Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv dump of answer sets #172

Open
karafecho opened this issue Jul 24, 2023 · 12 comments
Open

csv dump of answer sets #172

karafecho opened this issue Jul 24, 2023 · 12 comments
Assignees

Comments

@karafecho
Copy link

This issue is to suggest that we add a feature to allow users to download csv dumps of answer sets and individual results, in addition to JSON dumps.

@karafecho
Copy link
Author

Update: More than one external user has requested a csv or txt dump of results. I strongly suggest that we prioritize this request.

@karafecho
Copy link
Author

Note that one reason we have not offered non-JSON formats is that the conversion to csv or txt, for example, will result in a very messy, non-readable file due to the many nested attributes and node/edge properties. We may want to simplify the output by providing, for example, subject + predicate + object + primary/aggregator knowledge source + publications + statements. Of course, we will need to confirm that a simplified output file is acceptable to users.

@EvanDietzMorris
Copy link

EvanDietzMorris commented Sep 18, 2024

To add to the messiness point, it's not just attributes which are actually nested which make it messy, TRAPI results in general are made up of many dictionaries. Even primary and aggregator knowledge sources are represented in a dictionary format that would be complicated to translate into csv in a meaningful way. We usually have very simple examples but the TRAPI spec allows for complex chains of provenance. In the latest version of Plater (not yet deployed) we will also have support/aux graphs representing subclass edges, which are referenced by other results in a way that is easy to read in TRAPI/json but would be difficult to represent in CSV.

In short, it'd be easy to spit out very simplified versions of results, but there is A LOT of stuff in TRAPI that is easier to read and understand in json, and would require quite a bit of work and design to transform into CSV. The compromise would be to include json strings as values inside the csv, which somewhat defeats the point.

@karafecho
Copy link
Author

FWIW, I asked one of the external users who requested a csv or tsv dump if the minimal fields I suggested above would be sufficient to support this person's needs. Will update this ticket after receiving a response.

@karafecho
Copy link
Author

Additional feedback from external user:

"All I need is the subject, predicate, object and publication. The other fields would be nice. I am not sure if you have the time, but maybe you could add a ‘check box’ for the attributes to include."

@ntuck
Copy link

ntuck commented Oct 16, 2024

Would it be possible to download just what is seen on the face of the table?

@EvanDietzMorris
Copy link

Yup, we could do something like that.

@ntuck
Copy link

ntuck commented Jan 15, 2025

I tried to give a write up of what a CSV will achieve:

A CSV file of results from a query would support user interface. In the current state, the table enables navigable results without the capacity to easily/smoothly transfer the information to secondary analysis. The utility of a downloadable CSV file of query responses serves the function of caching the KG response for a user not acquainted with parsing json files or trapi queries. Secondary analysis in this case may include the generation of figures summarizing the graph using the built in scores/rankings, or secondary analysis by way of response frequency. A simple download of the table presented on the results screen would provide user comfort with saving the response generated at a given moment as well as facilitate additional analysis by the end user. Although the response table generated on the response page does not capture the full breadth & depth of the graph, a user would likely be interested primarily in the broad strokes of the information presented in the table and could use the tools built into the results page for additional analysis of the particular relevant components of the response.

@karafecho
Copy link
Author

I agree with all of Nyssa's arguments. I'll add that many of our users are simply more familiar with tabular data than graphs, which are admittedly challenging to navigate and interpret, especially when exploring results from complex queries. Those users that have requested csv dumps are fine with losing some of the richness of "knowledge" captured in the KG as a trade-off for gaining more more flexibility and the ability to conduct secondary analysis. While the UI can be modified to address some of these users desires, e.g., providing a tool to facilitate JSON to CSV conversion, I'm not convinced that the UI will ever be sufficient to address the desires of users who simply prefer tabular data.

@Woozl
Copy link
Member

Woozl commented Feb 4, 2025

Names have been added to the CSV download, replacing the CURIEs. Chris mentioned it is probably worth having columns for the names and CURIEs RobokopU24/qgraph#295

@karafecho
Copy link
Author

Any chance we can add names and not replace the CURIES? Both are helpful. :-)

@karafecho
Copy link
Author

Update:

  • node CURIES + names, scores, publications added
  • predicates will be added to complete the minimal viable product

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants