-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv dump of answer sets #172
Comments
Update: More than one external user has requested a csv or txt dump of results. I strongly suggest that we prioritize this request. |
Note that one reason we have not offered non-JSON formats is that the conversion to csv or txt, for example, will result in a very messy, non-readable file due to the many nested attributes and node/edge properties. We may want to simplify the output by providing, for example, subject + predicate + object + primary/aggregator knowledge source + publications + statements. Of course, we will need to confirm that a simplified output file is acceptable to users. |
To add to the messiness point, it's not just attributes which are actually nested which make it messy, TRAPI results in general are made up of many dictionaries. Even primary and aggregator knowledge sources are represented in a dictionary format that would be complicated to translate into csv in a meaningful way. We usually have very simple examples but the TRAPI spec allows for complex chains of provenance. In the latest version of Plater (not yet deployed) we will also have support/aux graphs representing subclass edges, which are referenced by other results in a way that is easy to read in TRAPI/json but would be difficult to represent in CSV. In short, it'd be easy to spit out very simplified versions of results, but there is A LOT of stuff in TRAPI that is easier to read and understand in json, and would require quite a bit of work and design to transform into CSV. The compromise would be to include json strings as values inside the csv, which somewhat defeats the point. |
FWIW, I asked one of the external users who requested a csv or tsv dump if the minimal fields I suggested above would be sufficient to support this person's needs. Will update this ticket after receiving a response. |
Additional feedback from external user: "All I need is the subject, predicate, object and publication. The other fields would be nice. I am not sure if you have the time, but maybe you could add a ‘check box’ for the attributes to include." |
Would it be possible to download just what is seen on the face of the table? |
Yup, we could do something like that. |
I tried to give a write up of what a CSV will achieve: A CSV file of results from a query would support user interface. In the current state, the table enables navigable results without the capacity to easily/smoothly transfer the information to secondary analysis. The utility of a downloadable CSV file of query responses serves the function of caching the KG response for a user not acquainted with parsing json files or trapi queries. Secondary analysis in this case may include the generation of figures summarizing the graph using the built in scores/rankings, or secondary analysis by way of response frequency. A simple download of the table presented on the results screen would provide user comfort with saving the response generated at a given moment as well as facilitate additional analysis by the end user. Although the response table generated on the response page does not capture the full breadth & depth of the graph, a user would likely be interested primarily in the broad strokes of the information presented in the table and could use the tools built into the results page for additional analysis of the particular relevant components of the response. |
I agree with all of Nyssa's arguments. I'll add that many of our users are simply more familiar with tabular data than graphs, which are admittedly challenging to navigate and interpret, especially when exploring results from complex queries. Those users that have requested csv dumps are fine with losing some of the richness of "knowledge" captured in the KG as a trade-off for gaining more more flexibility and the ability to conduct secondary analysis. While the UI can be modified to address some of these users desires, e.g., providing a tool to facilitate JSON to CSV conversion, I'm not convinced that the UI will ever be sufficient to address the desires of users who simply prefer tabular data. |
Names have been added to the CSV download, replacing the CURIEs. Chris mentioned it is probably worth having columns for the names and CURIEs RobokopU24/qgraph#295 |
Any chance we can add names and not replace the CURIES? Both are helpful. :-) |
Update:
|
This issue is to suggest that we add a feature to allow users to download csv dumps of answer sets and individual results, in addition to JSON dumps.
The text was updated successfully, but these errors were encountered: