csv dump of answer sets #172

karafecho · 2023-07-24T12:40:23Z

This issue is to suggest that we add a feature to allow users to download csv dumps of answer sets and individual results, in addition to JSON dumps.

karafecho · 2024-09-18T18:28:44Z

Update: More than one external user has requested a csv or txt dump of results. I strongly suggest that we prioritize this request.

karafecho · 2024-09-18T18:38:18Z

Note that one reason we have not offered non-JSON formats is that the conversion to csv or txt, for example, will result in a very messy, non-readable file due to the many nested attributes and node/edge properties. We may want to simplify the output by providing, for example, subject + predicate + object + primary/aggregator knowledge source + publications + statements. Of course, we will need to confirm that a simplified output file is acceptable to users.

EvanDietzMorris · 2024-09-18T18:47:22Z

To add to the messiness point, it's not just attributes which are actually nested which make it messy, TRAPI results in general are made up of many dictionaries. Even primary and aggregator knowledge sources are represented in a dictionary format that would be complicated to translate into csv in a meaningful way. We usually have very simple examples but the TRAPI spec allows for complex chains of provenance. In the latest version of Plater (not yet deployed) we will also have support/aux graphs representing subclass edges, which are referenced by other results in a way that is easy to read in TRAPI/json but would be difficult to represent in CSV.

In short, it'd be easy to spit out very simplified versions of results, but there is A LOT of stuff in TRAPI that is easier to read and understand in json, and would require quite a bit of work and design to transform into CSV. The compromise would be to include json strings as values inside the csv, which somewhat defeats the point.

karafecho · 2024-09-18T22:12:08Z

FWIW, I asked one of the external users who requested a csv or tsv dump if the minimal fields I suggested above would be sufficient to support this person's needs. Will update this ticket after receiving a response.

karafecho · 2024-09-20T03:59:39Z

Additional feedback from external user:

"All I need is the subject, predicate, object and publication. The other fields would be nice. I am not sure if you have the time, but maybe you could add a ‘check box’ for the attributes to include."

ntuck · 2024-10-16T18:09:46Z

Would it be possible to download just what is seen on the face of the table?

EvanDietzMorris · 2024-10-18T01:54:39Z

Yup, we could do something like that.

ntuck · 2025-01-15T19:05:07Z

I tried to give a write up of what a CSV will achieve:

A CSV file of results from a query would support user interface. In the current state, the table enables navigable results without the capacity to easily/smoothly transfer the information to secondary analysis. The utility of a downloadable CSV file of query responses serves the function of caching the KG response for a user not acquainted with parsing json files or trapi queries. Secondary analysis in this case may include the generation of figures summarizing the graph using the built in scores/rankings, or secondary analysis by way of response frequency. A simple download of the table presented on the results screen would provide user comfort with saving the response generated at a given moment as well as facilitate additional analysis by the end user. Although the response table generated on the response page does not capture the full breadth & depth of the graph, a user would likely be interested primarily in the broad strokes of the information presented in the table and could use the tools built into the results page for additional analysis of the particular relevant components of the response.

karafecho · 2025-01-15T22:21:41Z

I agree with all of Nyssa's arguments. I'll add that many of our users are simply more familiar with tabular data than graphs, which are admittedly challenging to navigate and interpret, especially when exploring results from complex queries. Those users that have requested csv dumps are fine with losing some of the richness of "knowledge" captured in the KG as a trade-off for gaining more more flexibility and the ability to conduct secondary analysis. While the UI can be modified to address some of these users desires, e.g., providing a tool to facilitate JSON to CSV conversion, I'm not convinced that the UI will ever be sufficient to address the desires of users who simply prefer tabular data.

Woozl · 2025-02-04T16:18:11Z

Names have been added to the CSV download, replacing the CURIEs. Chris mentioned it is probably worth having columns for the names and CURIEs RobokopU24/qgraph#295

karafecho · 2025-02-06T02:26:36Z

Any chance we can add names and not replace the CURIES? Both are helpful. :-)

karafecho · 2025-03-05T18:58:22Z

Update:

node CURIES + names, scores, publications added
predicates will be added to complete the minimal viable product

karafecho assigned Woozl Sep 18, 2024

karafecho added user priority UI use cases labels Sep 18, 2024

hina-shah added the enhancement New feature or request label Oct 8, 2024

EvanDietzMorris mentioned this issue Oct 14, 2024

Downloadable CSV file of question builder results #236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv dump of answer sets #172

csv dump of answer sets #172

karafecho commented Jul 24, 2023

karafecho commented Sep 18, 2024

karafecho commented Sep 18, 2024

EvanDietzMorris commented Sep 18, 2024 •

edited

Loading

karafecho commented Sep 18, 2024

karafecho commented Sep 20, 2024

ntuck commented Oct 16, 2024

EvanDietzMorris commented Oct 18, 2024

ntuck commented Jan 15, 2025

karafecho commented Jan 15, 2025

Woozl commented Feb 4, 2025

karafecho commented Feb 6, 2025

karafecho commented Mar 5, 2025

csv dump of answer sets #172

csv dump of answer sets #172

Comments

karafecho commented Jul 24, 2023

karafecho commented Sep 18, 2024

karafecho commented Sep 18, 2024

EvanDietzMorris commented Sep 18, 2024 • edited Loading

karafecho commented Sep 18, 2024

karafecho commented Sep 20, 2024

ntuck commented Oct 16, 2024

EvanDietzMorris commented Oct 18, 2024

ntuck commented Jan 15, 2025

karafecho commented Jan 15, 2025

Woozl commented Feb 4, 2025

karafecho commented Feb 6, 2025

karafecho commented Mar 5, 2025

EvanDietzMorris commented Sep 18, 2024 •

edited

Loading