Skip to content

Latest commit

 

History

History
44 lines (30 loc) · 2.71 KB

README.md

File metadata and controls

44 lines (30 loc) · 2.71 KB

G'day there,

Here is a quick summary of what is contained in this codebase and some advice on the next steps.

Whats the context & question?

The task is to explore tools to identify problems in the text reports of implantable medical devices.

The current basic approach is to compare the baseline occurrence of problems to potential increases, and similarly relate these increases occurrences to corresponding activities.

Open Questions

  • What is the best vocabulary of problems/activities to search for.
  • How can we identify significant words, word pairs or collections of words for each manufacturer.

What Data?

Main data

Two sets of main data:

Both of these datasets have two varieties: the device/report data and the text/narrative data. The latter is of interest for text analysis.

The TGA data has been scraped in the tgaScraping subfolder. The data is available in the data/tga folder. The MAUDE Jupyter notebook will download this data for you. See the notebooks for a preliminary analysis.

Helpful data

Useful external tools

  • medaCy medical named entity recognition.
  • PyMedTermino medical technical terminology mapping.
  • medpie: an information extraction package for medical message board posts

Other Glossary

  • tgaScraping/ProsthesisSponsors.xlsx is the list of all sponsors in the DAEN data.
  • random_files/FDA MAUDE Primer.pdf is a pdf report of what the FDA MAUDE data means. random_files/Grant Proposal.pdf is the project proposed for this grant.

Specific Todo-list items:

  • Map common words to the same vocabulary ('sexual intercourse', 'sex' and 'intercourse' -> 'sex')
  • Map varied manufacturer names to distinct manufacturers.