G'day there,
Here is a quick summary of what is contained in this codebase and some advice on the next steps.
The task is to explore tools to identify problems in the text reports of implantable medical devices.
The current basic approach is to compare the baseline occurrence of problems to potential increases, and similarly relate these increases occurrences to corresponding activities.
- What is the best vocabulary of problems/activities to search for.
- How can we identify significant words, word pairs or collections of words for each manufacturer.
Two sets of main data:
- From Australia there is the Therapeutic Goods Administration (TGA) which has the Database of Adverse Event Notifications (medical devices).
- In the US the much larger data from the Food and Drug Administration (FDA) is the Manufacturer and User Facility Device Experience Database (MAUDE). The file
FDA MAUDE Primer.pdf
may help in understanding this data.
Both of these datasets have two varieties: the device/report data and the text/narrative data. The latter is of interest for text analysis.
The TGA data has been scraped in the tgaScraping
subfolder. The data is available in the data/tga
folder.
The MAUDE Jupyter notebook will download this data for you. See the notebooks for a preliminary analysis.
vocabulary/CPA_activities.csv
is a list of physical activities from the 2011 Compendium of Physical Activitiesvocabulary/annex_*.json
is terminology of possible adverse events as classified by IMDRF
medaCy
medical named entity recognition.PyMedTermino
medical technical terminology mapping.medpie
: an information extraction package for medical message board posts
tgaScraping/ProsthesisSponsors.xlsx
is the list of all sponsors in the DAEN data.random_files/FDA MAUDE Primer.pdf
is a pdf report of what the FDA MAUDE data means.random_files/Grant Proposal.pdf
is the project proposed for this grant.
- Map common words to the same vocabulary ('sexual intercourse', 'sex' and 'intercourse' -> 'sex')
- Map varied manufacturer names to distinct manufacturers.