-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we consider to use Llama sovle this issue #7
Comments
@ZhouNan2020 Great idea! I imagine we would still need to validate some of the results to check that the performance is good enough, but if it is reliable we could achieve an automatic translation. Did you have any idea to try it? |
First of all, I've used the LLM playground to try to clean some disease names, and asked GPT or Claude to convert these disease names to ICD-10 encoding, and the results seem to be good. Second, if we want to plug into Diana, we should look into LLama's API instead of using playgrounds. Thirdly, I see that diana is currently mainly using JS and R, but the API that can be connected to LLMs should use more python Finally, as you said, we need to verify the reliability of the results. I don't think we have to guarantee that every result is correct, as long as we use a reasonable verification process to prove that we can maintain reliability within a reliable range. For example, if you randomly select and cross-validate the results of a zero-shot, as long as the AUC value of the LLMs performance can reach 0.8, I think it will be fine. We must understand that what we are looking for is a balance between human effort and accuracy, not complete accuracy, because even if we use human sifting and clarity, the results are not necessarily accurate. |
Great work! If the mapping from MedDRA to ICD-10 is of interest to you, also note that there is a human-generated validated mapping between MedDRA and ICD-10 that you could find useful. For the non-translated drug, it would definitely worth a try, even just to provide a first automatic suggestion for translation that can then be validated depending on the need. Validating and compiling the drugname translation row by row is a really big effort that we have to repeat at every new quarter update and it would be great to have an automatic support (at the moment we are just using the already validated dictionary together with some fuzzy techniques based on Levenshtein distance and string editing to precompile translation to be validated)! For this month we will be very busy, but if you are interested in trying to implement your idea we can discuss it next month. :) |
Actually, Im writing my Thesis now, so I cant ensure my workload focus on this project,
Maybe I'll have time next year.
***@***.***
From: Michele Fusaroli
Date: 2024-10-01 22:03
To: fusarolimichele/DiAna
CC: ZhouNan; Mention
Subject: Re: [fusarolimichele/DiAna] Can we consider to use Llama sovle this issue (Issue #7)
Great work! If the mapping from MedDRA to ICD-10 is of interest to you, also note that there is a human-generated validated mapping between MedDRA and ICD-10 that you could find useful.
For the non-translated drug, it would definitely worth a try, even just to provide a first automatic suggestion for translation that can then be validated depending on the need. Validating and compiling the drugname translation row by row is a really big effort that we have to repeat at every new quarter update and it would be great to have an automatic support (at the moment we are just using the already validated dictionary together with some fuzzy techniques based on Levenshtein distance and string editing to precompile translation to be validated)!
For this month we will be very busy, but if you are interested in trying to implement your idea we can discuss it next month. :)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Understand! Same situation ;) If you have not seen it, If you need for your thesis an already cleaned version of the FAERS, including drugs and events, the https://github.com/fusarolimichele/DiAna_package stores an R package that allows you to download the cleaned version with just one command! Good luck with your thesis, and feel free to reach out for anything! |
I think we can use the 'zero-shot' prompt to clean and normalize free text data in FAERS, some open-source LLMs like Qwen 2.5, Llama 3.2 maybe useful in this project.
The text was updated successfully, but these errors were encountered: