Can we consider to use Llama sovle this issue #7

ZhouNan2020 · 2024-09-27T07:57:15Z

I think we can use the 'zero-shot' prompt to clean and normalize free text data in FAERS, some open-source LLMs like Qwen 2.5, Llama 3.2 maybe useful in this project.

fusarolimichele · 2024-10-01T12:30:27Z

@ZhouNan2020 Great idea! I imagine we would still need to validate some of the results to check that the performance is good enough, but if it is reliable we could achieve an automatic translation. Did you have any idea to try it?

ZhouNan2020 · 2024-10-01T12:57:20Z

First of all, I've used the LLM playground to try to clean some disease names, and asked GPT or Claude to convert these disease names to ICD-10 encoding, and the results seem to be good.

Second, if we want to plug into Diana, we should look into LLama's API instead of using playgrounds.

Thirdly, I see that diana is currently mainly using JS and R, but the API that can be connected to LLMs should use more python

Finally, as you said, we need to verify the reliability of the results. I don't think we have to guarantee that every result is correct, as long as we use a reasonable verification process to prove that we can maintain reliability within a reliable range. For example, if you randomly select and cross-validate the results of a zero-shot, as long as the AUC value of the LLMs performance can reach 0.8, I think it will be fine. We must understand that what we are looking for is a balance between human effort and accuracy, not complete accuracy, because even if we use human sifting and clarity, the results are not necessarily accurate.

fusarolimichele · 2024-10-01T14:02:47Z

Great work! If the mapping from MedDRA to ICD-10 is of interest to you, also note that there is a human-generated validated mapping between MedDRA and ICD-10 that you could find useful.

For the non-translated drug, it would definitely worth a try, even just to provide a first automatic suggestion for translation that can then be validated depending on the need. Validating and compiling the drugname translation row by row is a really big effort that we have to repeat at every new quarter update and it would be great to have an automatic support (at the moment we are just using the already validated dictionary together with some fuzzy techniques based on Levenshtein distance and string editing to precompile translation to be validated)!

For this month we will be very busy, but if you are interested in trying to implement your idea we can discuss it next month. :)

ZhouNan2020 · 2024-10-02T00:25:01Z

Actually, Im writing my Thesis now, so I cant ensure my workload focus on this project, Maybe I'll have time next year. ***@***.*** From: Michele Fusaroli Date: 2024-10-01 22:03 To: fusarolimichele/DiAna CC: ZhouNan; Mention Subject: Re: [fusarolimichele/DiAna] Can we consider to use Llama sovle this issue (Issue #7) Great work! If the mapping from MedDRA to ICD-10 is of interest to you, also note that there is a human-generated validated mapping between MedDRA and ICD-10 that you could find useful. For the non-translated drug, it would definitely worth a try, even just to provide a first automatic suggestion for translation that can then be validated depending on the need. Validating and compiling the drugname translation row by row is a really big effort that we have to repeat at every new quarter update and it would be great to have an automatic support (at the moment we are just using the already validated dictionary together with some fuzzy techniques based on Levenshtein distance and string editing to precompile translation to be validated)! For this month we will be very busy, but if you are interested in trying to implement your idea we can discuss it next month. :) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

fusarolimichele · 2024-10-03T07:35:37Z

Understand! Same situation ;) If you have not seen it, If you need for your thesis an already cleaned version of the FAERS, including drugs and events, the https://github.com/fusarolimichele/DiAna_package stores an R package that allows you to download the cleaned version with just one command! Good luck with your thesis, and feel free to reach out for anything!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we consider to use Llama sovle this issue #7

Can we consider to use Llama sovle this issue #7

ZhouNan2020 commented Sep 27, 2024

fusarolimichele commented Oct 1, 2024

ZhouNan2020 commented Oct 1, 2024

fusarolimichele commented Oct 1, 2024

ZhouNan2020 commented Oct 2, 2024 via email

fusarolimichele commented Oct 3, 2024

Can we consider to use Llama sovle this issue #7

Can we consider to use Llama sovle this issue #7

Comments

ZhouNan2020 commented Sep 27, 2024

fusarolimichele commented Oct 1, 2024

ZhouNan2020 commented Oct 1, 2024

fusarolimichele commented Oct 1, 2024

ZhouNan2020 commented Oct 2, 2024 via email

fusarolimichele commented Oct 3, 2024