Replies: 3 comments 2 replies
-
I think the problem most often faced is that, while those using LLaMa's might think the rest of us are wasting money using the OpenAI API, using the LLaMa isn't really free and, for anything comparable to GPT-4's results, requires NVLinking like 4-8 RTX GPUs.My point is that calling it "free", when it comes to running it locally, is very deceptive. Also, while I wish it weren't so, my analysis of energy costs to fine tune, train, or inference one of the most robust models I could find data on (I think it was like a 65B parameter LLaMa derivative) demonstrated that the electric costs alone exceeded the equivalent, per prompt, API costs. So even after I buy all those massive GPUs, I'm paying out more than OpenAI will charge me. I don't wanna discourage the evolution of LLaMa or Open-source (which I'm excited about), but everyone calling these things free must be teenagers that only view it as free because they weren't the ones that paid for their $3700 gaming rig, nor do they conceive that all the megawatts it draws are not really free. EDIT: Let me qualify my assessment by saying that I believe that, if I understand LoRa correctly, there are massive exceptions to what I've said. Should you use GPT-4 questions and answers to create a specialized LLaMa such that it addresses very frequent questions with a very narrow skillset, that would definitely reduce power needed. |
Beta Was this translation helpful? Give feedback.
-
We have already implemented Local LLMs in this PR: #289 |
Beta Was this translation helpful? Give feedback.
-
If you have any other requirements, feel free to open this discussion! |
Beta Was this translation helpful? Give feedback.
-
i don't fully grasp this subject about models and AIs ... but i think we can use free and open models that run on our machines and prompt them .
Beta Was this translation helpful? Give feedback.
All reactions