Skip to content

Train LLMs on private data. Simply make an API request to our training endpoint specifying you data and model. LangDrive will handle the rest. ⚡

License

Notifications You must be signed in to change notification settings

Shanjaiadithyan/langdrive

 
 

Repository files navigation

LangDrive

Train, deploy and query open source LLMs using your private data, all from one library.

GitHub Contributors GitHub Last Commit GitHub Repo Size GitHub Issues GitHub Pull Requests Github License


Join our Discord to keep up to date with the community and roadmap.


Use casesFeaturesDocsGetting startedContributions


LangDrive is an open-source AI library that simplifies training, deploying, and querying open-source large language models (LLMs) using private data. It supports data ingestion, fine-tuning, and deployment via a command-line interface, YAML file, or API, with a quick, easy setup.

Read the docs for more.


Train Your First LLM

We've replicated one of our training images as a Google Colab Notebook. Here's what it does:

  • Finetune falcon-7b-instruct
  • Creates a Flask Webserver
  • Opens an Ngrok API endpoint so you can call the API

Try it out here


Use cases

LangDrive lets you finetune LLMs to build amazing AI apps like:

  • Question/Answering over internal documents
  • Chatbots
  • AI agents
  • Content generation

Features:

  • Data ingestion LangDrive comes with the following built in data connectors to simplify data ingestion:

    • Firebase Firestore
    • Email Ingestion via SMTP
    • Google Drive
    • CSV
    • Website URL
    • (more coming soon, or you can build yours - LangDrive is open source)
  • Fine tuning

    • Fine tune open source LLMs easily by formating your data into input:output completion pairs
  • Deployment

    • Add your Hugging Face access token to deploy your model directly to hugging face hub after fine tuning
  • Inference

    • Query our supported open source models
  • Data Utils

    • LangDrive comes built-in with data utils for CRUD operations for the different data connectors
  • API


Docs

To see full Documentation and examples, go to docs


Getting started

The simplest way to get started with LangDrive is through your CLI. For a more detailed overview on getting started using the YAML config and API, please visit the docs.

Using the CLI

Node developers can train and deploy a model in 2 simple steps.

  1. npm install langdrive
  2. langdrive train --csv ./path/to/csvFileName.csv --hftoken apikey123 --deploy

In this case, LangDrive will retrieve the data, train a model, host it's weights on Hugging Face, and return an inference endpoint you may use to query the LLM.

The command langdrive train is used to train the LLM, please see how to configure the command below.

args:

  • yaml: Path to optional YAML config doc, default Value: './LangDrive.yaml'. This will load up any class and query for records and their values for both inputs and ouputs.
  • csv: Path to training dataCSV*The training data should be a two-column CSV of input and output pairs.
  • hfToken: An API key provided by Hugging Face with write permissions. Get one here.
  • baseModel: The original model to train: This can be one of the models in our supported models shown at the bottom of this page
  • deployToHf: true | false
  • hfModelPath: The full path to your hugging face model repo where the model should be deployed. Format: hugging face username/model

It is assumed you do not want to deploy your model if you run langdrive train. In such a case a link to where you can download the weights will be provided. Adding --deploy will return a link to the inferencing endpoint.

More information on how to ingest simple data using the CLI can be found in the docs.


Contributions

LangDrive is open source and we welcome contributions from the community. To contribute, please make a PR through the "fork and pull request" process.

Join our Discord to keep up to date with the community and roadmap.

About

Train LLMs on private data. Simply make an API request to our training endpoint specifying you data and model. LangDrive will handle the rest. ⚡

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 63.3%
  • Jupyter Notebook 21.1%
  • Python 9.0%
  • CSS 2.2%
  • Shell 2.0%
  • HTML 1.8%
  • Other 0.6%