Stars
Setu is a comprehensive pipeline designed to clean, filter, and deduplicate diverse data sources including Web, PDF, and Speech data. Built on Apache Spark, Setu encompasses four key stages: documeโฆ
A 4-hour coding workshop to understand how LLMs are implemented and used
The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices
A Comprehensive Toolkit for High-Quality PDF Content Extraction
A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models
Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]
This repository offers a comprehensive collection of tutorials and implementations for Prompt Engineering techniques, ranging from fundamental concepts to advanced strategies. It serves as an essenโฆ
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM ๐
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contโฆ
The repository contains all the set-up required to execute trainium training jobs.
Build resilient language agents as graphs.
Generative AI with Large Language Models on Coursera offered by Deeplearning.AI and AWS.
๐ฆ ๐๐ฒ๐ฎ๐ฟ๐ป about ๐๐๐ ๐, ๐๐๐ ๐ข๐ฝ๐, and ๐๐ฒ๐ฐ๐๐ผ๐ฟ ๐๐๐ for free by designing, training, and deploying a real-time financial advisor LLM system ~ ๐ด๐ฐ๐ถ๐ณ๐ค๐ฆ ๐ค๐ฐ๐ฅ๐ฆ + ๐ท๐ช๐ฅ๐ฆ๐ฐ & ๐ณ๐ฆ๐ข๐ฅ๐ช๐ฏ๐จ ๐ฎ๐ข๐ต๐ฆ๐ณ๐ช๐ข๐ญ๐ด
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Practical course about Large Language Models.
Curated list of datasets and tools for post-training.
๐ค ๐๐ฒ๐ฎ๐ฟ๐ป for ๐ณ๐ฟ๐ฒ๐ฒ how to ๐ฏ๐๐ถ๐น๐ฑ an end-to-end ๐ฝ๐ฟ๐ผ๐ฑ๐๐ฐ๐๐ถ๐ผ๐ป-๐ฟ๐ฒ๐ฎ๐ฑ๐ ๐๐๐ & ๐ฅ๐๐ ๐๐๐๐๐ฒ๐บ using ๐๐๐ ๐ข๐ฝ๐ best practices: ~ ๐ด๐ฐ๐ถ๐ณ๐ค๐ฆ ๐ค๐ฐ๐ฅ๐ฆ + 12 ๐ฉ๐ข๐ฏ๐ฅ๐ด-๐ฐ๐ฏ ๐ญ๐ฆ๐ด๐ด๐ฐ๐ฏ๐ด
LLM Workshop at Data Hack Summit 2023
Due to restriction of LLaMA, we try to reimplement BLOOM-LoRA (much less restricted BLOOM license here https://huggingface.co/spaces/bigscience/license) using Alpaca-LoRA and Alpaca_data_cleaned.json
Instruct-tune LLaMA on consumer hardware
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiastsโฆ
๐ Text-Prompted Generative Audio Model