Skip to content

Deploy Llama for the Jarvis project using Docker, quickly enabling GPU-powered text generation.

License

Notifications You must be signed in to change notification settings

jarvis-fun/jarvis-inference-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jarvis Llama Inference

This repository provides two Dockerfiles for running Llama (or Llama 2) in a Docker container:

  • GPU Version: Uses an NVIDIA CUDA base image for GPU-accelerated inference (x86_64 + NVIDIA GPU).
  • CPU Version: Uses a plain Ubuntu base image for CPU-only inference, suitable for quick tests or Apple Silicon fallback.

Dockerfile.gpu

  • Based on nvidia/cuda:11.8.0-devel-ubuntu22.04.
  • Installs PyTorch with CUDA 11.8 support (torch==2.0.1+cu118, etc.).
  • Requires an x86_64 environment with NVIDIA drivers and the NVIDIA Container Toolkit installed.

Dockerfile.cpu

  • Based on ubuntu:22.04.
  • Installs CPU-only PyTorch wheels.
  • Suitable for local testing or Apple Silicon Docker (ARM), though inference will be slower.

Why We Built This

Within Jarvis, we frequently need natural-language processing and generative AI to:

  • Interpret user instructions (“Buy SOL if RSI < 30…”).
  • Generate human-like responses for trading insights or DeFi management tasks.
  • Rapidly prototype AI-driven features without reconfiguring GPU servers.

By bundling Llama in a Docker container, we simplify setup and ensure consistent deployments across development, staging, and production.


Where We Are Using It

  • Jarvis Core: Primary inference engine for prompt-based instructions, turning plain-English commands into structured actions.
  • DeFi Management: Automated liquidity provisioning and yield-farming instructions via Raydium or other protocols.
  • Internal Tools: Prototyping and testing new AI-driven components quickly and reliably.

Quick Start

  1. Requirements:

  2. Build & Run:

    cd scripts
    ./build_and_run_gpu.sh
    
  3. Test the endpoint:

     curl -X POST http://localhost:8080/generate \
      -H "Content-Type: application/json" \
      -d '{"prompt": "Hello, Jarvis!", "max_new_tokens": 30}'

About

Deploy Llama for the Jarvis project using Docker, quickly enabling GPU-powered text generation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published