Skip to content

AlienKevin/starcoder_azure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Set up a virtual machine

  1. Go to the azure portal and hit "Create a resource": Step 0

  2. Hit "Create" under "Virtual machine": Step 1

  3. Give the machine a name, here we use "starcoder" Step 2

  4. Select the security type as "Standard". Must be standard otherwise can't install Nvidia driver later. Select image as Ubuntu Server 22.04 LTS. Select size as Standard_NV36_A10_v5 - 36 vcpus. Step 3

  5. Use "Password" as the Authentication type for easy access through SSH: Step 4

  6. Create and attach a new disk to persistently store model weights and server files: Step 5 Step 6 Step 7

  7. Hit "Review + create" to create the instance: Step 8

  8. Once the instance is created, configure its network settings to allow inbound connections through port 8080 for use with our starcoder server: Step 10 Step 11 Step 12 Step 13

Install dependencies

  1. Get your instance's public IP address from the instance dashboard: IP Address

  2. Connect via ssh

    ssh -o ServerAliveInterval=60 azureuser@<your_instance_ip_address>
    
  3. Follow the official tutorial to attach the data disk created in the setup step. Name the mount point /workspace instead of /datadir in the tutorial. Start from the "Prepare a new empty disk" section and stop after finishing the "Verify the disk" section in the tutorial: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/attach-disk-portal

  4. Grant permission to the user (otherwise always require sudo for any file operation)

    sudo chmod -R -v 777 /workspace
    
  5. Follow the steps in the official doc to install Nvidia driver. The instance will automatically be restarted during the installation and your SSH connection will break. Reconnect to the instance after the installation finishes. https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-linux#azure-portal

  6. Install dependencies

    sudo apt-get update
    sudo apt-get install build-essential wget nvidia-cuda-toolkit -y
    

    The installation can take several minutes.

  7. RESTART the instance on the instance's dashboard. Otherwise will get Failed to initialize NVML: Driver/library version mismatch when running nvidia-smi.

  8. Verify that the GPU is correctly recognized:

    nvidia-smi
    

Setup llama.cpp server

  1. Download the a quantized version of the StarCoder 2 model from HuggingFace:

    cd /workspace
    mkdir models/
    cd models/
    wget https://huggingface.co/second-state/StarCoder2-15B-GGUF/resolve/main/starcoder2-15b-Q5_K_M.gguf
    
  2. Build llama.cpp

    cd /workspace
    git clone https://github.com/ggerganov/llama.cpp.git
    cd llama.cpp
    CUDA_DOCKER_ARCH=compute_86 make GGML_CUDA=1
    

    Note: 86 is specifically for the A10 GPU. See https://github.com/distantmagic/paddler/blob/main/infra/tutorial-installing-llamacpp-aws-cuda.md#cuda-architecture-must-be-explicitly-provided

  3. Start the server to listen on port 8080 for requests

    cd /workspace/llama.cpp
    
    ./llama-server \
        -t 10 \
        -ngl 64 \
        -b 512 \
        --ctx-size 16384 \
        -m ../models/starcoder2-15b-Q5_K_M.gguf \
        --color -c 3400 \
        --seed 42 \
        --temp 0.8 \
        --top_k 5 \
        --repeat_penalty 1.1 \
        --host :: \
        --port 8080 \
        -n -1
    

    Optional: Use GNU screen to allow persist access to server even when the SSH connection breaks:

    1. Start a new screen session:

      screen -S starcoder

      This starts a new screen session named starcoder.

    2. Run your server inside the screen session:

      cd /workspace/llama.cpp
      
      ./llama-server \
          -t 10 \
          -ngl 64 \
          -b 512 \
          --ctx-size 16384 \
          -m ../models/starcoder2-15b-Q5_K_M.gguf \
          --color -c 3400 \
          --seed 42 \
          --temp 0.8 \
          --top_k 5 \
          --repeat_penalty 1.1 \
          --host :: \
          --port 8080 \
          -n -1

    This will keep running inside the screen session.

    1. Detach from the screen session: Press Ctrl-a followed by d. This detaches the screen session and keeps it running in the background.

    2. Reattach to the screen session (if needed):

      screen -r starcoder

      This reattaches you to the starcoder session.

    3. List all screen sessions:

      screen -ls

      This shows all the running screen sessions.

    4. Kill a screen session (when you're done): First, reattach to the session:

      screen -r starcoder

      Then, stop the server and exit the session:

      exit

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published