Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and ROCm c9s container #5

Merged
merged 1 commit into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/.git
#/.git
/.github
/Makefile
/README.md
139 changes: 90 additions & 49 deletions .github/workflows/containers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,63 +13,73 @@ permissions:
contents: read

env:
IMAGE_NAME: quay.io/tiran/instructlab-containers
REGISTRY: ghcr.io
IMAGE_NAME: ghcr.io/${{ github.repository }}

jobs:
container:
name: "${{ matrix.suffix }} container"
runs-on: ubuntu-latest
strategy:
fail-fast: false
fail-fast: true
matrix:
include:
- containerfile: Containerfile.gfx1100
suffix: rocm-gfx1100
- containerfile: containers/rocm/Containerfile.fedora
suffix: rocm-fc40-gfx1100
free_diskspace: true
- containerfile: Containerfile.gfx1030
build_args: |
AMDGPU_ARCH=gfx1100
HSA_OVERRIDE_GFX_VERSION=11.0.0
PKG_CACHE=off

- containerfile: containers/rocm/Containerfile.fedora
suffix: rocm-gfx1030
free_diskspace: true
- containerfile: Containerfile.cuda-root
build_args: |
AMDGPU_ARCH=gfx1030
HSA_OVERRIDE_GFX_VERSION=10.3.0

- containerfile: containers/cuda/Containerfile
suffix: cuda-ubi9
free_diskspace: true
- containerfile: Containerfile.cpu
build_args: |
PKG_CACHE=off

- containerfile: containers/rocm/Containerfile.c9s
suffix: rocm-c9s
free_diskspace: true
build_args: |
PKG_CACHE=off
# FLASH_ATTN_AMDGPU_TARGETS=gfx90a,gfx942

- containerfile: containers/cpu/Containerfile
suffix: cpu
free_diskspace: false
build_args: |
PKG_CACHE=off
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
submodules: true

- name: Replace submodule .git reference for setuptools-scm
- name: Install jq
run: |
set -ex
# unset worktree first
git -C instructlab config --unset core.worktree
# replace file with git directory
rm instructlab/.git
cp -r .git/modules/instructlab instructlab/.git
# verify
git -C instructlab show
set -e
sudo apt update
sudo apt install -u jq

- name: Login to Quay.io
uses: docker/login-action@v3
with:
registry: quay.io
username: ${{ secrets.QUAY_USERNAME }}
password: ${{ secrets.QUAY_ROBOT_TOKEN }}

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Reconfigure Docker data-root
run: |
set -e
sudo mkdir /mnt/docker
jq '. + {"data-root": "/mnt/docker"}' < /etc/docker/daemon.json | tee /tmp/daemon.json
sudo mv /tmp/daemon.json /etc/docker/daemon.json
sudo systemctl restart docker.service

- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.IMAGE_NAME }}
tags: |
type=raw,value=${{ matrix.suffix }}
- name: Docker info
run: docker info

# container build needs lots of disk space. GHA has limited disk
# space. Cleanup unnecessary packages.
Expand All @@ -78,8 +88,13 @@ jobs:
if: ${{ matrix.free_diskspace }}
run: |
df -h
ls -la /opt/
sudo rm -rf /opt/az
sudo rm -rf /opt/hostedtoolcache
sudo rm -rf /opt/ghc
sudo rm -rf /opt/google
sudo rm -rf /opt/microsoft
sudo rm -rf /opt/pipx*
sudo rm -rf /usr/local
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/share/swift
Expand All @@ -88,30 +103,53 @@ jobs:
- name: Clean Docker images
if: ${{ matrix.free_diskspace }}
run: |
docker image prune -a -f
docker system prune -a -f
df -h

- name: Clean Debian packages
if: ${{ matrix.free_diskspace }}
- name: Replace submodule .git reference for setuptools-scm
run: make hack-submodule

- name: Disable cache mount to safe disk space
run: |
sudo apt purge -y -f microsoft-edge-stable google-chrome-stable firefox azure-cli google-cloud-cli mono-complete
sudo apt autoremove -y
sudo apt autoclean -y
# sudo dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -nr | head -n20
df -h
sed -i 's/^RUN --mount=type=cache.*$/RUN \\/' ${{ matrix.containerfile }}

- name: Login to Quay.io
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.IMAGE_NAME }}
tags: |
type=raw,value=${{ matrix.suffix }}

- name: Build and export image ${{ steps.meta.outputs.tags }}
id: build
uses: docker/build-push-action@v5
with:
context: instructlab/
load: true
load: True
push: false
file: ${{ matrix.containerfile }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
# cache-from: type=gha
# cache-to: type=gha
build-args: ${{ matrix.build_args }}
cache-from: type=gha
cache-to: type=gha

- name: List images ${{ steps.meta.outputs.tags }}
run: |
docker images

- name: Test image ${{ steps.meta.outputs.tags }}
run: |
set -e
Expand All @@ -127,12 +165,15 @@ jobs:
docker run --rm ${{ steps.meta.outputs.tags }} python3 -c 'import torch, llama_cpp'
echo "::endgroup::"

- name: Push image ${{ steps.meta.outputs.tags }}
- name: Push image all tags ${{ env.IMAGE_NAME }}
if: ${{ (github.event_name == 'push' && github.ref == 'refs/heads/main') || github.event_name == 'workflow_dispatch' }}
uses: docker/build-push-action@v5
run: |
docker push --all-tags ${{ env.IMAGE_NAME }}

- name: Generate artifact attestation
if: ${{ (github.event_name == 'push' && github.ref == 'refs/heads/main') || github.event_name == 'workflow_dispatch' }}
uses: actions/attest-build-provenance@v1
with:
context: instructlab/
push: true
file: ${{ matrix.containerfile }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
subject-name: ${{ env.IMAGE_NAME}}
subject-digest: ${{ steps.build.outputs.digest }}
push-to-registry: true
68 changes: 0 additions & 68 deletions Containerfile.cpu

This file was deleted.

90 changes: 0 additions & 90 deletions Containerfile.cuda-root

This file was deleted.

Loading