Skip to content

atantawi/caspian

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Caspian

Caspian is a controller for multi-cluster kubernetes environment that decides on scheduling and placement of workloads so that the total carbon footprint generated by executing the workloads is minimized. Caspian lives in a hub cluster and through a multi-cluster management platform applies its placement and scheduling decisions on workloads to the destination spoke clusters.

caspian-architecture

Caspian works in a time slotted manner and has the following main components:

  • Carbon Monitoring: It periodically fetches the predicted values of carbon intensity of spoke clusters.
  • Green Scheduler: Based on the current and future values of carbon intensity, the status of workloads in the system, and available capacity of spoke clusters in the next T time slots, it calls an optimizer to obtain the best scheduling/placement for the workloads. Once the Scheduler obtains the solution, it updates the spec of workloads to notify the multi-cluster manager about its decisions. The optimizer uses clp package in its core. Clp provides a Go interface to the COIN-OR Linear Programming (CLP) librarywhich is part of the COIN-OR suite.

Caspian and MCAD

Caspian uses MicroMCAD as a Workload queueing and multi-cluster management platform to dispatch workloads to the destination clusters. A summary of the interactions between MicroMCAD and Caspian is depicted below.

caspian-mcad

Installation and Setup

Running outside a cluster

This section explains how to run MCAD and Caspian locally. You’ll need a Kubernetes cluster (as hub) to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster. You will also need a minimum one kubernetes cluster (as spoke) for dispatching workloads.

  • Step 1: Clone multicluster branch of MCAD repository.
git clone [email protected]:tardieu/mcad.git -b multicluster
  • Step 2: Clone Caspian repository.
git clone [email protected]:sustainablecomputing/caspian.git
  • Step 3: Run MCAD against the hub cluster in dispatching mode.
go run ./mcad/cmd/main.go --kube-context=kind-hub --mode=dispatcher --metrics-bind-address=127.0.0.1:8080 --health-probe-bind-address=127.0.0.1:8081

  • Step 4: Run MicroMCAD against each spoke cluster in runner mode.
go run ./mcad/cmd/main.go --kube-context=kind-spoke1 --mode=runner --metrics-bind-address=127.0.0.1:8082 --health-probe-bind-address=127.0.0.1:8083 --clusterinfo-name=spoke1
  • Step 5: Run syncer/syncers to syncing between hub cluster and spoke cluster/clusters.
node syncer.js kind-hub kind-spoke1 default spoke1
  • Step 5: Run Caspian against the hub cluster.
go run ./caspian/cmd/main.go --kube-context=kind-hub 

You’ll need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster.

Running on cluster

How to use it

Once Caspian and MCAD are installed, you can deploy appwrappers in the hub clusters and watch their status. Caspian looks at the specifications of each appwrapper to determine the total CPU/GPU requirement, the run time, and the deadline for executing the appwrapper. The example below shows an example of an appwrapper. Under sustainale filed, you can specify the run time (in hours) and the deadline. If users does not fill these filed, Caspian by default assumes that the run time of the appwrapper is one hour and there is no deadline for finishing the appwrapper.

apiVersion: workload.codeflare.dev/v1beta1
kind: AppWrapper
metadata:
  namespace: default
  name: aw1
spec:
  priority: 1
  schedulingSpec:
    minAvailable: 1
    requeuing:
      maxNumRequeuings: 5
  sustainable:
    runTime: 3
    deadline: 2023-11-07T17:09:23-08:00
  resources:
    GenericItems:
    - custompodresources:
      - requests:
          cpu: 3
        replicas: 1
      generictemplate:
        apiVersion: v1
        kind: Pod
        metadata:
          namespace: default
          name: aw1-1
          labels:
            workload.codeflare.dev/namespace: default
            workload.codeflare.dev: aw1
        spec:
          restartPolicy: Never
          containers:
            - name: busybox
              image: busybox
              command: ["sh", "-c", "sleep 45"]
              resources:
                requests:
                  cpu: 3
                limits:
                  cpu: 3
 

Publications

  • Scheduling on a single cluster:

    • Tayebeh Bahreini, Asser Tantawi and Alaa Youssef, "An Approximation Algorithm for Minimizing the Cloud Carbon Footprint through Workload Scheduling", Proc. of the IEEE International Conference on Cloud Computing (IEEE CLOUD), 2022 (link).
  • Scheduling and placement over multi clusters:

    • Tayebeh Bahreini, Asser Tantawi and Alaa Youssef, "A Carbon-aware Workload Dispatcher in Cloud Computing Systems", Proc. of the IEEE International Conference on Cloud Computing (IEEE CLOUD), 2023 (link).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 100.0%