title | output |
---|---|
Setting up jupyterhub on Google Clould with Kubernetes |
html |
By Thalia 2020/03/15
Reference: https://zero-to-jupyterhub.readthedocs.io/en/latest/google/step-zero-gcp.html
You need to create a google cloud project and use the shell command there.
Create a managed Kubernetes cluster and a default node pool.
gcloud container clusters create
--machine-type n1-standard-2
--num-nodes 1
--zone us-west2
--cluster-version latest
dscluster
To test if your cluster is initialized, run:
kubectl get node
Give your account permissions to perform all administrative actions needed.
kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole=cluster-admin \
[email protected]
output:
clusterrolebinding.rbac.authorization.k8s.io/cluster-admin-binding created
Now that you have your Kubernetes cluster running, the next step is Setting up Helm.
Reference: https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-jupyterhub/setup-helm.html
Helm, the package manager for Kubernetes, is a useful tool for: installing, upgrading and managing applications on a Kubernetes cluster. Helm packages are called charts. We will be installing and managing JupyterHub on our Kubernetes cluster using a Helm chart.
Run this in Google Cloud Shell to install Helm:
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
Helm has two parts: a client (helm)
and a server (tiller)
. Tiller runs inside of your Kubernetes cluster as a pod in the kube-system namespace. Tiller manages both, the releases (installations) and revisions (versions) of charts deployed on the cluster. When you run helm commands, your local Helm client sends instructions to tiller in the cluster that in turn make the requested changes.
- Set up a ServiceAccount for use by tiller.
kubectl --namespace kube-system create serviceaccount tiller
- Initialize helm and tiller.
helm init --service-account tiller --history-max 100 --wait
This command only needs to run once per Kubernetes cluster, it will create a tiller deployment in the kube-system namespace and setup your local helm client. This command installs and configures the tiller part of Helm (the whole project, not the CLI) on the remote kubernetes cluster. Later when you want to deploy changes with helm (the local CLI), it will talk to tiller and tell it what to do. tiller then executes these instructions from within the cluster. We limit the history to 100 previous installs as very long histories slow down helm commands a lot.
- Ensure that tiller is secure from access inside the cluster:
kubectl patch deployment tiller-deploy --namespace=kube-system --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]'
You can verify that you have the correct version and that it installed properly by running:
helm version
Helm is now set up! let's continue with setting up JupyterHub!
Reference: https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-jupyterhub/setup-jupyterhub.html
Now that we have a Kubernetes cluster and Helm setup, we can proceed by using Helm to install JupyterHub and related Kubernetes resources using a Helm chart.
Generate a random hex string representing 32 bytes to use as a security token. Run this command in a terminal and copy the output:
openssl rand -hex 32
Output
14d2b#########################################f2f5cef10e933e8
Create and start editing a file called config.yaml. In the code snippet below we start the widely available nano editor, but any editor will do.
nano config.yaml
Write the following into the config.yaml file
proxy:
secretToken: "14d2b#########################################f2f5cef10e933e8"
Save the config.yaml file. In the nano editor this is done by pressing CTRL+X or CMD+X followed by a confirmation to save the changes.
helm upgrade --install jhub jupyterhub/jupyterhub \
--namespace jhub \
--version=0.8.2 \
--values config.yaml
Expected output:
Release "jhub" does not exist. Installing it now.
If you get a release named already exists error, then you should delete the release by running helm delete --purge <YOUR-RELEASE-NAME>
. Then reinstall by repeating this step. If it persists, also do kubectl delete namespace <YOUR-NAMESPACE>
and try again.
Find the IP we can use to access the JupyterHub. Run the following command until the EXTERNAL-IP of the proxy-public service is available like in the example output.
kubectl get service --namespace jhub
Output:
NAME | TYPE | CLUSTER-IP | EXTERNAL-IP | PORT(S) | AGE |
---|---|---|---|---|---|
hub | ClusterIP | 10.18.296.118 | 8081/TCP | 2m25s | |
proxy-api | ClusterIP | 10.18.244.7 | 8001/TCP | 2m25s | |
proxy-public | LoadBalancer | 10.18.241.124 | 15.239.22.110 | 80:31748/TCP,443:32558/TCP | 2m25s |
To use JupyterHub, enter the external IP for the proxy-public service in to a browser. JupyterHub is running with a default dummy authenticator so entering any username and password combination will let you enter the hub.
Now that you have basic JupyterHub running, you can extend it and optimize it in many ways to meet your needs. To let users use JupyterLab by default, add the following entries to your config.yaml:
singleuser:
defaultUrl: "/lab"
hub:
extraConfig:
jupyterlab: |
c.Spawner.cmd = ['jupyter-labhub']
And then you need to re-run the command to launch the new jupyterhub again:
helm upgrade --install jhub jupyterhub/jupyterhub --namespace jhub --version=0.8.2 --values config.yaml
JupyterHub Reference: https://zero-to-jupyterhub.readthedocs.io/en/v0.4-doc/user-experience.html
1. Install Docker Tookbox
From: https://docs.docker.com/toolbox/toolbox_install_windows/
After finish the installation, click the Docker QuickStart icon to launch a pre-configured Docker Toolbox terminal. If you run the command: docker run hello-world
can see the output Hello from Docker
meaning Docker is successfully installed.
2. Install repo2docker
pip install jupyter-repo2docker
repo2docker will convert a github repo into a docker image. The github repo should at least has a requirements.txt
file like this:
3. Create a docker image from a github repository
jupyter-repo2docker --no-run --image=jhubimg https://github.com/wenxuan0923/DStoolkit.git
NOTE: If you are running this command on a Windows machine like me, you will encounter an error in
..\repo2docker\app.py
:AttributeError: module 'os' has no attribute 'geteuid'
To fix this, you will need to add 'import psutil' in the app.py file, and change the command
os.geteuid()
withpsutil.Process(os.getpid()).pid
.
4. Tag the docker image
docker tag jhubimg gcr.io/dstoolkit/jhub
5. Push the docker image onto Google Cloud
docker push gcr.io/dstoolkit/jhub
NOTE: In most of the documentation I saw people use:
jupyter-repo2docker <YOUR-GITHUB-REPOSITORY> --image=gcr.io/<PROJECT-NAME>/<IMAGE-NAME>:<TAG> --no-run
to generate docker image, but it does not work for me. The error message is:
--image-name: 'gcr.io/....' is not a valid docker image name.
So, I choose to create the docker on my local machine first and then push it to the Google Cloud.
6. Edit Config.yaml file
Edit the JupyterHub configuration to build from this image. Edit config.yaml file to include these lines in it:
singleuser:
image:
name: gcr.io/dstoolkit/jhub
tag: latest
7. Relaunch the jupyterhub with new config file with the new docker image
helm upgrade jhub jupyterhub/jupyterhub --version=0.8.2 --values config.yaml
8. Restart your notebook
If you are already logged in. If you already have a running JupyterHub session, you’ll need to restart it (by stopping and starting your session from the control panel in the top right). New users won’t have to do this.
To embed your jupyterhub into website you can use iframe:
<iframe src='http://<your_external_IP>'></iframe>
But this is not encough because you will encounter an error about Content Security Policy like the picture below:
To embed your jupyterhub into website, add the following code into your config.yaml file: Yeah! You have a jupyterHub server on your own website page now!