Follow steps in guide: https://huggingface.co/docs/transformers/training
- Login:
huggingface-cli login
If you get output about Authenticated through git-credential store but this isn't the helper defined on your machine.
, then follow the instructions to fix.
Tip: You can get your token from https://huggingface.co/settings/tokens and it needs to be a WRITE token.
- Run
python hugging-face/hf_fine_tune_hello_world.py
Manually upload data from UX or from API.
To load do the following:
from datasets import load_dataset
remote_dataset = load_dataset("noahgift/social-power-nba")
remote_dataset
- Find a simple and small dataset: kaggle, your own, a sample dataset
- Go to Hugging Face website and upload
- Download and explore dataset
- Enhance dataset by filling out dataset metadata.
- Build a Demo for it.
Use the huggingface-cli
(venv) @noahgift ➜ /workspaces/hugging-face-tutorials (GPU) $ huggingface-cli scan-cache
REPO ID REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH
---------------------------- --------- ------------ -------- ------------- ------------- ---- ----------------------------------------------------------------------------
bert-base-cased model 436.4M 5 2 days ago 2 days ago main /home/codespace/.cache/huggingface/hub/models--bert-base-cased
bert-base-uncased model 441.2M 5 2 hours ago 2 hours ago main /home/codespace/.cache/huggingface/hub/models--bert-base-uncased
google/pegasus-cnn_dailymail model 1.9M 4 1 hour ago 1 hour ago main /home/codespace/.cache/huggingface/hub/models--google--pegasus-cnn_dailymail
gpt2 model 551.0M 5 2 days ago 2 days ago main /home/codespace/.cache/huggingface/hub/models--gpt2
gpt2-xl model 6.4G 5 1 hour ago 1 hour ago main /home/codespace/.cache/huggingface/hub/models--gpt2-xl
- Upload model to Hugging Face website
- Fill out model card
- Use model
Why transfer learning?
- One batch in PyTorch
- Using sacrebleu (precision based "Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances"). Recall is "while recall (also known as sensitivity) is the fraction of relevant instances that were retrieved" - wikipedia
- The ROUGE score was specifically developed for applications like summarization where high recall is more important than just precision!
rouge_metric = load_metric("rouge")
from datasets import load_metric
bleu_metric = load_metric("sacrebleu")
- Need token and follow guide
- Refer to HuggingFace course
use huggingface-cli login
and pass in your token
The following examples test out the GPU
- run pytorch training test:
python utils/quickstart_pytorch.py
- run pytorch CUDA test:
python utils/verify_cuda_pytorch.py
- run tensorflow training test:
python utils/quickstart_tf2.py
- run nvidia monitoring test:
nvidia-smi -l 1
it should show a GPU - run whisper transcribe test
./utils/transcribe-whisper.sh
and verify GPU is working withnvidia-smi -l 1
Additionally, this workspace is setup to fine-tune Hugging Face
python hf_fine_tune_hello_world.py
Used as the base and customized in the following Duke MLOps and Applied Data Engineering Coursera Labs:
- MLOPs-C2-Lab1-CICD
- MLOps-C2-Lab2-PokerSimulator
- MLOps-C2-Final-HuggingFace
- Coursera-MLOps-C2-lab3-probability-simulations
- Coursera-MLOps-C2-lab4-greedy-optimization
- nlp-with-transformers / notebooks
- Natural Language Processing with Transformers, Revised Edition
- Building Cloud Computing Solutions at Scale Specialization
- Python, Bash and SQL Essentials for Data Engineering Specialization
- Implementing MLOps in the Enterprise
- Practical MLOps: Operationalizing Machine Learning Models
- Coursera-Dockerfile