-
Notifications
You must be signed in to change notification settings - Fork 55
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #101 from NVIDIA/dev2.8
Dev2.8
- Loading branch information
Showing
13 changed files
with
688 additions
and
192 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
FROM nvcr.io/nvidia/pytorch:21.02-py3 | ||
|
||
RUN pip install pytorch-lightning==1.2.2 | ||
RUN pip install torchmetrics | ||
|
||
RUN git clone https://github.com/PyTorchLightning/pytorch-lightning.git | ||
COPY test.sh / |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Image Classification Speed Test | ||
|
||
This example is based on a | ||
[PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/domain_templates/computer_vision_fine_tuning.py) | ||
image classification with transfer learning code. The provided `run.sh` script will build a docker container for you and run the test. You can edit the | ||
script and specify the batch size, number of epochs, GPUs (set to 0 for a CPU test), and well as the number of cores / workers. | ||
|
||
On a GPU, this test takes just a | ||
few minutes to run, depending on the model. It will likely take quite a bit longer on CPU. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
#!/bin/sh | ||
|
||
# the name for the transient docker image | ||
IMG=foo | ||
|
||
|
||
# use 128 for 16G cards | ||
# batch size 16 takes less than 5 GB of GPU mem | ||
BATCH_SIZE=128 | ||
RUN_EPOCHS=15 | ||
# set to 0 for a CPU only test | ||
# on a multi-gpu machine, setting to > 1 works | ||
# but the test has overhead so the results are not representative | ||
GPUS=1 | ||
|
||
# this setting should likely match the number of cores in the system | ||
# this is the number of cores to use for the dataloader | ||
WORKERS=16 | ||
|
||
echo `date` building docker image | ||
docker build -t ${IMG} -f Dockerfile . | ||
|
||
echo `date` launching... | ||
|
||
# the idea is to do two runs and time the second | ||
# because the first (shorter, 1 epoch) run will download and prepare / cache the data we don't care about timing that | ||
docker run --rm --ipc=host ${IMG} /test.sh ${BATCH_SIZE} ${RUN_EPOCHS} ${GPUS} ${WORKERS} | ||
|
||
# docker rmi ${IMG} | ||
echo `date` all done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#!/bin/bash | ||
|
||
|
||
if [ $# -lt 4 ] | ||
then | ||
echo "use: test.sh <batch_size> <epochs> <gpus> <num_workers>" | ||
exit 1 | ||
fi | ||
|
||
BATCH="$1" | ||
EPOCHS="$2" | ||
GPUS="$3" | ||
WORKERS="$4" | ||
|
||
echo "warmup run starting" | ||
python /workspace/pytorch-lightning/pl_examples/domain_templates/computer_vision_fine_tuning.py --epochs 1 --batch-size ${BATCH} --gpus ${GPUS} --num_workers ${WORKERS} | ||
|
||
echo "running timed test" | ||
time python /workspace/pytorch-lightning/pl_examples/domain_templates/computer_vision_fine_tuning.py --epochs ${EPOCHS} --batch-size ${BATCH} --gpus ${GPUS} --num_workers ${WORKERS} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# kNN Speed Test | ||
|
||
This example is based on Chris Deotte's [Kaggle notebook](https://www.kaggle.com/cdeotte/rapids-gpu-knn-mnist-0-97/notebook), where a GPU-accelerated kNN classifier is used | ||
is used in the Kaggle MNIST competition. The GPU / CPU speedup will depend on your hardware, but we routinely see 100+x performance improvements. On a GPU, | ||
the 100x inference (cell 13) takes less than a minute to run. To run the same test on CPU, select the number of cores (cell 14) and then run the test in cell 15. | ||
|
||
These massive speedups are a game changer when it comes to rapid experimentation, model architecture selection, and hyperparameter optimization. | ||
|
||
To run this example, simply launch the data science stack container or conda environment and run the notebook. Alternatively, you could use one of RAPIDS containers. | ||
|
313 changes: 313 additions & 0 deletions
313
benchmarks/rapids/knn/DigitRecognizer/rapids-gpu-knn-mnist-0-97.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
Oops, something went wrong.