Skip to content

Commit

Permalink
REL DOC Updates for main branch switch
Browse files Browse the repository at this point in the history
[skip ci] Update master references for main branch
  • Loading branch information
mike-wendt authored Jul 16, 2020
2 parents 2240327 + e8bf8ea commit 7a63301
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
10 changes: 5 additions & 5 deletions RecSys2019/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
## Accelerating Recommender Systems by 15x with RAPIDS (Source Code)
This repository contains demonstrations of the acceleration techniques used to accelerate the training of the fastai tabular deep learning model by a factor of 15x and the feature engineering of the inputs to the model by 9.7x during the RecSys 2019 Challenge where the team placed 15th/1534 teams. The [paper which goes into detail regarding our solution](https://github.com/rapidsai/deeplearning/blob/master/RecSys2019/RAPIDS%20RecSys%20Challenge%202019.pdf) is also available. Please follow the instructions carefully as additional files are currently a part of the repository that will be used for future versions.
This repository contains demonstrations of the acceleration techniques used to accelerate the training of the fastai tabular deep learning model by a factor of 15x and the feature engineering of the inputs to the model by 9.7x during the RecSys 2019 Challenge where the team placed 15th/1534 teams. The [paper which goes into detail regarding our solution](https://github.com/rapidsai/deeplearning/blob/main/RecSys2019/RAPIDS%20RecSys%20Challenge%202019.pdf) is also available. Please follow the instructions carefully as additional files are currently a part of the repository that will be used for future versions.

## Prerequisites:
- Collect the data at the following location: https://recsys.trivago.cloud/challenge/dataset/ (you need to sign up to get access)
- [install either RAPIDS 0.9 or the nightly version RAPIDS(0.10a+)](https://rapids.ai/start.html)** (or use the RAPIDS containers), [PyTorch nightly](https://pytorch.org/get-started/locally/) and [fastai v1](https://docs.fast.ai/install.html)
- A GPU capable of fitting the entire dataset in GPU memory (32GB Tesla V100). We are working on versions that remove that restriction.

## Feature Creation
To give a point of comparison we've provided feature creation using both [rapids cuDF](https://github.com/rapidsai/dataloaders/tree/master/RecSys2019/FeatureEngineering/rapids) and [pandas](https://github.com/rapidsai/dataloaders/tree/master/RecSys2019/FeatureEngineering/pandas) which will hopefully help you translate your own feature engineering steps into cuDF.
To give a point of comparison we've provided feature creation using both [rapids cuDF](https://github.com/rapidsai/dataloaders/tree/main/RecSys2019/FeatureEngineering/rapids) and [pandas](https://github.com/rapidsai/dataloaders/tree/main/RecSys2019/FeatureEngineering/pandas) which will hopefully help you translate your own feature engineering steps into cuDF.

[create_data_pair_comparison_rapids.ipynb](https://github.com/rapidsai/dataloaders/blob/master/RecSys2019/FeatureEngineering/rapids/create_data_pair_comparison-rapids.ipynb) (or [it's pandas equivalent](https://github.com/rapidsai/dataloaders/blob/master/RecSys2019/FeatureEngineering/pandas/create_data_pair_comparison-panda.ipynb)) is the starting point for feature engineering and must be run before all other scripts.
[create_data_pair_comparison_rapids.ipynb](https://github.com/rapidsai/dataloaders/blob/main/RecSys2019/FeatureEngineering/rapids/create_data_pair_comparison-rapids.ipynb) (or [it's pandas equivalent](https://github.com/rapidsai/dataloaders/blob/main/RecSys2019/FeatureEngineering/pandas/create_data_pair_comparison-panda.ipynb)) is the starting point for feature engineering and must be run before all other scripts.

Other examples of feature engineering are available in the [rapids](https://github.com/rapidsai/dataloaders/tree/master/RecSys2019/FeatureEngineering/rapids) and [pandas](https://github.com/rapidsai/dataloaders/tree/master/RecSys2019/FeatureEngineering/pandas) folders but should not be run in the current iteration because they scale the dataset to a size that no longer fits in GPU memory when training. A version of the preprocessing that handles larger than GPU memory datasets has been developed and is currently being refined. It will be made available for RecSys 2019.
Other examples of feature engineering are available in the [rapids](https://github.com/rapidsai/dataloaders/tree/main/RecSys2019/FeatureEngineering/rapids) and [pandas](https://github.com/rapidsai/dataloaders/tree/main/RecSys2019/FeatureEngineering/pandas) folders but should not be run in the current iteration because they scale the dataset to a size that no longer fits in GPU memory when training. A version of the preprocessing that handles larger than GPU memory datasets has been developed and is currently being refined. It will be made available for RecSys 2019.

After you have successfully completed feature engineering. Verify the exported train.parquet, valid.parquet and test.parquet files exist in the rsc19/cache/ folder. Once verified, you can proceed to preprocessing and model training.

Note the version included does not require the LAMB optimizer; we were able to achieve similar performance with AdamW. For deeper networks it likely will be a critical component. The version of LAMB we used in our testing can be found [here](https://github.com/cybertronai/pytorch-lamb).

## Preprocessing and Training

The Training folder (currently in development) will contain the end to end workflows for an unoptimized version and [a GPU in memory version](https://github.com/rapidsai/dataloaders/blob/master/RecSys2019/Training/optimized_training_workflow_gpu.ipynb). Setting 'to_cpu = True' on cell 7 of the GPU notebook modifies it so that the tensor is copied to CPU and dataloading happens from there. Note that in order to see the full optimization effects you need to use the nightly version of PyTorch. Without the kernel enhancement the model is ~6.5x slower.
The Training folder (currently in development) will contain the end to end workflows for an unoptimized version and [a GPU in memory version](https://github.com/rapidsai/dataloaders/blob/main/RecSys2019/Training/optimized_training_workflow_gpu.ipynb). Setting 'to_cpu = True' on cell 7 of the GPU notebook modifies it so that the tensor is copied to CPU and dataloading happens from there. Note that in order to see the full optimization effects you need to use the nightly version of PyTorch. Without the kernel enhancement the model is ~6.5x slower.

## Future work

Expand Down
2 changes: 1 addition & 1 deletion pytorch/batch_dataloader/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Using the proposed method results in better GPU utilization, and better throughp

I've created source for a batch dataloader and batch dataset modelled after their vanilla counterparts and would love to see it integrated into the PyTorch repo. Usage is similar, and I've tried to stick to the pytorch variable naming and formatting.

Code can be found here: https://github.com/rapidsai/dataloaders/tree/master/pytorch/batch_dataloader
Code can be found here: https://github.com/rapidsai/dataloaders/tree/main/pytorch/batch_dataloader

It should hopefully be ready to go; I've tested it with both base pytorch and with ignite, but more eyes on it would definitely be beneficial, particularly in use cases beyond tabular like text or small images. It should be applicable to anyone who isn't doing large images or a lot of image augmentation. It's undergone an internal (NVidia) review of @ptrblck who was immensely helpful in refining it and @ngimel who reviewed the codebase and had helpful suggestions regarding memory pinning.

Expand Down

0 comments on commit 7a63301

Please sign in to comment.