Skip to content

Commit

Permalink
Add scalability
Browse files Browse the repository at this point in the history
  • Loading branch information
angus924 committed Nov 3, 2019
1 parent fe2b997 commit 84d24c0
Show file tree
Hide file tree
Showing 3 changed files with 398 additions and 4 deletions.
28 changes: 26 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,15 @@ To use ROCKET, you will need:

* Python (3.7+);
* Numba (0.45.1+);
* NumPy; and
* NumPy;
* scikit-learn (or equivalent).

All of these should be ready to go in [Anaconda](https://www.anaconda.com/distribution/).

For `reproduce_experiments_bakeoff.py`, we also use pandas (included in Anaconda).

For `reproduce_experiments_scalability.py`, you will also need [PyTorch](https://pytorch.org/) (1.2+).

## Basic Use

The key ROCKET functions, `generate_kernels(...)` and `apply_kernels(...)`, are contained in [`rocket_functions.py`](./code/rocket_functions.py). A worked example is provided in the [demo](./code/demo.ipynb) notebook.
Expand Down Expand Up @@ -99,7 +103,27 @@ python reproduce_experiments_bakeoff.py -i ./Univariate_arff/ -o ./ -n 1 -k 100

### Scalability

*(Forthcoming...)*
[`reproduce_experiments_scalability.py`](./code/reproduce_experiments_scalability.py) is intended to:

* allow for reproduction of the scalability experiments (in terms of dataset size); and
* serve as a template for integrating ROCKET with logistic / softmax regression and stochastic gradient descent (or, e.g., Adam) for other large datasets using PyTorch.

The required arguments are:

* `-tr` or `--training_path`, the training dataset (csv);
* `-te` or `--test_path`, the test dataset (csv);
* `-o` or `--output_path`, the directory in which to save the results;
* `-k` or `--num_kernels`, the number of kernels.

**Note**: It may be necessary to adapt the code to your dataset in terms of dataset size and structure, regularisation, etc.

Examples:

```bash
python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 100
python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 1_000
python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 10_000
```

## Contributing

Expand Down
70 changes: 68 additions & 2 deletions code/demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,14 @@
"\n",
"* Python (3.7+);\n",
"* Numba (0.45.1+);\n",
"* NumPy; and\n",
"* NumPy;\n",
"* scikit-learn (or equivalent).\n",
"\n",
"All of these should be ready to go in [Anaconda](https://www.anaconda.com/distribution/)."
"All of these should be ready to go in [Anaconda](https://www.anaconda.com/distribution/).\n",
"\n",
"For `reproduce_experiments_bakeoff.py`, we also use pandas (included in Anaconda).\n",
"\n",
"For `reproduce_experiments_scalability.py`, you will also need [PyTorch](https://pytorch.org/) (1.2+)."
]
},
{
Expand Down Expand Up @@ -313,6 +317,20 @@
"# 5 Reproducing the Experiments"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## UCR Archive"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 'Bake Off' Datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -336,6 +354,54 @@
"python reproduce_experiments_bakeoff.py -i ./Univariate_arff/ -o ./ -n 1 -k 100\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Additional 2018 Datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*(Forthcoming...)*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scalability"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`reproduce_experiments_scalability.py` is intended to:\n",
"\n",
"* allow for reproduction of the scalability experiments (in terms of dataset size); and\n",
"* serve as a template for integrating ROCKET with logistic / softmax regression and stochastic gradient descent (or, e.g., Adam) for other large datasets using PyTorch.\n",
"\n",
"The required arguments are:\n",
"\n",
"* `-tr` or `--training_path`, the training dataset (csv);\n",
"* `-te` or `--test_path`, the test dataset (csv);\n",
"* `-o` or `--output_path`, the directory in which to save the results;\n",
"* `-k` or `--num_kernels`, the number of kernels.\n",
"\n",
"**Note**: It may be necessary to adapt the code to your dataset in terms of dataset size and structure, regularisation, etc.\n",
"\n",
"Examples:\n",
"\n",
"```bash\n",
"python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 100\n",
"python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 1_000\n",
"python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 10_000\n",
"```"
]
}
],
"metadata": {
Expand Down
Loading

0 comments on commit 84d24c0

Please sign in to comment.