Add scalability

angus924 · Nov 3, 2019 · 84d24c0 · 84d24c0
1 parent fe2b997
commit 84d24c0
Show file tree

Hide file tree

Showing 3 changed files with 398 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -35,11 +35,15 @@ To use ROCKET, you will need:
 
 * Python (3.7+);
 * Numba (0.45.1+);
-* NumPy; and
+* NumPy;
 * scikit-learn (or equivalent).
 
 All of these should be ready to go in [Anaconda](https://www.anaconda.com/distribution/).
 
+For `reproduce_experiments_bakeoff.py`, we also use pandas (included in Anaconda).
+
+For `reproduce_experiments_scalability.py`, you will also need [PyTorch](https://pytorch.org/) (1.2+).
+
 ## Basic Use
 
 The key ROCKET functions, `generate_kernels(...)` and `apply_kernels(...)`, are contained in [`rocket_functions.py`](./code/rocket_functions.py).  A worked example is provided in the [demo](./code/demo.ipynb) notebook.
@@ -99,7 +103,27 @@ python reproduce_experiments_bakeoff.py -i ./Univariate_arff/ -o ./ -n 1 -k 100
 
 ### Scalability
 
-*(Forthcoming...)*
+[`reproduce_experiments_scalability.py`](./code/reproduce_experiments_scalability.py) is intended to:
+
+* allow for reproduction of the scalability experiments (in terms of dataset size); and
+* serve as a template for integrating ROCKET with logistic / softmax regression and stochastic gradient descent (or, e.g., Adam) for other large datasets using PyTorch.
+
+The required arguments are:
+
+* `-tr` or `--training_path`, the training dataset (csv);
+* `-te` or `--test_path`, the test dataset (csv);
+* `-o` or `--output_path`, the directory in which to save the results;
+* `-k` or `--num_kernels`, the number of kernels.
+
+**Note**: It may be necessary to adapt the code to your dataset in terms of dataset size and structure, regularisation, etc.
+
+Examples:
+
+```bash
+python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 100
+python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 1_000
+python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 10_000
+```
 
 ## Contributing
 

diff --git a/code/demo.ipynb b/code/demo.ipynb
@@ -56,10 +56,14 @@
     "\n",
     "* Python (3.7+);\n",
     "* Numba (0.45.1+);\n",
-    "* NumPy; and\n",
+    "* NumPy;\n",
     "* scikit-learn (or equivalent).\n",
     "\n",
-    "All of these should be ready to go in [Anaconda](https://www.anaconda.com/distribution/)."
+    "All of these should be ready to go in [Anaconda](https://www.anaconda.com/distribution/).\n",
+    "\n",
+    "For `reproduce_experiments_bakeoff.py`, we also use pandas (included in Anaconda).\n",
+    "\n",
+    "For `reproduce_experiments_scalability.py`, you will also need [PyTorch](https://pytorch.org/) (1.2+)."
    ]
   },
   {
@@ -313,6 +317,20 @@
     "# 5 Reproducing the Experiments"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## UCR Archive"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 'Bake Off' Datasets"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -336,6 +354,54 @@
     "python reproduce_experiments_bakeoff.py -i ./Univariate_arff/ -o ./ -n 1 -k 100\n",
     "```"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Additional 2018 Datasets"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*(Forthcoming...)*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Scalability"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`reproduce_experiments_scalability.py` is intended to:\n",
+    "\n",
+    "* allow for reproduction of the scalability experiments (in terms of dataset size); and\n",
+    "* serve as a template for integrating ROCKET with logistic / softmax regression and stochastic gradient descent (or, e.g., Adam) for other large datasets using PyTorch.\n",
+    "\n",
+    "The required arguments are:\n",
+    "\n",
+    "* `-tr` or `--training_path`, the training dataset (csv);\n",
+    "* `-te` or `--test_path`, the test dataset (csv);\n",
+    "* `-o` or `--output_path`, the directory in which to save the results;\n",
+    "* `-k` or `--num_kernels`, the number of kernels.\n",
+    "\n",
+    "**Note**: It may be necessary to adapt the code to your dataset in terms of dataset size and structure, regularisation, etc.\n",
+    "\n",
+    "Examples:\n",
+    "\n",
+    "```bash\n",
+    "python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 100\n",
+    "python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 1_000\n",
+    "python reproduce_experiments_scalability.py -tr training_data.csv -te test_data.csv -o ./ -k 10_000\n",
+    "```"
+   ]
   }
  ],
  "metadata": {