Add SPOS docs and improve NAS doc structure (microsoft#1907)

* darts mutator docs * fix docs * update * add docs for SPOS * index SPOS * restore workers
konnase · Dec 31, 2019 · c993f76 · c993f76
1 parent 31f545e
commit c993f76
Show file tree

Hide file tree

Showing 19 changed files with 395 additions and 170 deletions.
diff --git a/docs/en_US/NAS/DARTS.md b/docs/en_US/NAS/DARTS.md
@@ -1,18 +1,50 @@
-# DARTS on NNI
+# DARTS
 
 ## Introduction
 
-The paper [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent
+The paper [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent.
 
-To implement, authors optimize the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance.
+Authors' code optimizes the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance.
 
-Implementation on NNI is based on the [official implementation](https://github.com/quark0/darts) and a [popular 3rd-party repo](https://github.com/khanrc/pt.darts). So far, first and second order optimization and training from scratch on CIFAR10 have been implemented.
+Implementation on NNI is based on the [official implementation](https://github.com/quark0/darts) and a [popular 3rd-party repo](https://github.com/khanrc/pt.darts). DARTS on NNI is designed to be general for arbitrary search space. A CNN search space tailored for CIFAR10, same as the original paper, is implemented as a use case of DARTS.
 
-## Reproduce Results
+## Reproduction Results
 
-To reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready.
+The above-mentioned example is meant to reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready.
 
 |                        | In paper      | Reproduction |
 | ---------------------- | ------------- | ------------ |
 | First order (CIFAR10)  | 3.00 +/- 0.14 | 2.78         |
 | Second order (CIFAR10) | 2.76 +/- 0.09 | 2.89         |
+
+## Examples
+
+### CNN Search Space
+
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/darts)
+
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+
+# search the best architecture
+cd examples/nas/darts
+python3 search.py
+
+# train the best architecture
+python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
+```
+
+## Reference
+
+### PyTorch
+
+```eval_rst
+..  autoclass:: nni.nas.pytorch.darts.DartsTrainer
+    :members:
+
+    .. automethod:: __init__
+
+..  autoclass:: nni.nas.pytorch.darts.DartsMutator
+    :members:
+```
diff --git a/docs/en_US/NAS/ENAS.md b/docs/en_US/NAS/ENAS.md
@@ -1,7 +1,46 @@
-# ENAS on NNI
+# ENAS
 
 ## Introduction
 
 The paper [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268) uses parameter sharing between child models to accelerate the NAS process. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss.
 
-Implementation on NNI is based on the [official implementation in Tensorflow](https://github.com/melodyguan/enas), macro and micro search space on CIFAR10 included. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable.
+Implementation on NNI is based on the [official implementation in Tensorflow](https://github.com/melodyguan/enas), including a general-purpose Reinforcement-learning controller and a trainer that trains target network and this controller alternatively. Following paper, we have also implemented macro and micro search space on CIFAR10 to demonstrate how to use these trainers. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable.
+
+## Examples
+
+### CIFAR10 Macro/Micro Search Space
+
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/enas)
+
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+
+# search the best architecture
+cd examples/nas/enas
+
+# search in macro search space
+python3 search.py --search-for macro
+
+# search in micro search space
+python3 search.py --search-for micro
+
+# view more options for search
+python3 search.py -h
+```
+
+## Reference
+
+### PyTorch
+
+```eval_rst
+..  autoclass:: nni.nas.pytorch.enas.EnasTrainer
+    :members:
+
+    .. automethod:: __init__
+
+..  autoclass:: nni.nas.pytorch.enas.EnasMutator
+    :members:
+
+    .. automethod:: __init__
+```
diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
@@ -6,93 +6,32 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r
 
 With this motivation, our ambition is to provide a unified architecture in NNI, to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
 
-With [the unified interface](./NasInterface.md), there are two different modes for the architecture search. [The one](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](./NasInterface.md#classic-distributed-search) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.
+With [the unified interface](./NasInterface.md), there are two different modes for the architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](./NasInterface.md#classic-distributed-search) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.
 
 * [Supported One-shot NAS Algorithms](#supported-one-shot-nas-algorithms)
 * [Classic Distributed NAS with NNI experiment](./NasInterface.md#classic-distributed-search)
 * [NNI NAS Programming Interface](./NasInterface.md)
 
 ## Supported One-shot NAS Algorithms
 
-NNI supports below NAS algorithms now and being adding more. User can reproduce an algorithm or use it on owned dataset. we also encourage user to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
+NNI supports below NAS algorithms now and is adding more. User can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
 
 |Name|Brief Introduction of Algorithm|
 |---|---|
-| [ENAS](#enas) | Efficient Neural Architecture Search via Parameter Sharing [Reference Paper][1] |
-| [DARTS](#darts) | DARTS: Differentiable Architecture Search [Reference Paper][3] |
-| [P-DARTS](#p-darts) | Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [Reference Paper](https://arxiv.org/abs/1904.12760)|
+| [ENAS](ENAS.md) | [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268). In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance. |
+| [DARTS](DARTS.md) | [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) introduces a novel algorithm for differentiable network architecture search on bilevel optimization. |
+| [P-DARTS](PDARTS.md) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
+| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
 
-Note, these algorithms run **standalone without nnictl**, and supports PyTorch only. Tensorflow 2.0 will be supported in future release.
+One-shot algorithms run **standalone without nnictl**. Only PyTorch version has been implemented. Tensorflow 2.x will be supported in future release.
 
-### Dependencies
+Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
 
 * NNI 1.2+
 * tensorboard
 * PyTorch 1.2+
 * git
 
-### ENAS
-
-[Efficient Neural Architecture Search via Parameter Sharing][1]. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance.
-
-#### Usage
-
-ENAS in NNI is still under development and we only support search phase for macro/micro search space on CIFAR10. Training from scratch and search space on PTB has not been finished yet. [Detailed Description](ENAS.md)
-
-```bash
-# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-git clone https://github.com/Microsoft/nni.git
-
-# search the best architecture
-cd examples/nas/enas
-
-# search in macro search space
-python3 search.py --search-for macro
-
-# search in micro search space
-python3 search.py --search-for micro
-
-# view more options for search
-python3 search.py -h
-```
-
-### DARTS
-
-The main contribution of [DARTS: Differentiable Architecture Search][3] on algorithm is to introduce a novel algorithm for differentiable network architecture search on bilevel optimization. [Detailed Description](DARTS.md)
-
-#### Usage
-
-```bash
-# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-git clone https://github.com/Microsoft/nni.git
-
-# search the best architecture
-cd examples/nas/darts
-python3 search.py
-
-# train the best architecture
-python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
-```
-
-### P-DARTS
-
-[Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) bases on [DARTS](#DARTS). It's contribution on algorithm is to introduce an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure.
-
-#### Usage
-
-```bash
-# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-git clone https://github.com/Microsoft/nni.git
-
-# search the best architecture
-cd examples/nas/pdarts
-python3 search.py
-
-# train the best architecture, it's the same progress as darts.
-cd ../darts
-python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
-```
-
 ## Use NNI API
 
 NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated in future.
@@ -104,7 +43,7 @@ The programming interface of designing and searching a model is often demanded i
 1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs  best. So, it needs an easy way to express the candidate layers or sub-models.
 2. When applying NAS on a neural network, it needs an unified way to express the search space of architectures, so that it doesn't need to update trial code for different searching algorithms.
 
-NNI proposed API is [here](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch). And [here](https://github.com/microsoft/nni/tree/master/examples/nas/darts) is an example of NAS implementation, which bases on NNI proposed interface.
+NNI proposed API is [here](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch). And [here](https://github.com/microsoft/nni/tree/master/examples/nas/naive) is an example of NAS implementation, which bases on NNI proposed interface.
 
 [1]: https://arxiv.org/abs/1802.03268
 [2]: https://arxiv.org/abs/1707.07012

diff --git a/docs/en_US/NAS/PDARTS.md b/docs/en_US/NAS/PDARTS.md
@@ -0,0 +1,18 @@
+# P-DARTS
+
+## Examples
+
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/pdarts)
+
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+
+# search the best architecture
+cd examples/nas/pdarts
+python3 search.py
+
+# train the best architecture, it's the same progress as darts.
+cd ../darts
+python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
+```
diff --git a/docs/en_US/NAS/SPOS.md b/docs/en_US/NAS/SPOS.md
@@ -0,0 +1,119 @@
+# Single Path One-Shot (SPOS)
+
+## Introduction
+
+Proposed in [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.
+
+Implementation on NNI is based on [official repo](https://github.com/megvii-model/SinglePathOneShot). We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase. We have also shown 
+
+## Examples
+
+Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling.
+
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/spos)
+
+### Requirements
+
+NVIDIA DALI >= 0.16 is needed as we use DALI to accelerate the data loading of ImageNet. [Installation guide](https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/installation.html)
+
+Download the flops lookup table from [here](https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN) (maintained by [Megvii](https://github.com/megvii-model)).
+Put `op_flops_dict.pkl` and `checkpoint-150000.pth.tar` (if you don't want to retrain the supernet) under `data` directory.
+
+Prepare ImageNet in the standard format (follow the script [here](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4)). Linking it to `data/imagenet` will be more convenient.
+
+After preparation, it's expected to have the following code structure:
+
+```
+spos
+├── architecture_final.json
+├── blocks.py
+├── config_search.yml
+├── data
+│   ├── imagenet
+│   │   ├── train
+│   │   └── val
+│   └── op_flops_dict.pkl
+├── dataloader.py
+├── network.py
+├── readme.md
+├── scratch.py
+├── supernet.py
+├── tester.py
+├── tuner.py
+└── utils.py
+```
+
+### Step 1. Train Supernet
+
+```
+python supernet.py
+```
+
+Will export the checkpoint to `checkpoints` directory, for the next step.
+
+NOTE: The data loading used in the official repo is [slightly different from usual](https://github.com/megvii-model/SinglePathOneShot/issues/5), as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option `--spos-preprocessing` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
+
+### Step 2. Evolution Search
+
+Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
+
+In order to make the tuner aware of the flops limit and have the ability to calculate the flops, we created a new tuner called `EvolutionWithFlops` in `tuner.py`, inheriting the tuner in SDK.
+
+To have a search space ready for NNI framework, first run
+
+```
+nnictl ss_gen -t "python tester.py"
+```
+
+This will generate a file called `nni_auto_gen_search_space.json`, which is a serialized representation of your search space.
+
+By default, it will use `checkpoint-150000.pth.tar` downloaded previously. In case you want to use the checkpoint trained by yourself from the last step, specify `--checkpoint` in the command in `config_search.yml`.
+
+Then search with evolution tuner.
+
+```
+nnictl create --config config_search.yml
+```
+
+The final architecture exported from every epoch of evolution can be found in `checkpoints` under the working directory of your tuner, which, by default, is `$HOME/nni/experiments/your_experiment_id/log`.
+
+### Step 3. Train from Scratch
+
+```
+python scratch.py
+```
+
+By default, it will use `architecture_final.json`. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with `--fixed-arc` option.
+
+## Reference
+
+### PyTorch
+
+```eval_rst
+..  autoclass:: nni.nas.pytorch.spos.SPOSEvolution
+    :members:
+
+    .. automethod:: __init__
+
+..  autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainer
+    :members:
+
+    .. automethod:: __init__
+
+..  autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainingMutator
+    :members:
+
+    .. automethod:: __init__
+```
+
+## Known Limitations
+
+* Block search only. Channel search is not supported yet.
+* Only GPU version is provided here.
+
+## Current Reproduction Results
+
+Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
+
+* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to [this issue](https://github.com/megvii-model/SinglePathOneShot/issues/6).
+* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
diff --git a/docs/en_US/nas.rst b/docs/en_US/nas.rst
@@ -22,4 +22,5 @@ For details, please refer to the following tutorials:
     NAS Interface <NAS/NasInterface>
     ENAS <NAS/ENAS>
     DARTS <NAS/DARTS>
-    P-DARTS <NAS/Overview>
+    P-DARTS <NAS/PDARTS>
+    SPOS <NAS/SPOS>
diff --git a/examples/nas/darts/README.md b/examples/nas/darts/README.md
@@ -0,0 +1 @@
+[Documentation](https://nni.readthedocs.io/en/latest/NAS/DARTS.html)
diff --git a/examples/nas/enas/README.md b/examples/nas/enas/README.md
@@ -0,0 +1 @@
+[Documentation](https://nni.readthedocs.io/en/latest/NAS/ENAS.html)
diff --git a/examples/nas/naive/README.md b/examples/nas/naive/README.md
@@ -0,0 +1 @@
+This is a naive example that demonstrates how to use NNI interface to implement a NAS search space.
diff --git a/examples/nas/pdarts/README.md b/examples/nas/pdarts/README.md
@@ -0,0 +1 @@
+[Documentation](https://nni.readthedocs.io/en/latest/NAS/PDARTS.html)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		[Documentation](https://nni.readthedocs.io/en/latest/NAS/DARTS.html)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		This is a naive example that demonstrates how to use NNI interface to implement a NAS search space.