Skip to content

Commit

Permalink
Merge pull request microsoft#192 from microsoft/master
Browse files Browse the repository at this point in the history
merge master
  • Loading branch information
SparkSnail authored Jul 23, 2019
2 parents 1bd2012 + a5fa235 commit c76068f
Show file tree
Hide file tree
Showing 14 changed files with 799 additions and 399 deletions.
33 changes: 29 additions & 4 deletions docs/en_US/AdvancedFeature/MultiPhase.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,31 @@ It is pretty simple to use multi-phase in trial code, an example is shown below:
# ...
```

__2. Modify experiment configuration__
__2. Experiment configuration__

To enable multi-phase, you should also add `multiPhase: true` in your experiment YAML configure file. If this line is not added, `nni.get_next_parameter()` would always return the same configuration. For all the built-in tuners/advisors, you can use multi-phase in your trial code without modification of tuner/advisor spec in the YAML configure file.
To enable multi-phase, you should also add `multiPhase: true` in your experiment YAML configure file. If this line is not added, `nni.get_next_parameter()` would always return the same configuration.

Multi-phase experiment configuration example:

```
authorName: default
experimentName: multiphase experiment
trialConcurrency: 2
maxExecDuration: 1h
maxTrialNum: 8
trainingServicePlatform: local
searchSpacePath: search_space.json
multiPhase: true
useAnnotation: false
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
trial:
command: python3 mytrial.py
codeDir: .
gpuNum: 0
```

### Write a tuner that leverages multi-phase:

Expand All @@ -48,6 +70,9 @@ trial_end
```
With this information, the tuner could know which trial is requesting a configuration, and which trial is reporting results. This information provides enough flexibility for your tuner to deal with different trials and different phases. For example, you may want to use the trial_job_id parameter of generate_parameters method to generate hyperparameters for a specific trial job.

Of course, to use your multi-phase tuner, __you should add `multiPhase: true` in your experiment YAML configure file__.
### Tuners support multi-phase experiments:

[TPE](../Tuner/HyperoptTuner.md), [Random](../Tuner/HyperoptTuner.md), [Anneal](../Tuner/HyperoptTuner.md), [Evolution](../Tuner/EvolutionTuner.md), [SMAC](../Tuner/SmacTuner.md), [NetworkMorphism](../Tuner/NetworkmorphismTuner.md), [MetisTuner](../Tuner/MetisTuner.md), [BOHB](../Tuner/BohbAdvisor.md), [Hyperband](../Tuner/HyperbandAdvisor.md), [ENAS tuner](https://github.com/countif/enas_nni/blob/master/nni/examples/tuners/enas/nni_controller_ptb.py).

[ENAS tuner](https://github.com/countif/enas_nni/blob/master/nni/examples/tuners/enas/nni_controller_ptb.py) is an example of a multi-phase tuner.
### Training services support multi-phase experiment:
[Local Machine](../TrainingService/LocalMode.md), [Remote Servers](../TrainingService/RemoteMachineMode.md), [OpenPAI](../TrainingService/PaiMode.md)
38 changes: 16 additions & 22 deletions docs/en_US/Release.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# ChangeLog


## Release 0.9 - 7/1/2019

### Major Features
Expand Down Expand Up @@ -95,18 +96,18 @@

### Major Features

* [Version checking](https://github.com/Microsoft/nni/blob/master/docs/en_US/PaiMode.md#version-check)
* [Version checking](TrainingService/PaiMode.md)
* check whether the version is consistent between nniManager and trialKeeper
* [Report final metrics for early stop job](https://github.com/Microsoft/nni/issues/776)
* [Report final metrics for early stop job](https://github.com/microsoft/nni/issues/776)
* If includeIntermediateResults is true, the last intermediate result of the trial that is early stopped by assessor is sent to tuner as final result. The default value of includeIntermediateResults is false.
* [Separate Tuner/Assessor](https://github.com/Microsoft/nni/issues/841)
* [Separate Tuner/Assessor](https://github.com/microsoft/nni/issues/841)
* Adds two pipes to separate message receiving channels for tuner and assessor.
* Make log collection feature configurable
* Add intermediate result graph for all trials

### Bug fix

* [Add shmMB config key for OpenPAI](https://github.com/Microsoft/nni/issues/842)
* [Add shmMB config key for OpenPAI](https://github.com/microsoft/nni/issues/842)
* Fix the bug that doesn't show any result if metrics is dict
* Fix the number calculation issue for float types in hyperband
* Fix a bug in the search space conversion in SMAC tuner
Expand All @@ -121,8 +122,8 @@

### Documentation
* Chinese version document: https://nni.readthedocs.io/zh/latest/
* Debuggability/serviceability document: https://nni.readthedocs.io/en/latest/HowToDebug.html
* Tuner assessor reference: https://nni.readthedocs.io/en/latest/sdk_reference.html#tuner
* Debuggability/serviceability document: https://nni.readthedocs.io/en/latest/Tutorial/HowToDebug.html
* Tuner assessor reference: https://nni.readthedocs.io/en/latest/sdk_reference.html

### Bug Fixes and Other Changes
* Fix a race condition bug that does not store trial job cancel status correctly.
Expand All @@ -134,8 +135,8 @@

## Release 0.5.1 - 1/31/2018
### Improvements
* Making [log directory](https://github.com/Microsoft/nni/blob/v0.5.1/docs/en_US/ExperimentConfig.md) configurable
* Support [different levels of logs](https://github.com/Microsoft/nni/blob/v0.5.1/docs/en_US/ExperimentConfig.md), making it easier for debugging
* Making [log directory](https://github.com/microsoft/nni/blob/v0.5.1/docs/ExperimentConfig.md) configurable
* Support [different levels of logs](https://github.com/microsoft/nni/blob/v0.5.1/docs/ExperimentConfig.md), making it easier for debugging

### Documentation
* Reorganized documentation & New Homepage Released: https://nni.readthedocs.io/en/latest/
Expand Down Expand Up @@ -200,16 +201,16 @@

### New examples

* [FashionMnist](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism), work together with network morphism tuner
* [Distributed MNIST example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed-pytorch) written in PyTorch
* [FashionMnist](https://github.com/microsoft/nni/tree/master/examples/trials/network_morphism), work together with network morphism tuner
* [Distributed MNIST example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-distributed-pytorch) written in PyTorch

## Release 0.4 - 12/6/2018

### Major Features

* [Kubeflow Training service](TrainingService/KubeflowMode.md)
* Support tf-operator
* [Distributed trial example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow
* [Distributed trial example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow
* [Grid search tuner](Tuner/GridsearchTuner.md)
* [Hyperband tuner](Tuner/HyperbandAdvisor.md)
* Support launch NNI experiment on MAC
Expand Down Expand Up @@ -256,7 +257,7 @@
Each trial job is allocated a unique sequence number, which can be retrieved by nni.get_sequence_id() API.

```bash
git clone -b v0.3 https://github.com/Microsoft/nni.git
git clone -b v0.3 https://github.com/microsoft/nni.git
```

* **nni.report_final_result(result)** API supports more data types for result parameter.
Expand All @@ -278,20 +279,19 @@
docker pull msranni/nni:latest
```

* New trial example: [NNI Sklearn Example](https://github.com/Microsoft/nni/tree/master/examples/trials/sklearn)
* New competition example: [Kaggle Competition TGS Salt Example](https://github.com/Microsoft/nni/tree/master/examples/trials/kaggle-tgs-salt)
* New trial example: [NNI Sklearn Example](https://github.com/microsoft/nni/tree/master/examples/trials/sklearn)
* New competition example: [Kaggle Competition TGS Salt Example](https://github.com/microsoft/nni/tree/master/examples/trials/kaggle-tgs-salt)

### Others

* UI refactoring, refer to [WebUI doc](Tutorial/WebUI.md) for how to work with the new UI.
* Continuous Integration: NNI had switched to Azure pipelines
* [Known Issues in release 0.3.0](https://github.com/Microsoft/nni/labels/nni030knownissues).

## Release 0.2.0 - 9/29/2018

### Major Features

* Support [OpenPAI](https://github.com/Microsoft/pai) Training Platform (See [here](TrainingService/PaiMode.md) for instructions about how to submit NNI job in pai mode)
* Support [OpenPAI](https://github.com/microsoft/pai) Training Platform (See [here](TrainingService/PaiMode.md) for instructions about how to submit NNI job in pai mode)
* Support training services on pai mode. NNI trials will be scheduled to run on OpenPAI cluster
* NNI trial's output (including logs and model file) will be copied to OpenPAI HDFS for further debugging and checking
* Support [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) tuner (See [here](Tuner/SmacTuner.md) for instructions about how to use SMAC tuner)
Expand All @@ -301,9 +301,6 @@
* Update ga squad example and related documentation
* WebUI UX small enhancement and bug fix

### Known Issues

[Known Issues in release 0.2.0](https://github.com/Microsoft/nni/labels/nni020knownissues).

## Release 0.1.0 - 9/10/2018 (initial release)

Expand All @@ -327,6 +324,3 @@ Initial release of Neural Network Intelligence (NNI).
* Others
* Support simple GPU job scheduling

### Known Issues

[Known Issues in release 0.1.0](https://github.com/Microsoft/nni/labels/nni010knownissues).
1 change: 1 addition & 0 deletions src/nni_manager/core/nniDataStore.ts
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ class NNIDataStore implements DataStore {

public async getTrialJob(trialJobId: string): Promise<TrialJobInfo> {
const trialJobs: TrialJobInfo[] = await this.queryTrialJobs(undefined, trialJobId);
assert(trialJobs.length <= 1);

return trialJobs[0];
}
Expand Down
6 changes: 2 additions & 4 deletions src/nni_manager/core/nnimanager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -242,10 +242,8 @@ class NNIManager implements Manager {
});
}

public getTrialJob(trialJobId: string): Promise<TrialJobDetail> {
return Promise.resolve(
this.trainingService.getTrialJob(trialJobId)
);
public getTrialJob(trialJobId: string): Promise<TrialJobInfo> {
return this.dataStore.getTrialJob(trialJobId);
}

public async setClusterMetadata(key: string, value: string): Promise<void> {
Expand Down
7 changes: 6 additions & 1 deletion src/nni_manager/core/test/mockedDatastore.ts
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,12 @@ class MockedDataStore implements DataStore {
}

public getTrialJob(trialJobId: string): Promise<TrialJobInfo> {
throw new Error("Method not implemented.");
return Promise.resolve({
id: '1234',
status: 'SUCCEEDED',
startTime: Date.now(),
endTime: Date.now()
});
}

private async getFinalMetricData(trialJobId: string): Promise<any> {
Expand Down
6 changes: 5 additions & 1 deletion src/nni_manager/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,11 @@
},
"resolutions": {
"mem": "^4.0.0",
"handlebars": "^4.1.0"
"handlebars": "^4.1.0",
"lodash": "^4.17.13",
"lodash.merge": "^4.6.2",
"node.extend": "^1.1.7",
"hoek": "^4.2.1"
},
"engines": {
"node": ">=10.0.0"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -201,19 +201,26 @@ class FrameworkControllerTrainingService extends KubernetesTrainingService imple
throw new Error('Kubeflow Cluster config is not initialized');
}

if (this.fcTrialConfig === undefined) {
throw new Error('Kubeflow trial config is not initialized');
}

let trialJobOutputUrl: string = '';

if (this.fcClusterConfig.storageType === 'azureStorage') {
if (this.azureStorageClient === undefined) {
throw new Error('azureStorageClient is not initialized');
}
try {
//upload local files to azure storage
//upload local files, including scripts for running the trial and configuration (e.g., hyperparameters) for the trial, to azure storage
await AzureStorageClientUtility.uploadDirectory(
this.azureStorageClient, `nni/${getExperimentId()}/${trialJobId}`, this.azureStorageShare, `${trialLocalTempFolder}`);
//upload code files to azure storage
await AzureStorageClientUtility.uploadDirectory(
this.azureStorageClient, `nni/${getExperimentId()}/${trialJobId}`, this.azureStorageShare, `${this.fcTrialConfig.codeDir}`);

trialJobOutputUrl = `https://${this.azureStorageAccountName}.file.core.windows.net/\
${this.azureStorageShare}/${path.join('nni', getExperimentId(), trialJobId, 'output')}`;
trialJobOutputUrl = `https://${this.azureStorageAccountName}.file.core.windows.net/` +
`${this.azureStorageShare}/${path.join('nni', getExperimentId(), trialJobId, 'output')}`;
} catch (error) {
this.log.error(error);

Expand All @@ -226,7 +233,8 @@ class FrameworkControllerTrainingService extends KubernetesTrainingService imple
await cpp.exec(`mkdir -p ${this.trialLocalNFSTempFolder}/nni/${getExperimentId()}/${trialJobId}`);
// Copy code files from local dir to NFS mounted dir
await cpp.exec(`cp -r ${trialLocalTempFolder}/* ${this.trialLocalNFSTempFolder}/nni/${getExperimentId()}/${trialJobId}/.`);

// Copy codeDir to NFS mounted dir
await cpp.exec(`cp -r ${this.fcTrialConfig.codeDir}/* ${this.trialLocalNFSTempFolder}/nni/${getExperimentId()}/${trialJobId}/.`);
const nfsConfig: NFSConfig = nfsFrameworkControllerClusterConfig.nfs;
trialJobOutputUrl = `nfs://${nfsConfig.server}:${path.join(nfsConfig.path, 'nni', getExperimentId(), trialJobId, 'output')}`;
}
Expand Down Expand Up @@ -257,13 +265,12 @@ class FrameworkControllerTrainingService extends KubernetesTrainingService imple
throw new Error('frameworkcontroller trial config is not initialized');
}

await cpp.exec(`mkdir -p ${path.dirname(trialLocalTempFolder)}`);
await cpp.exec(`cp -r ${this.fcTrialConfig.codeDir} ${trialLocalTempFolder}`);
await cpp.exec(`mkdir -p ${trialLocalTempFolder}`);

const installScriptContent : string = CONTAINER_INSTALL_NNI_SHELL_FORMAT;
// Write NNI installation file to local tmp files
await fs.promises.writeFile(path.join(trialLocalTempFolder, 'install_nni.sh'), installScriptContent, { encoding: 'utf8' });
// Create tmp trial working folder locally.
await cpp.exec(`mkdir -p ${trialLocalTempFolder}`);

for (const taskRole of this.fcTrialConfig.taskRoles) {
const runScriptContent: string =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,10 @@ class KubeflowTrainingService extends KubernetesTrainingService implements Kuber
throw new Error('Kubeflow Cluster config is not initialized');
}

if (this.kubeflowTrialConfig === undefined) {
throw new Error('Kubeflow Trial config is not initialized');
}

let trialJobOutputUrl: string = '';

assert(this.kubeflowClusterConfig.storage === undefined
Expand All @@ -212,13 +216,17 @@ class KubeflowTrainingService extends KubernetesTrainingService implements Kuber
throw new Error('azureStorageClient is not initialized');
}
try {
//upload local files to azure storage
//upload local files, including scripts for running the trial and configuration (e.g., hyperparameters) for the trial, to azure storage
await AzureStorageClientUtility.uploadDirectory(this.azureStorageClient,
`nni/${getExperimentId()}/${trialJobId}`, this.azureStorageShare,
`${trialLocalTempFolder}`);
//upload code files to azure storage
await AzureStorageClientUtility.uploadDirectory(this.azureStorageClient,
`nni/${getExperimentId()}/${trialJobId}`, this.azureStorageShare,
`${this.kubeflowTrialConfig.codeDir}`);

trialJobOutputUrl = `https://${this.azureStorageAccountName}.file.core.windows.net/${this.azureStorageShare}\
/${path.join('nni', getExperimentId(), trialJobId, 'output')}`;
trialJobOutputUrl = `https://${this.azureStorageAccountName}.file.core.windows.net/${this.azureStorageShare}` +
`/${path.join('nni', getExperimentId(), trialJobId, 'output')}`;
} catch (error) {
this.log.error(error);

Expand All @@ -228,9 +236,10 @@ class KubeflowTrainingService extends KubernetesTrainingService implements Kuber
const nfsKubeflowClusterConfig: KubeflowClusterConfigNFS = <KubeflowClusterConfigNFS>this.kubeflowClusterConfig;
// Creat work dir for current trial in NFS directory
await cpp.exec(`mkdir -p ${this.trialLocalNFSTempFolder}/nni/${getExperimentId()}/${trialJobId}`);
// Copy code files from local dir to NFS mounted dir
// Copy script files from local dir to NFS mounted dir
await cpp.exec(`cp -r ${trialLocalTempFolder}/* ${this.trialLocalNFSTempFolder}/nni/${getExperimentId()}/${trialJobId}/.`);

// Copy codeDir to NFS mounted dir
await cpp.exec(`cp -r ${this.kubeflowTrialConfig.codeDir}/* ${this.trialLocalNFSTempFolder}/nni/${getExperimentId()}/${trialJobId}/.`);
const nfsConfig: NFSConfig = nfsKubeflowClusterConfig.nfs;
trialJobOutputUrl = `nfs://${nfsConfig.server}:${path.join(nfsConfig.path, 'nni', getExperimentId(), trialJobId, 'output')}`;
}
Expand All @@ -255,13 +264,10 @@ class KubeflowTrainingService extends KubernetesTrainingService implements Kuber
}

//create tmp trial working folder locally.
await cpp.exec(`mkdir -p ${path.dirname(trialLocalTempFolder)}`);
await cpp.exec(`cp -r ${kubeflowTrialConfig.codeDir} ${trialLocalTempFolder}`);
await cpp.exec(`mkdir -p ${trialLocalTempFolder}`);
const runScriptContent : string = CONTAINER_INSTALL_NNI_SHELL_FORMAT;
// Write NNI installation file to local tmp files
await fs.promises.writeFile(path.join(trialLocalTempFolder, 'install_nni.sh'), runScriptContent, { encoding: 'utf8' });
// Create tmp trial working folder locally.
await cpp.exec(`mkdir -p ${trialLocalTempFolder}`);

// Write worker file content run_worker.sh to local tmp folders
if (kubeflowTrialConfig.worker !== undefined) {
Expand Down
Loading

0 comments on commit c76068f

Please sign in to comment.