Skip to content

Commit

Permalink
NNI on Windows for NNI Remote mode (microsoft#1073)
Browse files Browse the repository at this point in the history
* test python

* test python36

* debug python

* debug python

* debug

* python version

* test python

* debug

* install nni

* install nni

* test powershell

* debug python

* test

* test python

* use python

* test python

* test python

* test

* update

* test powershell

* debug python

* debug python

* debug python

* debug powershell

* debug

* debug

* debug install.ps1

* add continueOnError: true

* debug

* debug

* update

* update

* add unittest

* test node

* update

* update joi

* debug joi

* add joi

* debug joi

* Update install

* update

* update

* add unittest

* add convert command

* add example

* fix windows commands

* debug

* fix tensorflow version

* fix pipeline

* update

* add gpu logic in windows

* update

* update

* debug

* fix commands

* fix commands

* update

* update

* Fix comments

* update

* fix kill command

* fix package.json

* Update package.json

* Refactor runScript

* Fix bug

* Fix comments

* Fix execKill

* Update

* Update

* Add unittest back

* Rollback install node

* Fix gpu memory

* Update

* Rollback check process

* Update mnist-hyperband.test.yml

* Update pipelines-it-local-windows.yml

* Update uninstall.ps1

* Fix virtual environment

* Fix tar

* Fix isAlive

* change gpu index logic

* test gpu index

* fix pipeline

* add cifar10

* fix cifar10

* remove gpu in cifar10

* test mnist gpu

* update

* debug

* Fix comments

* debug

* Update install.ps1

* debug

* update gpu metrics shell

* debug

* debug

* debug

* debug

* debug

* debug sigbreak

* Preinstall node-pre-gyp

* Update Installation.md

* Update Installation.md

* Remove install node-pre-gyp

* use taskkill to stop node process

* use ctl+c event to stop process

* add sigtrem signal in stop logic

* add ctl+break command

* Update isAlive

* debug sigterm

* Update pypi readme

* Update

* fix stop logic

* fix pipeline, add cifar10

* revert mnist, remove gpu

* Fix virtualenv

* Fix comments

* Update

* Update

* Fix install

* Update install.ps1

* Update install.ps1

* Fix comments

* Fix virtualenv install

* Update

* Update

* Fix comments

* Update

* Update install.ps1

* Update

* Update localTrainingService.ts

* Update

* Update

* Update

* Update

* Update

* Update util.ts

* Update utils.ts

* Fix system slash

* Update tmp dir

* Fix system slash

* Use python3 in remote

* Write tar command to file

* Update tar

* Update

* Update

* Fix stop

* Update StopSignal type

* Add removeTrialJobMetricListener

* remove Listeners

* Update listener

* Update

* Use Temp dir

* Use Temp dir

* Add remote windows pipeline

* Update pipelines-it-remote-windows.yml

* Update

* remote build wheel

* Update pipelines-it-remote-windows.yml

* debug

* debug

* Use docker source install

* Update

* Update

* Rollback remote build wheel

* Use self node and yarn

* Fix docker source install

* Rollback Makefile

* Upgrade docker pip

* Update

* Update

* Remote build wheel

* Use inline runOptions

* Hide wget output

* Add continueOnError

* Update

* Update

* Update

* Upgrade pip

* Add chmod

* Update

* debug

* Update

* Use pscp

* Update

* Download putty

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* debug

* exclude metis

* Refactor pathJoin

* Update

* debug metis

* debug metis

* Update

* Update dependency

* Fix comments

* Update

* Fix tslint

* Fix comments

* Fix comments

* add doc

* Fix comments

* Update

* Update doc
  • Loading branch information
demianzhang authored and xuehui1991 committed May 27, 2019
1 parent d8e1c4a commit a1f9266
Show file tree
Hide file tree
Showing 16 changed files with 210 additions and 86 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,15 +106,15 @@ We encourage researchers and students leverage these projects to accelerate the

## **Install & Verify**

If you choose NNI Windows local mode and you use PowerShell to run script for the first time, you need to **run PowerShell as administrator** with this command first:
If you are using NNI on Windows and use PowerShell to run script for the first time, you need to **run PowerShell as administrator** with this command first:

```bash
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
```

**Install through pip**

* We support Linux, MacOS and Windows(local mode) in current stage, Ubuntu 16.04 or higher, MacOS 10.14.1 along with Windows 10.1809 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.
* We support Linux, MacOS and Windows(local, remote and pai mode) in current stage, Ubuntu 16.04 or higher, MacOS 10.14.1 along with Windows 10.1809 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.

Linux and MacOS

Expand All @@ -131,12 +131,12 @@ python -m pip install --upgrade nni
Note:

* `--user` can be added if you want to install NNI in your home directory, which does not require any special privileges.
* Currently NNI on Windows only support local mode. Anaconda or Miniconda is highly recommended to install NNI on Windows.
* Currently NNI on Windows support local, remote and pai mode. Anaconda or Miniconda is highly recommended to install NNI on Windows.
* If there is any error like `Segmentation fault`, please refer to [FAQ](docs/en_US/FAQ.md)

**Install through source code**

* We support Linux (Ubuntu 16.04 or higher), MacOS (10.14.1) and Windows local mode (10.1809) in our current stage.
* We support Linux (Ubuntu 16.04 or higher), MacOS (10.14.1) and Windows (10.1809) in our current stage.

Linux and MacOS

Expand All @@ -160,7 +160,7 @@ Windows

For the system requirements of NNI, please refer to [Install NNI](docs/en_US/Installation.md)

For NNI Windows local mode, please refer to [NNI Windows local mode](docs/en_US/WindowsLocalMode.md)
For NNI on Windows, please refer to [NNI on Windows](docs/en_US/NniOnWindows.md)

**Verify install**

Expand Down
18 changes: 12 additions & 6 deletions deployment/pypi/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,22 +20,28 @@ ifeq ($(version_ts), true)
NNI_VERSION_VALUE := $(NNI_VERSION_VALUE).$(TIME_STAMP)
endif
NNI_VERSION_TEMPLATE = 999.0.0-developing

NNI_YARN_TARBALL ?= $(CWD)nni-yarn.tar.gz
NNI_YARN_FOLDER ?= $(CWD)nni-yarn
NNI_YARN := PATH=$(CWD)node-$(OS_SPEC)-x64/bin:$${PATH} $(NNI_YARN_FOLDER)/bin/yarn
.PHONY: build
build:
python3 -m pip install --user --upgrade setuptools wheel
wget https://aka.ms/nni/nodejs-download/$(OS_SPEC) -O $(CWD)node-$(OS_SPEC)-x64.tar.xz
wget -q https://aka.ms/nni/nodejs-download/$(OS_SPEC) -O $(CWD)node-$(OS_SPEC)-x64.tar.xz
rm -rf $(CWD)node-$(OS_SPEC)-x64
mkdir $(CWD)node-$(OS_SPEC)-x64
tar xf $(CWD)node-$(OS_SPEC)-x64.tar.xz -C node-$(OS_SPEC)-x64 --strip-components 1
cd $(CWD)../../src/nni_manager && yarn && yarn build
cd $(CWD)../../src/webui && yarn && yarn build
wget -q https://aka.ms/yarn-download -O $(NNI_YARN_TARBALL)
rm -rf $(NNI_YARN_FOLDER)
mkdir $(NNI_YARN_FOLDER)
tar -xf $(NNI_YARN_TARBALL) -C $(NNI_YARN_FOLDER) --strip-components 1
cd $(CWD)../../src/nni_manager && $(NNI_YARN) && $(NNI_YARN) build
cd $(CWD)../../src/webui && $(NNI_YARN) && $(NNI_YARN) build
rm -rf $(CWD)nni
cp -r $(CWD)../../src/nni_manager/dist $(CWD)nni
cp -r $(CWD)../../src/webui/build $(CWD)nni/static
cp $(CWD)../../src/nni_manager/package.json $(CWD)nni
sed -ie 's/$(NNI_VERSION_TEMPLATE)/$(NNI_VERSION_VALUE)/' $(CWD)nni/package.json
cd $(CWD)nni && yarn --prod
cd $(CWD)nni && $(NNI_YARN) --prod
cd $(CWD) && sed -ie 's/$(NNI_VERSION_TEMPLATE)/$(NNI_VERSION_VALUE)/' setup.py && python3 setup.py bdist_wheel -p $(WHEEL_SPEC)
cd $(CWD)

Expand All @@ -50,4 +56,4 @@ clean:
rm -rf $(CWD)dist
rm -rf $(CWD)nni
rm -rf $(CWD)nni.egg-info
rm -rf $(CWD)node-$(OS_SPEC)-x64
rm -rf $(CWD)node-$(OS_SPEC)-x64
4 changes: 2 additions & 2 deletions docs/en_US/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ Unable to open the WebUI may have the following reasons:
* If you still can't see the WebUI after you use the server IP, you can check the proxy and the firewall of your machine. Or use the browser on the machine where you start your NNI experiment.
* Another reason may be your experiment is failed and NNI may fail to get the experiment infomation. You can check the log of NNImanager in the following directory: ~/nni/experiment/[your_experiment_id] /log/nnimanager.log

### Windows local mode problems
Please refer to [NNI Windows local mode](WindowsLocalMode.md)
### NNI on Windows problems
Please refer to [NNI on Windows](NniOnWindows.md)

### Help us improve
Please inquiry the problem in https://github.com/Microsoft/nni/issues to see whether there are other people already reported the problem, create a new one if there are no existing issues been created.
2 changes: 1 addition & 1 deletion docs/en_US/Installation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Installation of NNI

Currently we support installation on Linux, Mac and Windows(local mode).
Currently we support installation on Linux, Mac and Windows(local, remote and pai mode).

## **Installation on Linux & Mac**

Expand Down
10 changes: 5 additions & 5 deletions docs/en_US/WindowsLocalMode.md → docs/en_US/NniOnWindows.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Windows Local Mode (experimental feature)
# NNI on Windows (experimental feature)

Currently we only support local mode on Windows. Windows 10.1809 is well tested and recommended.
Currently we support local, remote and pai mode on Windows. Windows 10.1809 is well tested and recommended.

## **Installation on Windows**

Expand All @@ -25,15 +25,15 @@ Set-ExecutionPolicy -ExecutionPolicy Unrestricted
Prerequisite: `python >=3.5`, `git`, `PowerShell`

```bash
git clone -b v0.7 https://github.com/Microsoft/nni.git
git clone -b v0.8 https://github.com/Microsoft/nni.git
cd nni
powershell ./install.ps1
powershell -file install.ps1
```

When these things are done, use the **config_windows.yml** configuration to start an experiment for validation.

```bash
nnictl create --config nni/examples/trials/mnist/config_windows.yml
nnictl create --config nni\examples\trials\mnist\config_windows.yml
```

For other examples you need to change trial command `python3` into `python` in each example YAML.
Expand Down
6 changes: 3 additions & 3 deletions docs/en_US/QuickStart.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

## Installation

We support Linux MacOS and Windows(local mode) in current stage, Ubuntu 16.04 or higher, MacOS 10.14.1 and Windows 10.1809 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.
We support Linux MacOS and Windows in current stage, Ubuntu 16.04 or higher, MacOS 10.14.1 and Windows 10.1809 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.
#### Linux and MacOS

```bash
python3 -m pip install --upgrade nni
```

#### Windows
If you choose Windows local mode and use PowerShell to run script, you need run below PowerShell command as administrator at first time.
If you are using NNI on Windows, you need run below PowerShell command as administrator at first time.
```bash
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
```
Expand Down Expand Up @@ -151,7 +151,7 @@ Run the **config.yml** file from your command line to start MNIST experiment.
#### Windows
Run the **config_windows.yml** file from your command line to start MNIST experiment.

**Note**, if you're using windows local mode, it needs to change `python3` to `python` in the config.yml file, or use the config_windows.yml file to start the experiment.
**Note**, if you're using NNI on Windows, it needs to change `python3` to `python` in the config.yml file, or use the config_windows.yml file to start the experiment.

```bash
nnictl create --config nni/examples/trials/mnist/config_windows.yml
Expand Down
12 changes: 11 additions & 1 deletion docs/en_US/RemoteMachineMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@ machineList:
username: bob
passwd: bob123
```
You can use different systems to run experiments on the remote machine.
#### Linux and MacOS
Simply filling the `machineList` section and then run:

```bash
Expand All @@ -64,5 +65,14 @@ nnictl create --config ~/nni/examples/trials/mnist-annotation/config_remote.yml

to start the experiment.

#### Windows
Simply filling the `machineList` section and then run:

```bash
nnictl create --config %userprofile%\nni\examples\trials\mnist-annotation\config_remote.yml
```

to start the experiment.

## version check
NNI support version check feature in since version 0.6, [refer](PaiMode.md)
2 changes: 1 addition & 1 deletion install.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ $yarnUrl = "https://yarnpkg.com/latest.tar.gz"
$unzipNodeDir = "node-v*"
$unzipYarnDir = "yarn-v*"

$NNI_DEPENDENCY_FOLDER = "C:\tmp\$env:USERNAME"
$NNI_DEPENDENCY_FOLDER = [System.IO.Path]::GetTempPath()+$env:USERNAME

$WHICH_PYTHON = where.exe python
if($WHICH_PYTHON -eq $null){
Expand Down
16 changes: 8 additions & 8 deletions src/nni_manager/common/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,11 @@ function getExperimentRootDir(): string {
.getLogDir();
}

function getLogDir(): string{
function getLogDir(): string {
return path.join(getExperimentRootDir(), 'log');
}

function getLogLevel(): string{
function getLogLevel(): string {
return getExperimentStartupInfo()
.getLogLevel();
}
Expand Down Expand Up @@ -149,7 +149,7 @@ function parseArg(names: string[]): string {
return '';
}

function encodeCmdLineArgs(args:any):any{
function encodeCmdLineArgs(args: any): any {
if(process.platform === 'win32'){
return JSON.stringify(args);
}
Expand All @@ -158,7 +158,7 @@ function encodeCmdLineArgs(args:any):any{
}
}

function getCmdPy():string{
function getCmdPy(): string {
let cmd = 'python3';
if(process.platform === 'win32'){
cmd = 'python';
Expand Down Expand Up @@ -390,7 +390,7 @@ async function getVersion(): Promise<string> {
/**
* run command as ChildProcess
*/
function getTunerProc(command: string, stdio: StdioOptions, newCwd: string, newEnv: any): ChildProcess{
function getTunerProc(command: string, stdio: StdioOptions, newCwd: string, newEnv: any): ChildProcess {
let cmd: string = command;
let arg: string[] = [];
let newShell: boolean = true;
Expand All @@ -411,7 +411,7 @@ function getTunerProc(command: string, stdio: StdioOptions, newCwd: string, newE
/**
* judge whether the process is alive
*/
async function isAlive(pid:any): Promise<boolean>{
async function isAlive(pid:any): Promise<boolean> {
let deferred : Deferred<boolean> = new Deferred<boolean>();
let alive: boolean = false;
if(process.platform ==='win32'){
Expand Down Expand Up @@ -439,7 +439,7 @@ async function isAlive(pid:any): Promise<boolean>{
/**
* kill process
*/
async function killPid(pid:any): Promise<void>{
async function killPid(pid:any): Promise<void> {
let deferred : Deferred<void> = new Deferred<void>();
try {
if (process.platform === "win32") {
Expand All @@ -455,7 +455,7 @@ async function killPid(pid:any): Promise<void>{
return deferred.promise;
}

function getNewLine(): string{
function getNewLine(): string {
if (process.platform === "win32") {
return "\r\n";
}
Expand Down
15 changes: 9 additions & 6 deletions src/nni_manager/core/nnimanager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ class NNIManager implements Manager {
private status: NNIManagerStatus;
private waitingTrials: string[];
private trialJobs: Map<string, TrialJobDetail>;

private trialJobMetricListener: (metric: TrialJobMetric) => void;

constructor() {
this.currSubmittedTrialNum = 0;
this.trialConcurrencyChange = 0;
Expand All @@ -76,6 +77,11 @@ class NNIManager implements Manager {
status: 'INITIALIZED',
errors: []
};
this.trialJobMetricListener = (metric: TrialJobMetric) => {
this.onTrialJobMetrics(metric).catch((err: Error) => {
this.criticalError(NNIError.FromError(err, 'Job metrics error: '));
});
};
}

public updateExperimentProfile(experimentProfile: ExperimentProfile, updateType: ProfileUpdateType): Promise<void> {
Expand Down Expand Up @@ -342,6 +348,7 @@ class NNIManager implements Manager {
if (this.dispatcher === undefined) {
throw new Error('Error: tuner has not been setup');
}
this.trainingService.removeTrialJobMetricListener(this.trialJobMetricListener);
this.dispatcher.sendCommand(TERMINATE);
let tunerAlive: boolean = true;
// gracefully terminate tuner and assessor here, wait at most 30 seconds.
Expand Down Expand Up @@ -589,11 +596,7 @@ class NNIManager implements Manager {
if (this.dispatcher === undefined) {
throw new Error('Error: tuner or job maintainer have not been setup');
}
this.trainingService.addTrialJobMetricListener((metric: TrialJobMetric) => {
this.onTrialJobMetrics(metric).catch((err: Error) => {
this.criticalError(NNIError.FromError(err, 'Job metrics error: '));
});
});
this.trainingService.addTrialJobMetricListener(this.trialJobMetricListener);

this.dispatcher.onCommand((commandType: string, content: string) => {
this.onTunerCommand(commandType, content).catch((err: Error) => {
Expand Down
Loading

0 comments on commit a1f9266

Please sign in to comment.