Skip to content

duane-edgington/ml_classify

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Keras Classification of Deep Sea Imagery

This repository contains code which can be used to generate a keras model to predict classes when given training data. The code utilizes transfer learning, working by using pre-trained deep neural net models such as inception and resnet, and aplpying them to a specific data set. This repository is primarily set up to be run on a Google Compute VM, however simple adjustments can be made to run on AWS, other cloud services, or locally.

Building Google Cloud VM Environment

The steps to setting up a google cloud environment are the following:

  • Create a new Google Cloud Project
  • Create a cloud storage bucket
  • Split data appropriately
  • Upload data to cloud storage
  • Create a Compute VM
  • Update drivers and Python version
  • Update Environment Variables and Modules

Create a Google Cloud Project

Go to https://cloud.google.com/
In the upper right, sign in to your google account
In the upper right, click console
Create a new project by selecting the drop down menu in the upper left
A pop-up window will have a new projects button in the upper right
Select your new project from the dropdown menu to make it your active project

Create Cloud Storage Bucket

In the search bar, search bucket and select "create bucket"
Assign your bucket a unique name
Select regional storage class
Set object-level and bucket-level permissions
Create bucket

Split Data

Assuming your data is seperated by class into folders, but not into a train/test/val set
Run pip install split-folders
Open a terminal and run python interactively
$ python3
>>> import split_folders
>>> split_folders.ratio('input_folder', output='output_folder', seed=1337, ratio=(.8, .1, .1))
Read about split-folders

Uploading Data to Buckets

Here there are two options, one GUI based and one command line
For command line, you need the Google SDK
After installation, you can copy items to and from buckets using gsutil cp command
gsutil -m cp -r training/data/folder gs://google-bucket-name
The GUI option is a drag and drop interface when you select your bucket from the console home page
After your data exists in a bucket, you must adjust permissions as needed

Creating a Compute VM

Search Compute VM in search bar and select add VM instance
Choose the name, region, and zone as you like
Select enough CPU power to not bottleneck, likely between 4-8 vCPU
Click the dropdown "CPU platform and GPU"
You may not have any GPU available because you do not have a quota
Instructions to increase your GPU quota are at end of section
Change the boot disk to select your preferred OS environment
We used 'Deep Learning Image: Base m31 (with CUDA 10.0)' as others had trouble with CUDA
Defaults for Identity and API access were used
At the bottom there is a gcloud command line option showing how to create the above VM from the command line

To increase your GPU quota, click the drop down menu in the top left corner
Hover over 'IAM & admin' and select Quotas from the new list
From the Metric drop down box, click None to deselect all, then search GPU
Select GPU (all regions), then check the box next to it and click EDIT QUOTAS at the top
Fill out the short form with your information and wait 1-3 days for a quota increase

Update drivers and Python version

To connect to the VM, go to console home page, select compute engine, select instance, and click start
Connect to the instance through SSH, either in terminal or the google shell version
The Base m31 image chosen above prompts you to install nvidia drivers and CUDA on startup
If this is the first time you start the VM, accept and install the drivers
The Python3 build which comes pre-packaged with this build is 3.5.3
To get 3.7.3 installed, use the following commands
sudo apt update

sudo apt install build-essential tk-dev libbz2-dev zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev wget

sudo curl -O https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tar.xz
sudo tar -xf Python-3.7.3.tar.xz
cd Python-3.7.3
sudo ./configure --enable-optimizations
sudo make -j 8 (This is assuming you have 8 cores, you can check with command nproc)
sudo make altinstall
sudo update-alternatives --install /usr/bin/python python /usr/local/bin/python3.7 3

Update Environment Variables and Modules

cd ~
git clone https://github.com/AtlasHale/ml_classify
cd ml_classify
Upload google service credentials as credentials.json
mv ~/credentials.json ~/ml_classify/
https://cloud.google.com/storage/docs/reference/libraries to find how to generate API key

Export project home as $PWD
Export tar bucket as name of bucket with tar files
Export wandb run group, user, api key
Export google service credentials location ($PWD/credentials.json)

Running Inference

To run general Inference with basic metrics:

python3 src/train.py --horizontal_flip True --augment_range 0.2 \
--train_tar train.tar.gz --val_tar val.tar.gz --lr 0.001 --base_model inceptionv3 \
--project inception_training --batch_size 4 --epoch 50

To run incrementally increasing the size of the training data set per class:

python3 src/learning_curve.py --horizontal_flip True --augment_range 0.2 \
--train_tar 100_train.tar.gz --val_tar val.tar.gz --lr 0.001 --base_model inceptionv3 \
--project inception_learning_curve --batch_size 4 --epoch 10

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.2%
  • Python 0.8%