Training Cost Calculator (TCC)

This repository intends to provide the world first open-source toolbox for predicting your machine learning model training time. It includes training time data generation and TCC models. Our current prediction focus on models built by Tensorflow.

Application

If you are training a model on a cloud instance, based on the unit cost, you are able to estimate training cost. This is why we named the toolbox Training Cost Calculator. You can apply TCC on multiple servers so that you know which one is the best fit for your training task.

Community

Until June 1st, 2022, the TCC's main maintainers are Yuqi Li and Xin Zhang. We welcome more contributors and users to join our Discord Community.

Roadmap

Developed	Developing	Ideas
Keras DNN TT Prediction	TT data generation pipeline	Support pytorch models
Keras CNN TT Prediction	Keras RNN TT Prediction	Add Huggingface models to TT datasbase
	Public TT Database	A Colab demo for entire workflow
		Connect TT data pipeline to clouds
		A Hackathon for TT prediction
		A scheduler to validate old TT meassurement

Research Docs

See the Research Docs for full progress on motivation, related researches.

Install

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including Tensorflow>=2.3, including Keras>=2.8.0.

git clone https://github.com/aipaca-mlops/ML-training-cost-predictor.git  # clone
cd ML-training-cost-calculator
pip install -r requirements.txt  # install

GPU Environment Check

GPU Features

Model training times depending on three factors, environment, model structure and data. Environment Check should be the first check before generating any data point.

Environment configuration should be features for training time prediction for your own experiment if you have multiple environments.

Reference here for all feature names.

from env_detect import gpu_features
# need to run with GPU

# only show default features
gpufeature = gpu_features()

>>>print(gpufeature.get_features())
{'timestamp': '2022/05/27 17:39:01.851', 'driver_version': '460.73.01', 'count': '1', 'name': 'Tesla T4', 'pcie.link.width.max': '16', 'vbios_version': '90.04.96.00.01', 'memory.total [MiB]': '15109 MiB', 'temperature.gpu': '46'}

# with additional features
gpufeature = gpu_features(
    features=['power.management', 'power.limit'], with_dafault_features=True
)

>>>print(gpufeature.get_features())
{'timestamp': '2022/05/27 17:39:01.896', 'driver_version': '460.73.01', 'count': '1', 'name': 'Tesla T4', 'pcie.link.width.max': '16', 'vbios_version': '90.04.96.00.01', 'memory.total [MiB]': '15109 MiB', 'temperature.gpu': '46', 'power.management': 'Enabled', 'power.limit [W]': '70.00 W'}

# only use wanted features
gpufeature = gpu_features(
    features=['power.management', 'power.limit'], with_dafault_features=False
)

>>>print(gpufeature.get_features())
{'power.management': 'Enabled', 'power.limit [W]': '70.00 W'}

Data Generation Quick Start Examples

Generate Dense Model

Generate model configs for feed forward network.

import random
import matplotlib.pyplot as plt
from model_level_utils import gen_nn, model_train_data

# generate model configurations as data points
data_points = 1000
gnn = gen_nn(
			hidden_layers_num_lower=1, #lower bound for layer size
			hidden_layers_num_upper=51, #upper bound for layer size
			hidden_layer_size_lower=1, #lower bound for layer number
			hidden_layer_size_upper=1001, #upper bound for layer number
			activation='random',
			optimizer='random',
			loss='random'
			)
model_configs = gnn.generate_model_configs(num_model_data=data_points)

Check out what is inside model_configs.

>>>print(type(model_configs))
<class 'list'>

>>>print(model_configs[0])
{'layer_sizes': [966, 624, 193],
'activations': ['relu', 'selu', 'exponential'],
'optimizer': 'adagrad',
'loss': 'poisson'}

Next we can use generated model configurations to get training times. This might take a while depending on your GPU.

# train generated model configurations to get training time
mtd = model_train_data(
			model_configs,
			input_dims=list(range(1, 1001)),
			batch_sizes=[2**i for i in range(1, 9)],
			epochs=5,
			truncate_from=1,
			trials=2,
			batch_strategy='random',
			)
model_data = mtd.get_train_data()

Check out what is inside model_data.

>>>print(type(model_data))
<class 'list'>

>>>print(model_data[0])
{'layer_sizes': [966, 624, 193],
'activations': ['relu', 'selu', 'exponential'],
'optimizer': 'adagrad',
'loss': 'poisson',
'batch_size_8': {'batch_time': 1.8298625946044922, #medium training time for one batch
				'epoch_time': 2.166271209716797, #medium training time for one epoch
				'setup_time': 449.7561454772949, #tensorflow build graph time
				'input_dim': 196}
				}

Now we can convert generated data into pandas dataframe.

# convert raw data as dataframe and scaler
df, scaler = mtd.convert_config_data(
    model_data, layer_num_upper=50, layer_na_fill=0, act_na_fill=0, min_max_scaler=True
)

>>>display(df.head())
|    |   layer_1_size |   layer_2_size |   layer_3_size |   layer_4_size |   layer_5_size |   layer_6_size |   layer_7_size |   layer_8_size |   layer_9_size |   layer_10_size |   layer_11_size |   layer_12_size |   layer_13_size |   layer_14_size |   layer_15_size |   layer_16_size |   layer_17_size |   layer_18_size |   layer_19_size |   layer_20_size |   layer_21_size |   layer_22_size |   layer_23_size |   layer_24_size |   layer_25_size |   layer_26_size |   layer_27_size |   layer_28_size |   layer_29_size |   layer_30_size |   layer_31_size |   layer_32_size |   layer_33_size |   layer_34_size |   layer_35_size |   layer_36_size |   layer_37_size |   layer_38_size |   layer_39_size |   layer_40_size |   layer_41_size |   layer_42_size |   layer_43_size |   layer_44_size |   layer_45_size |   layer_46_size |   layer_47_size |   layer_48_size |   layer_49_size |   layer_50_size |   layer_1_activation |   layer_2_activation |   layer_3_activation |   layer_4_activation |   layer_5_activation |   layer_6_activation |   layer_7_activation |   layer_8_activation |   layer_9_activation |   layer_10_activation |   layer_11_activation |   layer_12_activation |   layer_13_activation |   layer_14_activation |   layer_15_activation |   layer_16_activation |   layer_17_activation |   layer_18_activation |   layer_19_activation |   layer_20_activation |   layer_21_activation |   layer_22_activation |   layer_23_activation |   layer_24_activation |   layer_25_activation |   layer_26_activation |   layer_27_activation |   layer_28_activation |   layer_29_activation |   layer_30_activation |   layer_31_activation |   layer_32_activation |   layer_33_activation |   layer_34_activation |   layer_35_activation |   layer_36_activation |   layer_37_activation |   layer_38_activation |   layer_39_activation |   layer_40_activation |   layer_41_activation |   layer_42_activation |   layer_43_activation |   layer_44_activation |   layer_45_activation |   layer_46_activation |   layer_47_activation |   layer_48_activation |   layer_49_activation |   layer_50_activation |   batch_size |   input_dim |   optimizer_adadelta |   optimizer_adagrad |   optimizer_adam |   optimizer_adamax |   optimizer_ftrl |   optimizer_nadam |   optimizer_rmsprop |   optimizer_sgd |   loss_categorical_crossentropy |   loss_mae |   loss_mape |   loss_mse |   loss_msle |   loss_poisson |   batch_time |   epoch_time |   setup_time |
|---:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|----------------:|---------------------:|---------------------:|---------------------:|---------------------:|---------------------:|---------------------:|---------------------:|---------------------:|---------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|----------------------:|-------------:|------------:|---------------------:|--------------------:|-----------------:|-------------------:|-----------------:|------------------:|--------------------:|----------------:|--------------------------------:|-----------:|------------:|-----------:|------------:|---------------:|-------------:|-------------:|-------------:|
|  0 |       0.54509  |       0.202608 |       0.823824 |      0.468468  |          0.348 |          0.591 |          0.102 |      0.0460922 |       0.770771 |        0.721163 |           0.608 |        0.304304 |        0.956871 |           0.172 |           0.403 |        0.306613 |        0.640281 |           0.471 |        0.108108 |       0.956871  |        0.94995  |        0.413413 |        0.384384 |           0.126 |        0.782347 |        0.536537 |        0.744745 |       0.531595  |           0.544 |        0.947896 |        0.50303  |           0.868 |        0.155936 |        0.229229 |           0.932 |        0.951904 |        0.276104 |        0.963855 |        0.611836 |        0.291291 |        0.284422 |      0.00902708 |               0 |               0 |               0 |               0 |               0 |               0 |               0 |               0 |                0.875 |             0.333333 |             0.111111 |             0.333333 |             0.222222 |             0.777778 |             0.444444 |             0.888889 |             0.444444 |              0.888889 |              0.444444 |              0.111111 |              0.777778 |              0.222222 |              0.333333 |              0.444444 |              0.666667 |              0.777778 |              1        |              0.222222 |              0.888889 |              0.666667 |              0.222222 |              0.555556 |              0.444444 |              0.555556 |              1        |              0.444444 |              0.888889 |              1        |              0.111111 |              0.222222 |              0.111111 |              0.888889 |              0.222222 |              0.777778 |              0.444444 |              0.444444 |              1        |              0.555556 |              0.222222 |              0.777778 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |    0.496063  |   0.534068  |                    0 |                   0 |                1 |                  0 |                0 |                 0 |                   0 |               0 |                               0 |          0 |           1 |          0 |           0 |              0 |     12.0096  |     12.3415  |     5442.04  |
|  1 |       0.966934 |       0.625878 |       0.193193 |      0         |          0     |          0     |          0     |      0         |       0        |        0        |           0     |        0        |        0        |           0     |           0     |        0        |        0        |           0     |        0        |       0         |        0        |        0        |        0        |           0     |        0        |        0        |        0        |       0         |           0     |        0        |        0        |           0     |        0        |        0        |           0     |        0        |        0        |        0        |        0        |        0        |        0        |      0          |               0 |               0 |               0 |               0 |               0 |               0 |               0 |               0 |                0     |             0.777778 |             1        |             0        |             0        |             0        |             0        |             0        |             0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |    0.023622  |   0.194389  |                    0 |                   1 |                0 |                  0 |                0 |                 0 |                   0 |               0 |                               0 |          0 |           0 |          0 |           0 |              1 |      1.82986 |      2.16627 |      449.756 |
|  2 |       0.433868 |       0.212638 |       0.744745 |      0.0970971 |          0.473 |          0.67  |          0.109 |      0.845691  |       0.49049  |        0.375125 |           0.41  |        0.854855 |        0.88666  |           0.336 |           0.211 |        0.5501   |        0.354709 |           0.484 |        0.725726 |       0.766299  |        0.577578 |        0.119119 |        0.376376 |           0.288 |        0.822467 |        0.814815 |        0.570571 |       0.438315  |           0.241 |        0.477956 |        0.780808 |           0.943 |        0.151911 |        0.251251 |           0     |        0        |        0        |        0        |        0        |        0        |        0        |      0          |               0 |               0 |               0 |               0 |               0 |               0 |               0 |               0 |                0     |             0.888889 |             0.555556 |             0.777778 |             0.111111 |             0.777778 |             0.444444 |             0.555556 |             0.333333 |              0.888889 |              0.333333 |              0.777778 |              0.555556 |              0.333333 |              0.777778 |              0.222222 |              0.444444 |              0.444444 |              0.777778 |              0.888889 |              0.555556 |              0.333333 |              0.888889 |              0.555556 |              0.555556 |              0.777778 |              0.222222 |              0.444444 |              0.555556 |              0.888889 |              1        |              0.555556 |              0.222222 |              0.444444 |              0        |              0        |              0        |              0        |              0        |              0        |              0        |              0        |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |    0.244094  |   0.837675  |                    0 |                   0 |                0 |                  0 |                0 |                 1 |                   0 |               0 |                               0 |          0 |           0 |          1 |           0 |              0 |     18.2607  |     18.6019  |     4921.17  |
|  3 |       0.562124 |       0.487462 |       0.626627 |      0.275275  |          0.631 |          0.817 |          0.875 |      0.383768  |       0.225225 |        0.842528 |           0.509 |        0.477477 |        0.131394 |           0.303 |           0.066 |        0.641283 |        0.845691 |           0.633 |        0.295295 |       0.415246  |        0.8999   |        0.655656 |        0.351351 |           0.566 |        0.489468 |        0.492492 |        0.947948 |       0.0802407 |           0.253 |        0.140281 |        0.458586 |           0.598 |        0.587525 |        0.274274 |           0.085 |        0.96493  |        0.172691 |        0.348394 |        0.805416 |        0.745746 |        0.954774 |      0.907723   |               0 |               0 |               0 |               0 |               0 |               0 |               0 |               0 |                0.25  |             0.555556 |             0.111111 |             0.555556 |             0.333333 |             0.555556 |             0.111111 |             1        |             0.222222 |              0.222222 |              0.888889 |              0.444444 |              1        |              0.111111 |              0.444444 |              0.222222 |              0.888889 |              0.777778 |              0.777778 |              0.333333 |              0.888889 |              0.222222 |              0.777778 |              1        |              0.555556 |              0.888889 |              0.666667 |              0.555556 |              0.666667 |              0.777778 |              0.888889 |              0.333333 |              0.888889 |              1        |              0.333333 |              0.222222 |              0.888889 |              0.111111 |              0.777778 |              0.888889 |              1        |              0.111111 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |    0.0551181 |   0.311623  |                    0 |                   1 |                0 |                  0 |                0 |                 0 |                   0 |               0 |                               0 |          1 |           0 |          0 |           0 |              0 |      8.15439 |      8.47578 |     1590.8   |
|  4 |       0.893788 |       0.11334  |       0.148148 |      0.947948  |          0.858 |          0.797 |          0.268 |      0.833667  |       0.575576 |        0.721163 |           0.607 |        0.001001 |        0.834504 |           0.133 |           0.064 |        0.169339 |        0.655311 |           0.206 |        0.913914 |       0.0832497 |        0.36036  |        0.51952  |        0.331331 |           0.279 |        0.2668   |        0.4004   |        0.301301 |       0.728185  |           0.4   |        0.891784 |        0.240404 |           0.474 |        0.349095 |        0.91992  |           0.898 |        0.163327 |        0        |        0        |        0        |        0        |        0        |      0          |               0 |               0 |               0 |               0 |               0 |               0 |               0 |               0 |                0.5   |             0.333333 |             0.333333 |             1        |             0.111111 |             0.111111 |             1        |             0.222222 |             1        |              1        |              0.777778 |              0.888889 |              0.666667 |              0.888889 |              0.222222 |              0.777778 |              0.666667 |              0.333333 |              0.111111 |              0.444444 |              0.555556 |              0.444444 |              0.222222 |              0.777778 |              0.888889 |              0.777778 |              0.111111 |              1        |              1        |              0.777778 |              0.777778 |              0.444444 |              0.777778 |              0.555556 |              0.222222 |              0.444444 |              0        |              0        |              0        |              0        |              0        |              0        |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |                     0 |    0.0551181 |   0.0751503 |                    0 |                   1 |                0 |                  0 |                0 |                 0 |                   0 |               0 |                               0 |          0 |           1 |          0 |           0 |              0 |      6.92463 |      7.37071 |     1709.9   |

Generate CNN Model

Generate model configs for convolutional network.

from model_trainingtime_prediction.model_level_utils_cnn import gen_cnn2d, cnn2d_model_train_data
import nltk
from tqdm import tqdm

gen = gen_cnn2d(
		input_shape_lower=20,
		input_shape_upper=101,
		conv_layer_num_lower=1,
		conv_layer_num_upper=51,
		filter_lower=1,
		filter_upper=101,
		dense_layer_num_lower=1,
		dense_layer_num_upper=6,
		dense_size_lower=1,
		dense_size_upper=1001,
		max_pooling_prob=.5,
		input_channels=None,
		paddings=None,
		activations=None,
		optimizers=None,
		losses=None
		)
model_configs = gen.generate_model_configs(num_model_data=data_points, progress=True)

Check out what is inside model_configs.

>>>print(type(model_configs))
<class 'list'>

>>>print(model_configs[0])
[[{'filters': 83, 'padding': 'valid', 'activation': 'softmax', 'kernel_size': (53, 53), 'strides': (1, 1)},
{'filters': 51, 'padding': 'same', 'activation': 'softsign', 'kernel_size': (2, 2), 'strides': (1, 1)},
{'padding': 'same', 'pool_size': (2, 2), 'strides': (1, 1)},
{'filters': 17, 'padding': 'same', 'activation': 'selu', 'kernel_size': (3, 3), 'strides': (1, 1)},
{'padding': 'same', 'pool_size': (3, 3), 'strides': (1, 1)},
{'filters': 32, 'padding': 'valid', 'activation': 'tanh', 'kernel_size': (2, 2), 'strides': (2, 2)},
{'padding': 'valid', 'pool_size': (1, 1), 'strides': (1, 1)},
{},
{'units': 702, 'activation': 'sigmoid'},
{'units': 471, 'activation': 'elu'},
{'units': 906, 'activation': 'elu'},
{'Compile': {'optimizer': 'rmsprop', 'loss': 'categorical_crossentropy'}, 'Fit': {}}],
['Conv2D', 'Conv2D', 'MaxPooling2D', 'Conv2D', 'MaxPooling2D', 'Conv2D', 'MaxPooling2D', 'Flatten', 'Dense', 'Dense', 'Dense'],
(56, 56, 3)]

Next we can use generated model configurations to get training times. This might take a while depending on your GPU.

# train generated model configurations to get training time
mtd = cnn2d_model_train_data(
	model_configs, batch_sizes=None, epochs=None, truncate_from=None, trials=None
)

model_data = mtd.get_train_data(progress=True)

Check out what is inside model_data.

>>>print(type(model_data))
<class 'list'>

>>>print(model_data[0])
[[{'filters': 83, 'padding': 'valid', 'activation': 'softmax', 'kernel_size': (53, 53), 'strides': (1, 1)},
{'filters': 51, 'padding': 'same', 'activation': 'softsign', 'kernel_size': (2, 2), 'strides': (1, 1)},
{'padding': 'same', 'pool_size': (2, 2), 'strides': (1, 1)},
{'filters': 17, 'padding': 'same', 'activation': 'selu', 'kernel_size': (3, 3), 'strides': (1, 1)},
{'padding': 'same', 'pool_size': (3, 3), 'strides': (1, 1)},
{'filters': 32, 'padding': 'valid', 'activation': 'tanh', 'kernel_size': (2, 2), 'strides': (2, 2)},
{'padding': 'valid', 'pool_size': (1, 1), 'strides': (1, 1)},
{},
{'units': 702, 'activation': 'sigmoid'},
{'units': 471, 'activation': 'elu'},
{'units': 906, 'activation': 'elu'},
{'Compile': {'optimizer': 'rmsprop', 'loss': 'categorical_crossentropy'}, 'Fit': {}}],
['Conv2D', 'Conv2D', 'MaxPooling2D', 'Conv2D', 'MaxPooling2D', 'Conv2D', 'MaxPooling2D', 'Flatten', 'Dense', 'Dense', 'Dense'],
(56, 56, 3),
{'batch_size': 4,
'batch_time': 3.8396120071411133,
'epoch_time': 4.180788993835449,
'setup_time': 930.9759140014648,
'input_dim': (56, 56, 3)}]

Now we can convert generated data into pandas dataframes.

# 15 from conv_layer_num_upper * 2 + dense_layer_num_upper
# * 2 because the maxpooling layer might be there

model_data_dfs, time_df, scaler = mtd.convert_config_data(
	model_data, max_layer_num=15, num_fill_na=0, name_fill_na=None, min_max_scaler=True
)

>>>display(model_data_dfs[0])
|    |   layer_size |   kernel_size |   strides |   batch_size |   input_shape |   channels |   layer_type_Conv2D |   layer_type_Dense |   layer_type_MaxPooling2D |   padding_same |   padding_valid |   activation_elu |   activation_exponential |   activation_relu |   activation_selu |   activation_sigmoid |   activation_softmax |   activation_softplus |   activation_softsign |   activation_tanh |   optimizer_adadelta |   optimizer_adagrad |   optimizer_adam |   optimizer_adamax |   optimizer_ftrl |   optimizer_nadam |   optimizer_rmsprop |   optimizer_sgd |   loss_categorical_crossentropy |   loss_mae |   loss_mape |   loss_mse |   loss_msle |   loss_poisson |
|---:|-------------:|--------------:|----------:|-------------:|--------------:|-----------:|--------------------:|-------------------:|--------------------------:|---------------:|----------------:|-----------------:|-------------------------:|------------------:|------------------:|---------------------:|---------------------:|----------------------:|----------------------:|------------------:|---------------------:|--------------------:|-----------------:|-------------------:|-----------------:|------------------:|--------------------:|----------------:|--------------------------------:|-----------:|------------:|-----------:|------------:|---------------:|
|  0 |        0.083 |     0.557895  |       0.5 |     0.015625 |      0.565657 |          1 |                   1 |                  0 |                         0 |              0 |               1 |                0 |                        0 |                 0 |                 0 |                    0 |                    1 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
|  1 |        0.051 |     0.0210526 |       0.5 |     0.015625 |      0.565657 |          1 |                   1 |                  0 |                         0 |              1 |               0 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     1 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
|  2 |        0     |     0.0210526 |       0.5 |     0.015625 |      0.565657 |          1 |                   0 |                  0 |                         1 |              1 |               0 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
|  3 |        0.017 |     0.0315789 |       0.5 |     0.015625 |      0.565657 |          1 |                   1 |                  0 |                         0 |              1 |               0 |                0 |                        0 |                 0 |                 1 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
|  4 |        0     |     0.0315789 |       0.5 |     0.015625 |      0.565657 |          1 |                   0 |                  0 |                         1 |              1 |               0 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
|  5 |        0.032 |     0.0210526 |       1   |     0.015625 |      0.565657 |          1 |                   1 |                  0 |                         0 |              0 |               1 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 1 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
|  6 |        0     |     0.0105263 |       0.5 |     0.015625 |      0.565657 |          1 |                   0 |                  0 |                         1 |              0 |               1 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
|  7 |        0.702 |     0         |       0   |     0.015625 |      0.565657 |          1 |                   0 |                  1 |                         0 |              0 |               0 |                0 |                        0 |                 0 |                 0 |                    1 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
|  8 |        0.471 |     0         |       0   |     0.015625 |      0.565657 |          1 |                   0 |                  1 |                         0 |              0 |               0 |                1 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
|  9 |        0.906 |     0         |       0   |     0.015625 |      0.565657 |          1 |                   0 |                  1 |                         0 |              0 |               0 |                1 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   1 |               0 |                               1 |          0 |           0 |          0 |           0 |              0 |
| 10 |        0     |     0         |       0   |     0        |      0        |          0 |                   0 |                  0 |                         0 |              0 |               0 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   0 |               0 |                               0 |          0 |           0 |          0 |           0 |              0 |
| 11 |        0     |     0         |       0   |     0        |      0        |          0 |                   0 |                  0 |                         0 |              0 |               0 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   0 |               0 |                               0 |          0 |           0 |          0 |           0 |              0 |
| 12 |        0     |     0         |       0   |     0        |      0        |          0 |                   0 |                  0 |                         0 |              0 |               0 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   0 |               0 |                               0 |          0 |           0 |          0 |           0 |              0 |
| 13 |        0     |     0         |       0   |     0        |      0        |          0 |                   0 |                  0 |                         0 |              0 |               0 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   0 |               0 |                               0 |          0 |           0 |          0 |           0 |              0 |
| 14 |        0     |     0         |       0   |     0        |      0        |          0 |                   0 |                  0 |                         0 |              0 |               0 |                0 |                        0 |                 0 |                 0 |                    0 |                    0 |                     0 |                     0 |                 0 |                    0 |                   0 |                0 |                  0 |                0 |                 0 |                   0 |               0 |                               0 |          0 |           0 |          0 |           0 |              0 |

>>>display(time_df.head())
|    |   batch_time |   epoch_time |   setup_time |
|---:|-------------:|-------------:|-------------:|
|  0 |      3.83961 |      4.18079 |      930.976 |
|  1 |     13.6241  |     13.9616  |      405.777 |
|  2 |    152.279   |    152.747   |    10592.4   |
|  3 |    364.004   |    364.5     |    26533.8   |
|  4 |     30.2401  |     30.6295  |     1840.11  |

Generate Classic CNN Model Data

People normally use pre-defined classic CNN structures instead of creating their own. We can also generate data for these specific models instead of random generated CNN models. This will help us increase the prediction accuracy.

For valid pre-defined model names check here.

For each pre-trained models, the input shape cannot be changed if we want to fine-tune on pre-trained weights.

from model_level_utils_cnn import ClassicModelTrainData

cmtd = ClassicModelTrainData(
		batch_sizes=[2, 4],
		optimizers=["sgd", "rmsprop", "adam"],
		losses=["mse", "msle", "poisson", "categorical_crossentropy"])

model_data = cmtd.get_train_data('VGG16', input_shape=None, output_size=1000, progress=True)

batch_sizes = [i['batch_size'] for i in model_data]
optimizers = [i['optimizer'] for i in model_data]
losses = [i['loss'] for i in model_data]
times = [i['batch_time'] for i in model_data]

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# set plot style: grey grid in the background:
sns.set(style="darkgrid")

data = pd.DataFrame(list(zip(batch_sizes, optimizers, losses, times)), columns = ['batch_size', 'optimizer', 'loss', 'batch_time'])

sns.set(font_scale = 2)
ax = sns.catplot(x="optimizer", y="batch_time",
             hue="loss", col="batch_size",
             data=data, kind="bar",
             height=10, aspect=1)
plt.show()

It is interesting to observe that loss functions play less important role in terms of training time spending. Whereas batch sizes and optimizers have much apparent impact as we expected.

Generate RNN Model

Generate training time data for recurrent network.

put python steps here

Training Time Prediction Quick Start Examples

Predict Dense Model Training Time Using Model Level Structure

Make a prediction for a feed forward network training time.

Build a regression model with generated dataframe data from Dense Model Data Generation.

import numpy as np

# use data to train a ML model
test_ratio = 0.2
df_index = df.index.tolist()
np.random.shuffle(df_index)

middle_index = int(df.shape[0] * test_ratio)
test_idx = df_index[:middle_index]
train_idx = df_index[middle_index:]

df_train = df.iloc[train_idx]
df_test = df.iloc[test_idx]

# we need to train 2 models, one to predict batch runtime, one to predict setup time
# combine both will be the true training time of a model
feature_cols = df.columns.tolist()[:-3] #last 3 columns are target columns
target_col = 'batch_time'
setup_col = 'setup_time'

x_train = df_train[feature_cols].to_numpy()
y_batch_train = np.array(df_train[target_col].tolist())
y_setup_train = np.array(df_train[setup_col].tolist())

x_test = df_test[feature_cols].to_numpy()
y_batch_test = np.array(df_test[target_col].tolist())
y_setup_test = np.array(df_test[setup_col].tolist())

# build a regression dense model for batch time prediction
from keras.models import Sequential
from keras.layers import Dense

batch_model = Sequential()
batch_model.add(
    Dense(200, input_dim=x_train.shape[1], kernel_initializer='normal', activation='relu')
)
batch_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
batch_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
batch_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
batch_model.add(Dense(1, kernel_initializer='normal'))
# Compile model
batch_model.compile(loss='mean_squared_error', optimizer='adam')

history_batch = batch_model.fit(
	x_train,
	y_batch_train,
	batch_size=16,
	epochs=50,
	validation_data=(x_test, y_batch_test),
	verbose=True
)

Plot training history.

# summarize history for loss
plt.plot(history_batch.history['loss'])
plt.plot(history_batch.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train',  'test'], loc='upper left')
plt.show()

Plot prediction vs test.

# plot predictions vs true for batch model
batch_y_pred = batch_model.predict(x_test)
batch_y_pred = batch_y_pred.reshape(batch_y_pred.shape[0],  )
plt.scatter(batch_y_pred, y_batch_test)
plt.show()

Now we train regression model for setup time model.

# build a dense model for setup time prediction
setup_model = Sequential()
setup_model.add(
    Dense(200, input_dim=x_train.shape[1], kernel_initializer='normal', activation='relu')
)
setup_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
setup_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
setup_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
setup_model.add(Dense(1, kernel_initializer='normal'))
# Compile model
setup_model.compile(loss='mean_squared_error', optimizer='adam')
history_setup = setup_model.fit(
	x_train,
	y_setup_train,
	batch_size=16,
	epochs=45,
	validation_data=(x_test, y_setup_test),
	verbose=True
)

Plot training history.

# summarize history for loss
plt.plot(history_setup.history['loss'])
plt.plot(history_setup.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train',  'test'], loc='upper left')
plt.show()

Plot prediction vs test.

# plot predictions vs true for setup time model
setup_y_pred = setup_model.predict(x_test)
setup_y_pred = setup_y_pred.reshape(setup_y_pred.shape[0],  )
plt.scatter(setup_y_pred, y_setup_test)
plt.show()

Predict Dense Model Training Time Using Unit Counts

Now we do not convert model structure into features, instead, we simply use the sum of all units of all dense layers as the model feature. It turns out works great and gives much more flexibility.

from model_level_utils import convert_dense_data

cdd = convert_dense_data()
# model_data from data generation step
dense_data, times_data, Scaler = cdd.convert_model_config(model_data, data_type='Units', min_max_scaler=True)

Take of look of dense_data and times_data

>>>print(dense_data)
[[0.20932152 0.05511811 0. ... 0. 0. 0. ]
 [0.90319687 0.11811024 0. ... 0. 1. 0. ]
 [0.68745291 0.02362205 0. ... 0. 0. 0. ]
 ...
 [0.54267877 0.05511811 0. ... 0. 0. 1. ]
 [0.09124179 0. 0. ... 0. 0. 0. ]
 [0.46047863 1. 0. ... 1. 0. 0. ]]

>>>print(times_data)
[ 4.50062752 15.86818695 9.13286209
  ...
  4.87327576 2.39396095 18.14508438]

Train a regression model with dense_data and times_data

from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization

x_train, x_test, y_train, y_test = train_test_split(dense_data, times_data, test_size=0.2, random_state=42)

batch_model = Sequential()
batch_model.add(Dense(2000, input_dim=x_train.shape[1], kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(1, kernel_initializer='normal'))

# Compile model
batch_model.compile(loss='mean_squared_error', optimizer='adam')

history_batch = batch_model.fit(x_train, y_train, batch_size=16, epochs=20, validation_data=(x_test, y_test), verbose=True)

# summarize history for loss
plt.plot(history_batch.history['loss'])
plt.plot(history_batch.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train',  'test'], loc='upper left')
plt.show()

batch_y_pred = batch_model.predict(x_test)
batch_y_pred = batch_y_pred.reshape(batch_y_pred.shape[0],  )
plt.scatter(batch_y_pred, y_test)
plt.show()

Predict CNN Model Training Time Using Model Level Structure

Make a prediction for a convolutional network training time.

Build a regression model with generated dataframe data from CNN Model Data Generation.

We first flatten dataframes into training data.

import numpy as np

x = np.array([
	data_df.to_numpy().reshape(
	model_data_dfs[0].shape[0] * model_data_dfs[0].shape[1],
	)  for data_df in model_data_dfs
])

y = np.array(time_df.batch_time.tolist())

Build and train regression model.

from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

batch_model = Sequential()
batch_model.add(
Dense(2000, input_dim=x_train.shape[1], kernel_initializer='normal', activation='relu')
)
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(2000, kernel_initializer='normal', activation='relu'))
batch_model.add(BatchNormalization())
batch_model.add(Dense(1, kernel_initializer='normal'))
# Compile model
batch_model.compile(loss='mean_squared_error', optimizer='adam')

history_batch = batch_model.fit(
	x_train, y_train, batch_size=16, epochs=20, validation_data=(x_test, y_test), verbose=True
	)

Predict CNN Model Training Time Using FLOPs

We first get FLOPs features.

# Convert raw data into data points

# model_data from data generation step for CNN
scaler_conv_data, times_data_conv2d, scaler = ccd.convert_model_config(model_data, layer_num_upper=105, data_type='FLOPs', min_max_scaler=True)

Build regression model and train.

# train data

x_train, x_test, y_train, y_test = train_test_split(
scaler_conv_data, times_data_conv2d, test_size=0.1, random_state=0)

batch_model = keras.Sequential()
batch_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
batch_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
batch_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
batch_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
batch_model.add(Dense(200, kernel_initializer='normal', activation='relu'))
batch_model.add(Dense(1, kernel_initializer='normal'))

# Compile model
batch_model.compile(loss='mean_squared_error', optimizer='adam')


history_batch = batch_model.fit(
x_train, y_train, batch_size=16, epochs=15, validation_data=(x_test, y_test), verbose=True)

Predict RNN Model Training Time

Make a prediction for a recurrent network training time.

put python steps here

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Images		Images
docs		docs
experiments		experiments
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
env_detect.py		env_detect.py
model_level_utils.py		model_level_utils.py
model_level_utils_bidirectional_rnn.py		model_level_utils_bidirectional_rnn.py
model_level_utils_cnn.py		model_level_utils_cnn.py
model_level_utils_rnn.py		model_level_utils_rnn.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training Cost Calculator (TCC)

Application

Community

Roadmap

Research Docs

GPU Environment Check

Data Generation Quick Start Examples

Training Time Prediction Quick Start Examples

About

Releases

Packages

Languages

License

marko-polo-cheno/ML-training-cost-calculator

Folders and files

Latest commit

History

Repository files navigation

Training Cost Calculator (TCC)

Application

Community

Roadmap

Research Docs

GPU Environment Check

Data Generation Quick Start Examples

Training Time Prediction Quick Start Examples

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages