convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.

Machine: 6-core Intel Core i7-5930K CPU @ 3.50GHz + NVIDIA Titan X + Ubuntu 14.04 x86_64

##Imagenet Winners Benchmarking I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers.

AlexNet (One Weird Trick paper) - Input 128x3x224x224

Library	Class	Time (ms)	forward (ms)	backward (ms)
Nervana-fp16	ConvLayer	92	29	62
CuDNN[R3]-fp16	cudnn.SpatialConvolution	96	30	66
CuDNN[R3]-fp32	cudnn.SpatialConvolution	96	32	64
Nervana-fp32	ConvLayer	101	32	69
fbfft	fbnn.SpatialConvolution	104	31	72
cudaconvnet2*	ConvLayer	177	42	135
CuDNN[R2] *	cudnn.SpatialConvolution	231	70	161
Caffe (native)	ConvolutionLayer	324	121	203
TensorFlow	conv2d	326	96	230
Torch-7 (native)	SpatialConvolutionMM	342	132	210
CL-nn (Torch)	SpatialConvolutionMM	963	388	574
Caffe-CLGreenTea	ConvolutionLayer	1442	210	1232

Overfeat [fast] - Input 128x3x231x231

Library	Class	Time (ms)	forward (ms)	backward (ms)
CuDNN[R3]-fp16	cudnn.SpatialConvolution	313	107	206
CuDNN[R3]-fp32	cudnn.SpatialConvolution	326	113	213
fbfft	SpatialConvolutionCuFFT	342	114	227
Nervana-fp16	ConvLayer	355	112	242
Nervana-fp32	ConvLayer	398	124	273
cudaconvnet2*	ConvLayer	723	176	547
CuDNN[R2] *	cudnn.SpatialConvolution	810	234	576
Caffe	ConvolutionLayer	823	355	468
Torch-7 (native)	SpatialConvolutionMM	878	379	499
CL-nn (Torch)	SpatialConvolutionMM	963	388	574
TensorFlow	conv2d	1084	316	768
Caffe-CLGreenTea	ConvolutionLayer	2857	616	2240

OxfordNet [Model-A] - Input 64x3x224x224

Library	Class	Time (ms)	forward (ms)	backward (ms)
Nervana-fp16	ConvLayer	529	167	362
Nervana-fp32	ConvLayer	590	180	410
CuDNN[R3]-fp16	cudnn.SpatialConvolution	615	179	436
CuDNN[R3]-fp32	cudnn.SpatialConvolution	615	196	418
fbfft	SpatialConvolutionCuFFT	1092	355	737
cudaconvnet2*	ConvLayer	1229	408	821
CuDNN[R2] *	cudnn.SpatialConvolution	1099	342	757
Caffe	ConvolutionLayer	1068	323	745
Torch-7 (native)	SpatialConvolutionMM	1105	350	755
CL-nn (Torch)	SpatialConvolutionMM	3437	875	2562
Caffe-CLGreenTea	ConvolutionLayer	5620	988	4632
TensorFlow	conv2d	OOM	OOM	OOM

GoogleNet V1 - Input 128x3x224x224

Library	Class	Time (ms)	forward (ms)	backward (ms)
Nervana-fp16	ConvLayer	283	85	197
Nervana-fp32	ConvLayer	322	90	232
CuDNN[R3]-fp32	cudnn.SpatialConvolution	431	117	313
CuDNN[R3]-fp16	cudnn.SpatialConvolution	501	109	392
Caffe	ConvolutionLayer	1935	786	1148
CL-nn (Torch)	SpatialConvolutionMM	7016	3027	3988
Caffe-CLGreenTea	ConvolutionLayer	9462	746	8716
TensorFlow	conv2d	OOM	OOM	OOM

Layer-wise Benchmarking (Last Updated April 2015)

###Spatial Convolution layer (3D input 3D output, densely connected)

forward + backprop (wrt input and weights)

Original Library	Class/Function Benchmarked	Time (ms)	forward (ms)	backward (ms)
fbfft	SpatialConvolutionCuFFT	256	101	155
cuda-convnet2 *	ConvLayer	977	201	776
cuda-convnet**	pylearn2.cuda_convnet	1077	312	765
CuDNN R2 *	cudnn.SpatialConvolution	1019	269	750
Theano	CorrMM	1225	407	818
Caffe	ConvolutionLayer	1231	396	835
Torch-7	SpatialConvolutionMM	1265	418	877
DeepCL	ConvolutionLayer	6280	2648	3632
cherry-picking****	best per layer	235	79	155

This table is NOT UPDATED For TITAN-X. These numbers below were on Titan Black and are here only for informational and legacy purposes.

Original Library	Class/Function Benchmarked	Time (ms)	forward (ms)	backward (ms)
Theano (experimental)***	conv2d_fft	1178	304	874
Torch-7	nn.SpatialConvolutionBHWD	1892	581	1311
ccv	ccv_convnet_layer	809+bw	809
Theano (legacy)	conv2d	70774	3833	66941

* indicates that the library was tested with Torch bindings of the specific kernels.
** indicates that the library was tested with Pylearn2 bindings.
*** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
**** The last row shows results obtainable when choosing the best-performing library for each layer.
L1 - Input: 128x128 Batch-size 128, Feature maps: 3->96, Kernel Size: 11x11, Stride: 1x1
L2 - Input: 64x64 Batch-size 128, Feature maps: 64->128, Kernel Size: 9x9, Stride: 1x1
L3 - Input: 32x32 Batch-size 128, Feature maps: 128->128, Kernel Size: 9x9, Stride: 1x1
L4 - Input: 16x16 Batch-size 128, Feature maps: 128->128, Kernel Size: 7x7, Stride: 1x1
L5 - Input: 13x13 Batch-size 128, Feature maps: 384->384, Kernel Size: 3x3, Stride: 1x1
The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5)

#####Breakdown

forward

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library	Class/Function Benchmarked	L1	L2	L3	L4	L5	Total
fbfft	SpatialConvolutionCuFFT	57	27	6	2	9	101
cuda-convnet2 *	ConvLayer	36	113	40	4	8	201
cuda-convnet**	pylearn2.cuda_convnet	38	183	68	7	16	312
CuDNN R2	cudnn.SpatialConvolution	56	143	53	6	11	269
Theano	CorrMM	91	143	121	24	28	407
Caffe	ConvolutionLayer<Dtype>	93	136	116	24	27	396
Torch-7	nn.SpatialConvolutionMM	94	149	123	24	28	418
DeepCL	ConvolutionLayer	738	1241	518	47	104	2648
cherry-picking****	best per layer	36	27	6	2	8	79

backward (gradInput + gradWeight)

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library	Class/Function Benchmarked	L1	L2	L3	L4	L5	Total
fbfft	SpatialConvolutionCuFFT	76	45	12	4	18	155
cuda-convnet2 *	ConvLayer	103	467	162	15	29	776
cuda-convnet**	pylearn2.cuda_convnet	136	433	147	15	34	765
CuDNN R2	cudnn.SpatialConvolution	139	401	159	19	32	750
Theano	CorrMM	179	405	174	29	31	818
Caffe	ConvolutionLayer<Dtype>	200	405	172	28	30	835
Torch-7	nn.SpatialConvolutionMM	206	432	178	29	32	877
DeepCL	ConvolutionLayer	484	2144	747	59	198	3632
cherry-picking****	best per layer	76	45	12	4	18	155

Name		Name	Last commit message	Last commit date
Latest commit History 377 Commits
CUV		CUV
TorontoDeepLearning-convnet		TorontoDeepLearning-convnet
caffe		caffe
ccv		ccv
chainer		chainer
cltorch		cltorch
convnet.js		convnet.js
cuda-convnet2		cuda-convnet2
cxxnet		cxxnet
deepcl		deepcl
eblearn		eblearn
glconv		glconv
greentea		greentea
matlab-DeepLearnToolbox		matlab-DeepLearnToolbox
nervana		nervana
nnforge		nnforge
tensorflow		tensorflow
theano		theano
torch7		torch7
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

convnet-benchmarks

Layer-wise Benchmarking (Last Updated April 2015)

forward + backprop (wrt input and weights)

forward

backward (gradInput + gradWeight)

About

Uh oh!

Releases

Packages

Languages

VittalP/convnet-benchmarks

Folders and files

Latest commit

History

Repository files navigation

convnet-benchmarks

Layer-wise Benchmarking (Last Updated April 2015)

forward + backprop (wrt input and weights)

forward

backward (gradInput + gradWeight)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages