This repository is about some CNN Architecture's implementations for cifar10.
I just use Keras and Tensorflow to implementate all of these CNN models.
- Python (3.5.2)
- Keras (2.0.8)
- tensorflow-gpu (1.3.0)
- The first CNN model: LeNet
- Network in Network
- Vgg19 Network
- Residual Network
- Wide Residual Network
- ResNeXt
- DenseNet
- SENet
network | dropout | preprocess | GPU | params | training time | accuracy(%) |
---|---|---|---|---|---|---|
Lecun-Network | - | meanstd | GTX980TI | 62k | 30 min | 76.27 |
Network-in-Network | 0.5 | meanstd | GTX1060 | 0.96M | 1 h 30 min | 91.25 |
Network-in-Network_bn | 0.5 | meanstd | GTX980TI | 0.97M | 2 h 20 min | 91.75 |
Vgg19-Network | 0.5 | meanstd | GTX980TI | 45M | 4 hours | 93.53 |
Residual-Network50 | - | meanstd | GTX980TI | 1.7M | 8 h 58 min | 94.10 |
Wide-resnet 16x8 | - | meanstd | GTX1060 | 11.3M | 11 h 32 min | 95.14 |
DenseNet-100x12 | - | meanstd | GTX980TI | 0.85M | 30 h 40 min | 95.15 |
ResNeXt-4x64d | - | meanstd | GTX1080TI | 20M | 22 h 50 min | 95.51 |
SENet(ResNeXt-4x64d) | - | meanstd | GTX1080 | 20M | - | - |
Now, I fixed some bugs and used 1080TI to retrain all of the following models.
In particular:
Change the batch size according to your GPU's memory.
Modify the learning rate schedule may imporve the results of accuracy!
network | GPU | params | batch size | epoch | training time | accuracy(%) |
---|---|---|---|---|---|---|
Lecun-Network | GTX1080TI | 62k | 128 | 200 | 30 min | 76.25 |
Network-in-Network | GTX1080TI | 0.97M | 128 | 200 | 1 h 40 min | 91.63 |
Vgg19-Network | GTX1080TI | 45M | 128 | 200 | 2 h 17 min | 93.40 |
Residual-Network50 | GTX1080TI | 1.7M | 128 | 200 | 4 h 29 min | 94.44 |
Wide-resnet 16x8 | GTX1080TI | 11.3M | 128 | 200 | 5 h 1 min | 95.13 |
DenseNet-100x12 | GTX1080TI | 0.85M | 64 | 250 | 19 h 2 min | 94.91 |
ResNeXt-4x64d | GTX1080TI | 20M | 120 | 250 | 21 h 3 min | 95.19 |
SENet(ResNeXt-4x64d) | GTX1080TI | 20M | 120 | 250 | 21 h 57 min | 95.60 |
Because I don't have enough machines to train the larger networks.
So I only trained the smallest network described in the paper.
You can see the results in liuzhuang13/DenseNet and prlz77/ResNeXt.pytorch