Implementations of KAN variations.
installed python 3.10 + nvcc
pip install .
Install conda https://conda.io/projects/conda/en/latest/user-guide/install/index.html
conda create -n lkan python==3.10
conda activate lkan
conda install cuda-nvcc
pip install .
To run mnist select config in main.py
and run main.py
.
To view charts, run tensorboard --logdir .experiments/
MLP (31.8M parameters) - 51 it/s
KANLinear0 (32.3 M parameters) - 4.3 it/s
KANLinear (31M parameters) - 36.5 it/s
KANLinearFFT (33M parameters) - 40 it/s
KANLinearFFT CUDA (50% memory of KANLinearFFT for forward and backward) = 23 it/s
See examples/
continual_training_adam.ipynb
, continual_training_lbfgs.ipynb
- continual training
- update_grid on cuda raise error (torch.linalg.lstsq assume full rank on cuda, only one algorithm) - solved temporary, moved calculating lstsq to cpu
- update_grid_from_samples in original KAN run model multiple times, is it necessary?
- parameters counting, is grid parameter or not?
- MLP training is almost instant, but KAN train slow on start
- Base structure
- KAN simple implementation
- KAN trainer
- train KAN on test dataset
- remove unnecessary dependencies in requirements.txt
- test update_grid and "Other possibilities are: (a) the grid is learnable with gradient descent" from paper.
- Regularization
- Compare with MLP
- Grid extension
- MNIST
- CIFAR10
- KAN ResNet?
- KAN as CNN filter?
- KAN in VIT?
- Fourier KAN?
- GraphKAN
- Mixing KAN and normal Layers.
- pruning
- test continual learning
- docs and examples - write notebooks like in KAN repo.
- KAN vs MLP in "LLM" - test?
- CUDA kernel for b_splines?
- unit tests?
@misc{liu2024kan,
title={KAN: Kolmogorov-Arnold Networks},
author={Ziming Liu and Yixuan Wang and Sachin Vaidya and Fabian Ruehle and James Halverson and Marin Soljačić and Thomas Y. Hou and Max Tegmark},
year={2024},
eprint={2404.19756},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Original KAN repo - base idea
efficient-kan - KANLinear and optimizations