HPC dev, web3 dev, CUDA dev.
Use C++/CUDA/Python programming, for:
- SIMD instruction set programming for CPU-side optimization (AVX/AVX2.0/AMX/SVML).
- CUDA low-level operators dev (general cuda-core kernels, cuBLAS, wmma, PTX).
- LLM inference system, especially hpc operator support.
Currently High Performance Computing interning in the Paddle R&D team at Baidu.
If there is any issue or project I can help you with, email me: [email protected].