tccl extensible collectives library in triton based on https://github.com/yifuwang/symm-mem-recipes built for CUDA MODE IRL Sep 21, 2024 by eric zhang, sage moore, clive chan presentation slides: link