02-07[MAPL@PLDI'19] Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations 阅读笔记
04-06[OSDI'20] A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters 论文阅读
04-05[Arxiv'19] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 论文阅读