02-22[ICDE'22] PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems 论文阅读
02-07[MAPL@PLDI'19] Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations 阅读笔记
04-06[OSDI'20] A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters (BytePS) 论文阅读
04-05[arXiv'19] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 论文阅读