Cited By
View all- Cheng SLin JEmani MRaskar SForeman SXie ZVishwanath VKandemir M(2024)Thorough Characterization and Analysis of Large Transformer Model Training At-ScaleProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390348:1(1-25)Online publication date: 21-Feb-2024
- Jangda AYadav MLee IChabbi MSteuwer M(2024)Fast Kronecker Matrix-Matrix Multiplication on GPUsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638489(390-403)Online publication date: 2-Mar-2024
- Chen CLi XZhu QDuan JSun PZhang XYang CTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication PartitioningProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651379(178-191)Online publication date: 27-Apr-2024
- Show More Cited By