Machine Learning Systems
Efficient ML Inference and Training
Over the past few years, increasing the size of machine learning models has been a common trend among practitioners to increase their learning capacity for performing human-like activities. In fact, we have reached the point where most SOTA models designed for computer vision and natural language processing tasks (e.g., large language models (LLMs) and large vision transformers (LViTs)) contain over billions of parameters, posing new challenges first for their training and then their deployment on cloud/edge devices. In this project, we explore and develop scientific methods for accelerated memory-efficient training/fine-tuning and inference of such models. Below is the list of relevant publications.
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics (Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung, Costin Iancu, Koushik Sen, Preprint 2023)
Partially-Random Initialization: A Smoking Gun for Binarization Hypothesis of BERT (Arash Ardakani, Findings of the Association for Computational Linguistics: EMNLP 2022)
Standard Deviation-Based Quantization for Deep Neural Networks (Amir Ardakani, Arash Ardakani, Brett Meyer, James J Clark, Warren J Gross, Preprint 2022)
Efficient Two-Stage Progressive Quantization of BERT (Charles Le, Arash Ardakani, Amir Ardakani, Hang Zhang, Yuyan Chen, James Clark, Brett Meyer, Warren Gross, SustaiNLP 2022)
Training Binarized Neural Networks Using Ternary Multipliers (Amir Ardakani, Arash Ardakani and Warren Gross, IEEE Design & Test 2021)
The Synthesis of XNOR Recurrent Neural Networks with Stochastic Logic (Arash Ardakani, Zhengyun Ji, Amir Ardakani, Warren Gross, NeurIPS 2019)
Learning Recurrent Binary/Ternary Weights (Arash Ardakani, Zhengyun Ji, Sean C Smithson, Brett H Meyer, Warren J Gross, ICLR 2019)
Sparsely-connected neural networks: towards efficient VLSI implementation of deep neural networks (Arash Ardakani, Carlo Condo, Warren J Gross, ICLR 2017)