TY - GEN
T1 - Faster and Lighter LLMs
T2 - 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
AU - Chavan, Arnav
AU - Magazine, Raghav
AU - Kushwaha, Shubham
AU - Debbah, Mérouane
AU - Gupta, Deepak
N1 - Publisher Copyright:
© 2024 International Joint Conferences on Artificial Intelligence. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluate various compression techniques, providing practical insights for efficient LLM deployment in a unified setting. The empirical analysis on LLaMA(/2)-7B highlights the effectiveness of these methods. Drawing from survey insights, we identify current limitations and discuss potential future directions to improve LLM inference efficiency. We release the codebase to reproduce the results presented in this paper at https://github.com/nyunAI/Faster-LLM-Survey.
AB - Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluate various compression techniques, providing practical insights for efficient LLM deployment in a unified setting. The empirical analysis on LLaMA(/2)-7B highlights the effectiveness of these methods. Drawing from survey insights, we identify current limitations and discuss potential future directions to improve LLM inference efficiency. We release the codebase to reproduce the results presented in this paper at https://github.com/nyunAI/Faster-LLM-Survey.
UR - https://www.scopus.com/pages/publications/85204301028
M3 - Conference contribution
AN - SCOPUS:85204301028
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 7980
EP - 7988
BT - Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
A2 - Larson, Kate
Y2 - 3 August 2024 through 9 August 2024
ER -