TY - GEN
T1 - A High-performance RNS LSTM block
AU - Sakellariou, Vasilis
AU - Paliourasy, Vassilis
AU - Kouretasy, Ioannis
AU - Saleh, Hani
AU - Stouraitis, Thanos
N1 - Funding Information:
This work is supported by the SRC project 2020-AH-2983 and the [CIRA-2020-053] fund and was conducted at the SOC center of Khalifa University.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The Residue Number System (RNS) has been proposed as an alternative to conventional binary representations for use in AI hardware accelerators. While it has been successfully utilized in applications targeting Convolutional Neural Networks (CNNs), its usage in other network models such as Recurrent Neural Networks (RNNs) has been set back due to the difficulty of implementing more complex activations functions like tanh and sigmoid (sigma) in the RNS domain. In this paper, we seek to extend its usage in such models, and in particular LSTM networks, by providing efficient RNS implementations of the activation functions. To this aim, we derive improved accuracy piecewise linear approximations of the tanh and sigma functions using the minimax approach and propose a fully RNS-based hardware realization. We show that our approximations can effectively mitigate accuracy degradation in LSTM networks compared to naive approximations, while the RNS LSTM block can be up to 40% more efficient in terms of performance per area unit compared to a binary counterpart, when used in high performance-targeted accelerators.
AB - The Residue Number System (RNS) has been proposed as an alternative to conventional binary representations for use in AI hardware accelerators. While it has been successfully utilized in applications targeting Convolutional Neural Networks (CNNs), its usage in other network models such as Recurrent Neural Networks (RNNs) has been set back due to the difficulty of implementing more complex activations functions like tanh and sigmoid (sigma) in the RNS domain. In this paper, we seek to extend its usage in such models, and in particular LSTM networks, by providing efficient RNS implementations of the activation functions. To this aim, we derive improved accuracy piecewise linear approximations of the tanh and sigma functions using the minimax approach and propose a fully RNS-based hardware realization. We show that our approximations can effectively mitigate accuracy degradation in LSTM networks compared to naive approximations, while the RNS LSTM block can be up to 40% more efficient in terms of performance per area unit compared to a binary counterpart, when used in high performance-targeted accelerators.
KW - AI Hardware Accelerator
KW - LSTM
KW - Residue Number System (RNS)
UR - http://www.scopus.com/inward/record.url?scp=85142468127&partnerID=8YFLogxK
U2 - 10.1109/ISCAS48785.2022.9937633
DO - 10.1109/ISCAS48785.2022.9937633
M3 - Conference contribution
AN - SCOPUS:85142468127
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
SP - 1264
EP - 1268
BT - IEEE International Symposium on Circuits and Systems, ISCAS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Symposium on Circuits and Systems, ISCAS 2022
Y2 - 27 May 2022 through 1 June 2022
ER -