AI Hardware Accelerators for Deep Neural Networks Based on the Residue Number System

Student thesis: Doctoral Thesis

Abstract

The proliferation of Artificial Intelligence (AI) systems in recent years has introduced an unprecedented demand for computing power. The huge computational requirements of modern deep learning networks (DNNs) have pushed traditional computing architectures to their limit. In the era of pervasive AI, where more and more embedded and Internet-of-Things (IoT) edge devices, such as sensors, microcontrollers, and smartphones, are becoming hosts of advanced AI systems, new challenging trade-offs among energy, latency, and accuracy are imposed. General-purpose processors have thus become ineffective in terms of energy efficiency for processing at the edge, and as a result, a plethora of innovative domain-specific processing architectures have been recently developed. To enable these specialized hardware systems to keep up with the unprecedented computational requirements introduced by modern AI models, new computing paradigms that broaden our capacity to extract performance from the available hardware resources need to be sought. Among the various design choices when implementing custom AI hardware accelerators, the utilized number representation scheme directly impacts the system’s accuracy, speed, area, and energy dissipation. This thesis explores the use of alternative Number Systems (NSs) to design efficient AI hardware accelerators, focusing on the Residue Number System (RNS). Initially, the benefits and challenges regarding the utilization of various NSs in AI hardware systems are identified and a comparative analysis among them is conducted. Our findings suggest that integer data formats coupled with advanced quantization techniques provide the best compromise between model performance and hardware cost for AI inference. Motivated by these results, the remaining of the thesis delves into DNN processing architectures that utilize RNS, an energy-efficient alternative to conventional fixed-point arithmetic, as the underlying data representation. A silicon-implemented RNS DNN accelerator targeting edge-AI devices is a core contribution of this research work. The proposed architecture achieves end-to-end RNS domain processing through innovative usage of activation functions, scaling and overflow control techniques, as well as bespoke RNS low-power techniques, which enable translating the inherent arithmetic advantages of RNS into system-level performance gains. A peak energy efficiency of 4.92 TOPS/W has been measured on the prototype chips, marking a 1.33× power efficiency increase compared to the conventional fixed-point counterpart. Furthermore, this thesis contributes towards extending RNS usage in more complex NN models, such as Recurrent Neural Networks (RNNs). It proposes novel techniques to exploit or increase bit- and residue-level sparsity within the model’s parameters. A multiplier-free DNN processing architecture that leverages the properties of RNS and signed-digit encoding to increase effective throughput by exploiting bit level sparsity is also developed. Finally, a mixed-precision RNS architecture is designed. Powered by flexible arithmetic circuit implementation, the proposed RNS system is shown, for the first time, to achieve superior performance-cost tradeoffs compared to the conventional binary representation for all operation points. Using the RNS-aware mixed-precision optimization methodology, a strategy to systematically identify optimal RNS bases is introduced. Overall, the findings of this research prove that RNS, coupled with other architectural and NN model-level optimizations that can be applied orthogonally, can push the performance limits of digital DNN processing systems.
Date of Award6 Dec 2024
Original languageAmerican English
SupervisorAthanasios Stouraitis (Supervisor)

Keywords

  • Artificial intelligence
  • Deep neural networks
  • Hardware accelerators
  • Alternative arithmetic systems
  • Residue numbering system
  • Edge-AI
  • ASIC

Cite this

'