Reduce Computing Complexity of Deep Neural Networks Through Weight Scaling

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Large deep neural network (DNN) models are computation and memory intensive, which limits their deployment especially on edge devices. Therefore, pruning, quantization, data sparsity and data reuse have been applied to DNNs to reduce memory and computation complexity at the expense of some accuracy loss. The reduction in the bit-precision results in loss of information, and the aggressive bit-width reduction could result in noticeable accuracy loss. This paper introduces Scaling-Weight-based Convolution (SWC) technique to reduce the DNN model size and the complexity and number of arithmetic operations. This is achieved by, using a small set of high-precision weights (maximum absolute weight 'MAW') and a large set of low-precision weights (Scaling weights 'SWs'). This results in decreasing the model size with minimum loss in accuracy compared to simply reducing the precision. Moreover, a scaling and quantized network-acceleration processor (SQNAP) is proposed based on the SWC method to achieve high-speed and low-power with reduced memory accesses. The proposed SWC eliminate >90% of the multiplications in the network. Moreover, the less important SWs are pruned, which has a small portion of the MAW. Retraining is applied in order to maintain accuracy. Full analysis for MNIST, Fashion MNIST, Cifar 10 and Cifar 100 datasets is presented for image recognition, where different DNN models are used including LeNet, ResNet, AlexNet and VGG 16.

Original languageBritish English
Title of host publicationIEEE International Symposium on Circuits and Systems, ISCAS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1249-1253
Number of pages5
ISBN (Electronic)9781665484855
DOIs
StatePublished - 2022
Event2022 IEEE International Symposium on Circuits and Systems, ISCAS 2022 - Austin, United States
Duration: 27 May 20221 Jun 2022

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
Volume2022-May
ISSN (Print)0271-4310

Conference

Conference2022 IEEE International Symposium on Circuits and Systems, ISCAS 2022
Country/TerritoryUnited States
CityAustin
Period27/05/221/06/22

Keywords

  • Deep neural network
  • Hardware acceleration
  • Low-bit precision
  • Quantization

Fingerprint

Dive into the research topics of 'Reduce Computing Complexity of Deep Neural Networks Through Weight Scaling'. Together they form a unique fingerprint.

Cite this