TY - GEN
T1 - Reduce Computing Complexity of Deep Neural Networks Through Weight Scaling
AU - Tolba, Mohammed F.
AU - Saleh, Hani
AU - Al-Qutayri, Mahmoud
AU - Mohammad, Baker
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Large deep neural network (DNN) models are computation and memory intensive, which limits their deployment especially on edge devices. Therefore, pruning, quantization, data sparsity and data reuse have been applied to DNNs to reduce memory and computation complexity at the expense of some accuracy loss. The reduction in the bit-precision results in loss of information, and the aggressive bit-width reduction could result in noticeable accuracy loss. This paper introduces Scaling-Weight-based Convolution (SWC) technique to reduce the DNN model size and the complexity and number of arithmetic operations. This is achieved by, using a small set of high-precision weights (maximum absolute weight 'MAW') and a large set of low-precision weights (Scaling weights 'SWs'). This results in decreasing the model size with minimum loss in accuracy compared to simply reducing the precision. Moreover, a scaling and quantized network-acceleration processor (SQNAP) is proposed based on the SWC method to achieve high-speed and low-power with reduced memory accesses. The proposed SWC eliminate >90% of the multiplications in the network. Moreover, the less important SWs are pruned, which has a small portion of the MAW. Retraining is applied in order to maintain accuracy. Full analysis for MNIST, Fashion MNIST, Cifar 10 and Cifar 100 datasets is presented for image recognition, where different DNN models are used including LeNet, ResNet, AlexNet and VGG 16.
AB - Large deep neural network (DNN) models are computation and memory intensive, which limits their deployment especially on edge devices. Therefore, pruning, quantization, data sparsity and data reuse have been applied to DNNs to reduce memory and computation complexity at the expense of some accuracy loss. The reduction in the bit-precision results in loss of information, and the aggressive bit-width reduction could result in noticeable accuracy loss. This paper introduces Scaling-Weight-based Convolution (SWC) technique to reduce the DNN model size and the complexity and number of arithmetic operations. This is achieved by, using a small set of high-precision weights (maximum absolute weight 'MAW') and a large set of low-precision weights (Scaling weights 'SWs'). This results in decreasing the model size with minimum loss in accuracy compared to simply reducing the precision. Moreover, a scaling and quantized network-acceleration processor (SQNAP) is proposed based on the SWC method to achieve high-speed and low-power with reduced memory accesses. The proposed SWC eliminate >90% of the multiplications in the network. Moreover, the less important SWs are pruned, which has a small portion of the MAW. Retraining is applied in order to maintain accuracy. Full analysis for MNIST, Fashion MNIST, Cifar 10 and Cifar 100 datasets is presented for image recognition, where different DNN models are used including LeNet, ResNet, AlexNet and VGG 16.
KW - Deep neural network
KW - Hardware acceleration
KW - Low-bit precision
KW - Quantization
UR - http://www.scopus.com/inward/record.url?scp=85142478757&partnerID=8YFLogxK
U2 - 10.1109/ISCAS48785.2022.9938015
DO - 10.1109/ISCAS48785.2022.9938015
M3 - Conference contribution
AN - SCOPUS:85142478757
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
SP - 1249
EP - 1253
BT - IEEE International Symposium on Circuits and Systems, ISCAS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Symposium on Circuits and Systems, ISCAS 2022
Y2 - 27 May 2022 through 1 June 2022
ER -