TY - JOUR
T1 - Deep Neural Networks-Based Weight Approximation and Computation Reuse for 2-D Image Classification
AU - Tolba, Mohammed F.
AU - Tesfai, Huruy Tekle
AU - Saleh, Hani
AU - Mohammad, Baker
AU - Al-Qutayri, Mahmoud
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2022
Y1 - 2022
N2 - Deep Neural Networks (DNNs) are computationally and memory intensive, which present a big challenge for hardware, especially for resource-constrained devices such as Internet-of-Things (IoT) nodes. This paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques for image recognition applications. First, starting from the pre-Trained network, then the DNNs weights are approximated based on the linear and quadratic approximation methods during the retraining phase to reduce the DNN model size and number of arithmetic operations. Then, the DNNs weights are replaced with the linear/quadratic coefficients to execute the inference so that different DNNs weights can be computed using the same coefficients. That leads to a repetition of the weights, which enables the reuse of the DNN sub-computations (computational reuse) and leverages the same data (data reuse) to reduce DNNs computations memory accesses, and improve energy efficiency, albeit at the cost of increased training time. Complete analysis for MNIST, Fashion MNIST, CIFAR 10, CIFAR 100, and tiny ImageNet datasets is presented for image recognition, where different DNN models are used, including LeNet, ResNet, AlexNet, and VGG16. Our results show that the linear approximation achieves 1211.3× , 21.8× , 700× , and 19.3× on LeNet-5 MNIST, LeNet Fashion MNIST, VGG16 and ResNet-20. respectively, with small accuracy loss. Compared to the state-of-The-Art Row Stationary (RS) method, the proposed architecture saved 54% of the total number of adders and multipliers needed. Overall, the proposed approach is suitable for IoT edge devices as it reduces computing complexity, memory size, and memory access with a small impact on accuracy.
AB - Deep Neural Networks (DNNs) are computationally and memory intensive, which present a big challenge for hardware, especially for resource-constrained devices such as Internet-of-Things (IoT) nodes. This paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques for image recognition applications. First, starting from the pre-Trained network, then the DNNs weights are approximated based on the linear and quadratic approximation methods during the retraining phase to reduce the DNN model size and number of arithmetic operations. Then, the DNNs weights are replaced with the linear/quadratic coefficients to execute the inference so that different DNNs weights can be computed using the same coefficients. That leads to a repetition of the weights, which enables the reuse of the DNN sub-computations (computational reuse) and leverages the same data (data reuse) to reduce DNNs computations memory accesses, and improve energy efficiency, albeit at the cost of increased training time. Complete analysis for MNIST, Fashion MNIST, CIFAR 10, CIFAR 100, and tiny ImageNet datasets is presented for image recognition, where different DNN models are used, including LeNet, ResNet, AlexNet, and VGG16. Our results show that the linear approximation achieves 1211.3× , 21.8× , 700× , and 19.3× on LeNet-5 MNIST, LeNet Fashion MNIST, VGG16 and ResNet-20. respectively, with small accuracy loss. Compared to the state-of-The-Art Row Stationary (RS) method, the proposed architecture saved 54% of the total number of adders and multipliers needed. Overall, the proposed approach is suitable for IoT edge devices as it reduces computing complexity, memory size, and memory access with a small impact on accuracy.
KW - Approximate computing
KW - Computational reuse
KW - Data reuse
KW - DNN
KW - IoT
UR - http://www.scopus.com/inward/record.url?scp=85127065117&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2022.3161738
DO - 10.1109/ACCESS.2022.3161738
M3 - Article
AN - SCOPUS:85127065117
SN - 2169-3536
VL - 10
SP - 41551
EP - 41563
JO - IEEE Access
JF - IEEE Access
ER -