Abstract
The challenges of Convolutional neural networks (CNNs) based AI inference on edge devices include computing complexity, large memory requirements, and high power consumption. Researchers have pursued efficient hardware, dataflow optimization, and new algorithms to tackle these obstacles. This paper introduced a novel acceleration methodology designed to significantly enhance the processing speed and power of existing CNN architectures. The proposed approach leverages the inherent static nature of pre-trained CNN weights through the prism of linear approximation. Different weight approximation options are evaluated, including within and across kernels. By using the same weight index from different kernels for approximation, computation reuse becomes possible, as the same index from different kernels is sampled with the same data from the input feature map. The proposed approach is evaluated through the design and analysis of four CNN architectures, considering hardwarere sources, power consumption, and latency. The proposed algorithms are implemented on a gem5-based RISCV simulator, demonstrating a speedup of approximately 2× compared to the baseline. Additionally, the proposed CNN accelerators are implemented on Xilinx Kintex 7 Field Programmable Gate Array (FPGA), resulting in a 50% reduction in FPGA hardware resources. When benchmarked with AlexNet, VGG16, SqueezeNet, and ResNet, the proposed approach achieves a 50% reduction in multiplications compared to previous works while maintaining accuracy with loss of 0.9% for VGG16 and 3.1% for AlexNet.
| Original language | British English |
|---|---|
| Pages (from-to) | 1-15 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Artificial Intelligence |
| DOIs | |
| State | Accepted/In press - 2023 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- approximate computing
- Approximation algorithms
- computational reuse
- Convolutional neural networks
- Deep neural network
- Energy efficiency
- Hardware
- Hardware acceleration
- Kernel
- Linear approximation
- Memory management
Fingerprint
Dive into the research topics of 'Energy Efficient and Fast CNN Inference by Exploring Weight Approximation and Computational Reuse'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver