Abstract
Artificial Intelligence (AI) is revolutionizing various sectors with its transformative potential, ranging from smart healthcare systems to autonomous vehicles. Algorithmic advancements play a crucial role in enhancing efficiency and performance, such as Vision Transformers (ViT), Hyperdimensional Computing (HDC), and Convolutional Neural Networks (CNN). However, the widespread deployment of AI applications and the big data that they require is hindered by challenges related to computing and memory efficiency, especially on edge computing devices. Traditional hardware technology and architectures face many limitations, necessitating innovative approaches to address these hurdles. The main contribution of this thesis is the demonstration of efficient hardware implementations of image processing algorithms targeted at the Internet of Things (IoTs).Building on that, this thesis employs different hardware implementations and data preprocessing techniques to reduce computational complexity on the edge. In the first contribution, the impact of quantization on the Random Spray Retinex (RSR) algorithm is studied for image enhancement. Additionally, RSR is used as a preprocessing filter before the task of semantic segmentation of low-quality urban road scenes. Additionally, an efficient implementation of RSR was proposed using Resistive Random Access Memory (RRAM) technology to address the computational complexity of RSR and test its suitability for edge devices. Results show that hyper-RRAM-CMOS technology reduces memory accesses by 99.6% and the number of arithmetic operations by 1.94x compared to its digital implementation counterpart. Considering the importance of data optimization for limited resource embedded systems, the second contribution of this thesis exploits data reuse utilizing Spatial Locality Input Data (SLID) during the inference of the ViT network. Deployed on Raspberry Pi 4, this approach achieves up to a 50% reduction in Multiply-Accumulate (MAC) operations, a reduction of inference latency and energy by 40% compared to a base-ViT, while sacrificing marginal accuracy of 1-2%.
The third contribution is to fuse Spatial Transformer Network (STN) and HDC in "SpatialHD" to enhance the HDC classification performance for image classification with minimal computational overhead. SpatialHD demonstrates an improvement over the base-HDC model for 2-dimensional data, achieving up to 9% higher accuracy with only 30% of the dataset used for training. Moreover, this approach substantially reduces computational complexity by 2.5x through the optimized use of the size for STN feature maps. Moreover, when SpatialHD was implemented on resource-constrained platforms such as the Raspberry Pi 4, results showed acceleration in inference by 3.4x and enhancement in energy efficiency by 3.2x, with an accuracy loss of ∼3% compared to conventional CNN networks.
Recognizing the importance of hardware implementation and optimization, the fourth contribution introduces an efficient hardware accelerator for the ViT network, leveraging memristor-based In-Memory Computing (IMC) technology. The design targets the memory bottleneck associated with Matrix-matrix Multiplication (MatMul) operations in the self-attention stage of the ViT. This is achieved by utilizing the approximate analog and highly parallel computations facilitated by the memristor crossbar architecture. This approach results in a reduction of approximately 10x in the number of MAC operations in transformer networks while having a ∼4.53% drop in accuracy, as validated by a circuit simulator employing NeuroSim 3.0. Overall, this thesis explores a range of data preprocessing techniques, algorithm optimization methods, and hardware implementations for image recognition applications, all aimed at addressing the critical need for on-device intelligence and enhancing the efficiency of IoT.
| Date of Award | 22 May 2024 |
|---|---|
| Original language | American English |
| Supervisor | Baker Mohammad (Supervisor) |
Keywords
- Memristor crossbar
- Multiply and accumulate operations
- MAC
- Artificial intelligence
- AI
- In-memory computing
- IMC
- RRAM
- Hardware accelerator
- Internet of Things
- Edge computing
- Vision transformer
- Attention
- Data reuse
- Data sparsity
- Efficient hardware for AI
- Input similarity
- Computational reuse
- Brain-inspired computing
- Spatial transformers
- Hyperdimensional computing
- Image classification
- Hardware implementation
- Random spray retinex
- Image enhancement
- Feature extraction
- ASIC
- Low-power digital design
Cite this
- Standard