Abstract
Recent advancements have demonstrated the effectiveness of retentive networks in natural language processing and high-level vision tasks. However, their potential in low-level vision, such as image super-resolution (SR), remains underexplored. In this paper, we introduce RetNetSR, a novel retentive network architecture designed specifically for image super-resolution. Our approach leverages a spatial prior derived from the Manhattan distance to enhance the Self-Attention mechanism, effectively translating the temporal decay concept of RetNet into the spatial domain. At the core of RetNetSR is the Manhattan Self-Attention module, which integrates Self-Attention with multi-layer perceptrons and depthwise convolution. Building on this, we propose the Manhattan Self-Attention Block and the Manhattan Self-Attention Group, the latter further enriched with 3 3 convolutions and Enhanced Spatial Attention modules for more effective deep feature extraction. Experimental evaluations across multiple runs on standard image SR benchmarks show that RetNetSR consistently achieves competitive or superior results, with statistical analysis (paired t-test, p 0.05) confirming that improvements—such as raising PSNR from 32.76 to 32.86 on Urban100 (2 scale) and SSIM from 0.9151 to 0.9156 on Manga109 (4 scale) are statistically significant.
| Original language | British English |
|---|---|
| Article number | 130 |
| Journal | Neural Computing and Applications |
| Volume | 38 |
| Issue number | 5 |
| DOIs | |
| State | Published - Mar 2026 |
Keywords
- Explicit spatial prior
- Image super-resolution
- Manhattan self-attention
- Retentive network
- Transformer model
Fingerprint
Dive into the research topics of 'Lightweight image super-resolution based on retentive network'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver