Skip to main content
Log in

MadFormer: multi-attention-driven image super-resolution method based on Transformer

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

While the Transformer-based method has demonstrated exceptional performance in low-level visual processing tasks, it has a strong modeling ability only locally, thereby neglecting the importance of spatial feature information and high-frequency details within the channel for super-resolution. To enhance feature information and improve the visual experience, we propose a multi-attention-driven image super-resolution method based on a Transformer network, called MadFormer. Initially, the low-resolution image undergoes an initial convolution operation to extract shallow features while being fed into a residual multi-attention block incorporating channel attention, spatial attention, and self-attention mechanisms. By employing multi-head self-attention, the proposed method aims to capture global–local feature information; channel attention and spatial attention are utilized to effectively capture high-frequency features in both the channel and spatial domains. Subsequently, deep feature information is inputted into a dynamic fusion block that dynamically fuses multi-attention extracted features, facilitating the aggregation of cross-window information. Ultimately, the shallow and deep feature information is fused via convolution operations, yielding high-resolution images through high-quality reconstruction. Comprehensive quantitative and qualitative comparisons with other advanced algorithms demonstrate the substantial advantages of the proposed approach in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) for image super-resolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability statement

The datasets used in this study are publicly available online. The codes and datasets included during the current study are available on the project website: https://github.com/L-beibei/MadFormer.

References

  1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 28 (2017)

  2. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: ICCV, pp. 1833–1844 (2021)

  3. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)

  4. Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: CVPR, pp. 22367–22377 (2023)

  5. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: ECCV, pp. 184–199 (2014)

  6. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., : Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, pp. 4681–4690 (2017)

  7. Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: CVPR, pp. 1646–1654 (2016)

  8. Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: CVPRW, pp. 136–144 (2017)

  9. Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image super-resolution. In: CVPR, pp. 1637–1645 (2016)

  10. Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residual network. In: CVPR, pp. 3147–3155 (2017)

  11. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image super-resolution. In: CVPR, pp. 2472–2481 (2018)

  12. Lu, E., Hu, X.: Image super-resolution via channel attention and spatial attention. Appl. Intell. 52(2), 2260–2268 (2022)

    Article  MathSciNet  Google Scholar 

  13. Zhao, H., Kong, X., He, J., Qiao, Y., Dong, C.: Efficient image super-resolution using pixel attention. In: ECCV, pp. 56–72 (2020)

  14. Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: ECCV, pp. 286–301 (2018)

  15. Wen, J., Zha, L.: Dual-attention guided multi-scale network for single image super-resolution. Appl. Intell. 52(11), 12258–12271 (2022)

    Article  Google Scholar 

  16. Niu, B., Wen, W., Ren, W., Zhang, X., Yang, L., Wang, S., Zhang, K., Cao, X., Shen, H.: Single image super-resolution via a holistic attention network. In: ECCV, pp. 191–207 (2020)

  17. Mei, Y., Fan, Y., Zhou, Y., Huang, L., Huang, T.S., Shi, H.: Image super-resolution with cross-scale non-local attention and exhaustive self-exemplars mining. In: CVPR, pp. 5690–5699 (2020)

  18. Shi, M., Kong, S., Zao, B., Tan, M.: Gcpan: an adaptive global cross-scale prior attention network for image super-resolution. Neural Computing and Applications, 1–18 (2023)

  19. Li, Y., Fan, Y., Xiang, X., Demandolx, D., Ranjan, R., Timofte, R., Van Gool, L.: Efficient and explicit modelling of image hierarchies for image restoration. In: CVPR, pp. 18278–18289 (2023)

  20. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)

  21. Ding, M., Qu, A., Zhong, H., Lai, Z., Xiao, S., He, P.: An enhanced vision transformer with wavelet position embedding for histopathological image classification. Pattern Recognition 140 (2023)

  22. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: ICML, pp. 10347–10357 (2021)

  23. Liu, L., Prost, J., Zhu, L., Papadakis, N., Liò, P., Schönlieb, C.-B., Aviles-Rivero, A.I.: Scotch and soda: A transformer video shadow detection framework. In: CVPR, pp. 10449–10458 (2023)

  24. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: ECCV, pp. 205–218 (2023)

  25. Zhang, H., Li, F., Xu, H., Huang, S., Liu, S., Ni, L.M., Zhang, L.: Mp-former: Mask-piloted transformer for image segmentation. In: CVPR, pp. 18074–18083 (2023)

  26. Zhang, X., Zeng, H., Guo, S., Zhang, L.: Efficient long-range attention network for image super-resolution. In: ECCV, pp. 649–667 (2022)

  27. Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: CVPR, pp. 12299–12310 (2021)

  28. Chai, X., Shao, F., Jiang, Q., Ying, H.: Tccl-net: Transformer-convolution collaborative learning network for omnidirectional image super-resolution. Knowl.-Based Syst. 274, 110625 (2023)

    Article  Google Scholar 

  29. Conde, M.V., Choi, U.-J., Burchi, M., Timofte, R.: Swin2sr: Swinv2 transformer for compressed image super-resolution and restoration. In: ECCV, pp. 669–687 (2023)

  30. Zhang, D., Huang, F., Liu, S., Wang, X., Jin, Z.: Swinfir: Revisiting the swinir with fast fourier convolution and improved training for image super-resolution. arXiv preprint arXiv:2208.11247 (2022)

  31. Li, Y., Chen, Y.: Revisiting dynamic convolution via matrix decomposition. In: International Conference on Learning Representations (2021)

  32. Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: CVPRW, pp. 136–144 (2017)

  33. Bevilacqua, M., Roumy, A., Guillemot, C.M., Alberi-Morel, M.-L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: BMVC, BMVA Press (2012)

  34. Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7, pp. 711–730 (2012)

  35. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV, pp. 416–423 (2001)

  36. Huang, J.-B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: CVPR, pp. 5197–5206 (2015)

  37. Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications 76, 21811–21838 (2017)

    Article  Google Scholar 

  38. Dai, T., Cai, J., Zhang, Y., Xia, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: CVPR, pp. 11065–11074 (2019)

  39. Zhou, S., Zhang, J., Zuo, W., Loy, C.C.: Cross-scale internal graph neural network for image super-resolution. Adv. Neural. Inf. Process. Syst. 33, 3499–3509 (2020)

    Google Scholar 

  40. Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: CVPR, pp. 3517–3526 (2021)

  41. Xia, B., Hang, Y., Tian, Y., Yang, W., Liao, Q., Zhou, J.: Efficient non-local contrastive attention for image super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2759–2767 (2022)

  42. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)

  43. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: CVPR, pp. 11534–11542 (2020)

  44. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: ECCV, pp. 3–19 (2018)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62172137 and 61976042, the Innovative Talents Program for Liaoning Universities under Grant LR2019020, the Liaoning Revitalization Talents Program under Grant XLYC2007023 and the Dalian Youth Science and Technology Star Program under Grant 2022RQ086.

Author information

Authors and Affiliations

Authors

Contributions

F. Sun and B. Zhu conceived this study. B. Liu conducted experiments and wrote the initial manuscript. T. Li and J. Sun reviewed and edited it.

Corresponding author

Correspondence to Jing Sun.

Ethics declarations

Ethical approval

There is no ethical approval needed in the present study.

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Communicated by X. Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, B., Sun, J., Zhu, B. et al. MadFormer: multi-attention-driven image super-resolution method based on Transformer. Multimedia Systems 30, 78 (2024). https://doi.org/10.1007/s00530-024-01276-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01276-1

Keywords

Navigation