A Novel Transformer Network with Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting

    Research output: Contribution to journalArticlepeer-review

    21 Scopus citations

    Abstract

    Earth observation is a growing research area that can capitalize on the powers of artificial intelligence for short time forecasting, a now-casting scenario. In this work, we tackle the challenge of weather forecasting using a video transformer network. Vision transformer architectures have been explored in various applications, with major constraints being the computational complexity of attention and the data-hungry training. To address these issues, we propose the use of video Swin-Transformer (VST), coupled with a dedicated augmentation scheme. Moreover, we employ gradual spatial reduction on the encoder side and cross-Attention on the decoder. The proposed approach is tested on the Weather4Cast2021 weather forecasting challenge data, which requires the prediction of 8 h ahead future frames (4 per hour) from an hourly weather product sequence. The dataset was normalized to 0-1 to facilitate the use of the evaluation metrics across different datasets. The model results in an mse score of 0.4750 when provided with training data, and 0.4420 during transfer learning without using training data, respectively.

    Original languageBritish English
    Pages (from-to)45-55
    Number of pages11
    JournalIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
    Volume17
    DOIs
    StatePublished - 2024

    Keywords

    • Encoder-decoder video architecture
    • now-casting
    • shifted window cross attention
    • video Swin-Transformer (VST)
    • weather forecasting

    Fingerprint

    Dive into the research topics of 'A Novel Transformer Network with Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting'. Together they form a unique fingerprint.

    Cite this