Forest disturbance due to natural events, such as wildfires, represents an increasing global challenge that demands advanced analytical methods for effective detection and mitigation. To this end, integrating satellite imagery with deep learning (DL) has emerged as a powerful approach for forest wildfire detection; however, its practical use remains limited by the scarcity of large, well-labeled satellite imagery datasets. In this study, we address this issue by presenting the California Wildfire GeoImaging Dataset (CWGID), a high-resolution bi-temporal collection of over 100,000 labeled RGB "before-" and and "-after" Sentinel-2 wildfire satellite image pairs. We build and label the dataset programmatically, significantly reducing the time and manual effort usually required to create labeled datasets suitable for DL applications. Our methods include data acquisition from authoritative sources, systematic preprocessing, and an initial analysis using three pre-trained Convolutional Neural Network (CNN) architectures for two classification tasks consisting, respectively, in labeling unitemporal and bitemporal inputs as damaged or not damaged by fire. Our results show that using bi-temporal imagery as input during model training and testing can result in improved model performance, with the Early Fusion (EF) EfficientNet-B0 model achieving the highest wildfire detection accuracy of over 92%. These findings suggest that the CWGID and the streamlined programmatic methodology used to build it may help address the scarcity of labeled data for DL-based forest wildfire detection, while providing a scalable resource that could support other DL applications in environmental monitoring.
Institute for Human and Machine Cognition; Intelligent Systems and Robotics; Earth and Environmental Sciences; Hal Marcus College of Science and Engineering