AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification

Gui-Song Xia1, Jingwen Hu1,2, Fan Hu1,2, Baoguang Shi3, Xiang Bai3, Yanfei Zhong1, Liangpei Zhang1

1. State Key Lab. LIESMARS, Wuhan University, Wuhan 430079, China
2. EIS, Wuhan University, Wuhan, 430079, China
3. EIC, Huazhong University of Science and Technology, Wuhan 430074, China.


published on IEEE Trans. on Geoscience and Remote Sensing, Vol. 55, No.7, pp.3965 - 3981, 2017.

- Abstract -

Aerial scene classification, which aims to automatically label an aerial image with a specific semantic category, is a fundamental problem for understanding high-resolution remote sensing imagery. In recent years, it has become a dynamic task in remote sensing area and numerous algorithms have been proposed for this task, including many machine learning and data-driven approaches. However, the existing datasets for aerial scene classification like UC-Merced dataset and WHU-RS19 are with relatively small sizes, and the results on them are already saturated. This largely limits the development of scene classification algorithms. This paper describes the Aerial Image Dataset (AID): a large-scale dataset for aerial scene classification. The goal of AID is to advance the state-of-the-arts in scene classification of remote sensing images. For creating AID, we collect and annotate more than ten thousands aerial scene images. In addition, a comprehensive review of the existing aerial scene classification techniques as well as recent widely-used deep learning methods is given. Finally, we provide a performance analysis of typical aerial scene classification and deep learning approaches on AID, which can be served as the baseline results on this benchmark.

- AID: a new dataset - (download on Onedrive or BaiduPan)

AID is a new large-scale aerial image dataset, by collecting sample images from Google Earth imagery. Note that although the Google Earth images are post-processed using RGB renderings from the original optical aerial images, it has proven that there is no significant difference between the Google Earth images with the real optical aerial images even in the pixel-level land use/cover mapping. Thus, the Google Earth images can also be used as aerial images for evaluating scene classification algorithms.

The new dataset is made up of the following 30 aerial scene types: airport, bare land, baseball field, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks and viaduct. All the images are labelled by the specialists in the field of remote sensing image interpretation, and some samples of each class are shown in Fig.1. In all, the AID dataset has a number of 10000 images within 30 classes.

The images in AID are actually multi-source, as Google Earth images are from different remote imaging sensors. This brings more challenges for scene classification than the single source images like UC-Merced dataset. Moreover, all the sample images per each class in AID are carefully chosen from different countries and regions around the world, mainly in China, the United States, England, France, Italy, Japan, Germany, etc., and they are extracted at different time and seasons under different imaging conditions, which increases the intra-class diversities of the data. Some samples of each class are shown in Fig.1.


Figure 1: Samples of the AID dataset: three examples of each semantic scene class are shown.


- Experimental study -

- Baseline methods (code download on Onedrive or BaiduPan)

We evaluate the following three kinds of scene classification methods:

  1. Low-level methods: Scale Invariant Feature Transform (SIFT), Local Binary Pattern (LBP), Color Histogram (CH) and GIST.
  2. Mid-level methods: Bag of Visual Words (BoVW), Spatial Pyramid Matching (SPM), Locality-constrained Linear Coding (LLC), Probabilistic Latent Semantic Analysis (pLSA), Latent Dirichlet allocation (LDA), Improved Fisher kernel (IFK) and Vector of Locally Aggregated Descriptors (VLAD) combined with three local feature descriptors (i.e., SIFT, LBP and CH).
  3. High-level methods: CaffeNet, VGG-VD-16 and GoogLeNet.

- Testing datasets

We conduct the experiments on three popular scene classification datasets as well as our AID dataset:
  1. UC-Merced dataset, contains 21 scene classes and 100 samples of size 256x256 in each class.
  2. WHU-RS19 dataset, has 19 different scene classes and 50 samples of size 600x600 in each class.
  3. RSSCN7 dataset, contains 7 scene classes and 400 samples of size 400x400 in each class.
  4. AID dataset, has 30 different scene classes and about 200 to 400 samples of size 600x600 in each class.
The AID dataset can be downloaded here.

- Evaluation protocols

To compare the classification quantitatively, we compute the common used measures: overall accuracy (OA), which is defined as the number of correctly predicted images divided by the total number of predicted images. It is a direct measure to reveal the classification performance on the whole dataset.

To compute OA, we adopt two different settings for each tested dataset in the supervised classification process. To compare the performances fairly, we adopt the Support Vector Machine (SVM) with linear kernel as our classifier. For the RSSCN7 dadaset and our benchmark dataset, we fix the ratio of the number of training set to be 20% and 50% respectively and the left for testing, while for UC-Merced dataset, the ratios are set to be 50% and 80% respectively. For the WHU-RS19 dataset, the ratios are fixed at 40% and 60% respectively. To compute the overall accuracy, we randomly split the datasets into training sets and testing sets for evaluation, and repeat it ten times to reduce the influence of the randomness and obtain reliable results. The OA is computed for each run, and the final result is reported as the mean and standard deviation of OA from the individual run.

- Baseline results

Table.1. Overall accuracy (OA) of different low-level methods on the UC-Merced dataset, the WHU-RS19 dataset, the RSSCN7 dataset and the AID dataset.

Table.2. Overall accuracy (OA) of different mid-level methods on the UC-Merced dataset, the WHU-RS19 dataset, the RSSCN7 dataset and the AID dataset.

Table.3. Overall accuracy (OA) of different high-level methods on the UC-Merced dataset, the WHU-RS19 dataset, the RSSCN7 dataset and the AID dataset.


- Reference -

  1. G.-S. Xia, et. al. AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification, IEEE Trans. on Geoscience and Remote Sensing, Vol. 55, No.7, pp.3965 - 3981, 2017
  2. Q. Hu, W. Wu, T. Xia, Q. Yu, P. Yang, Z. Li, and Q. Song, Exploring the use of google earth imagery and object-based methods in land use/cover mapping, Remote Sensing, vol. 5, no. 11, pp. 6026-6042,2013.
  3. G. Cheng, J. Han, L. Guo, Z. Liu, S. Bu, and J. Ren, Effective and efficient midlevel visual elementsoriented land-use classi cation using vhr remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 8, pp. 4238{4249, 2015.
  4. G. Cheng, J. Han, P. Zhou, and L. Guo, Multi-class geospatial object detection and geographic image classi cation based on collection of part detectors, ISPRS Journal of Photogrammetry and Remote Sensing, vol. 98, pp. 119{132, 2014.
  5. F. Hu, G.-S. Xia, J. Hu, and L. Zhang, Transferring deep convolutional neural networks for the scene classi cation of high-resolution remote sensing imagery, Remote Sensing, vol. 7, no. 11, pp. 14680-14707, 2015.
  6. Y. Yang and S. Newsam, Spatial pyramid co-occurrence for image classi cation, in IEEE International Conference on Computer Vision (ICCV). IEEE, 2011, pp. 1465-1472.
  7. V. Risojevic and Z. Babic, Fusion of global and local descriptors for remote sensing image classi cation, IEEE Geoscience and Remote Sensing Letters, vol. 10, no. 4, pp. 836-840, 2013.
  8. Y. Yang and S. Newsam, Geographic image retrieval using local invariant features, IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 2, pp. 818-832, 2013.
  9. A. M. Cheriyadat, Unsupervised feature learning for aerial scene classi cation, IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 1, pp. 439-451, 2014.
  10. L.-J. Zhao, P. Tang, and L.-Z. Huo, Land-use scene classi cation using a concentric circle-structured multiscale bag-of-visual-words model, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 12, pp. 4620-4631, 2014.
  11. S. Chen and Y. Tian, Pyramid of spatial relatons for scene-level land use classi cation, IEEE Trans- actions on Geoscience and Remote Sensing, vol. 53, no. 4, pp. 1947-1957, 2015.
  12. J. Hu, T. Jiang, X. Tong, G.-S. Xia, and L. Zhang, A benchmark for scene classi cation of high spatial resolution remote sensing imagery, in IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2015, pp. 5003-5006.
  13. F. Hu, G.-S. Xia, Z. Wang, X. Huang, L. Zhang, and H. Sun, Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classfication, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 5, pp. 2015-2030, 2015.
  14. M. Castelluccio, G. Poggi, C. Sansone, and L. Verdoliva, Land use classi cation in remote sensing images by convolutional neural networks, arXiv preprint arXiv:1508.00092, 2015.
  15. O. A. B. Penatti, K. Nogueira, and J. A. dos Santos, Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, June 2015.