As a fundamental problem in earth observation, aerial scene classification tries to assign a specific semantic label to an aerial image. In recent years, the deep convolutional neural networks (CNNs) have shown advanced performances in aerial scene classification. The successful pretrained CNNs can he transferable to aerial images. However, global CNN activations may lack geometric invariance and, therefore, limit the improvement of aerial scene classification. To address this problem, Xiaoqiang Lu’s research team proposes a deep scene representation to achieve the invariance of CNN features and further enhance the discriminative power. The proposed method: 1) extracts CNN activations from the last convolutional layer of pretrained CNN; 2) performs multiscale pooling (MSP) on these activations; and 3) builds a holistic representation by the Fisher vector method. MSP is a simple and effective multiscale strategy, which enriches multiscale spatial information in affordable computational time. The proposed representation is particularly suited at aerial scenes and consistently outperforms global CNN activations without requiring feature adaptation. Extensive experiments on five aerial scene data sets indicate that the proposed method, even with a simple linear classifier, can achieve the state-of-the-art performance.
(Original research article "IEEE Transactions on Geoscience and Remote Sensing Vol. 57, Issue 7, pp. 4799-4809 (2019) https://ieeexplore.ieee.org/document/8636541")