Land Cover and Land Cover Dynamics - Classifying global biomes and monitoring their dynamics ...

Banner



The MODIS Global Land Cover Product (MOD12Q1)

The MODIS global land cover product is designed to provide information related to the state and seasonal-to-decadal scale dynamics in global land cover. The product consists of two suites of science data sets (SDS’s). MODIS land cover type (MOD12Q1), includes five main layers in which land cover is mapped using different classification systems. MODIS land cover dynamics (MOD12Q2) includes seven layers, and has been developed to support studies of seasonal and interannual variation (phenology) in land surface and ecosystem properties. Both products are global. In collections 1, 3 and 4 MOD12 was produced at a spatial resolution of 1-km. In collection 5, the spatial resolution has been increased to 500-m.

The MOD12Q1 global land cover product includes a set of internally consistent layers depicting different land cover classifications. These layers include the International Geosphere-Biosphere Programme (IGBP; Loveland and Belward, 1997) classification; a 14-class system developed at the University of Maryland (UMD; Hansen et al., 2000); a 6-biome system used by the MODIS LAI/FPAR algorithm (Myneni et al., 1997; Lotsch et al, 2001); the biome classification proposed by Running et al. (1995); and the plant functional type classification described by Bonan et al. (2002). Secondary labels (the most likely alternative IGBP class) and classification confidences (McIver and Friedl, 2001) are also provided for each pixel, and a lower spatial-resolution climate modeling grid (CMG) is produced for users who do not require the spatial detail afforded by main land cover product. The CMG provides both the dominant land cover type in each cell, as well as the sub-grid scale frequency distribution of land cover classes within each cell.


Algorithm Description

The classification strategy used by the MODIS land cover product employs a supervised decision tree classification algorithm called C4.5 (Quinlan 1993). This approach is supported by a variety of recent work demonstrating the utility of decision trees for land cover classification problems in remote sensing (DeFries et al., 1998; Friedl and Brodley, 1997; Friedl et al., 1999; 2000; 2002; Hansen et al., 1996; 2000; McIver and Friedl, 2001; 2002). C4.5 is a univariate decision tree that makes no assumptions regarding the frequency distribution of the data being classified. This attribute is particularly important at global scales, because virtually all classes of interest exhibit multimodal frequency distributions and therefore violate assumptions required by parametric supervised approaches such as the maximum likelihood classifier (Schowengerdt, 1997). In addition to being nonparametric, C4.5 possesses several other traits that make it particularly useful for classification of land cover from MODIS data at global scales. First, C4.5 includes elegant and robust solutions for dealing with missing data. This attribute is especially crucial at high latitudes where a substantial proportion of the input MODIS data are missing because of low solar zenith angles, and in the tropics where missing data are frequent because of cloud cover. Second, C4.5 includes mature methods for “pruning” the estimated classifications, thereby avoiding classifications that are overfit to training data. A key feature of the MODIS land cover classification algorithm is a technique known as “boosting” (Freund 1995). Boosting is one of numerous ensemble classification methods developed in the mid- to late 1990’s that have been widely shown to enhance classification accuracy (Bauer and Kohavi, 1999; Dietterich, 2000). Boosting also serves to minimize the sensitivity of the classification algorithm to both noise in feature data and labeling errors in training data.

Training Data

The MOD12Q1 algorithm relies heavily on a database of land cover exemplars for classification estimation. Because global land cover is highly diverse, a key requirement of this data base is that it be geographically and ecologically comprehensive, thereby capturing the global variability of land cover. To meet these needs, the System for Terrestrial Ecosystem Parameterization (STEP) was developed (Muchoney et al. 1999). STEP is designed to provide a classification-free and versatile database for site-based characterization of global land cover.

The current STEP database (i.e., for collection 5) consists of roughly 2000 sites distributed globally. However, the database is dynamic and requires ongoing maintenance and augmentation to meet the needs of the MODIS global land cover mapping effort. Sites included in the database are derived from manual interpretation of Landsat Thematic Mapper (TM) data, augmented by ancillary map data, as available.
Input Features

The input features to C4.5 are designed to exploit two main dimensions of information. First, spectral information is provided by the seven MODIS land bands (channels 1-7), supplemented by the enhanced vegetation index (EVI) product (Huete et al., 2002), and in collection 5, the MODIS land surface temperature (LST) product (MOD11; Wan et al., 2002). All of these input data are cloud-cleared and atmospherically corrected, and are representative of 8-day periods. To minimize artifacts introduced by variation in view geometry, we use surface reflectance and EVI data from nadir BRDF-adjusted reflectance (NBAR) data provided by the MODIS BRDF/albedo product (Schaaf et al., 2002). To exploit temporal information related to land surface phenology, the algorithm ingests an annual time series of NBAR, LST, and EVI data. In addition, we include a set of annual metrics including the annual mean, minimum, and maximum for each of the input features identified above.

Ensemble Decision Tree Classification

Classifications are estimate using ensemble decisions trees, using the C4.5 desicion tree classification algorithm in combination with a technique called "boosting." C4.5 estimates a supervised classification using training data from the STEP database. To do this, the training data are recursively partitioned and decision surfaces are estimated using a metric called the information gain ratio, which measures the reduction in entropy in the data produced by a split. Using this metric, tests at each node within a tree are estimated to maximize the reduction in entropy in the descendant nodes.

The goal of boosting is to improve the classification accuracy of a base classification algorithm (in this case C4.5). Studies conducted using remotely sensed data have shown that boosting tends to reduce misclassification error rates by about 25 percent on average (Friedl et al., 1999; McIver and Friedl, 2001). Boosting algorithms operate by estimating multiple classifications in an iterative fashion. At each iteration, a weight is assigned to each training observation. Those observations that were misclassified in the previous iteration are assigned a heavier weight in the next iteration, thereby forcing the classification algorithm to concentrate on those observations that are more difficult to classify. Each iteration therefore produces a new classification tree, with the intent of correcting misclassification errors committed in the previous iteration.

Estimating Class-Conditional and Posterior Probabilities

The final step in the classification process estimates class-conditional and posterior probabilities for each class at each pixel. , To do this we use results from a statistical examination of boosting (Friedman et al., 2000). Using this result, probabilities of class conidional membership can be obtained from boosting in the same way that logistic regression provides the probability of a binary response variable based on one or more predictor variables. Because we have estimates for the class conditional probabilities at each pixel, we can use Bayes’ Rule in association with prior probabilities for specific classes to improve our classification results. The resulting posterior probability associated with the most likely class is included in MOD12Q1 and provides a spatially explicit measure of classification quality at each pixel. See McIver and Friedl (2001, 2002) and Friedl et al. (2002) for details


NH-slice

© 2009 Boston University