191 |
Monitoring vegetation dynamics in Zhongwei, an arid city of Northwest ChinaWang, Haitao 10 June 2014 (has links)
This case study used Zhongwei City in northwest China to quantify the urbanization and revegetation processes (1990-2011) through a unified sub-pixel measure of vegetation cover. Research strategies included: (1) Conduct sub-pixel vegetation mapping (1990, 1996, 2004, and 2011) with Random Forest (RF) algorithm by integrating high (OrbView-3) and medium spatial resolution (Landsat TM) data; (2) Examine simple Dark Object Subtraction (DOS) atmospheric correction method to support temporal generalization of sub-pixel mapping algorithm; (3) And characterize patterns of vegetation cover dynamics based on change detection analysis.
We found the RF algorithm, combined with simple DOS, showed good generalization capability for sub-pixel vegetation mapping. Predicted sub-pixel vegetation proportions were consistent for "pseudo-invariant" pixels. Vegetation change analysis suggested persistent urban development within the city boundary, accompanied by a continuous expansion of revegetated area at the city fringe. Urban development occurred at both the suburban and urban core areas, and was mainly shaped by transportation networks. A transition in revegetation practices was documented: the large-scale governmental revegetation programs were replaced by the commercial afforestation conducted by industries. This study showed a slight increase in vegetation cover over the time period, balanced by losses to urban expansion, and a likely severe degradation of vegetation cover due to conversion of arable land to desert vegetation. The loss of arable land and the growth of artificial desert vegetation have yielded a dynamic equilibrium in terms of overall vegetation cover during 1990 to 2011, but in the long run vegetation quality is certainly reduced. / Master of Science
|
192 |
Mapping Smallholder Forest Plantations in Andhra Pradesh, India using Multitemporal Harmonized Landsat Sentinel-2 S10 DataWilliams, Paige T. 27 January 2020 (has links)
The objective of this study was to develop a method by which smallholder forest plantations can be mapped accurately in Andhra Pradesh, India using multitemporal (intra- and inter-annual) visible and near-infrared (VNIR) bands from the Sentinel-2 MultiSpectral Instruments (MSIs). Dependency on and scarcity of wood products have driven the deforestation and degradation of natural forests in Southeast Asia. At the same time, forest plantations have been established both within and outside of forests, with the latter (as contiguous blocks) being the focus of this study. The ecosystem services provided by natural forests are different from those of plantations. As such, being able to separate natural forests from plantations is important. Unfortunately, there are constraints to accurately mapping planted forests in Andhra Pradesh (and other similar landscapes in South and Southeast Asia) using remotely sensed data due to the plantations' small size (average 2 hectares), short rotation ages (often 4-7 years for timber species), and spectral similarities to croplands and natural forests. The East and West Godavari districts of Andhra Pradesh were selected as the area for a case study. Cloud-free Harmonized Landsat Sentinel-2 (HLS) S10 data was acquired over six dates, from different seasons, as follows: December 28, 2015; November 22, 2016; November 2, 2017; December 22, 2017; March 1, 2018; and June 15, 2018. Cloud-free satellite data are not available during the monsoon season (July to September) in this coastal region. In situ data on forest plantations, provided by collaborators, was supplemented with additional training data representing other land cover subclasses in the region: agriculture, water, aquaculture, mangrove, palm, forest plantation, ground, natural forest, shrub/scrub, sand, and urban, with a total sample size of 2,230. These high-quality samples were then aggregated into three land use classes: non-forest, natural forest, and forest plantations. Image classification used random forests within the Julia Decision Tree package on a thirty-band stack that was comprised of the VNIR bands and NDVI images for all dates. The median classification accuracy from the 5-fold cross validation was 94.3%. Our results, predicated on high quality training data, demonstrate that (mostly smallholder) forest plantations can be separated from natural forests even using only the Sentinel 2 VNIR bands when multitemporal data (across both years and seasons) are used. / The objective of this study was to develop a method by which smallholder forest plantations can be mapped accurately in Andhra Pradesh, India using multitemporal (intra- and inter-annual) visible (red, green, blue) and near-infrared (VNIR) bands from the European Space Agency satellite Sentinel-2. Dependency on and scarcity of wood products have driven the deforestation and degradation of natural forests in Southeast Asia. At the same time, forest plantations have been established both within and outside of forests, with the latter (as contiguous blocks) being the focus of this study. The ecosystem services provided by natural forests are different from those of plantations. As such, being able to separate natural forests from plantations is important. Unfortunately, there are constraints to accurately mapping planted forests in Andhra Pradesh (and other similar landscapes in South and Southeast Asia) using remotely sensed data due to the plantations' small size (average 2 hectares), short rotation ages (often 4-7 years for timber species), and spectral (reflectance from satellite imagery) similarities to croplands and natural forests. The East and West Godavari districts of Andhra Pradesh were selected as the area for a case study. Cloud-free Harmonized Landsat Sentinel-2 (HLS) S10 images were acquired over six dates, from different seasons, as follows: December 28, 2015; November 22, 2016; November 2, 2017; December 22, 2017; March 1, 2018; and June 15, 2018. Cloud-free satellite data are not available during the monsoon season (July to September) in this coastal region. In situ data on forest plantations, provided by collaborators, was supplemented with additional training data points (X and Y locations with land cover class) representing other land cover subclasses in the region: agriculture, water, aquaculture, mangrove, palm, forest plantation, ground, natural forest, shrub/scrub, sand, and urban, with a total of 2,230 training points. These high-quality samples were then aggregated into three land use classes: non-forest, natural forest, and forest plantations. Image classification used random forests within the Julia DecisionTree package on a thirty-band stack that was comprised of the VNIR bands and NDVI (calculation related to greenness, i.e. higher value = more vegetation) images for all dates. The median classification accuracy from the 5-fold cross validation was 94.3%. Our results, predicated on high quality training data, demonstrate that (mostly smallholder) forest plantations can be separated from natural forests even using only the Sentinel 2 VNIR bands when multitemporal data (across both years and seasons) are used.
|
193 |
Not All Biomass is Created Equal: An Assessment of Social and Biophysical Factors Constraining Wood Availability in VirginiaBraff, Pamela Hope 19 May 2014 (has links)
Most estimates of wood supply do not reflect the true availability of wood resources. The availability of wood resources ultimately depends on collective wood harvesting decisions across the landscape. Both social and biophysical constraints impact harvesting decisions and thus the availability of wood resources. While most constraints do not completely inhibit harvesting, they may significantly reduce the probability of harvest. Realistic assessments of woody availability and distribution are needed for effective forest management and planning. This study focuses on predicting the probability of harvest at forested FIA plot locations in Virginia. Classification and regression trees, conditional inferences trees, random forest, balanced random forest, conditional random forest, and logistic regression models were built to predict harvest as a function of social and biophysical availability constraints. All of the models were evaluated and compared to identify important variables constraining harvest, predict future harvests, and estimate the available wood supply. Variables related to population and resource quality seem to be the best predictors of future harvest. The balanced random forest and logistic regressions models are recommended for predicting future harvests. The balanced random forest model is the best predictor, while the logistic regression model can be most easily shared and replicated. Both models were applied to predict harvest at recently measured FIA plots. Based on the probability of harvest, we estimate that between 2012 and 2017, 10 – 21 percent of total wood volume on timberland will be available for harvesting. / Master of Science
|
194 |
An evaluation of a data-driven approach to regional scale surface runoff modellingZhang, Ruoyu 03 August 2018 (has links)
Modelling surface runoff can be beneficial to operations within many fields, such as agriculture planning, flood and drought risk assessment, and water resource management. In this study, we built a data-driven model that can reproduce monthly surface runoff at a 4-km grid network covering 13 watersheds in the Chesapeake Bay area. We used a random forest algorithm to build the model, where monthly precipitation, temperature, land cover, and topographic data were used as predictors, and monthly surface runoff generated by the SWAT hydrological model was used as the response. A sub-model was developed for each of 12 monthly surface runoff estimates, independent of one another. Accuracy statistics and variable importance measures from the random forest algorithm reveal that precipitation was the most important variable to the model, but including climatological data from multiple months as predictors significantly improves the model performance. Using 3-month climatological, land cover, and DEM derivatives from 40% of the 4-km grids as the training dataset, our model successfully predicted surface runoff for the remaining 60% of the grids (mean R2 (RMSE) for the 12 monthly models is 0.83 (6.60 mm)). The lowest R2 was associated with the model for August, when the surface runoff values are least in a year. In all studied watersheds, the highest predictive errors were found within the watershed with greatest topographic complexity, for which the model tended to underestimate surface runoff. For the other 12 watersheds studied, the data-driven model produced smaller and more spatially consistent predictive errors. / Master of Science / Surface runoff data can be valuable to many fields, such as agriculture planning, water resource management, and flood and drought risk assessment. The traditional approach to acquire the surface runoff data is by simulating hydrological models. However, running such models always requires advanced knowledge to watersheds and computation technologies. In this study, we build a statistical model that can reproduce monthly surface runoff at 4-km grid covering 13 watersheds in Chesapeake Bay area. This model uses publicly accessible climate, land cover, and topographic datasets as predictors, and monthly surface runoff from the SWAT model as the response. We develop 12 monthly models for each month, independent to each other. To test whether the model can be applied to generalize the surface runoff for the entire study area, we use 40% of grid data as the training sample and the remainder as validation. The accuracy statistics, the annual mean R2 and RMSE are 0.83 and 6.60 mm, show our model is capable to accurately reproduce monthly surface runoff of our study area. The statistics for August model are not as satisfying as other months’ models. The possible reason is the surface runoff in August is the lowest among the year, thus there is no enough variation for the algorithm to distinguish the minor difference of the response in model building process. When applying the model to watersheds in steep terrain conditions, we need to pay attention to the results in which the error may be relatively large.
|
195 |
Application and feasibility of visible-NIR-MIR spectroscopy and classification techniques for wetland soil identificationWhatley, Caleb 10 May 2024 (has links) (PDF)
Wetland determinations require the visual identification of anaerobic soil indicators by an expert, which is a complex and subjective task. To eliminate bias, an objective method is needed to identify wetland soil. Currently, no such method exists that is rapid and easily interpretable. This study proposes a method for wetland soil identification using visible through mid-infrared (MIR) spectroscopy and classification algorithms. Wetland and non-wetland soils (n = 440) were collected across Mississippi. Spectra were measured from fresh and dried soil. Support Vector Classification and Random Forest modeling techniques were used to classify spectra with 75%/25% calibration and validation split. POWERSHAP Shapley feature selection and Gini importance were used to locate highest-contributing spectral features. Average classification accuracy was ~91%, with a maximum accuracy of 99.6% on MIR spectra. The most important features were related to iron compounds, nitrates, and soil texture. This study improves the reliability of wetland determinations as an objective and rapid wetland soil identification method while eliminating the need for an expert for determination.
|
196 |
Improving Osteological Sex Estimation Methods for the Skull: Combining Morphological Traits and Measurements Utilizing Decision Trees and Random Forest ModelingFerrell, Morgan 01 January 2024 (has links) (PDF)
Osteological sex estimation is a key component of the biological profile in forensic anthropological casework. However, there are still limitations with current methodologies for the skull as well as inadequate classification accuracies. Therefore, the purpose of this research is to improve osteological sex classification accuracies for the skull by combining morphological and metric variables into multiple models using decision trees and random forest (RF) modeling. The sample was derived from four U.S.-based skeletal collections and consisted of 403 individuals of European American and African American population affinities. Twenty-one morphological traits and 21 metric variables of the skull were selected for analysis, and intraobserver error was assessed to determine which variables should be incorporated into the models. Additionally, two-way ANOVAs and aligned rank transformation were utilized to examine the effects of sex, age, population affinity, and secular change on the variables. To generate the trees and RF models, 80% of the sample was used for model training and 20% of the sample was used for holdout validation testing. Multiple decision trees and RF models were generated that incorporated morphological, metric, and combined variables. Models were generated for the African American and European American samples, as well as for the pooled populations. The predictive accuracy of the models was assessed utilizing the holdout validation sample and the out-of-bag error. Overall, the majority of the combined data decision trees and RF models achieved higher classification accuracies compared to the separate morphological and metric models. Additionally, the pooled and European American models frequently achieved higher accuracies compared to the African American models. The combined data models also resulted in higher accuracies compared to popular osteological sex estimation methods for the skull. Therefore, the combined data models have great potential for use by forensic anthropologists and bioarchaeologists for estimating osteological sex from the skull.
|
197 |
Identification and physical characterisation of sarcomere pattern formation using supervised machine learningSbosny, Leon 16 May 2024 (has links)
To analyse the large amounts of image data that are generated by biologists with modern microscopes, machine learning algorithms became increasingly popular.
In collaboration with Frank Schnorrer and Cl ́ement Rodier at Institut de Biologie du Developpement de Marseille, as well as Ian Estabrook at Physics of Life, TU Dresden, this thesis applies the supervised machine learning algorithms ‘Support Vector Machine’ and ‘Random Forest’ to data obtained from fluorescence microscope images of myofibrillogenesis in Drosophila pupae with the aim to identify sarcomeres, the structures that makeup the highly regular myofibrils.
For the implementation in MATLAB, methods such as ‘feature engineering’ are used to increase the performance by reinterpreting the input data and using physical characteristics of the sample system. The project also identifies the problem of class imbalance between positive and negative examples in the input data and counters it with a redefined learning cost. In conclusion, the use of machine learning algorithms for image analysis in biophysics is a very promising way to reduce manual labour. The choice of the best learning algorithm depends on the purpose the obtained output data should serve.
|
198 |
LAND COVER CHANGE AND ITS IMPLICATIONS FOR ECOSYSTEM SERVICES IN THE GREATER SHAWNEE NATIONAL FORESTThapa, Saroj 01 August 2024 (has links) (PDF)
This dissertation employed a random forest algorithm for Land Use Land Cover (LULC) classification and proposed and tested a modified forest transition framework in the Greater Shawnee National Forest (GSNF), Illinois. Subsequently, a machine learning-based multilayer artificial neural network was used to assess the LULC of the GSNF between 2019 and 2050 utilizing IPCC-based projected climate data. The accuracy of LULC classification was evaluated using Kappa statistics and Producer and User accuracies. The Stepwise Regression, Support Vector Machine, Random Forest, and Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) models were compared to quantify terrestrial carbon stock. Similarly, InVEST, FRAGSTAT, and Maxent models were used for habitat quality analysis and to estimate the probability of bobcat distribution. The terrestrial carbon stock, habitat quality, and bobcat distribution were quantified across three spatial resolutions, 30, 60, and 90 meters, to assess if there were substantial differences in the represented trends of these measures of Ecosystem Services (ES). The LULC analysis showed varying levels of temporal and spatial variabilities with increased deciduous forest (1.35%), mixed forest (26.40%), agricultural land (2.15%), and urbanized areas (6.70%) between 1990 and 2019. Notably, the LULC intensity analysis exhibited stability from 2001 to 2019, consistent with the forest transition framework proposed in the study. However, when integrating temperature and precipitation projections derived from the IPCC, notable changes in forest cover were observed from the western to eastern sectors within the central region of the GSNF. In all IPCC based scenarios, overall forest cover (deciduous, evergreen, and mixed) declined. The classification accuracy of the LULC assessment ranged from 92.9% to 95.9%, accompanied by kappa statistics ranging from 0.89 to 0.94. The prediction accuracy of LULC change was validated for the year 2019, ranging from 77.99% to 84.67%, with kappa statistics between 0.79 and 0.81, depending on the scenario, and predictions were extended to the year 2050. The terrestrial carbon stock in GSNF varied from 15 to 212 MgC per hectare across different models. The RF model performed best at 90 meters resolution with FIA-based data, with RMSE values of 17.45, 18.73, and 20.05, and R-squared values of 0.53, 0.48, and 0.43 for 2001, 2010, and 2019, respectively. The findings indicated that while the InVEST model provides a broad and generalized approach to quantifying carbon storage, the random forest (RF) model is essential for obtaining more accurate and precise estimations. LULC has gradually become more fragmented over time, leading to a decline in average habitat quality from 1990 (0.724±0.215) to 2019 (0.689±0.192). Regardless of increased forest density, the proportion of high-quality habitats (habitat quality score above 0.83) decreased by 5% during the study period. Interestingly, there was a notable increase in the probability score of bobcat distribution from 1990 (0.327±0.123) to 2019 (0.347±0.084). The study revealed a strong correlation between habitat quality and the probability of bobcat distributions, indicating a mutual influence between the two factors. This dissertation suggests that the LULC change of the GSNF follows the forest transition framework and has a significant implication on ecosystem services, such as carbon storage and habitat quality. These results are instrumental for sustainable land management to optimize terrestrial carbon stock and habitat quality, thereby mitigating the impacts of climate change.
|
199 |
Post-Processing National Water Model Long-Range Forecasts with Random Forest Regression in the Cloud to Improve Forecast Accuracy for Decision-Makers and Water ManagersAnderson, Jacob Matthew 19 December 2024 (has links) (PDF)
Post-processing bias correction of streamflow forecasts can be useful in the hydrologic modeling workflow to fine-tune forecasts for operations, water management, and decision-making. Hydrologic model runoff simulations include errors, uncertainties, and biases, leading to less accuracy and precision for applications in real-world scenarios. We used random forest regression to correct biases and errors in streamflow predictions from the U.S. National Water Model (NWM) long-range streamflow forecasts, considering U.S. Geological Survey (USGS) gauge station measurements as a proxy for true streamflow. We used other features in model training, including watershed characteristics, time fraction of year, and lagged streamflow values, to help the model perform better in gauged and ungauged areas. We assessed the effectiveness of the bias correction technique by comparing the difference between forecast and actual streamflow before and after the bias correction model was employed. We also explored advances in hydroinformatics and cloud computing by creating and testing this bias correction capability within the Google Cloud Console environment to avoid slow and unnecessary data downloads to local devices, thereby streamlining the data processing and storage within the cloud. This demonstrates the possibility of integrating our method into the NWM real-time forecasting workflow. Results indicate reasonable bias correction is possible using the random forest regression machine learning technique. Differences between USGS discharge and NWM forecasts are less than the original difference observed after being run through the random forest model. The main issue concerning the forecasts from the NWM is that the error increases further from the reference time or start of the forecast period. The model we created shows significant improvement in streamflow the further the times get from the reference time. The error is reduced and more uniform throughout all the time steps of the 30-day long-range forecasts.
|
200 |
Comparação entre métodos de imputação de dados em diferentes intensidades amostrais na série homogênea de precipitação pluvial da ESALQ / Comparison between data imputation methods at different sample intensities in the ESALQ homogeneous rainfall seriesGasparetto, Suelen Cristina 07 June 2019 (has links)
Problemas frequentes nas análises estatísticas de informações meteorológicas são a ocorrência de dados faltantes e ausência de conhecimento acerca da homogeneidade das informações contidas no banco de dados. O objetivo deste trabalho foi testar e classificar a homogeneidade da série de precipitação pluvial da estação climatológica convencional da ESALQ, no período de 1917 a 1997, e comparar três métodos de imputação de dados, em diferentes intensidades amostrais (5%, 10% e 15%) de informações faltantes, geradas de forma aleatória. Foram utilizados três testes de homogeneidade da série: Pettitt, Buishand e normal padrão. Para o \"preenchimento\" das informações faltantes, foram comparados três métodos de imputação múltipla: PMM (Predictive Mean Matching), random forest e regressão linear via método bootstrap, em cada intensidade amostral de informações faltantes. Os métodos foram utilizados por meio do pacote MICE (Multivariate Imputation by Chained Equations) do R. A comparação entre cada procedimento de imputação foi feita por meio da raiz do erro quadrático médio, índice de exatidão de Willmott e o índice de desempenho. A série de chuva foi entendida como de classe 1, ou seja, \"útil\" - Nenhum sinal claro de falta de homogeneidade foi aparente e, o método que resultou em menores valores da raiz quadrada dos erros e maiores índices foi o PMM, em especial na intensidade de 10% de informações faltantes. O índice de desempenho para os três métodos de imputação de dados em todas as intensidades de observações faltantes foi considerado \"Péssimo\" / Frequent problems in the statistical analyzes of meteorological information are the occurrence of missing data and missing of knowledge about the homogeneity of the information contained in the data base. The objective of this work was to test and classify the homogeneity of the rainfall series of the conventional climatological station of the ESALQ from 1917 to 1997 and to compare three methods of data imputation in different sample intensities (5%, 10% and 15%), of missing data, generated in a random way. Three homogeneity tests were used: Pettitt, Buishand and standard normal. For the \"filling\" of missing information, three methods of multiple imputation were compared: PMM (Predictive Mean Matching), random forest and linear regression via bootstrap method, in each sampling intensity of missing information. The methods were used by means of the MICE (Multivariate Imputation by Chained Equations) package of R. The comparison of each imputation procedure was done by root mean square error, Willmott\'s accuracy index and performance index. The rainfall series was understood to be class 1 \"useful\" - No clear sign of lack of homogeneity was apparent and the method that resulted in smaller values of the square root of the errors and higher indexes was the PMM, in particular the intensity of 10% of missing information. The performance index for the three methods of imputation the data at all missing observation intensities was considered \"Terrible\"
|
Page generated in 0.0788 seconds