• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 339
  • 26
  • 21
  • 13
  • 8
  • 5
  • 5
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 507
  • 507
  • 272
  • 270
  • 147
  • 135
  • 129
  • 128
  • 113
  • 92
  • 88
  • 77
  • 76
  • 74
  • 59
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
191

Monitoring vegetation dynamics in Zhongwei, an arid city of Northwest China

Wang, Haitao 10 June 2014 (has links)
This case study used Zhongwei City in northwest China to quantify the urbanization and revegetation processes (1990-2011) through a unified sub-pixel measure of vegetation cover. Research strategies included: (1) Conduct sub-pixel vegetation mapping (1990, 1996, 2004, and 2011) with Random Forest (RF) algorithm by integrating high (OrbView-3) and medium spatial resolution (Landsat TM) data; (2) Examine simple Dark Object Subtraction (DOS) atmospheric correction method to support temporal generalization of sub-pixel mapping algorithm; (3) And characterize patterns of vegetation cover dynamics based on change detection analysis. We found the RF algorithm, combined with simple DOS, showed good generalization capability for sub-pixel vegetation mapping. Predicted sub-pixel vegetation proportions were consistent for "pseudo-invariant" pixels. Vegetation change analysis suggested persistent urban development within the city boundary, accompanied by a continuous expansion of revegetated area at the city fringe. Urban development occurred at both the suburban and urban core areas, and was mainly shaped by transportation networks. A transition in revegetation practices was documented: the large-scale governmental revegetation programs were replaced by the commercial afforestation conducted by industries. This study showed a slight increase in vegetation cover over the time period, balanced by losses to urban expansion, and a likely severe degradation of vegetation cover due to conversion of arable land to desert vegetation. The loss of arable land and the growth of artificial desert vegetation have yielded a dynamic equilibrium in terms of overall vegetation cover during 1990 to 2011, but in the long run vegetation quality is certainly reduced. / Master of Science
192

Mapping Smallholder Forest Plantations in Andhra Pradesh, India using Multitemporal Harmonized Landsat Sentinel-2 S10 Data

Williams, Paige T. 27 January 2020 (has links)
The objective of this study was to develop a method by which smallholder forest plantations can be mapped accurately in Andhra Pradesh, India using multitemporal (intra- and inter-annual) visible and near-infrared (VNIR) bands from the Sentinel-2 MultiSpectral Instruments (MSIs). Dependency on and scarcity of wood products have driven the deforestation and degradation of natural forests in Southeast Asia. At the same time, forest plantations have been established both within and outside of forests, with the latter (as contiguous blocks) being the focus of this study. The ecosystem services provided by natural forests are different from those of plantations. As such, being able to separate natural forests from plantations is important. Unfortunately, there are constraints to accurately mapping planted forests in Andhra Pradesh (and other similar landscapes in South and Southeast Asia) using remotely sensed data due to the plantations' small size (average 2 hectares), short rotation ages (often 4-7 years for timber species), and spectral similarities to croplands and natural forests. The East and West Godavari districts of Andhra Pradesh were selected as the area for a case study. Cloud-free Harmonized Landsat Sentinel-2 (HLS) S10 data was acquired over six dates, from different seasons, as follows: December 28, 2015; November 22, 2016; November 2, 2017; December 22, 2017; March 1, 2018; and June 15, 2018. Cloud-free satellite data are not available during the monsoon season (July to September) in this coastal region. In situ data on forest plantations, provided by collaborators, was supplemented with additional training data representing other land cover subclasses in the region: agriculture, water, aquaculture, mangrove, palm, forest plantation, ground, natural forest, shrub/scrub, sand, and urban, with a total sample size of 2,230. These high-quality samples were then aggregated into three land use classes: non-forest, natural forest, and forest plantations. Image classification used random forests within the Julia Decision Tree package on a thirty-band stack that was comprised of the VNIR bands and NDVI images for all dates. The median classification accuracy from the 5-fold cross validation was 94.3%. Our results, predicated on high quality training data, demonstrate that (mostly smallholder) forest plantations can be separated from natural forests even using only the Sentinel 2 VNIR bands when multitemporal data (across both years and seasons) are used. / The objective of this study was to develop a method by which smallholder forest plantations can be mapped accurately in Andhra Pradesh, India using multitemporal (intra- and inter-annual) visible (red, green, blue) and near-infrared (VNIR) bands from the European Space Agency satellite Sentinel-2. Dependency on and scarcity of wood products have driven the deforestation and degradation of natural forests in Southeast Asia. At the same time, forest plantations have been established both within and outside of forests, with the latter (as contiguous blocks) being the focus of this study. The ecosystem services provided by natural forests are different from those of plantations. As such, being able to separate natural forests from plantations is important. Unfortunately, there are constraints to accurately mapping planted forests in Andhra Pradesh (and other similar landscapes in South and Southeast Asia) using remotely sensed data due to the plantations' small size (average 2 hectares), short rotation ages (often 4-7 years for timber species), and spectral (reflectance from satellite imagery) similarities to croplands and natural forests. The East and West Godavari districts of Andhra Pradesh were selected as the area for a case study. Cloud-free Harmonized Landsat Sentinel-2 (HLS) S10 images were acquired over six dates, from different seasons, as follows: December 28, 2015; November 22, 2016; November 2, 2017; December 22, 2017; March 1, 2018; and June 15, 2018. Cloud-free satellite data are not available during the monsoon season (July to September) in this coastal region. In situ data on forest plantations, provided by collaborators, was supplemented with additional training data points (X and Y locations with land cover class) representing other land cover subclasses in the region: agriculture, water, aquaculture, mangrove, palm, forest plantation, ground, natural forest, shrub/scrub, sand, and urban, with a total of 2,230 training points. These high-quality samples were then aggregated into three land use classes: non-forest, natural forest, and forest plantations. Image classification used random forests within the Julia DecisionTree package on a thirty-band stack that was comprised of the VNIR bands and NDVI (calculation related to greenness, i.e. higher value = more vegetation) images for all dates. The median classification accuracy from the 5-fold cross validation was 94.3%. Our results, predicated on high quality training data, demonstrate that (mostly smallholder) forest plantations can be separated from natural forests even using only the Sentinel 2 VNIR bands when multitemporal data (across both years and seasons) are used.
193

Not All Biomass is Created Equal: An Assessment of Social and Biophysical Factors Constraining Wood Availability in Virginia

Braff, Pamela Hope 19 May 2014 (has links)
Most estimates of wood supply do not reflect the true availability of wood resources. The availability of wood resources ultimately depends on collective wood harvesting decisions across the landscape. Both social and biophysical constraints impact harvesting decisions and thus the availability of wood resources. While most constraints do not completely inhibit harvesting, they may significantly reduce the probability of harvest. Realistic assessments of woody availability and distribution are needed for effective forest management and planning. This study focuses on predicting the probability of harvest at forested FIA plot locations in Virginia. Classification and regression trees, conditional inferences trees, random forest, balanced random forest, conditional random forest, and logistic regression models were built to predict harvest as a function of social and biophysical availability constraints. All of the models were evaluated and compared to identify important variables constraining harvest, predict future harvests, and estimate the available wood supply. Variables related to population and resource quality seem to be the best predictors of future harvest. The balanced random forest and logistic regressions models are recommended for predicting future harvests. The balanced random forest model is the best predictor, while the logistic regression model can be most easily shared and replicated. Both models were applied to predict harvest at recently measured FIA plots. Based on the probability of harvest, we estimate that between 2012 and 2017, 10 – 21 percent of total wood volume on timberland will be available for harvesting. / Master of Science
194

An evaluation of a data-driven approach to regional scale surface runoff modelling

Zhang, Ruoyu 03 August 2018 (has links)
Modelling surface runoff can be beneficial to operations within many fields, such as agriculture planning, flood and drought risk assessment, and water resource management. In this study, we built a data-driven model that can reproduce monthly surface runoff at a 4-km grid network covering 13 watersheds in the Chesapeake Bay area. We used a random forest algorithm to build the model, where monthly precipitation, temperature, land cover, and topographic data were used as predictors, and monthly surface runoff generated by the SWAT hydrological model was used as the response. A sub-model was developed for each of 12 monthly surface runoff estimates, independent of one another. Accuracy statistics and variable importance measures from the random forest algorithm reveal that precipitation was the most important variable to the model, but including climatological data from multiple months as predictors significantly improves the model performance. Using 3-month climatological, land cover, and DEM derivatives from 40% of the 4-km grids as the training dataset, our model successfully predicted surface runoff for the remaining 60% of the grids (mean R2 (RMSE) for the 12 monthly models is 0.83 (6.60 mm)). The lowest R2 was associated with the model for August, when the surface runoff values are least in a year. In all studied watersheds, the highest predictive errors were found within the watershed with greatest topographic complexity, for which the model tended to underestimate surface runoff. For the other 12 watersheds studied, the data-driven model produced smaller and more spatially consistent predictive errors. / Master of Science / Surface runoff data can be valuable to many fields, such as agriculture planning, water resource management, and flood and drought risk assessment. The traditional approach to acquire the surface runoff data is by simulating hydrological models. However, running such models always requires advanced knowledge to watersheds and computation technologies. In this study, we build a statistical model that can reproduce monthly surface runoff at 4-km grid covering 13 watersheds in Chesapeake Bay area. This model uses publicly accessible climate, land cover, and topographic datasets as predictors, and monthly surface runoff from the SWAT model as the response. We develop 12 monthly models for each month, independent to each other. To test whether the model can be applied to generalize the surface runoff for the entire study area, we use 40% of grid data as the training sample and the remainder as validation. The accuracy statistics, the annual mean R2 and RMSE are 0.83 and 6.60 mm, show our model is capable to accurately reproduce monthly surface runoff of our study area. The statistics for August model are not as satisfying as other months’ models. The possible reason is the surface runoff in August is the lowest among the year, thus there is no enough variation for the algorithm to distinguish the minor difference of the response in model building process. When applying the model to watersheds in steep terrain conditions, we need to pay attention to the results in which the error may be relatively large.
195

Identification and physical characterisation of sarcomere pattern formation using supervised machine learning

Sbosny, Leon 16 May 2024 (has links)
To analyse the large amounts of image data that are generated by biologists with modern microscopes, machine learning algorithms became increasingly popular. In collaboration with Frank Schnorrer and Cl ́ement Rodier at Institut de Biologie du Developpement de Marseille, as well as Ian Estabrook at Physics of Life, TU Dresden, this thesis applies the supervised machine learning algorithms ‘Support Vector Machine’ and ‘Random Forest’ to data obtained from fluorescence microscope images of myofibrillogenesis in Drosophila pupae with the aim to identify sarcomeres, the structures that makeup the highly regular myofibrils. For the implementation in MATLAB, methods such as ‘feature engineering’ are used to increase the performance by reinterpreting the input data and using physical characteristics of the sample system. The project also identifies the problem of class imbalance between positive and negative examples in the input data and counters it with a redefined learning cost. In conclusion, the use of machine learning algorithms for image analysis in biophysics is a very promising way to reduce manual labour. The choice of the best learning algorithm depends on the purpose the obtained output data should serve.
196

Application and feasibility of visible-NIR-MIR spectroscopy and classification techniques for wetland soil identification

Whatley, Caleb 10 May 2024 (has links) (PDF)
Wetland determinations require the visual identification of anaerobic soil indicators by an expert, which is a complex and subjective task. To eliminate bias, an objective method is needed to identify wetland soil. Currently, no such method exists that is rapid and easily interpretable. This study proposes a method for wetland soil identification using visible through mid-infrared (MIR) spectroscopy and classification algorithms. Wetland and non-wetland soils (n = 440) were collected across Mississippi. Spectra were measured from fresh and dried soil. Support Vector Classification and Random Forest modeling techniques were used to classify spectra with 75%/25% calibration and validation split. POWERSHAP Shapley feature selection and Gini importance were used to locate highest-contributing spectral features. Average classification accuracy was ~91%, with a maximum accuracy of 99.6% on MIR spectra. The most important features were related to iron compounds, nitrates, and soil texture. This study improves the reliability of wetland determinations as an objective and rapid wetland soil identification method while eliminating the need for an expert for determination.
197

Comparação entre métodos de imputação de dados em diferentes intensidades amostrais na série homogênea de precipitação pluvial da ESALQ / Comparison between data imputation methods at different sample intensities in the ESALQ homogeneous rainfall series

Gasparetto, Suelen Cristina 07 June 2019 (has links)
Problemas frequentes nas análises estatísticas de informações meteorológicas são a ocorrência de dados faltantes e ausência de conhecimento acerca da homogeneidade das informações contidas no banco de dados. O objetivo deste trabalho foi testar e classificar a homogeneidade da série de precipitação pluvial da estação climatológica convencional da ESALQ, no período de 1917 a 1997, e comparar três métodos de imputação de dados, em diferentes intensidades amostrais (5%, 10% e 15%) de informações faltantes, geradas de forma aleatória. Foram utilizados três testes de homogeneidade da série: Pettitt, Buishand e normal padrão. Para o \"preenchimento\" das informações faltantes, foram comparados três métodos de imputação múltipla: PMM (Predictive Mean Matching), random forest e regressão linear via método bootstrap, em cada intensidade amostral de informações faltantes. Os métodos foram utilizados por meio do pacote MICE (Multivariate Imputation by Chained Equations) do R. A comparação entre cada procedimento de imputação foi feita por meio da raiz do erro quadrático médio, índice de exatidão de Willmott e o índice de desempenho. A série de chuva foi entendida como de classe 1, ou seja, \"útil\" - Nenhum sinal claro de falta de homogeneidade foi aparente e, o método que resultou em menores valores da raiz quadrada dos erros e maiores índices foi o PMM, em especial na intensidade de 10% de informações faltantes. O índice de desempenho para os três métodos de imputação de dados em todas as intensidades de observações faltantes foi considerado \"Péssimo\" / Frequent problems in the statistical analyzes of meteorological information are the occurrence of missing data and missing of knowledge about the homogeneity of the information contained in the data base. The objective of this work was to test and classify the homogeneity of the rainfall series of the conventional climatological station of the ESALQ from 1917 to 1997 and to compare three methods of data imputation in different sample intensities (5%, 10% and 15%), of missing data, generated in a random way. Three homogeneity tests were used: Pettitt, Buishand and standard normal. For the \"filling\" of missing information, three methods of multiple imputation were compared: PMM (Predictive Mean Matching), random forest and linear regression via bootstrap method, in each sampling intensity of missing information. The methods were used by means of the MICE (Multivariate Imputation by Chained Equations) package of R. The comparison of each imputation procedure was done by root mean square error, Willmott\'s accuracy index and performance index. The rainfall series was understood to be class 1 \"useful\" - No clear sign of lack of homogeneity was apparent and the method that resulted in smaller values of the square root of the errors and higher indexes was the PMM, in particular the intensity of 10% of missing information. The performance index for the three methods of imputation the data at all missing observation intensities was considered \"Terrible\"
198

Remote Sensing of Woodland Structure and Composition in the Sudano-Sahelian zone : Application of WorldView-2 and Landsat 8

Karlson, Martin January 2015 (has links)
Woodlands constitute the subsistence base of the majority of people in the Sudano-Sahelian zone (SSZ), but low availability of in situ data on vegetation structure and composition hampers research and monitoring. This thesis explores the utility of remote sensing for mapping and analysing vegetation, primarily trees, in the SSZ. A comprehensive literature review was first conducted to describe how the application of remote sensing has developed in the SSZ between 1975 and 2014, and to identify important research gaps. Based on the gaps identified in the literature review, the capabilities of two new satellite systems (WorldView-2 and Landsat 8) for mapping woodland structure and composition were tested in an area in central Burkina Faso. The results shows that WorldView-2 represents a useful data source for mapping individual trees: 85.4% of the reference trees were detected in the WorldView-2 data and tree crown area was estimate with an average error of 45.6%. In addition, WorldView-2 data produced high classification accuracies for five locally important tree species. The highest overall classification accuracy (82.4%) was produced using multi-temporal WorldView-2 data. Landsat 8 data proved more suitable for mapping tree canopy cover as compared to aboveground biomass in the woodland landscape. Tree canopy cover and aboveground biomass was predicted with 41% and 66% root mean square error, respectively, at pixel level. This thesis demonstrates the potential of easily accessible data from two satellite systems for mapping important tree attributes in woodland areas, and discusses how the usefulness of remote sensing for analyzing vegetation can be further enhanced in the SSZ. / Merparten av befolkningen i Sudano-Sahel zonen (SSZ) är beroende av naturresurser och ekosystemtjänster från woodlands (öppen torrskog) för att säkra sin försörjning. Tillgången av fältmätningar av vegetationens struktur och sammansättning är mycket låg i detta område, vilket utgör ett problem för forskning och miljöövervakning. Denna avhandling undersöker nyttan av fjärranalys för att kartlägga och analysera vegetation, främst träd, i SSZ. En omfattande litteraturöversikt genomfördes först för att undersöka hur tillämpningen av fjärranalys har utvecklats i SSZ mellan 1975 och 2014, samt att identifiera viktiga forskningsluckor. Några av de luckor som konstaterades i litteraturgenomgången låg till grund för de följande studierna där två nya satellitsystem (Worldview-2 och Landsat 8) utvärderades för deras användbarhet att kartlägga trädtäckets struktur och artsammansättning i ett woodland-område i centrala Burkina Faso. Resultaten visar att Worldview-2 är en värdefull datakälla för kartering av enskilda träd: 85.4% av referensträden detekterades och trädkronornas storlek uppskattades med ett medelfel av 45.6%. Worldview-2-data producerade även hög klassificeringsnoggrannhet för de fem lokalt viktigaste trädslagen. Den högsta noggrannheten (82.4%) uppnåddes med multi-temporal Worldview-2-data. Landsat 8 data visade sig mer lämpade för kartering av krontäcke, jämfört med biomassa. Medelfelet för karteringen var 41% för krontäcke och 66% för biomassa, på pixelnivå. Avhandlingen visar att lättillgängliga data från två satellitsystem är användbara för kartläggning av viktiga trädattribut i woodlands, samt diskuterar hur nyttan av fjärranalys för vegetationsanalys kan ökas ytterligare i SSZ.
199

Modellierung des Unfallgeschehens im Radverkehr am Beispiel der Stadt Dresden

Martin, Jacqueline 25 January 2021 (has links)
Das Radverkehrsaufkommen in Deutschland verzeichnete in den letzten Jahren einen Zuwachs, was sich im Umkehrschluss ebenfalls im Anstieg des Unfallgeschehens mit Radfahrendenbeteiligung widerspiegelt. Um den steigenden Unfallzahlen entgegenzuwirken, empfehlen Politik und Verbände v.a. Infrastrukturmaßnahmen zu ergreifen. Davon ausgehend untersucht die vorliegende Arbeit beispielhaft für die Stadt Dresden, wie sich einzelne Infrastrukturmerkmale auf das Unfallgeschehen zwischen Rad- und motorisiertem Verkehr auswirken. Die Datengrundlage der Untersuchung stellen dabei 548 Unfälle mit Radfahrendenbeteiligung aus den Jahren 2015 bis 2019 sowie die Merkmale von 484 Knotenpunktzufahrten dar. Da die Infrastruktur das Unfallgeschehen nicht allein determiniert, werden zudem Kenngrößen des Verkehrsaufkommens einbezogen. Um das Unfallgeschehen zu untersuchen, kommen das Random Forest-Verfahren sowie die Negative Binomialregression in Form von 'Accident Prediction Models' mit vorheriger Variablenselektion anhand des LASSO-Verfahrens zum Einsatz. Die Verfahren werden jeweils auf zwei spezielle Unfalltypen für Knotenpunkte angewandt, um differenzierte Ergebnisse zu erlangen. Der erste Unfalltyp 'Abbiege-Unfall' umfasst dabei Kollisionen zwischen einem rechtsabbiegenden und einem in gleicher oder entgegengesetzter Richtung geradeausfahrenden Beteiligten, während der zweite Unfalltyp 'Einbiegen-/Kreuzen-Unfall' Kollisionen zwischen einem vorfahrtsberechtigten Verkehrsteilnehmenden und einem einbiegenden oder kreuzenden Wartepflichtigen beinhaltet. Für den Unfalltyp 'Abbiege-Unfall' zeigen die Verfahren bspw., dass eine über den Knotenpunkt komplett oder teilweise rot eingefärbte Radfahrfurt sowie eine indirekte Führung des linksabbiegenden Radverkehrs anstelle dessen Führung im Mischverkehr höhere Unfallzahlen erwarten lässt, wobei letzteres für den untersuchten Sachverhalt irrelevant erscheint und damit auf eine Schwäche bei der Variableneinbeziehung hindeutet. Im Gegensatz dazu schätzen die Verfahren für den Unfalltyp 'Einbiegen-/Kreuzen-Unfall' bspw. höhere Unfallzahlen, wenn die Anzahl der Geradeausfahrstreifen einer Zufahrt zunimmt und wenn der Knotenpunkt durch das Verkehrszeichen Z205 bzw. eine Teil-Lichtsignalanlage anstelle der Vorschrift Rechts-vor-Links geregelt wird. Zudem zeigen die Verfahren bei beiden Unfalltypen zumeist, dass die Zahl der Unfälle ab einem bestimmten Verkehrsaufkommen weniger stark ansteigt. Dieses Phänomen ist in der Wissenschaft unter dem Namen 'Safety in Numbers-Effekt' bekannt. Ein Vergleich der Modellgüten zwischen den Unfalltypen zeigt zudem, dass beide Verfahren mit ihrem Modell des Unfalltyps 'Abbiege-Unfall' bessere Vorhersagen generieren als mit ihrem Modell des Unfalltyps 'Einbiegen-/Kreuzen-Unfall'. Weiterhin unterscheiden sich die Modellgüten nach Unfalltyp nur geringfügig zwischen beiden Verfahren, weshalb davon ausgegangen werden kann, dass beide Verfahren qualitativ ähnliche Modelle des entsprechenden Unfalltyps liefern.:1 Einleitung 2 Literaturüberblick 2.1 Safety in Numbers-Effekt 2.2 Einflussfaktoren von Radverkehrsunfällen 3 Grundlagen der Unfallforschung 3.1 Unfallkategorien 3.2 Unfalltypen 4 Datengrundlage 4.1 Unfalldaten 4.2 Infrastrukturmerkmale 4.3 Überblick über verwendete Variablen 5 Methodik 5.1 Korrelationsbetrachtung 5.2 Random Forest 5.2.1 Grundlagen 5.2.2 Random Forest-Verfahren 5.2.3 Modellgütekriterien 5.2.4 Variablenbedeutsamkeit 5.3 Negative Binomialregression 5.3.1 Grundlagen 5.3.2 Accident Prediction Models 5.3.3 Variablenselektion 5.3.4 Modellgütekriterien 5.3.5 Variablenbedeutsamkeit 5.3.6 Modelldiagnostik 6 Durchführung und Ergebnisse 6.1 Korrelationsbetrachtung 6.2 Random Forest 6.2.1 Modellgütekriterien 6.2.2 Variablenbedeutsamkeit 6.3 Negative Binomialregression 6.3.1 Variablenselektion 6.3.2 Modellgütekriterien 6.3.3 Variablenbedeutsamkeit 6.3.4 Modelldiagnostik 6.4 Vergleich beider Verfahren 6.4.1 Modellgütekriterien 6.4.2 Variablenbedeutsamkeit und Handlungsempfehlungen 6.5 Vergleich mit Literaturerkenntnissen 7 Kritische Würdigung 8 Zusammenfassung und Ausblick
200

Analyse und Vergleich des Modal Splits in den Jahren 2013 und 2018 auf Basis der SrV-Daten mithilfe von Random Forest

Lins, Stefan Martin 04 March 2021 (has links)
Der hohe Anteil des Verkehrs an den Gesamtemissionen, dem damit verbundenen Beitrag zum Klimawandel sowie der extensive Flächenverbrauch des Individualverkehrs verstärken die politischen Forderungen nach einer Verkehrswende. Das Ziel dieser Arbeit ist es, mithilfe ausführlich methodisch dargestellter Verfahren des maschinellen Lernens ein optimales Klassifikationsmodell zu entwickeln. Dieses ermöglicht die Evaluation und Prognose der Verkehrsmittelwahl und damit den Modal Split auf Basis verschiedener Einflussfaktoren insbesondere im Zeitverlauf zwischen 2013 und 2018. Bisherige Untersuchungen konzentrieren sich auf außereuropäische Gebiete und einmalige Erhebungsdurchläufe. Für die Analyse wird auf die von der Technischen Universität Dresden durchgeführte Mobilitätsbefragung 'SrV - Mobilität in Städten' für die 25 großen deutschen Vergleichsstädte der Jahre 2013 und 2018 zurückgegriffen. Nach der Datenaufbereitung werden unter Verwendung deskriptiver Methoden und Zusammenhangsmaße die einzelnen Merkmalsvariablen auf die Eignung in der Modellbildung beurteilt, um möglichst aussagekräftige Modellergebnisse zu erhalten. Basierend auf CART-Entscheidungsbäumen werden Modelle mit dem Bagging-, Random Forest- und dem Boosting-Algorithmus für beide Jahre erstellt. Zur Einordnung der Effektivität der Modelle werden ebenfalls Modelle für Künstliche Neuronale Netzwerke und der Multinomialen Logistischen Regression für beide Jahre untersucht. Auf Basis von Random Forest, das insgesamt in der Untersuchung mit einer Gesamttrefferquote von 82,9 % (AUC-Wert 0,9458) für 2013 und 79,8 % (AUC-Wert 0,9377) für 2018 die besten Gütemaße erzielt, werden die Einflussfaktoren mittels eines Variable Importance Plots und des Partial Dependence Plots beschrieben und ausgewertet. Insbesondere wird festgestellt, dass Länge und Dauer des Weges und die Verfügbarkeit einer Dauerkarte für den öffentlichen Verkehr den größten Einfluss auf die Verkehrsmittelwahl haben. Im Zeitverlauf fällt auf, dass insbesondere MIV-Wege durch Rad- und ÖV-Fahrten substituiert werden, während bei den Fußwegen nur geringe Veränderungen auffallen. Die geschätzten Klassifikationsmodelle erreichen überwiegend herausragende Vorhersagen der Verkehrsmittelwahl, wobei diese Prognosen für das Fahrrad sich am schwierigsten gestalten.:Inhaltsverzeichnis Abbildungsverzeichnis VII Tabellenverzeichnis XI Abkürzungsverzeichnis XIII Symbolverzeichnis XV 1 Einleitung 1 2 Literaturübersicht 3 3 Methodik 5 3.1 Entscheidungsbäume 5 3.1.1 Notation der Baumstruktur 5 3.1.2 Regressionsbäume 6 3.1.3 Klassifikationsbäume 6 3.1.4 Stutzen eines Baumes und Abbruchkriterien 9 3.1.5 Bewertung des Verfahrens 10 3.2 Bagging 11 3.2.1 Idee 11 3.2.2 Bootstrap 12 3.2.3 Subsampling 12 3.2.4 Prinzip des Bagging-Algorithmus 12 3.2.5 Bewertung des Verfahrens und Anpassung 15 3.3 Random Forest 16 3.3.1 Idee 16 3.3.2 Prinzip des Random-Forest-Algorithmus 17 3.3.3 Bewertung des Verfahrens und Anpassung 20 3.3.4 Bewertung der Einflussfaktoren 21 3.4 Boosting 23 3.4.1 Idee 23 3.4.2 Prinzip des AdaBoost-Verfahrens 24 3.4.3 Evaluation 25 3.5 Künstliches Neuronales Netzwerk 25 3.5.1 Idee 26 3.5.2 Prinzip des Künstlichen Neuronalen Netzwerks 26 3.5.3 Evaluation und Anpassungsparameter 29 3.6 Multinomiale Logistische Regression 30 3.7 Gütemaße 30 3.7.1 Trefferquote 30 3.7.2 ROC-Kurve und AUC 30 4 Daten 33 4.1 Datensatz 33 4.2 Datenaufbereitung 34 4.2.1 Auflösung der Multilevelstruktur 34 4.2.2 Daten in der Haushaltsebene 35 4.2.3 Daten in der Personenebene 36 4.2.4 Daten in der Wegeebene 37 4.2.5 Ausreißer und fehlende Werte 37 5 Deskriptive Analyse 39 5.1 Auswertung der kategorialen abhängigen Variablen 39 5.2 Auswertung der kardinalen Variablen 40 5.2.1 Streu- und Lagemaße 40 5.2.2 Korrelation zwischen den kardinalen Variablen 42 5.3 Auswertung der ordinalen und nominalen Variablen 43 5.3.1 Relative Häufigkeiten 43 5.3.2 Beurteilung der ordinalen und nominalen Variablen mithilfe des korrigierten Kontingenzkoeffizienten nach Pearson 46 5.4 Analyse statistischer Unterschiede der beiden untersuchten Stichproben 47 6 Ergebnisse der Modelle 49 6.1 Baumbasierte Klassifikationsverfahren 49 6.1.1 CART-Entscheidungsbäume 49 6.1.2 Bagging 52 6.1.3 Random Forest 53 6.1.4 Boosting 66 6.2 Künstliches Neuronales Netzwerk 69 6.3 Multinomiale Logistische Regression 71 7 Fazit 73 8 Kritische Würdigung und Ausblick 75 Literaturverzeichnis XIX Anhang XXV Danksagung LXI / The high share of traffic in total emissions, the associated contribution to climate change and the extensive land consumption of individual traffic reinforce the political demands for a traffic turnaround. The aim of this thesis is to develop an optimal classification model with the help of detailed methodical presented methods of machine learning. This enables the evaluation and forcast of the choice of means of transport and thus the modal split on the basis of various influencing factors, particularly over the course of time between 2013 and 2018. Previous studies have focused on non-European areas and one-off surveys. For the analysis, the mobility survey 'SrV-Mobilität in Städten' carried out by the Technische Universität Dresden for the 25 large German cities in 2013 and 2018 is used. After the data processing, the individual feature variables are assessed for their suitability in the modeling process using descriptive methods and correlation measures in order to obtain the most meaningful model results possible. Based on CART Decision Trees, models with the Bagging, Random Forest and Boosting algorithms are created for both years. To classify the effectiveness of the models, models for Artificial Neural Networks and Multinomial Logistic Regression are also examined for both years. Based on Random Forest, which achieved the best quality measures in the study with an overall accuracy of 82.9 % (AUC value 0.9458) for 2013 and 79.8 % (AUC value 0.9377) for 2018, the influencing factors are described and evaluated using a Variable Importance Plot and the Partial Dependence Plot. In particular, it is found that the length and duration of the journey and the availability of a season ticket for public transport have the greatest influence on the choice of the mode of transport. Over the course of time, it is noticeable that in particular motorized traffic routes are being replaced by cycling and public transport, while only minor changes are noticeable in the case of walking. Most of the estimated classification models achieve excellent predictions in the choice of mode of transport, although these predictions are the most difficult for the bicycle.:Inhaltsverzeichnis Abbildungsverzeichnis VII Tabellenverzeichnis XI Abkürzungsverzeichnis XIII Symbolverzeichnis XV 1 Einleitung 1 2 Literaturübersicht 3 3 Methodik 5 3.1 Entscheidungsbäume 5 3.1.1 Notation der Baumstruktur 5 3.1.2 Regressionsbäume 6 3.1.3 Klassifikationsbäume 6 3.1.4 Stutzen eines Baumes und Abbruchkriterien 9 3.1.5 Bewertung des Verfahrens 10 3.2 Bagging 11 3.2.1 Idee 11 3.2.2 Bootstrap 12 3.2.3 Subsampling 12 3.2.4 Prinzip des Bagging-Algorithmus 12 3.2.5 Bewertung des Verfahrens und Anpassung 15 3.3 Random Forest 16 3.3.1 Idee 16 3.3.2 Prinzip des Random-Forest-Algorithmus 17 3.3.3 Bewertung des Verfahrens und Anpassung 20 3.3.4 Bewertung der Einflussfaktoren 21 3.4 Boosting 23 3.4.1 Idee 23 3.4.2 Prinzip des AdaBoost-Verfahrens 24 3.4.3 Evaluation 25 3.5 Künstliches Neuronales Netzwerk 25 3.5.1 Idee 26 3.5.2 Prinzip des Künstlichen Neuronalen Netzwerks 26 3.5.3 Evaluation und Anpassungsparameter 29 3.6 Multinomiale Logistische Regression 30 3.7 Gütemaße 30 3.7.1 Trefferquote 30 3.7.2 ROC-Kurve und AUC 30 4 Daten 33 4.1 Datensatz 33 4.2 Datenaufbereitung 34 4.2.1 Auflösung der Multilevelstruktur 34 4.2.2 Daten in der Haushaltsebene 35 4.2.3 Daten in der Personenebene 36 4.2.4 Daten in der Wegeebene 37 4.2.5 Ausreißer und fehlende Werte 37 5 Deskriptive Analyse 39 5.1 Auswertung der kategorialen abhängigen Variablen 39 5.2 Auswertung der kardinalen Variablen 40 5.2.1 Streu- und Lagemaße 40 5.2.2 Korrelation zwischen den kardinalen Variablen 42 5.3 Auswertung der ordinalen und nominalen Variablen 43 5.3.1 Relative Häufigkeiten 43 5.3.2 Beurteilung der ordinalen und nominalen Variablen mithilfe des korrigierten Kontingenzkoeffizienten nach Pearson 46 5.4 Analyse statistischer Unterschiede der beiden untersuchten Stichproben 47 6 Ergebnisse der Modelle 49 6.1 Baumbasierte Klassifikationsverfahren 49 6.1.1 CART-Entscheidungsbäume 49 6.1.2 Bagging 52 6.1.3 Random Forest 53 6.1.4 Boosting 66 6.2 Künstliches Neuronales Netzwerk 69 6.3 Multinomiale Logistische Regression 71 7 Fazit 73 8 Kritische Würdigung und Ausblick 75 Literaturverzeichnis XIX Anhang XXV Danksagung LXI

Page generated in 0.0682 seconds