Global ETD Search

371	Remote Sensing of Urbanization and Environmental Impacts Haas, Jan January 2013 (has links) The unprecedented growth of urban areas all over the globe is nowadays maybe most apparent in China having undergone rapid urbanization since the late 1970s. The need for new residential, commercial and industrial areas leads to new urban regions challenging sustainable development and the maintenance and creation of a high living standard as well as the preservation of ecological functionality. Therefore, timely and reliable information on land-cover changes and their consequent environmental impacts are needed to support sustainable urban development.The objective of this research is the analysis of land-cover changes, especially the development of urban areas in terms of speed, magnitude and resulting implications for the natural and rural environment using satellite imagery and the quantification of environmental impacts with the concepts of ecosystem services and landscape metrics. The study areas are the cities of Shanghai and Stockholm and the three highly-urbanized Chinese regions Jing-Jin-Ji, the Yangtze River Delta and the Pearl River Delta. The analyses are based on classification of optical satellite imagery (Landsat TM/ETM+ and HJ-1A/B) over the past two decades. The images were first co-registered and mosaicked, whereupon GLCM texture features were generated and tasseled cap transformations performed to improve class separabilities. The mosaics were classified with a pixel-based SVM and a random forest decision tree ensemble classifier. Based on the classification results, two urbanization indices were derived that indicate both the absolute amount of urban land and the speed of urban development. The spatial composition and configuration of the landscape was analysed by landscape metrics. Environmental impacts were quantified by attributing ecosystem service values to the classifications and the observation of value changes over time. ivThe results from the comparative study between Shanghai and Stockholm show a decrease in all natural land-cover classes and agricultural areas, whereas urban areas increased by approximately 120% in Shanghai, nearly ten times as much as in Stockholm where no significant land-cover changes other than a 12% urban expansion could be observed. From the landscape metrics analysis results, it appears that fragmentation in both study regions occurred mainly due to the growth of high density built-up areas in previously more natural environments, while the expansion of low density built-up areas was for the most part in conjunction with pre-existing patches. Urban growth resulted in ecosystem service value losses of ca. 445 million US dollars in Shanghai, mostly due to a decrease in natural coastal wetlands. In Stockholm, a 4 million US dollar increase in ecosystem service values could be observed that can be explained by the maintenance and development of urban green spaces. Total urban growth in Shanghai was 1,768 km2 compared to 100 km2 in Stockholm. Regarding the comparative study of urbanization in the three Chinese regions, a total increase in urban land of about 28,000 km2 could be detected with a simultaneous decrease in ecosystem service values corresponding to ca. 18.5 billion Chinese Yuan Renminbi. The speed and relative urban growth in Jing-Jin-Ji was highest, followed by the Yangtze River Delta and the Pearl River Delta. The increase in urban land occurred predominately at the expense of cropland. Wetlands decreased due to land reclamation in all study areas. An increase in landscape complexity in terms of land-cover composition and configuration could be detected. Urban growth in Jing-Jin-Ji contributed most to the decrease in ecosystem service values, closely followed by the Yangtze River Delta and the Pearl River Delta. / <p>QC 20130610</p> Remote Sensing Classification Land Use/Land-Cover Support Vector Machine Random Forest Urbanization Environmental Impact Landscape Metrics Ecosystem Services Remote Sensing Fjärranalysteknik
372	Detection of Vulnerability Scanning Attacks using Machine Learning : Application Layer Intrusion Detection and Prevention by Combining Machine Learning and AppSensor Concepts / Detektering av sårbarhetsscanning med maskininlärning : Detektering och förhindrande av attacker i applikationslagret genom kombinationen av maskininlärning och AppSensor koncept Shahrivar, Pojan January 2022 (has links) It is well-established that machine learning techniques have been used with great success in other domains and has been leveraged to deal with sources of evolving abuse, such as spam. This study aims to determine whether machine learning techniques can be used to create a model that detects vulnerability scanning attacks using proprietary real-world data collected from tCell, a web application firewall. In this context, a vulnerability scanning attack is defined as an automated process that detects and classifies security weaknesses and flaws in the web application. To test the hypothesis that machine learning techniques can be used to create a detection model, twenty four models were trained. The models showed a high level of precision and recall, ranging from 91% to 0.96% and 85% to 0.93%, respectively. Although the classification performance was strong, the models were not calibrated sufficiently which resulted in an underconfidence in the predictions. The results can therefore been viewed as a performance baseline. Nevertheless, the results demonstrate an advancement over the simplistic threshold-based techniques developed in the early days of the internet, but require further research and development to tune and calibrate the models. / Det är väletablerat att tekniker för maskininlärning har använts med stor framgång inom andra domäner och har utnyttjats för att hantera källor till växande missbruk, såsom spam. Denna studie syftar till att avgöra om maskininlärningstekniker kan tillämpas för att skapa en modell som upptäcker sårbarhets-skanningsattacker med hjälp av proprietär data som samlats in från tCell, en webbapplikationsbrandvägg. I detta sammanhang definieras en sårbarhetsskanningsattack som en automatiserad process som upptäcker och klassificerar säkerhetsbrister och brister i webb-applikationen. För att testa hypotesen att maskininlärningstekniker kan användas för att skapa en detektionsmodell, tränades tjugofyra modeller. Modellerna visade en hög nivå av precision och sensitivitet, från 91% till 0,96% och 85% till 0,93%, respektive. Även om klassificeringsprestandan var god, var modellerna inte tillräckligt kalibrerade, vilket resulterade i ett svagt förtoende för förutsägelserna. De presenterade resultaten kan därför ses som en prestationsbaslinje. Resultaten visar ett framsteg över de förenklade tröskelbaserade teknikerna som utvecklades i begynnelsen av internet, men kräver ytterligare forskning och utveckling för att kalibrera modellerna. Vulnerability Scanning Random Forest Web application security Next-Gen Web application Firewall Machine learning Dynamic application security testing Intrusion detection & prevention Computer and Information Sciences Data- och informationsvetenskap
373	Assessing Machine Learning Algorithms to Develop Station-based Forecasting Models for Public Transport : Case Study of Bus Network in Stockholm Movaghar, Mahsa January 2022 (has links) Public transport is essential for both residents and city planners because of its environmentally and economically beneficial characteristics. During the past decade climatechange, coupled with fuel and energy crises have attracted significant attention toward public transportation. Increasing the demand for public transport on the one hand and its complexity on the other hand have made the optimum network design quite challenging for city planners. The ridership is affected by numerous variables and features like space and time. These fluctuations, coupled with inherent uncertaintiesdue to different travel behaviors, make this procedure challenging. Any demand and supply mismatching can result in great user dissatisfaction and waste of energy on the horizon. During the past years, due to recent technologies in recording and storing data and advances in data analysis techniques, finding patterns, and predicting ridership based on historical data have improved significantly. This study aims to develop forecasting models by regressing boardings toward population, time of day, month, and station. Using the available boarding dataset for blue bus line number 4 in Stockholm, Sweden, seven different machine learning algorithms were assessed for prediction: Multiple Linear Regression, Decision Tree, Random Forest, Bayesian Ridge Regression, Neural Networks, Support Vector Machines, K-Nearest Neighbors. The models were trained and tested on the dataset from 2012 to 2019, before the start of the pandemic. The best model, KNN, with an average R-squared of 0.65 in 10-fold cross-validation was accepted as the best model. This model is then used to predict reduced ridership during the pandemic in 2020 and 2021. The results showed a reduction of 48.93% in 2020 and 82.24% in 2021 for the studied bus line. Public transport ridership machine learning Multiple Linear Regression Decision Tree Random Forest Bayesian Ridge Regression Neural Networks Support Vector Machines K-Nearest Neighbors Engineering and Technology Teknik och teknologier
374	Proteomics and Machine Learning for Pulmonary Embolism Risk with Protein Markers Awuah, Yaa Amankwah 01 December 2023 (has links) (PDF) This thesis investigates protein markers linked to pulmonary embolism risk using proteomics and statistical methods, employing unsupervised and supervised machine learning techniques. The research analyzes existing datasets, identifies significant features, and observes gender differences through MANOVA. Principal Component Analysis reduces variables from 378 to 59, and Random Forest achieves 70% accuracy. These findings contribute to our understanding of pulmonary embolism and may lead to diagnostic biomarkers. MANOVA reveals significant gender differences, and applying proteomics holds promise for clinical practice and research. Proteomics Dimension Reduction Random Forest Features Extraction Feature Selection MANOVA Lawley-Hotelling’s Test Pillai’s Test Wilk’s Lambda Roy’s Largest Root. Applied Statistics Biostatistics Statistical Models
375	Marginal agricultural land identification in the Lower Mississippi Alluvial Valley Tiwari, Prakash 12 May 2023 (has links) (PDF) This study identified marginal agricultural lands in the Lower Mississippi Alluvial Valley using crop yield predicting models. The Random Forest Regression (RFR) and Multiple Linear Regression (MLR) models were trained and validated using county-level crop yield data, climate data, soil properties, and Normalized Difference Vegetation Index (NDVI). The RFR model outperformed MLR model in estimating soybean and corn yields, with an index of agreement (d) of 0.98 and 0.96, Nash-Sutcliffe model efficiency (NSE) of 0.88 and 0.93, and root mean square error (RMSE) of 9.34% and 5.84%, respectively. Marginal agricultural lands were estimated to 26,366 hectares using cost and sales price in 2021 while they were estimated to 623,566 hectares using average cost and sales price from 2016 to 2021. The results provide valuable information for land use planners and farmers to update field crops and plan alternative land uses that can generate higher returns while conserving these marginal lands. Marginal Agricultural Land Machine Learning Remote Sensing Soybean Corn Crop Yield Prediction Lower Mississippi Alluvial Valley Random Forest Model Agriculture Food Science
376	Effects of Tree Composition and Soil Depth on Structure and Functionality of Belowground Microbial Communities in Temperate European Forests Prada-Salcedo, Luis Daniel, Prada-Salcedo, Juan Pablo, Heintz-Buschart, Anna, Buscot, François, Goldmann, Kezia 19 October 2023 (has links) Depending on their tree species composition, forests recruit different soil microbial communities. Likewise, the vertical nutrient gradient along soil profiles impacts these communities and their activities. In forest soils, bacteria and fungi commonly compete, coexist, and interact, which is challenging for understanding the complex mechanisms behind microbial structuring. Using amplicon sequencing, we analyzed bacterial and fungal diversity in relation to forest composition and soil depth. Moreover, employing random forest models, we identified microbial indicator taxa of forest plots composed of either deciduous or evergreen trees, or their mixtures, as well as of three soil depths. We expected that forest composition and soil depth affect bacterial and fungal diversity and community structure differently. Indeed, relative abundances of microbial communities changed more across soil depths than in relation to forest composition. The microbial Shannon diversity was particularly affected by soil depth and by the proportion of evergreen trees. Our results also reflected that bacterial communities are primarily shaped by soil depth, while fungi were influenced by forest tree species composition. An increasing proportion of evergreen trees did not provoke differences in main bacterial metabolic functions, e.g., carbon fixation, degradation, or photosynthesis. However, significant responses related to specialized bacterial metabolisms were detected. Saprotrophic, arbuscular mycorrhizal, and plant pathogenic fungi were related to the proportion of evergreen trees, particularly in topsoil. Prominent microbial indicator taxa in the deciduous forests were characterized to be r-strategists, whereas K-strategists dominated evergreen plots. Considering simultaneously forest composition and soil depth to unravel differences inmicrobial communities,metabolic pathways and functional guilds have the potential to enlighten mechanisms that maintain forest soil functionality and provide resistance against disturbances. info:eu-repo/classification/ddc/570 ddc:570
377	A Machine Learning Framework for the Classification of Natura 2000 Habitat Types at Large Spatial Scales Using MODIS Surface Reflectance Data Sittaro, Fabian, Hutengs, Christopher, Semella, Sebastian, Vohland, Michael 02 June 2023 (has links) Anthropogenic climate and land use change is causing rapid shifts in the distribution and composition of habitats with profound impacts on ecosystem biodiversity. The sustainable management of ecosystems requires monitoring programmes capable of detecting shifts in habitat distribution and composition at large spatial scales. Remote sensing observations facilitate such efforts as they enable cost-efficient modelling approaches that utilize publicly available datasets and can assess the status of habitats over extended periods of time. In this study, we introduce a modelling framework for habitat monitoring in Germany using readily available MODIS surface reflectance data. We developed supervised classification models that allocate (semi-)natural areas to one of 18 classes based on their similarity to Natura 2000 habitat types. Three machine learning classifiers, i.e., Support Vector Machines (SVM), Random Forests (RF), and C5.0, and an ensemble approach were employed to predict habitat type using spectral signatures from MODIS in the visible-to-near-infrared and short-wave infrared. The models were trained on homogenous Special Areas of Conservation that are predominantly covered by a single habitat type with reference data from 2013, 2014, and 2016 and tested against ground truth data from 2010 and 2019 for independent model validation. Individually, the SVM and RF methods achieved better overall classification accuracies (SVM: 0.72–0.93%, RF: 0.72–0.94%) than the C5.0 algorithm (0.66–0.93%), while the ensemble classifier developed from the individual models gave the best performance with overall accuracies of 94.23% for 2010 and 80.34% for 2019 and also allowed a robust detection of non-classifiable pixels. We detected strong variability in the cover of individual habitat types, which were reduced when aggregated based on their similarity. Our methodology is capable to provide quantitative information on the spatial distribution of habitats, differentiate between disturbance events and gradual shifts in ecosystem composition, and could successfully allocate natural areas to Natura 2000 habitat types. info:eu-repo/classification/ddc/620 ddc:620
378	Predicting the Impact of Supply Chain Disruptions Using Statistical Analysis and Machine Learning / Prediktering av följderna från störningar i en försörjningskedja med användning av statistisk analys och maskininlärning Andersson, Hannes, Sjöberg, John January 2023 (has links) The dairy business is vulnerable to supply chain disruptions since large safety stocks to cover up losses are not always a viable option, therefore it is crucial to maintain a smooth supply chain to ensure stable delivery accuracies. Disruptions are unpredictable and hard to avoid in the supply chain, especially in cases where production errors cause lost production volume. This thesis proposes the use of machine learning and statistical modelling together with data from Arla to predict when a shortage will occur and its duration to allow proactive decision making to mitigate the consequences of the disruption. The aim of this thesis is to create one predictive model for delay and one for duration based on data from multiple products and explore how the features and methods used can capture the product specific characteristics in the data and thereupon improve the models. The model used for evaluating these factors was a random forest classifier, and permutation feature importance was used to determine the relevant features for the models. The issue of having imbalanced data was handled by first grouping the data and then applying the oversampling method SMOTE. The two models were trained on different datasets where the duration model was trained on all disruptions and the delay model was only trained on a subset were a shortage have occurred. One finding was that applying SMOTE yielded the best results. The best duration model had an accuracy of 62% with precision and recall of 79% and 76% respectively for the majority class, but very low for the other classes with a combined average of 21% and 24%. The most important feature for the duration was the the quotient describing the lost production. The best delay model had an accuracy of 62% with more accurate predictions over all classes and an average precision and recall of 59% and 57%. The most important feature for the delay was how often a product is produced. / Mejeribranschen är sårbar för störningar i försörjningskedjan eftersom stora säkerhetslager för att täcka förluster inte alltid är ett genomförbart alternativ, därför är det avgörande att upprätthålla en smidig försörjningskedja för att säkerställa stabila leveransnivåer. Störningar är oförutsägbara och svåra att undvika i en försörjningskedja, särskilt i de fall där produktionsfel orsakar minskad produktionsvolym. Denna uppsats föreslår användning av maskininlärning och statistisk modellering tillsammans med data från Arla för att prediktera när en brist kommer att uppstå i förhållande till störningen samt bristens varaktighet för att möjliggöra proaktiva beslut som förmildrar konsekvenserna av störningen. Målet med denna uppsats är att skapa en prediktiv modell för fördröjning och en för varaktighet baserad på data från flera produkter och undersöka hur de variabler och metoder som användes kan fånga produktspecifika egenskaper i data och därav förbättra modellen. Modellen som användes för att utvärdera dessa faktorer var en random forest klassificerare, och permutation feature importance användes för att utvärdera de använda variablerna för modellerna. Obalanserad data hanterades genom att först gruppera datan och sedan tillämpa översamplingsmetoden SMOTE. De två modellerna tränades på olika data där varaktighetsmodellen tränades på alla störningar och fördröjningsmodellen endast tränades på de fall där en brist uppstått. En slutsats var att tillämpning av SMOTE gav de bästa resultaten. Den bästa varaktighetsmodellen hade en noggrannhet på 62% med precision och recall på 79% respektive 76% för majoritetsklassen men mycket lägre för de andra klasserna med en genomsnittlig precision och recall på 21% och 24%. Den viktigaste variabeln för varaktigheten var kvoten som beskriver den förlorade produktionen. Den bästa fördröjningsmodellen hade en noggrannhet på 62% med stabilare prediktioner över alla klasser och en genomsnittlig precision och recall på 59% och 57%. Den viktigaste variabeln för fördröjningen var hur ofta en produkt produceras. Supply chain disruption SMOTE feature engineering machine learning random forest statistics applied mathematics Störning i försörjningskedja maskininlärning matematik statistik Other Mathematics Annan matematik
379	Neonatal Sepsis Detection Using Decision Tree Ensemble Methods: Random Forest and XGBoost Al-Bardaji, Marwan, Danho, Nahir January 2022 (has links) Neonatal sepsis is a potentially fatal medical conditiondue to an infection and is attributed to about 200 000annual deaths globally. With healthcare systems that are facingconstant challenges, there exists a potential for introducingmachine learning models as a diagnostic tool that can beautomatized within existing workflows and would not entail morework for healthcare personnel. The Herlenius Research Teamat Karolinska Institutet has collected neonatal sepsis data thathas been used for the development of many machine learningmodels across several papers. However, none have tried to studydecision tree ensemble methods. In this paper, random forestand XGBoost models are developed and evaluated in order toassess their feasibility for clinical practice. The data contained24 features of vital parameters that are easily collected througha patient monitoring system. The validation and evaluationprocedure needed special consideration due to the data beinggrouped based on patient level and being imbalanced. Theproposed methods developed in this paper have the potentialto be generalized to other similar applications. Finally, usingthe measure receiver-operating-characteristic area-under-curve(ROC AUC), both models achieved around ROC AUC= 0.84.Such results suggest that the random forest and XGBoost modelsare potentially feasible for clinical practice. Another gainedinsight was that both models seemed to perform better withsimpler models, suggesting that future work could create a moreexplainable model. / Nenatal sepsis är ett potentiellt dödligt‌‌‌ medicinskt tillstånd till följd av en infektion och uppges globalt orsaka 200 000 dödsfall årligen. Med sjukvårdssystem som konstant utsätts för utmaningar existerar det en potential för maskininlärningsmodeller som diagnostiska verktyg automatiserade inom existerande arbetsflöden utan att innebära mer arbete för sjukvårdsanställda. Herelenius forskarteam på Karolinska Institet har samlat ihop neonatal sepsis data som har använts för att utveckla många maskininlärningsmodeller över flera studier. Emellertid har ingen prövat att undersöka beslutsträds ensemble metoder. Syftet med denna studie är att utveckla och utvärdera random forest och XGBoost modeller för att bedöma deras möjligheter i klinisk praxis. Datan innehör 24 attribut av vitalparameterar som enkelt samlas in genom patientövervakningssystem. Förfarandet för validering och utvärdering krävde särskild hänsyn med tanke på att datan var grupperad på patientnivå och var obalanserad. Den föreslagna metoden har potential att generaliseras till andra liknande tillämpningar. Slutligen, genom att använda receiveroperating-characteristic area-under-curve (ROC AUC) måttet kunde vi uppvisa att båda modellerna presterade med ett resultat på ROC AUC= 0.84. Sådana resultat föreslår att både random forest och XGBoost modellerna kan potentiellt användas i klinisk praxis. En annan insikt var att båda modellerna verkade prestera bättre med enklare modeller vilket föreslår att ete skulle kunna vara att skapa en mer förklarlig skininlärningsmodell. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm Machine Learning Sepsis Neonatal Sepsis Random Forest XGBoost Imbalanced Data Binary Classification Cross-Validation Hyperparameter Tuning Elektroteknik och elektronik
380	Neonatal Sepsis Detection With Random Forest Classification for Heavily Imbalanced Data Osman Abubaker, Ayman January 2022 (has links) Neonatal sepsis is associated with most cases ofmortality in the neonatal intensive care unit. Major challengesin detecting sepsis using suitable biomarkers has lead people tolook for alternative approaches in the form of Machine Learningtechniques. In this project, Random Forest classification wasperformed on a sepsis data set provided by Karolinska Hospital.We particularly focused on tackling class imbalance in the datausing sampling and cost-sensitive techniques. We compare theclassification performances of Random Forests in six differentsetups; four using oversampling and undersampling techniques;one using cost-sensitive learning and one basic Random Forest.The performance with the oversampling techniques were betterand could identify more sepsis patients than the other setups.The overall performances were also good, making the methodspotentially useful in practice. / Neonatal sepsis är orsaken till majoriteten av mortaliteten i neonatal intensivvården. Svårigheten i att detektera sepsis med hjälp av biomarkörer har lett många att leta efter alternativa metoder. Maskininlärningstekniker är en sådan alternativ metod som har i senaste tider ökat i användning inom vård och andra sektorer. I detta project användes Random Forest klassifikations algoritmen på en sepsis datamängd given av Karolinska Sjukhuset. Vi fokuserade på att hantera klassimbalansen i datan genom att använda olika provtagningsoch kostnadskänsliga metoder. Vi jämförde klassificeringsprestanda för Random Forest med sex olika inställningar; fyra av de använde provtagingsmetoderna; en av de använde en kostnadskänslig metod och en var en vanlig Random Forest. Det visade sig att modellens prestanda ökade som mest med översamplings metoderna. Den generella klassificeringsprestandan var också bra, vilket gör Random Forests tillsammans med ingsmetoderna potentiellt användbar i praktiken. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm Random Forest Neonatal Sepsis Imbalanced Classification Cost-sensitive SMOTE ADASYN CNN Tomek- Links Elektroteknik och elektronik

Search results