Global ETD Search

321	Comparison of different models for forecasting of Czech electricity market / Comparison of different models for forecasting of Czech electricity market Kunc, Vladimír January 2017 (has links) There is a demand for decision support tools that can model the electricity markets and allows to forecast the hourly electricity price. Many different ap- proach such as artificial neural network or support vector regression are used in the literature. This thesis provides comparison of several different estima- tors under one settings using available data from Czech electricity market. The resulting comparison of over 5000 different estimators led to a selection of several best performing models. The role of historical weather data (temper- ature, dew point and humidity) is also assesed within the comparison and it was found that while the inclusion of weather data might lead to overfitting, it is beneficial under the right circumstances. The best performing approach was the Lasso regression estimated using modified Lars. 1
322	Assessing and Improving Methods for the Effective Use of Landsat Imagery for Classification and Change Detection in Remote Canadian Regions He, Juan Xia January 2016 (has links) Canadian remote areas are characterized by a minimal human footprint, restricted accessibility, ubiquitous lichen/snow cover (e.g. Arctic) or continuous forest with water bodies (e.g. Sub-Arctic). Effective mapping of earth surface cover and land cover changes using free medium-resolution Landsat images in remote environments is a challenge due to the presence of spectrally mixed pixels, restricted field sampling and ground truthing, and the often relatively homogenous cover in some areas. This thesis investigates how remote sensing methods can be applied to improve the capability of Landsat images for mapping earth surface features and land cover changes in Canadian remote areas. The investigation is conducted from the following four perspectives: 1) determining the continuity of Landsat-8 images for mapping surficial materials, 2) selecting classification algorithms that best address challenges involving mixed pixels, 3) applying advanced image fusion algorithms to improve Landsat spatial resolution while maintaining spectral fidelity and reducing the effects of mixed pixels on image classification and change detection, and, 4) examining different change detection techniques, including post-classification comparisons and threshold-based methods employing PCA(Principal Components Analysis)-fused multi-temporal Landsat images to detect changes in Canadian remote areas. Three typical landscapes in Canadian remote areas are chosen in this research. The first is located in the Canadian Arctic and is characterized by ubiquitous lichen and snow cover. The second is located in the Canadian sub-Arctic and is characterized by well-defined land features such as highlands, ponds, and wetlands. The last is located in a forested highlands region with minimal built-environment features. The thesis research demonstrates that the newly available Landsat-8 images can be a major data source for mapping Canadian geological information in Arctic areas when Landsat-7 is decommissioned. In addition, advanced classification techniques such as a Support-Vector-Machine (SVM) can generate satisfactory classification results in the context of mixed training data and minimal field sampling and truthing. This thesis research provides a systematic investigation on how geostatistical image fusion can be used to improve the performance of Landsat images in identifying surface features. Finally, SVM-based post-classified multi-temporal, and threshold-based PCA-fused bi-temporal Landsat images are shown to be effective in detecting different aspects of vegetation change in a remote forested region in Ontario. This research provides a comprehensive methodology to employ free Landsat images for image classification and change detection in Canadian remote regions. image classification Neural Network Support Vector Machine Random Forest lithology geostatistical regression kriging data fusion Land cover change detection post-classification threshold-based PCA
323	Predictive models for side effects following radiotherapy for prostate cancer / Modèles prédictifs pour les effets secondaires du traitement du cancer de la prostate par radiothérapie Ospina Arango, Juan David 16 June 2014 (has links) La radiothérapie externe (EBRT en anglais pour External Beam Radiotherapy) est l'un des traitements référence du cancer de prostate. Les objectifs de la radiothérapie sont, premièrement, de délivrer une haute dose de radiations dans la cible tumorale (prostate et vésicules séminales) afin d'assurer un contrôle local de la maladie et, deuxièmement, d'épargner les organes à risque voisins (principalement le rectum et la vessie) afin de limiter les effets secondaires. Des modèles de probabilité de complication des tissus sains (NTCP en anglais pour Normal Tissue Complication Probability) sont nécessaires pour estimer sur les risques de présenter des effets secondaires au traitement. Dans le contexte de la radiothérapie externe, les objectifs de cette thèse étaient d'identifier des paramètres prédictifs de complications rectales et vésicales secondaires au traitement; de développer de nouveaux modèles NTCP permettant l'intégration de paramètres dosimétriques et de paramètres propres aux patients; de comparer les capacités prédictives de ces nouveaux modèles à celles des modèles classiques et de développer de nouvelles méthodologies d'identification de motifs de dose corrélés à l'apparition de complications. Une importante base de données de patients traités par radiothérapie conformationnelle, construite à partir de plusieurs études cliniques prospectives françaises, a été utilisée pour ces travaux. Dans un premier temps, la fréquence des symptômes gastro-Intestinaux et génito-Urinaires a été décrite par une estimation non paramétrique de Kaplan-Meier. Des prédicteurs de complications gastro-Intestinales et génito-Urinaires ont été identifiés via une autre approche classique : la régression logistique. Les modèles de régression logistique ont ensuite été utilisés dans la construction de nomogrammes, outils graphiques permettant aux cliniciens d'évaluer rapidement le risque de complication associé à un traitement et d'informer les patients. Nous avons proposé l'utilisation de la méthode d'apprentissage de machine des forêts aléatoires (RF en anglais pour Random Forests) pour estimer le risque de complications. Les performances de ce modèle incluant des paramètres cliniques et patients, surpassent celles des modèle NTCP de Lyman-Kutcher-Burman (LKB) et de la régression logistique. Enfin, la dose 3D a été étudiée. Une méthode de décomposition en valeurs populationnelles (PVD en anglais pour Population Value Decomposition) en 2D a été généralisée au cas tensoriel et appliquée à l'analyse d'image 3D. L'application de cette méthode à une analyse de population a été menée afin d'extraire un motif de dose corrélée à l'apparition de complication après EBRT. Nous avons également développé un modèle non paramétrique d'effets mixtes spatio-Temporels pour l'analyse de population d'images tridimensionnelles afin d'identifier une région anatomique dans laquelle la dose pourrait être corrélée à l'apparition d'effets secondaires. / External beam radiotherapy (EBRT) is one of the cornerstones of prostate cancer treatment. The objectives of radiotherapy are, firstly, to deliver a high dose of radiation to the tumor (prostate and seminal vesicles) in order to achieve a maximal local control and, secondly, to spare the neighboring organs (mainly the rectum and the bladder) to avoid normal tissue complications. Normal tissue complication probability (NTCP) models are then needed to assess the feasibility of the treatment and inform the patient about the risk of side effects, to derive dose-Volume constraints and to compare different treatments. In the context of EBRT, the objectives of this thesis were to find predictors of bladder and rectal complications following treatment; to develop new NTCP models that allow for the integration of both dosimetric and patient parameters; to compare the predictive capabilities of these new models to the classic NTCP models and to develop new methodologies to identify dose patterns correlated to normal complications following EBRT for prostate cancer treatment. A large cohort of patient treated by conformal EBRT for prostate caner under several prospective French clinical trials was used for the study. In a first step, the incidence of the main genitourinary and gastrointestinal symptoms have been described. With another classical approach, namely logistic regression, some predictors of genitourinary and gastrointestinal complications were identified. The logistic regression models were then graphically represented to obtain nomograms, a graphical tool that enables clinicians to rapidly assess the complication risks associated with a treatment and to inform patients. This information can be used by patients and clinicians to select a treatment among several options (e.g. EBRT or radical prostatectomy). In a second step, we proposed the use of random forest, a machine-Learning technique, to predict the risk of complications following EBRT for prostate cancer. The superiority of the random forest NTCP, assessed by the area under the curve (AUC) of the receiving operative characteristic (ROC) curve, was established. In a third step, the 3D dose distribution was studied. A 2D population value decomposition (PVD) technique was extended to a tensorial framework to be applied on 3D volume image analysis. Using this tensorial PVD, a population analysis was carried out to find a pattern of dose possibly correlated to a normal tissue complication following EBRT. Also in the context of 3D image population analysis, a spatio-Temporal nonparametric mixed-Effects model was developed. This model was applied to find an anatomical region where the dose could be correlated to a normal tissue complication following EBRT. Radiothérapie prostatique Effets secondaires Modèles prédictifs Forêts aléatoires Modèles d'effets mélangés Modèles non paramétriques Prostate radiotherapy Side effects Predictive models, random forest Mixed-effects models Non parametric models
324	Analys av prestations- och prediktionsvariabler inom fotboll Ulriksson, Marcus, Armaki, Shahin January 2017 (has links) Uppsatsen ämnar att försöka förklara hur olika variabler angående matchbilden i en fotbollsmatch påverkar slutresultatet. Dessa variabler är uppdelade i prestationsvariabler och kvalitétsvariabler. Prestationsvariablerna är baserade på prestationsindikatorer inspirerat av Hughes och Bartlett (2002). Kvalitétsvariablerna förklarar hur bra de olika lagen är. Som verktyg för att uppnå syftet används olika klassificeringsmodeller utifrån både prestationsvariablerna och kvalitétsvariablerna. Först undersöktes vilka prestationsindikatorer som var viktigast. Den bästa modellen klassificerade cirka 60 % rätt och rensningar och skott på mål var de viktigaste prestationsvariablerna. Sedan undersöktes vilka prediktionsvariabler som var bäst. Den bästa modellen klassificerade rätt slutresultat cirka 88 % av matcherna. Utifrån vad författarna ansågs vara de viktigaste prediktionsvariablerna skapades en prediktionsmodell med färre variabler. Denna lyckades klassificera rätt cirka 86 % av matcherna. Prediktionsmodellen var konstruerad med spelarbetyg, odds på oavgjort och domare. fotboll Premier League prestationsvariabler prediktionsvariabler prediktion klassificering machine learning ensemblemodeller beslutsträd bagging random forest boosting adaboosting slutresultat Probability Theory and Statistics Sannolikhetsteori och statistik
325	Caractérisation et cartographie de la structure forestière à partir d'images satellitaires à très haute résolution spatiale / Quantification and mapping of forest structure from Very High Resolution (VHR) satellite images Beguet, Benoît 06 October 2014 (has links) Les images à très haute résolution spatiale (THR) telles que les images Pléiades (50 cm en Panchromatique, 2m en multispectral) rendent possible une description fine de la structure forestière (distribution et dimensions des arbres) à l'échelle du peuplement, en exploitant la relation entre la structure spatiale des arbres et la texture d'image quand la taille du pixel est inférieure à la dimension des arbres. Cette attente répond au besoin d'inventaire spatialisé de la ressource forestière à l'échelle du peuplement et de ses changements dus à la gestion forestière, à l'aménagement du territoire ou aux événements catastrophiques. L'objectif est double: (1) évaluer le potentiel de la texture d'images THR pour estimer les principales variables de structure forestière (diamètre des couronnes, diamètre du tronc, hauteur, densité ou espacement des arbres) à l'échelle du peuplement; (2) sur ces bases, classer les données image, au niveau pixel, par types de structure forestière afin de produire l'information spatialisée la plus fine possible. Les principaux développements portent sur l'automatisation du paramètrage, la sélection de variables, la modélisation par régression multivariable et une approche de classification par classifieurs d'ensemble (Forêts Aléatoires ou Random Forests). Ils sont testés et évalués sur deux sites de la forêt landaise de pin maritime à partir de trois images Pléiades et une Quickbird, acquises dans diverses conditions (saison, position du soleil, angles de visée). La méthodologie proposée est générique. La robustesse aux conditions d'acquisition des images est évaluée. Les résultats montrent que des variations fines de texture caractéristiques de celles de la structure forestière sont bien identifiables. Les performances en terme d'estimation des variables forestières (RMSE) : ~1.1 m pour le diamètre des couronnes, ~3 m pour la hauteur des arbres ou encore ~0.9 m pour leur espacement, ainsi qu'en cartographie des structures forestières (~82 % de taux de bonne classification pour la reconnaissance des 5 classes principales de la structure forestière) sont satisfaisantes d'un point de vue opérationnel. L'application à des images multi-annuelles permettra d'évaluer leur capacité à détecter et cartographier des changements tels que coupe forestière, mitage urbain ou encore dégâts de tempête. / Very High spatial Resolution (VHR) images like Pléiades imagery (50 cm panchromatic, 2m multispectral) allows a detailed description of forest structure (tree distribution and size) at stand level, by exploiting the spatial relationship between tree structure and image texture when the pixel size is smaller than tree dimensions. This information meets the expected strong need for spatial inventory of forest resources at the stand level and its changes due to forest management, land use or catastrophic events. The aim is twofold : (1) assess the VHR satellite images potential to estimate the main variables of forest structure from the image texture: crown diameter, stem diameter, height, density or tree spacing, (2) on these bases, a pixel-based image classification of forest structure is processed in order to produce the finest possible spatial information. The main developments concern parameter optimization, variable selection, multivariate regression modelling and ensemble-based classification (Random Forests). They are tested and evaluated on the Landes maritime pine forest with three Pléiades images and a Quickbird image acquired under different conditions (season, sun angle, view angle). The method is generic. The robustness of the proposed method to image acquisition parameters is evaluated. Results show that fine variations of texture characteristics related to those of forest structure are clearly identifiable. Performances in terms of forest variable estimation (RMSE): ~1,1m for crown diameter, ~3m for tree height and ~0,9m for tree spacing, as well as forest structure mapping (~82% Overall accuracy for the classification of the five main forest structure classes) are satisfactory from an operational perspective. Their application to multi- annual images will assess their ability to detect and map forest changes such as clear cut, urban sprawl or storm damages. Classification Sélection de variables Forêts aléatoires Texture Forêt Pléiades Très haute résolution spatiale Classification Feature selection Random forest Texture Forestry Pléiades Very high spatial resolution
326	Datamining a využití rozhodovacích stromů při tvorbě Scorecards / Data Mining and use of decision trees by creation of Scorecards Straková, Kristýna January 2014 (has links) The thesis presents a comparison of several selected modeling methods used by financial institutions for (not exclusively) decision-making processes. First theoretical part describes well known modeling methods such as logistic regression, decision trees, neural networks, alternating decision trees and relatively new method called "Random forest". The practical part of thesis outlines some processes within financial institutions, in which selected modeling methods are used. On real data of two financial institutions logistic regression, decision trees and decision forest are compared which each other. Method of neural network is not included due to its complex interpretability. In conclusion, based on resulting models, thesis is trying to answers, whether logistic regression (method most widely used by financial institutions) remains most suitable.
327	Machine learning methods for seasonal allergic rhinitis studies Feng, Zijie January 2021 (has links) Seasonal allergic rhinitis (SAR) is a disease caused by allergens from both environmental and genetic factors. Some researchers have studied the SAR based on traditional genetic methodologies. As technology develops, a new technique called single-cell RNA sequencing (scRNA-seq) is developed, which can generate high-dimension data. We apply two machine learning (ML) algorithms, random forest (RF) and partial least squares discriminant analysis (PLS-DA), for cell source classification and gene selection based on the SAR scRNA-seq time-series data from three allergic patients and four healthy controls denoised by single-cell variational inference (scVI). We additionally propose a new fitting method consisting of bootstrap and cubic smoothing splines to fit the averaged gene expressions per cell from different populations. To sum up, we find that both RF and PLS-DA could provide high classification accuracy, and RF is more preferable, considering its stable performance and strong gene-selection ability. Based on our analysis, there are 10 genes having discriminatory power to classify cells of allergic patients and healthy controls at any timepoints. Although there is no literature founded to show the direct connections between such 10 genes and SAR, the potential associations are indirectly confirmed by some studies. It shows a possibility that we can alarm allergic patients before a disease outbreak based on their genetic information. Meanwhile, our experiment results indicate that ML algorithms may discover something between genes and SAR compared with traditional techniques, which needs to be analyzed in genetics in the future. Machine learning Seasonal allergic rhinitis Random forest Bootstrap Cubic smoothing splines Single-cell RNA sequencing Probability Theory and Statistics Sannolikhetsteori och statistik
328	Binary Classification for Predicting Customer Churn Axén, Maja, Karlberg, Jennifer January 2020 (has links) Predicting when a customer is about to turn to a competitor can be difficult, yet extremely valuable from a business perspective. The moment a customer stops being considered a customer is known as churn, a widely researched topic in several industries when dealing with subscription-services. However, in industries with non-subscription services and products, defining churn can be a daunting task and the existing literature does not fully cover this field. Therefore, this thesis can be seen as a contribution to current research, specially when not having a set definition for churn. A definition for churn, adjusted to DIAKRIT’s business, is created. DIAKRIT is a company working in the real estate industry, which faces many challenges, such as a huge seasonality. The prediction was approached as a supervised problem, where three different Machine Learning methods were used: Logistic Regression, Random Forest and Support Vector Machine. The variables used in the predictions are predominantly activity data. With a relatively high accuracy and AUC-score, Random Forest was concluded to be the most reliable model. It is however clear that the model cannot separate between the classes perfectly. It was also visible that the Random Forest model produces a relatively high precision. Thereby, it can be settled that even though the model is not flawless the customers predicted to churn are very likely to churn. / Att prediktera när en kund är påväg att vända sig till en konkurrent kan vara svårt, dock kan det visa sig extremt värdefullt ur ett affärsperspektiv. När en kund slutar vara kund benäms det ofta som kundbortfall eller ”churn”. Detta är ett ämne som är brett forskat på i flertalet olika industrier, men då ofta i situationer med prenumenationstjänster. När man inte har en prenumerationstjänst försvåras uppgiften att definera churn och existerande studier brister i att analysera detta. Denna uppsats kan därför ses som ett bidrag till nuvarande litteratur, i synnerhet i fall där ingen tydlig definition för churn existerar. En definition för churn, anpassad efter DIAKRIT och deras affärsstruktur har skapats i det här projektet. DIAKRIT är verksamma i fastighetsbranschen, en industri som har flera utmaningar, bland annat en extrem säsongsvariaton. För att genomföra prediktionerna användes tre olika maskininlärningamodeller: Logistisk Regression, Random Forest och Support Vector Machine. De variabler som användes är mestadels aktivitetsdata. Med relativt hög noggranhet och AUC-värde anses Random Forest vara mest pålitlig. Modellen kan dock inte separera mellan de två klasserna perfekt. Random Forest modellen visade sig också genera en hög precision. Därför kan slutsatsen dras att även om modellen inte är felfri verkar det som att kunderna predikterade som churn mest sannolikt kommer churna. Churn Machine Learning Prediction Logistic Regression Random Forest Support Vector Machine Customer Profitability Customer Attrition User Churn User Retention Real estate industry Mathematics Matematik
329	Rozpoznání dopravních prostředků pomocí signálů snímaných chytrým telefonem / Recognition of vehicles using signals sensed by smartphone Nevěčná, Leona January 2018 (has links) Thanks to the development in recent years, the placement of miniaturized sensors such as accelerometers, gyroscopes, magnetometers, global positioning system receivers (GPS), microphones or others to commercially sold smartphones is increasing. Use of these sensors (which are to be found in the smartphone) for human activity recognition with health care improvement in mind is a discussed theme. Advantages of the use of smartphone for human movement monitoring lies in the fact that it is a device that the person measured carries with them and there are no additional costs. The disadvantages are a limited storage and battery. Therefore, only accelerometer, gyroscope, magnetometer, and microphone were chosen because their combination achieves best results. GPS sensor was excluded for its lack of reliability in sampling and for being energy demanding. Features were computed from the measured data and used for learning of the classification model. The highest accuracy was achieved with the use of a machine learning method called Random Forest. The main goal of this work was to create an algorithm for transportation mode recognition using signals sensed by a smartphone. The created algorithm succeeds in classification of walk, car, bus, tram, train, and bike in 97.4 % with 20 % holdout validation. When tested on a new set of data from the tenth volunteer, the resulting accuracy counted as average form classification recall for each transportation mode reached 90.49 %.
330	Nové metody pro analýzu spánku a klasifikaci / Novel methods for sleep analysis and classification Navrátilová, Markéta January 2020 (has links) Tato diplomová práce se zabývá metodami pro analýzu a klasifikaci spánku. Popisuje jakjednotlivé spánkové fáze a vzorce biosignálů v průběhu spánku, tak metody pro klasifi-kaci. Příznaky jsou extrahovány na dodaných biosignálech ECG, EDA a RIP. Na základětěchto příznaků jsou klasifikovány jednotlivé spánkové fáze s využitím klasifikátoru ná-hodný les. Parametry klasifikátoru jsou optimalizovány a následně jsou vyhodnocenydosažené výsledky. Pomocí metod pro redukci dimenzionality je soubor příznaků analy-zován a výsledky jsou porovnány s výsledky ze standardní klasifikace. Řešení pro vizuali-zaci jak samotných nezpracovaných signálů, tak extrahovaných příznaků je navrhnuto aimplementováno. Dosažené výsledky jsou porovnány s publikovanými metodami.

Search results