• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 340
  • 26
  • 21
  • 13
  • 8
  • 5
  • 5
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 510
  • 510
  • 273
  • 271
  • 147
  • 135
  • 129
  • 128
  • 113
  • 92
  • 88
  • 77
  • 76
  • 74
  • 59
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

Bedömning av fakturor med hjälp av maskininlärning / Invoice Classification using Machine Learning

Hjalmarsson, Martin, Björkman, Mikael January 2017 (has links)
Factoring innebär försäljning av fakturor till tredjepart och därmed möjlighet att få in kapital snabbt och har blivit alltmer populärt bland företag idag. Ett fakturaköp innebär en viss kreditrisk för företaget i de fall som fakturan inte blir betald och som köpare av kapital önskar man att minimera den risken. Aros Kapital erbjuder sina kunder tjänsten factoring. Under detta projekt undersöks möjligheten att använda maskininlärningsmetoder för att bedöma om en faktura är en bra eller dålig investering. Om maskininlärningen visar sig vara bättre än manuell hantering kan även bättre resultat uppnås i form av minskade kreditförluster, köp av fler fakturor och därmed ökad vinst. Fyra maskininlärningsmetoder jämfördes: beslutsträd, slumpmässig skog, Adaboost och djupa neurala nätverk. Utöver jämförelse sinsemellan har metoderna jämförts med Aros befintliga beslut och nuvarande regelmotor. Av de jämförda maskininlärningsmetoderna presterade slumpmässig skog bäst och visade sig bättre än Aros befintliga beslut på de testade fakturorna, slumpmässig skog fick F1-poängen 0,35 och Aros 0,22 . / Today, companies can sell their invoices to a third party in order to to quickly capitalize them. This is called factoring. For the financial institute which serve as the third party, the purchase of an invoice infers a certain risk in case the invoice is not paid, a risk the financial institute would like to minimize. Aros Kapital is a financial institute that offers factoring as one of their services. This project at Aros Kapital evaluated the possibility of using machine learning to determine whether or not an invoice will be good investment for the financial institute. If the machine learning algorithm performs better than manual handling and by minimizing credit losses and buying more invoices this could lead to an increase in profit for Aros. Four machine learning algorithms have been compared: decision trees, random forest, Adaboost and deep neural network. Beyond the comparison between the four algorithms, the algorithms were also compared with Aros actual decision and Aros current rule engine solution. The  results show that random forest is the best performing algorithm and it also shows a slight improvement on performance compared to Aros actual decision, random forest got an F1- core of 0.35 and Aros 0.22.

Strategies for Combining Tree-Based Ensemble Models

Zhang, Yi 01 January 2017 (has links)
Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach.

Unconstrained Gaze Estimation Using RGB-D Camera. / Estimation du regard avec une caméra RGB-D dans des environnements utilisateur non-contraints

Kacete, Amine 15 December 2016 (has links)
Dans ce travail, nous avons abordé le problème d’estimation automatique du regard dans des environnements utilisateur sans contraintes. Ce travail s’inscrit dans la vision par ordinateur appliquée à l’analyse automatique du comportement humain. Plusieurs solutions industrielles sont aujourd’hui commercialisées et donnent des estimations précises du regard. Certaines ont des spécifications matérielles très complexes (des caméras embarquées sur un casque ou sur des lunettes qui filment le mouvement des yeux) et présentent un niveau d’intrusivité important, ces solutions sont souvent non accessible au grand public. Cette thèse vise à produire un système d’estimation automatique du regard capable d’augmenter la liberté du mouvement de l’utilisateur par rapport à la caméra (mouvement de la tête, distance utilisateur-capteur), et de réduire la complexité du système en utilisant des capteurs relativement simples et accessibles au grand public. Dans ce travail, nous avons exploré plusieurs paradigmes utilisés par les systèmes d’estimation automatique du regard. Dans un premier temps, Nous avons mis au point deux systèmes basés sur deux approches classiques: le premier basé caractéristiques et le deuxième basé semi apparence. L’inconvénient majeur de ces paradigmes réside dans la conception des systèmes d'estimation du regard qui supposent une indépendance totale entre l'image d'apparence des yeux et la pose de la tête. Pour corriger cette limitation, Nous avons convergé vers un nouveau paradigme qui unifie les deux blocs précédents en construisant un espace regard global, nous avons exploré deux directions en utilisant des données réelles et synthétiques respectivement. / In this thesis, we tackled the automatic gaze estimation problem in unconstrained user environments. This work takes place in the computer vision research field applied to the perception of humans and their behaviors. Many existing industrial solutions are commercialized and provide an acceptable accuracy in gaze estimation. These solutions often use a complex hardware such as range of infrared cameras (embedded on a head mounted or in a remote system) making them intrusive, very constrained by the user's environment and inappropriate for a large scale public use. We focus on estimating gaze using cheap low-resolution and non-intrusive devices like the Kinect sensor. We develop new methods to address some challenging conditions such as head pose changes, illumination conditions and user-sensor large distance. In this work we investigated different gaze estimation paradigms. We first developed two automatic gaze estimation systems following two classical approaches: feature and semi appearance-based approaches. The major limitation of such paradigms lies in their way of designing gaze systems which assume a total independence between eye appearance and head pose blocks. To overcome this limitation, we converged to a novel paradigm which aims at unifying the two previous components and building a global gaze manifold, we explored two global approaches across the experiments by using synthetic and real RGB-D gaze samples.

Comparison of different models for forecasting of Czech electricity market / Comparison of different models for forecasting of Czech electricity market

Kunc, Vladimír January 2017 (has links)
There is a demand for decision support tools that can model the electricity markets and allows to forecast the hourly electricity price. Many different ap- proach such as artificial neural network or support vector regression are used in the literature. This thesis provides comparison of several different estima- tors under one settings using available data from Czech electricity market. The resulting comparison of over 5000 different estimators led to a selection of several best performing models. The role of historical weather data (temper- ature, dew point and humidity) is also assesed within the comparison and it was found that while the inclusion of weather data might lead to overfitting, it is beneficial under the right circumstances. The best performing approach was the Lasso regression estimated using modified Lars. 1

Assessing and Improving Methods for the Effective Use of Landsat Imagery for Classification and Change Detection in Remote Canadian Regions

He, Juan Xia January 2016 (has links)
Canadian remote areas are characterized by a minimal human footprint, restricted accessibility, ubiquitous lichen/snow cover (e.g. Arctic) or continuous forest with water bodies (e.g. Sub-Arctic). Effective mapping of earth surface cover and land cover changes using free medium-resolution Landsat images in remote environments is a challenge due to the presence of spectrally mixed pixels, restricted field sampling and ground truthing, and the often relatively homogenous cover in some areas. This thesis investigates how remote sensing methods can be applied to improve the capability of Landsat images for mapping earth surface features and land cover changes in Canadian remote areas. The investigation is conducted from the following four perspectives: 1) determining the continuity of Landsat-8 images for mapping surficial materials, 2) selecting classification algorithms that best address challenges involving mixed pixels, 3) applying advanced image fusion algorithms to improve Landsat spatial resolution while maintaining spectral fidelity and reducing the effects of mixed pixels on image classification and change detection, and, 4) examining different change detection techniques, including post-classification comparisons and threshold-based methods employing PCA(Principal Components Analysis)-fused multi-temporal Landsat images to detect changes in Canadian remote areas. Three typical landscapes in Canadian remote areas are chosen in this research. The first is located in the Canadian Arctic and is characterized by ubiquitous lichen and snow cover. The second is located in the Canadian sub-Arctic and is characterized by well-defined land features such as highlands, ponds, and wetlands. The last is located in a forested highlands region with minimal built-environment features. The thesis research demonstrates that the newly available Landsat-8 images can be a major data source for mapping Canadian geological information in Arctic areas when Landsat-7 is decommissioned. In addition, advanced classification techniques such as a Support-Vector-Machine (SVM) can generate satisfactory classification results in the context of mixed training data and minimal field sampling and truthing. This thesis research provides a systematic investigation on how geostatistical image fusion can be used to improve the performance of Landsat images in identifying surface features. Finally, SVM-based post-classified multi-temporal, and threshold-based PCA-fused bi-temporal Landsat images are shown to be effective in detecting different aspects of vegetation change in a remote forested region in Ontario. This research provides a comprehensive methodology to employ free Landsat images for image classification and change detection in Canadian remote regions.

Predictive models for side effects following radiotherapy for prostate cancer / Modèles prédictifs pour les effets secondaires du traitement du cancer de la prostate par radiothérapie

Ospina Arango, Juan David 16 June 2014 (has links)
La radiothérapie externe (EBRT en anglais pour External Beam Radiotherapy) est l'un des traitements référence du cancer de prostate. Les objectifs de la radiothérapie sont, premièrement, de délivrer une haute dose de radiations dans la cible tumorale (prostate et vésicules séminales) afin d'assurer un contrôle local de la maladie et, deuxièmement, d'épargner les organes à risque voisins (principalement le rectum et la vessie) afin de limiter les effets secondaires. Des modèles de probabilité de complication des tissus sains (NTCP en anglais pour Normal Tissue Complication Probability) sont nécessaires pour estimer sur les risques de présenter des effets secondaires au traitement. Dans le contexte de la radiothérapie externe, les objectifs de cette thèse étaient d'identifier des paramètres prédictifs de complications rectales et vésicales secondaires au traitement; de développer de nouveaux modèles NTCP permettant l'intégration de paramètres dosimétriques et de paramètres propres aux patients; de comparer les capacités prédictives de ces nouveaux modèles à celles des modèles classiques et de développer de nouvelles méthodologies d'identification de motifs de dose corrélés à l'apparition de complications. Une importante base de données de patients traités par radiothérapie conformationnelle, construite à partir de plusieurs études cliniques prospectives françaises, a été utilisée pour ces travaux. Dans un premier temps, la fréquence des symptômes gastro-Intestinaux et génito-Urinaires a été décrite par une estimation non paramétrique de Kaplan-Meier. Des prédicteurs de complications gastro-Intestinales et génito-Urinaires ont été identifiés via une autre approche classique : la régression logistique. Les modèles de régression logistique ont ensuite été utilisés dans la construction de nomogrammes, outils graphiques permettant aux cliniciens d'évaluer rapidement le risque de complication associé à un traitement et d'informer les patients. Nous avons proposé l'utilisation de la méthode d'apprentissage de machine des forêts aléatoires (RF en anglais pour Random Forests) pour estimer le risque de complications. Les performances de ce modèle incluant des paramètres cliniques et patients, surpassent celles des modèle NTCP de Lyman-Kutcher-Burman (LKB) et de la régression logistique. Enfin, la dose 3D a été étudiée. Une méthode de décomposition en valeurs populationnelles (PVD en anglais pour Population Value Decomposition) en 2D a été généralisée au cas tensoriel et appliquée à l'analyse d'image 3D. L'application de cette méthode à une analyse de population a été menée afin d'extraire un motif de dose corrélée à l'apparition de complication après EBRT. Nous avons également développé un modèle non paramétrique d'effets mixtes spatio-Temporels pour l'analyse de population d'images tridimensionnelles afin d'identifier une région anatomique dans laquelle la dose pourrait être corrélée à l'apparition d'effets secondaires. / External beam radiotherapy (EBRT) is one of the cornerstones of prostate cancer treatment. The objectives of radiotherapy are, firstly, to deliver a high dose of radiation to the tumor (prostate and seminal vesicles) in order to achieve a maximal local control and, secondly, to spare the neighboring organs (mainly the rectum and the bladder) to avoid normal tissue complications. Normal tissue complication probability (NTCP) models are then needed to assess the feasibility of the treatment and inform the patient about the risk of side effects, to derive dose-Volume constraints and to compare different treatments. In the context of EBRT, the objectives of this thesis were to find predictors of bladder and rectal complications following treatment; to develop new NTCP models that allow for the integration of both dosimetric and patient parameters; to compare the predictive capabilities of these new models to the classic NTCP models and to develop new methodologies to identify dose patterns correlated to normal complications following EBRT for prostate cancer treatment. A large cohort of patient treated by conformal EBRT for prostate caner under several prospective French clinical trials was used for the study. In a first step, the incidence of the main genitourinary and gastrointestinal symptoms have been described. With another classical approach, namely logistic regression, some predictors of genitourinary and gastrointestinal complications were identified. The logistic regression models were then graphically represented to obtain nomograms, a graphical tool that enables clinicians to rapidly assess the complication risks associated with a treatment and to inform patients. This information can be used by patients and clinicians to select a treatment among several options (e.g. EBRT or radical prostatectomy). In a second step, we proposed the use of random forest, a machine-Learning technique, to predict the risk of complications following EBRT for prostate cancer. The superiority of the random forest NTCP, assessed by the area under the curve (AUC) of the receiving operative characteristic (ROC) curve, was established. In a third step, the 3D dose distribution was studied. A 2D population value decomposition (PVD) technique was extended to a tensorial framework to be applied on 3D volume image analysis. Using this tensorial PVD, a population analysis was carried out to find a pattern of dose possibly correlated to a normal tissue complication following EBRT. Also in the context of 3D image population analysis, a spatio-Temporal nonparametric mixed-Effects model was developed. This model was applied to find an anatomical region where the dose could be correlated to a normal tissue complication following EBRT.

Analys av prestations- och prediktionsvariabler inom fotboll

Ulriksson, Marcus, Armaki, Shahin January 2017 (has links)
Uppsatsen ämnar att försöka förklara hur olika variabler angående matchbilden i en fotbollsmatch påverkar slutresultatet. Dessa variabler är uppdelade i prestationsvariabler och kvalitétsvariabler. Prestationsvariablerna är baserade på prestationsindikatorer inspirerat av Hughes och Bartlett (2002). Kvalitétsvariablerna förklarar hur bra de olika lagen är. Som verktyg för att uppnå syftet används olika klassificeringsmodeller utifrån både prestationsvariablerna och kvalitétsvariablerna. Först undersöktes vilka prestationsindikatorer som var viktigast. Den bästa modellen klassificerade cirka 60 % rätt och rensningar och skott på mål var de viktigaste prestationsvariablerna. Sedan undersöktes vilka prediktionsvariabler som var bäst. Den bästa modellen klassificerade rätt slutresultat cirka 88 % av matcherna. Utifrån vad författarna ansågs vara de viktigaste prediktionsvariablerna skapades en prediktionsmodell med färre variabler. Denna lyckades klassificera rätt cirka 86 % av matcherna. Prediktionsmodellen var konstruerad med spelarbetyg, odds på oavgjort och domare.

Caractérisation et cartographie de la structure forestière à partir d'images satellitaires à très haute résolution spatiale / Quantification and mapping of forest structure from Very High Resolution (VHR) satellite images

Beguet, Benoît 06 October 2014 (has links)
Les images à très haute résolution spatiale (THR) telles que les images Pléiades (50 cm en Panchromatique, 2m en multispectral) rendent possible une description fine de la structure forestière (distribution et dimensions des arbres) à l'échelle du peuplement, en exploitant la relation entre la structure spatiale des arbres et la texture d'image quand la taille du pixel est inférieure à la dimension des arbres. Cette attente répond au besoin d'inventaire spatialisé de la ressource forestière à l'échelle du peuplement et de ses changements dus à la gestion forestière, à l'aménagement du territoire ou aux événements catastrophiques. L'objectif est double: (1) évaluer le potentiel de la texture d'images THR pour estimer les principales variables de structure forestière (diamètre des couronnes, diamètre du tronc, hauteur, densité ou espacement des arbres) à l'échelle du peuplement; (2) sur ces bases, classer les données image, au niveau pixel, par types de structure forestière afin de produire l'information spatialisée la plus fine possible. Les principaux développements portent sur l'automatisation du paramètrage, la sélection de variables, la modélisation par régression multivariable et une approche de classification par classifieurs d'ensemble (Forêts Aléatoires ou Random Forests). Ils sont testés et évalués sur deux sites de la forêt landaise de pin maritime à partir de trois images Pléiades et une Quickbird, acquises dans diverses conditions (saison, position du soleil, angles de visée). La méthodologie proposée est générique. La robustesse aux conditions d'acquisition des images est évaluée. Les résultats montrent que des variations fines de texture caractéristiques de celles de la structure forestière sont bien identifiables. Les performances en terme d'estimation des variables forestières (RMSE) : ~1.1 m pour le diamètre des couronnes, ~3 m pour la hauteur des arbres ou encore ~0.9 m pour leur espacement, ainsi qu'en cartographie des structures forestières (~82 % de taux de bonne classification pour la reconnaissance des 5 classes principales de la structure forestière) sont satisfaisantes d'un point de vue opérationnel. L'application à des images multi-annuelles permettra d'évaluer leur capacité à détecter et cartographier des changements tels que coupe forestière, mitage urbain ou encore dégâts de tempête. / Very High spatial Resolution (VHR) images like Pléiades imagery (50 cm panchromatic, 2m multispectral) allows a detailed description of forest structure (tree distribution and size) at stand level, by exploiting the spatial relationship between tree structure and image texture when the pixel size is smaller than tree dimensions. This information meets the expected strong need for spatial inventory of forest resources at the stand level and its changes due to forest management, land use or catastrophic events. The aim is twofold : (1) assess the VHR satellite images potential to estimate the main variables of forest structure from the image texture: crown diameter, stem diameter, height, density or tree spacing, (2) on these bases, a pixel-based image classification of forest structure is processed in order to produce the finest possible spatial information. The main developments concern parameter optimization, variable selection, multivariate regression modelling and ensemble-based classification (Random Forests). They are tested and evaluated on the Landes maritime pine forest with three Pléiades images and a Quickbird image acquired under different conditions (season, sun angle, view angle). The method is generic. The robustness of the proposed method to image acquisition parameters is evaluated. Results show that fine variations of texture characteristics related to those of forest structure are clearly identifiable. Performances in terms of forest variable estimation (RMSE): ~1,1m for crown diameter, ~3m for tree height and ~0,9m for tree spacing, as well as forest structure mapping (~82% Overall accuracy for the classification of the five main forest structure classes) are satisfactory from an operational perspective. Their application to multi- annual images will assess their ability to detect and map forest changes such as clear cut, urban sprawl or storm damages.

Datamining a využití rozhodovacích stromů při tvorbě Scorecards / Data Mining and use of decision trees by creation of Scorecards

Straková, Kristýna January 2014 (has links)
The thesis presents a comparison of several selected modeling methods used by financial institutions for (not exclusively) decision-making processes. First theoretical part describes well known modeling methods such as logistic regression, decision trees, neural networks, alternating decision trees and relatively new method called "Random forest". The practical part of thesis outlines some processes within financial institutions, in which selected modeling methods are used. On real data of two financial institutions logistic regression, decision trees and decision forest are compared which each other. Method of neural network is not included due to its complex interpretability. In conclusion, based on resulting models, thesis is trying to answers, whether logistic regression (method most widely used by financial institutions) remains most suitable.

Machine learning methods for seasonal allergic rhinitis studies

Feng, Zijie January 2021 (has links)
Seasonal allergic rhinitis (SAR) is a disease caused by allergens from both environmental and genetic factors. Some researchers have studied the SAR based on traditional genetic methodologies. As technology develops, a new technique called single-cell RNA sequencing (scRNA-seq) is developed, which can generate high-dimension data. We apply two machine learning (ML) algorithms, random forest (RF) and partial least squares discriminant analysis (PLS-DA), for cell source classification and gene selection based on the SAR scRNA-seq time-series data from three allergic patients and four healthy controls denoised by single-cell variational inference (scVI). We additionally propose a new fitting method consisting of bootstrap and cubic smoothing splines to fit the averaged gene expressions per cell from different populations. To sum up, we find that both RF and PLS-DA could provide high classification accuracy, and RF is more preferable, considering its stable performance and strong gene-selection ability. Based on our analysis, there are 10 genes having discriminatory power to classify cells of allergic patients and healthy controls at any timepoints. Although there is no literature founded to show the direct connections between such 10 genes and SAR, the potential associations are indirectly confirmed by some studies. It shows a possibility that we can alarm allergic patients before a disease outbreak based on their genetic information. Meanwhile, our experiment results indicate that ML algorithms may discover something between genes and SAR compared with traditional techniques, which needs to be analyzed in genetics in the future.

Page generated in 0.0616 seconds