Global ETD Search

441	News Value Modeling and Prediction using Textual Features and Machine Learning / Modellering och prediktion av nyhetsvärde med textattribut och maskininlärning Lindblom, Rebecca January 2020 (has links) News value assessment has been done forever in the news media industry and is today often done in real-time without any documentation. Editors take a lot of different qualitative aspects into consideration when deciding what news stories will make it to the first page. This thesis explores how the complex news value assessment process can be translated into a quantitative model, and also how those news values can be predicted in an effective way using machine learning and NLP. Two models for news value were constructed, for which the correlation between modeled and manual news values was measured, and the results show that the more complex model gives a higher correlation. For prediction, different types of features are extracted, Random Forest and SVM are used, and the predictions are evaluated with accuracy, F1-score, RMSE, and MAE. Random Forest shows the best results for all metrics on all datasets, the best result being on the largest dataset, probably due to the smaller datasets having a less even distribution between classes. NLP machine learning news value popularity random forest prediction classification modeling read time clicks newspaper swedish nyhetsvärde nyhetsvärdering popularitet tidning maskininlärning Computer and Information Sciences Data- och informationsvetenskap
442	Machine Learning Applications for Downscaling Groundwater Storage Changes Integrating Satellite Gravimetry and Other Observations Agarwal, Vibhor January 2021 (has links) No description available. Geographic Information Science Geography Remote Sensing Geophysical Geological Machine Learning GRACE Downscaling Central Valley North China Plain Random Forest Artificial Neural Network Groundwater Depletion Groundwater Storage Iterative forward modeling Leakage correction cross-validation
443	Statistical and Machine Learning for assessment of Traumatic Brain Injury Severity and Patient Outcomes Rahman, Md Abdur January 2021 (has links) Traumatic brain injury (TBI) is a leading cause of death in all age groups, causing society to be concerned. However, TBI diagnostics and patient outcomes prediction are still lacking in medical science. In this thesis, I used a subset of TBIcare data from Turku University Hospital in Finland to classify the severity, patient outcomes, and CT (computerized tomography) as positive/negative. The dataset was derived from the comprehensive metabolic profiling of serum samples from TBI patients. The study included 96 TBI patients who were diagnosed as 7 severe (sTBI=7), 10 moderate (moTBI=10), and 79 mild (mTBI=79). Among them, there were 85 good recoveries (Good_Recovery=85) and 11 bad recoveries (Bad_Recovery=11), as well as 49 CT positive (CT. Positive=49) and 47 CT negative (CT. Negative=47). There was a total of 455 metabolites (features), excluding three response variables. Feature selection techniques were applied to retain the most important features while discarding the rest. Subsequently, four classifications were used for classification: Ridge regression, Lasso regression, Neural network, and Deep learning. Ridge regression yielded the best results for binary classifications such as patient outcomes and CT positive/negative. The accuracy of CT positive/negative was 74% (AUC of 0.74), while the accuracy of patient outcomes was 91% (AUC of 0.91). For severity classification (multi-class classification), neural networks performed well, with a total accuracy of 90%. Despite the limited number of data points, the overall result was satisfactory. TBI (Traumatic brain injury) Metabolites Glasgow coma scale Severity Patient outcomes CT positive /negative Random Forest Boruta Lasso regression Ridge regression Neural network Deep learning. Social Sciences Interdisciplinary
444	Detecting anomalies in data streams driven by ajump-diffusion process / Anomalidetektion i dataströmmar för hopp-diffusionsprocesser Paulin, Carl January 2021 (has links) Jump-diffusion processes often model financial time series as they can simulate the random jumps that they frequently exhibit. These jumps can be seen as anomalies and are essential for financial analysis and model building, making them vital to detect.The realized variation, realized bipower variation, and realized semi-variation were tested to see if one could use them to detect jumps in a jump-diffusion process and if anomaly detection algorithms can use them as features to improve their accuracy. The algorithms tested were Isolation Forest, Robust Random Cut Forest, and Isolation Forest Algorithm for Streaming Data, where the latter two use streaming data. This was done by generating a Merton jump-diffusion process with a varying jump-rate and tested using each algorithm with each of the features. The performance of each algorithm was measured using the F1-score to compare the difference between features and algorithms. It was found that the algorithms were improved from using the features; Isolation Forest saw improvement from using one, or more, of the named features. For the streaming algorithms, Robust Random Cut Forest performed the best for every jump-rate except the lowest. Using a combination of the features gave the highest F1-score for both streaming algorithms. These results show one can use these features to extract jumps, as anomaly scores, and improve the accuracy of the algorithms, both in a batch and stream setting. / Hopp-diffusionsprocesser används regelbundet för att modellera finansiella tidsserier eftersom de kan simulera de slumpmässiga hopp som ofta uppstår. Dessa hopp kan ses som anomalier och är viktiga för finansiell analys och modellbyggnad, vilket gör dom väldigt viktiga att hitta. Den realiserade variationen, realiserade bipower variationen, och realiserade semi-variationen är faktorer av en tidsserie som kan användas för att hitta hopp i hopp-diffusionprocesser. De används här för att testa om anomali-detektionsalgoritmer kan använda funktionerna för att förbättra dess förmåga att detektera hopp. Algoritmerna som testades var Isolation Forest, Robust Random Cut Forest, och Isolation Forest Algoritmen för Strömmande data, där de två sistnämnda använder strömmande data. Detta gjordes genom att genera data från en Merton hopp-diffusionprocess med varierande hoppfrekvens där de olika algoritmerna testades med varje funktion samt med kombinationer av funktioner. Prestationen av varje algoritm beräknades med hjälp av F1-värde för att kunna jämföra algoritmerna och funktionerna med varandra. Det hittades att funktionerna kan användas för att extrahera hopp från hopp-diffusionprocesser och även använda de som en indikator för när hopp skulle ha hänt. Algoritmerna fick även ett högre F1-värde när de använde funktionerna. Isolation Forest fick ett förbättrat F1-värde genom att använda en eller fler utav funktionerna och hade ett högre F1-värde än att bara använda funktionerna för att detektera hopp. Robust Random Cut Forest hade högst F1-värde av de två algoritmer som använde strömmande data och båda fick högst F1-värde när man använde en kombination utav alla funktioner. Resultatet visar att dessa funktioner fungerar för att extrahera hopp från hopprocesser, använda dem för att detektera hopp, och att algoritmernas förmåga att detektera hoppen ökade med hjälp av funktionerna. machine learning ML random forest anomaly detection outlier analysis financial modelling merton jump-diffusion process stochastic process isolation forest IF robust random cut forest RRCF Mathematical Analysis Matematisk analys Probability Theory and Statistics Sannolikhetsteori och statistik Computer Sciences Datavetenskap (datalogi)
445	Spatial patterns of humus forms, soil organisms and soil biological activity at high mountain forest sites in the Italian Alps Hellwig, Niels 24 October 2018 (has links) The objective of the thesis is the model-based analysis of spatial patterns of decomposition properties on the forested slopes of the montane level (ca. 1200-2200 m a.s.l.) in a study area in the Italian Alps (Val di Sole / Val di Rabbi, Autonomous Province of Trento). The analysis includes humus forms and enchytraeid assemblages as well as pH values, activities of extracellular enzymes and C/N ratios of the topsoil. The first aim is to develop, test and apply data-based techniques for spatial modelling of soil ecological parameters. This methodological approach is based on the concept of digital soil mapping. The second aim is to reveal the relationships between humus forms, soil organisms and soil microbiological parameters in the study area. The third aim is to analyze if the spatial patterns of indicators of decomposition differ between the landscape scale and the slope scale. At the landscape scale, sample data from six sites are used, covering three elevation levels at both north- and south-facing slopes. A knowledge-based approach that combines a decision tree analysis with the construction of fuzzy membership functions is introduced for spatial modelling. According to the sampling design, elevation and slope exposure are the explanatory variables. The investigations at the slope scale refer to one north-facing and one south-facing slope, with 30 sites occurring on each slope. These sites have been derived using conditioned Latin Hypercube Sampling, and thus reasonably represent the environmental conditions within the study area. Predictive maps have been produced in a purely data-based approach with random forests. At both scales, the models indicate a high variability of spatial decomposition patterns depending on the elevation and the slope exposure. In general, sites at high elevation on north-facing slopes almost exclusively exhibit the humus forms Moder and Mor. Sites on south-facing slopes and at low elevation exhibit also Mull and Amphimull. The predictions of those enchytraeid species characterized as Mull and Moder indicators match the occurrence of the corresponding humus forms well. Furthermore, referencing the mineral topsoil, the predictive models show increasing pH values, an increasing leucine-aminopeptidase activity, an increasing ratio alkaline/acid phosphomonoesterase activity and a decreasing C/N ratio from north-facing to south-facing slopes and from high to low elevation. The predicted spatial patterns of indicators of decomposition are basically similar at both scales. However, the patterns are predicted in more detail at the slope scale because of a larger data basis and a higher spatial precision of the environmental covariates. These factors enable the observation of additional correlations between the spatial patterns of indicators of decomposition and environmental influences, for example slope angle and curvature. Both the corresponding results and broad model evaluations have shown that the applied methods are generally suitable for modelling spatial patterns of indicators of decomposition in a heterogeneous high mountain environment. The overall results suggest that the humus form can be used as indicator of organic matter decomposition processes in the investigated high mountain area. Soil ecological mechanisms Organic matter decomposition Digital soil mapping Spatial modeling Enchytraeids Fuzzy logic Decision tree analysis Random forest 38.60 - Bodenkunde: Allgemeines 42.91 - Terrestrische Ökologie 38.95 - Umweltgeologie, Geoökologie ddc:550
446	Forêt aléatoire pour l'apprentissage multi-vues basé sur la dissimilarité : Application à la Radiomique / Random forest for dissimilarity based multi-view learning : application to radiomics Cao, Hongliu 02 December 2019 (has links) Les travaux de cette thèse ont été initiés par des problèmes d’apprentissage de données radiomiques. La Radiomique est une discipline médicale qui vise l’analyse à grande échelle de données issues d’imageries médicales traditionnelles, pour aider au diagnostic et au traitement des cancers. L’hypothèse principale de cette discipline est qu’en extrayant une grande quantité d’informations des images, on peut caractériser de bien meilleure façon que l’œil humain les spécificités de cette pathologie. Pour y parvenir, les données radiomiques sont généralement constituées de plusieurs types d’images et/ou de plusieurs types de caractéristiques (images, cliniques, génomiques). Cette thèse aborde ce problème sous l’angle de l’apprentissage automatique et a pour objectif de proposer une solution générique, adaptée à tous problèmes d’apprentissage du même type. Nous identifions ainsi en Radiomique deux problématiques d’apprentissage: (i) l’apprentissage de données en grande dimension et avec peu d’instances (high dimension, low sample size, a.k.a.HDLSS) et (ii) l’apprentissage multi-vues. Les solutions proposées dans ce manuscrit exploitent des représentations de dissimilarités obtenues à l’aide des Forêts Aléatoires. L’utilisation d’une représentation par dissimilarité permet de contourner les difficultés inhérentes à l’apprentissage en grande dimension et facilite l’analyse conjointe des descriptions multiples (les vues). Les contributions de cette thèse portent sur l’utilisation de la mesure de dissimilarité embarquée dans les méthodes de Forêts Aléatoires pour l’apprentissage multi-vue de données HDLSS. En particulier, nous présentons trois résultats: (i) la démonstration et l’analyse de l’efficacité de cette mesure pour l’apprentissage multi-vue de données HDLSS; (ii) une nouvelle méthode pour mesurer les dissimilarités à partir de Forêts Aléatoires, plus adaptée à ce type de problème d’apprentissage; et (iii) une nouvelle façon d’exploiter l’hétérogénèité des vues, à l’aide d’un mécanisme de combinaison dynamique. Ces résultats ont été obtenus sur des données radiomiques mais aussi sur des problèmes multi-vue classiques. / The work of this thesis was initiated by a Radiomic learning problem. Radiomics is a medical discipline that aims at the large-scale analysis of data from traditional medical imaging to assist in the diagnosis and treatment of cancer. The main hypothesis of this discipline is that by extracting a large amount of information from the images, we can characterize the specificities of this pathology in a much better way than the human eye. To achieve this, Radiomics data are generally based on several types of images and/or several types of features (from images, clinical, genomic). This thesis approaches this problem from the perspective of Machine Learning (ML) and aims to propose a generic solution, adapted to any similar learning problem. To do this, we identify two types of ML problems behind Radiomics: (i) learning from high dimension, low sample size (HDLSS) and (ii) multiview learning. The solutions proposed in this manuscript exploit dissimilarity representations obtained using the Random Forest method. The use of dissimilarity representations makes it possible to overcome the well-known difficulties of learning high dimensional data, and to facilitate the joint analysis of the multiple descriptions, i.e. the views.The contributions of this thesis focus on the use of the dissimilarity easurement embedded in the Random Forest method for HDLSS multi-view learning. In particular, we present three main results: (i) the demonstration and analysis of the effectiveness of this measure for HDLSS multi-view learning; (ii) a new method for measuring dissimilarities from Random Forests, better adapted to this type of learning problem; and (iii) a new way to exploit the heterogeneity of views, using a dynamic combination mechanism. These results have been obtained on radiomic data but also on classical multi-view learning problems. Espace de dissimilarité Forêt aléatoire Apprentissage multi-vue Dimension élevée Taille réduite de l'échantillon Apprentissage de dissimilarité Sélection dynamique Dissimilarity space Random forest Multi-view learning High dimension Low sample size Dissimilarity learning Dynamic selection 006.3
447	Introduction à l’apprentissage automatique en pharmacométrie : concepts et applications Leboeuf, Paul-Antoine 05 1900 (has links) L’apprentissage automatique propose des outils pour faire face aux problématiques d’aujourd’hui et de demain. Les récentes percées en sciences computationnelles et l’émergence du phénomène des mégadonnées ont permis à l’apprentissage automatique d’être mis à l’avant plan tant dans le monde académique que dans la société. Les récentes réalisations de l’apprentissage automatique dans le domaine du langage naturel, de la vision et en médecine parlent d’eux-mêmes. La liste des sciences et domaines qui bénéficient des techniques de l’apprentissage automatique est longue. Cependant, les tentatives de coopération avec la pharmacométrie et les sciences connexes sont timides et peu nombreuses. L’objectif de ce projet de maitrise est d’explorer le potentiel de l’apprentissage automatique en sciences pharmaceutiques. Cela a été réalisé par l’application de techniques et des méthodes d’apprentissage automatique à des situations de pharmacologie clinique et de pharmacométrie. Le projet a été divisé en trois parties. La première partie propose un algorithme pour renforcer la fiabilité de l’étape de présélection des covariables d’un modèle de pharmacocinétique de population. Une forêt aléatoire et l’XGBoost ont été utilisés pour soutenir la présélection des covariables. Les indicateurs d’importance relative des variables pour la forêt aléatoire et pour l’XGBoost ont bien identifié l’importance de toutes les covariables qui avaient un effet sur les différents paramètres du modèle PK de référence. La seconde partie confirme qu’il est possible d’estimer des concentrations plasmatiques avec des méthodes différentes de celles actuellement utilisés en pharmacocinétique. Les mêmes algorithmes ont été sélectionnés et leur ajustement pour la tâche était appréciable. La troisième partie confirme la possibilité de faire usage des méthodes d'apprentissage automatique pour la prédiction de relations complexes et typiques à la pharmacologie clinique. Encore une fois, la forêt aléatoire et l’XGBoost ont donné lieu à un ajustement appréciable. / Machine learning offers tools to deal with current problematics. Recent breakthroughs in computational sciences and the emergence of the big data phenomenon have brought machine learning to the forefront in both academia and society. The recent achievements of machine learning in natural language, computational vision and medicine speak for themselves. The list of sciences and fields that benefit from machine learning techniques is long. However, attempts to cooperate with pharmacometrics and related sciences are timid and limited. The aim of this Master thesis is to explore the potential of machine learning in pharmaceutical sciences. This has been done through the application of machine learning techniques and methods to situations of clinical pharmacology and pharmacometrics. The project was divided into three parts. The first part proposes an algorithm to enhance the reliability of the covariate pre-selection step of a population pharmacokinetic model. Random forest and XGBoost were used to support the screening of covariates. The indicators of the relative importance of the variables for the random forest and for XGBoost recognized the importance of all the covariates that influenced the various parameters of the PK model of reference. The second part exemplifies the estimation of plasma concentrations using machine learning methods. The same algorithms were selected and their fit for the task was appreciable. The third part confirms the possibility to apply machine learning methods in the prediction of complex relationships, as some typical clinical pharmacology relationships. Again, random forest and XGBoost got a nice adjustment. Apprentissage automatique Méthodes ensemblistes Pharmacométrie Sciences pharmaceutiques Forêts aléatoires eXtreme Gradient Boosting Machine learning Ensemble methods Pharmacometrics Pharmaceutical sciences Random forest
448	Přehrávač hudby pro Android s výběrem skladem dle kontextu zařízení / Android Music Player with the Song Selection by a Device Context Chmelařová, Gabriela January 2021 (has links) Tato práce pojednává o vytvoření mobilní aplikace zvažující kontext zařízení, která vybírá a doporučuje hudební skladby dle aktuálního stavu kontextu zařízení. Kontext je získáván na základě naměřených hodnot, které jsou získány z vestavěných senzorů mobilního zařízení a z ostatních systémových hodnot zařízení. Výběr konkrétní skladby je poté založen na výstupu modelu strojového učení, který klasifikuje kontext na základě aktuálních získaných dat a následně zvolí skladbu připadající k danému kontextu.
449	Development of Adaptive Computational Algorithms for Manned and Unmanned Flight Safety Elkin, Colin P. January 2018 (has links) No description available. Computer Science Computer Engineering flight safety machine learning data fusion human effectiveness unmanned aerial vehicles sliding-scale autonomy cognitive workload artificial neural networks support vector machines decision trees random forest Dempster-Shafer Evidence Theory
450	Effect of Supply Chain Uncertainties on Inventory and Fulfillment Decision Making: An Empirical Investigation Paul, Somak 02 October 2019 (has links) No description available. Business Administration

Search results