• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 329
  • 25
  • 21
  • 13
  • 8
  • 5
  • 5
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 494
  • 494
  • 261
  • 258
  • 142
  • 129
  • 124
  • 124
  • 111
  • 85
  • 84
  • 74
  • 72
  • 72
  • 58
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Result Prediction by Mining Replays in Dota 2

Johansson, Filip, Wikström, Jesper January 2015 (has links)
Context: Real-time games like Dota 2 lack the extensive mathematical modeling of turn-based games that can be used to make objective statements about how to best play them. Understanding a real-time computer game through the same kind of modeling as a turn-based game is practically impossible. Objectives: In this thesis an attempt was made to create a model using machine learning that can predict the winning team of a Dota 2 game given partial data collected as the game progressed. A couple of different classifiers were tested, out of these Random Forest was chosen to be studied more in depth. Methods: A method was devised for retrieving Dota 2 replays and parsing them into a format that can be used to train classifier models. An experiment was conducted comparing the accuracy of several machine learning algorithms with the Random Forest algorithm on predicting the outcome of Dota 2 games. A further experiment comparing the average accuracy of 25 Random Forest models using different settings for the number of trees and attributes was conducted. Results: Random Forest had the highest accuracy of the different algorithms with the best parameter setting having an average of 88.83% accuracy, with a 82.23% accuracy at the five minute point. Conclusions: Given the results, it was concluded that partial game-state data can be used to accurately predict the results of an ongoing game of Dota 2 in real-time with the application of machine learning techniques.
112

A knowledge based approach of toxicity prediction for drug formulation : modelling drug vehicle relationships using soft computing techniques

Mistry, Pritesh January 2015 (has links)
This multidisciplinary thesis is concerned with the prediction of drug formulations for the reduction of drug toxicity. Both scientific and computational approaches are utilised to make original contributions to the field of predictive toxicology. The first part of this thesis provides a detailed scientific discussion on all aspects of drug formulation and toxicity. Discussions are focused around the principal mechanisms of drug toxicity and how drug toxicity is studied and reported in the literature. Furthermore, a review of the current technologies available for formulating drugs for toxicity reduction is provided. Examples of studies reported in the literature that have used these technologies to reduce drug toxicity are also reported. The thesis also provides an overview of the computational approaches currently employed in the field of in silico predictive toxicology. This overview focuses on the machine learning approaches used to build predictive QSAR classification models, with examples discovered from the literature provided. Two methodologies have been developed as part of the main work of this thesis. The first is focused on use of directed bipartite graphs and Venn diagrams for the visualisation and extraction of drug-vehicle relationships from large un-curated datasets which show changes in the patterns of toxicity. These relationships can be rapidly extracted and visualised using the methodology proposed in chapter 4. The second methodology proposed, involves mining large datasets for the extraction of drug-vehicle toxicity data. The methodology uses an area-under-the-curve principle to make pairwise comparisons of vehicles which are classified according to the toxicity protection they offer, from which predictive classification models based on random forests and decisions trees are built. The results of this methodology are reported in chapter 6.
113

Evaluering och optimering av automatisk beståndsindelning

Brehmer, Dan January 2016 (has links)
Beståndsindelning av skog är till stor den en manuell process som kräver mycket tid. De senaste 20 åren har tekniker som Airborne Laser Scanning (ALS) bidragit till en effektivisering av processen genom att generera laserdata som möjliggör skapandet av lättolkade bilder av skogsområden. Ur laser- och bilddata kan skogliga attribut så som trädhöjd, trädtäthet och markhöjd extraheras. Studiens syfte var att utvärdera vilka attribut som var mest relevanta för att särskilja skogsbestånd i ett system som delade in skog i bestånd automatiskt. Vid analys av attributens relevans användes klassificeringsmodeller. Fackmän intervjuades och litteratur studerades. Under studien modifierades systemets algoritmer med ambitionen att höja dess resultat till en tillfredsställande nivå. Studien visade att attribut som är kopplade till skogssköstel har störst relevans vid automatisk beståndsindelning. Trots modifieringar och använding av relevanta attribut lyckades studien inte påvisa att systemet kunde fungera som en egen lösning för beståndsindelning av skog. Däremot var den resulterande beståndsindelningen lämplig att använda som ett komplement vid manuell beståndsindelning.
114

Modelling of patterns between operational data, diagnostic trouble codes and workshop history using big data and machine learning

Virkkala, Linda, Haglund, Johanna January 2016 (has links)
The work presented in this thesis is part of a large research and development project on condition-based maintenance for heavy trucks and buses at Scania. The aim of this thesis was to be able to predict the status of a component (the starter motor) using data mining methods and to create models that can predict the failure of that component. Based on workshop history data, error codes and operational data, three sets of classification models were built and evaluated. The first model aims to find patterns in a set of error codes, to see which codes are related to a starter motor failure. The second model aims to see if there are patterns in operational data that lead to the occurrence of an error code. Finally, the two data sets were merged and a classifier was trained and evaluated on this larger data set. Two machine learning algorithms were used and compared throughout the model building: AdaBoost and random forest. There is no statistically significant difference in their performance, and both algorithms had an error rate around ~13%, ~5% and ~13% for the three classification models respectively. However, random forest is much faster, and is therefore the preferable option for an industrial implementation. Variable analysis was conducted for the error codes and operational data, resulting in rankings of informative variables. From the evaluation metric precision, it can be derived that if our random forest model predicts a starter motor failure, there is a 85.7% chance that it actually has failed. This model finds 32% (the models recall) of the failed starter motors. It is also shown that four error codes; 2481, 2639, 2657 and 2597 have the highest predictive power for starter motor failure classification. For the operational data, variables that concern the starter motor lifetime and battery health are generally ranked as important by the models. The random forest model finds 81.9% of the cases where the 2481 error code occurs. If the random forest model predicts that the error code 2481 will occur, there is a 88.2% chance that it will. The classification performance was not increased when the two data sets were merged, indicating that the patterns detected by the two first classification models do not add value toone another.
115

Time Series Online Empirical Bayesian Kernel Density Segmentation: Applications in Real Time Activity Recognition Using Smartphone Accelerometer

Na, Shuang 28 June 2017 (has links)
Time series analysis has been explored by the researchers in many areas such, as statistical research, engineering applications, medical analysis, and finance study. To represent the data more efficiently, the mining process is supported by time series segmentation. Time series segmentation algorithm looks for the change points between two different patterns and develops a suitable model, depending on the data observed in such segment. Based on the issue of limited computing and storage capability, it is necessary to consider an adaptive and incremental online segmentation method. In this study, we propose an Online Empirical Bayesian Kernel Segmentation (OBKS), which combines Online Multivariate Kernel Density Estimation (OMKDE) and Online Empirical Bayesian Segmentation (OBS) algorithm. This innovative method considers Online Multivariate Kernel density as a predictive distribution derived by Online Empirical Bayesian segmentation instead of using posterior predictive distribution as a predictive distribution. The benefit of Online Multivariate Kernel Density Estimation is that it does not require the assumption of a pre-defined prior function, which makes the OMKDE more adaptive and adjustable than the posterior predictive distribution. Human Activity Recognition (HAR) by smartphones with embedded sensors is a modern time series application applied in many areas, such as therapeutic applications and sensors of cars. The important procedures related to the HAR problem include classification, clustering, feature extraction, dimension reduction, and segmentation. Segmentation as the first step of HAR analysis attempts to represent the time interval more effectively and efficiently. The traditional segmentation method of HAR is to partition the time series into short and fixed length segments. However, these segments might not be long enough to capture the sufficient information for the entire activity time interval. In this research, we segment the observations of a whole activity as a whole interval using the Online Empirical Bayesian Kernel Segmentation algorithm as the first step. The smartphone with built-in accelerometer generates observations of these activities. Based on the segmenting result, we introduce a two-layer random forest classification method. The first layer is used to identify the main group; the second layer is designed to analyze the subgroup from each core group. We evaluate the performance of our method based on six activities: sitting, standing, lying, walking, walking\_upstairs, and walking\_downstairs on 30 volunteers. If we want to create a machine that can detect walking\_upstairs and walking\_downstairs automatically, it requires more information and more detail that can generate more complicated features, since these two activities are very similar. Continuously, considering the real-time Activity Recognition application on the smartphones by the embedded accelerometers, the first layer classifies the activities as static and dynamic activities, the second layer classifies each main group into the sub-classes, depending on the first layer result. For the data collected, we get an overall accuracy of 91.4\% based on the six activities and an overall accuracy of 100\% based only on the dynamic activity (walking, walking\_upstairs, walking\_downstairs) and the static activity (sitting, standing, lying).
116

Lending Sociodynamics and Drivers of the Financial Business Cycle

J. Hawkins, Raymond, Kuang, Hengyu January 2017 (has links)
We extend sociodynamic modeling of the financial business cycle to the Euro Area and Japan. Using an opinion-formation model and machine learning techniques we find stable model estimation of the financial business cycle using central bank lending surveys and a few selected macroeconomic variables. We find that banks have asymmetric response to good and bad economic information, and that banks adapt to their peers' opinions when changing lending policies.
117

Assessing biofilm development in drinking water distribution systems by Machine Learning methods

Ramos Martínez, Eva 02 May 2016 (has links)
[EN] One of the main challenges of drinking water utilities is to ensure high quality supply, in particular, in chemical and microbiological terms. However, biofilms invariably develop in all drinking water distribution systems (DWDSs), despite the presence of residual disinfectant. As a result, water utilities are not able to ensure total bacteriological control. Currently biofilms represent a real paradigm in water quality management for all DWDSs. Biofilms are complex communities of microorganisms bound by an extracellular polymer that provides them with structure, protection from toxics and helps retain food. Besides the health risk that biofilms involve, due to their role as a pathogen shelter, a number of additional problems associated with biofilm development in DWDSs can be identified. Among others, aesthetic deterioration of water, biocorrosion and disinfectant decay are universally recognized. A large amount of research has been conducted on this field since the earliest 80's. However, due to the complex environment and the community studied most of the studies have been developed under certain simplifications. We resort to this already done work and acquired knowledge on biofilm growth in DWDSs to change the common approaches of these studies. Our proposal is based on arduous preprocessing and posterior analysis by Machine Learning approaches. A multi-disciplinary procedure is undertaken, helping as a practical approach to develop a decision-making tool to help DWDS management to maintain, as much as possible, biofilm at the lowest level, and mitigating its negative effects on the service. A methodology to detect the more susceptible areas to biofilm development in DWDSs is proposed. Knowing the location of these hot-spots of the network, mitigation actions could be focused more specifically, thus saving resources and money. Also, prevention programs could be developed, acting before the consequences of biofilm are noticed by the consumers. In this way, the economic cost would be reduced and the service quality would improve, eventually increasing consumers' satisfaction. / [ES] Uno de los principales objetivos de las empresas encargadas de la gestión de los sistemas de distribución de agua potable (DWDSs, del inglés Drinking Water Distribution Systems) es asegurar una alta calidad del agua en su abastecimiento, tanto química como microbiológica. Sin embargo, la existencia de biofilms en todos ellos, a pesar de la presencia de desinfectante residual, hace que no se pueda asegurar un control bacteriológico total, por lo que, hoy en día, los biofilms representan un paradigma en la gestión de la calidad del agua en los DWDSs. Los biofilms son comunidades complejas de microorganismos recubiertas de un polímero extracelular que les da estructura y les ayuda a retener el alimento y a protegerse de agentes tóxicos. Además del riesgo sanitario que suponen por su papel como refugio de patógenos, existen muchos otros problemas asociados al desarrollo de biofilms en los DWDSs, como deterioro estético del agua, biocorrosión y consumo de desinfectante, entre otros. Una gran cantidad de investigaciones se han realizado en este campo desde los primeros años 80. Sin embargo, debido a la complejidad del entorno y la comunidad estudiada la mayoría de estos estudios se han llevado a cabo bajo ciertas simplificaciones. En nuestro caso, recurrimos a estos trabajos ya realizados y al conocimiento adquirido sobre el desarrollo del biofilm en los DWDSs para cambiar el enfoque en el que normalmente se enmarcan estos estudios. Nuestra propuesta se basa en un intenso pre-proceso y posterior análisis con técnicas de aprendizaje automático. Se implementa un proceso multidisciplinar que ayuda a la realización de un enfoque práctico para el desarrollo de una herramienta de ayuda a la toma de decisiones que ayude a la gestión de los DWDSs, manteniendo, en lo posible, el biofilm en los niveles más bajos, y mitigando sus efectos negativos sobre el servicio de agua. Se propone una metodología para detectar las áreas más susceptibles al desarrollo del biofilm en los DWDSs. Conocer la ubicación de estos puntos calientes de biofilm en la red permitiría llevar a cabo acciones de mitigación de manera localizada, ahorrando recursos y dinero, y asimismo, podrían desarrollarse programas de prevención, actuando antes de que las consecuencias derivadas del desarrollo de biofilm sean percibidas por los consumidores. De esta manera, el coste económico se vería reducido y la calidad del servicio mejoraría, aumentando, finalmente, la satisfacción de los usuarios. / [CAT] Un dels principals reptes dels serveis d'aigua potable és garantir el subministrament d'alta qualitat, en particular, en termes químics i microbiològics. No obstant això, els biofilms desenvolupen invariablement en tots els sistemes de distribució d'aigua potable (DWDSs, de l'anglès, Drinking Water Distribution Systems), tot i la presència de desinfectant residual. Com a resultat, les empreses d'aigua no són capaces de garantir un control bacteriològic total. Actualment el biofilms representen un veritable paradigma en la gestió de la qualitat de l'aigua per a tots les DWDSs. Els biofilms són comunitats complexes de microorganismes vinculats per un polímer extracel·lular que els proporciona estructura, protecció contra els tòxics i ajuda a retenir els aliments. A més del risc de salut que impliquen els biofilms, com a causa del seu paper com a refugi de patògens, una sèrie de problemes addicionals associats amb el desenvolupament del biofilm en els DWDSs pot ser identificat. Entre altres, deteriorament estètic d'aigua, biocorrosión i decadència de desinfectant són universalment reconeguts. Una gran quantitat d'investigació s'ha realitzat en aquest camp des dels primers anys de la dècada del 80. No obstant això, a causa de la complexitat de l'entorn i la comunitat estudiada, la major part dels estudis s'han desenvolupat sota certes simplificacions. Recorrem a aquest treball ja realitzat i a aquest coneixement adquirit en el creixement de biofilms en els DWDSs per canviar el punt de vista clàssic del biofilm en estudis en els DWDSs. La nostra proposta es basa en l'ardu processament previ i posterior anàlisi mitjançant enfocaments d'aprenentatge automàtic. Es va dur a terme un procediment multidisciplinari, ajudant com un enfocament pràctic per desenvolupar una eina de presa de decisions per ajudar a la gestió dels DWDS a mantenir, en la mesura possible, els biofilm en els nivells més baixos, i la mitigació dels seus efectes negatius sobre el servei. Es proposa una metodologia per detectar les àrees més susceptibles al desenvolupament de biofilms en els DWDSs. En conèixer la ubicació d'aquests punts calents de la xarxa, les accions de mitigació podrien centrar-se més específicament, estalviant recursos i diners. A més, els programes de prevenció es podrien desenvolupar, actuant abans que les conseqüències del biofilm es noten pels consumidors. D'aquesta manera, el cost econòmic seria reduït i la qualitat del servei podria millorar, finalment augmentant la satisfacció dels consumidors. / Ramos Martínez, E. (2016). Assessing biofilm development in drinking water distribution systems by Machine Learning methods [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/63257 / TESIS
118

A pipeline for the identification and examination of proteins implicated in frontotemporal dementia

Waury, Katharina January 2020 (has links)
Frontotemporal dementia is a neurodegenerative disorder with high heterogeneity on the genetic, pathological and clinical level. The familial form of the disease is mainly caused by pathogenic variants of three genes: C9orf72, MAPT and GRN. As there is no clear correlation between the mutation and the clinical phenotype, symptom severity or age of onset, the demand for predictive biomarkers is high. While there is no fluid biomarker for frontotemporal dementia in use yet, there is strong hope that changes of protein concentrations in the blood or cerebrospinal fluid can aid prognostics many years before symptoms develop. Increasing amounts of data are becoming available because of long-term studies of families affected by familial frontotemporal dementia, but its analysis is time-consuming and work intensive. In the scope of this project a pipeline was built for the automated analysis of proteomics data. Specifically, it aims to identify proteins useful for differentiation between two groups by using random forest, a supervised machine learning method. The analysis results of the pipeline for a data set containing blood plasma protein concentration of healthy controls and participants affected by frontotemporal dementia were promising and the generalized functioning of the pipeline was proven with an independent breast cancer proteomics data set.
119

Time prediction and process discovery of administration process

Öberg, Johanna January 2020 (has links)
Machine learning and process mining are two techniques that are becoming more and more popular among organisations for business intelligence purposes. Results from these techniques can be very useful for organisations' decision-making. The Swedish National Forensic Centre (NFC), an organisation that performs forensic analyses, is in need of a way to visualise and understand its administration process. In addition, the organisation would like to be able to predict the time analyses will take to perform. In this project, it was evaluated if machine learning and process mining could be used on NFC's administration process-related data to satisfy the organisation's needs. Using the process mining tool Mehrwerk Process Mining implemented in the software Qlik Sense, different process variants were discovered from the data and visualised in a comprehensible way. The process variants were easy to interpret and useful for NFC. Machine learning regression models were trained on the data to predict analysis length. Two different datasets were tried, a large dataset with few features and a smaller dataset with more features. The models were then evaluated on test datasets. The models did not predict the length of analyses in an acceptable way. A reason to this could be that the information in the data was not sufficient for this prediction.
120

Identificación de clientes con patrones de consumo eléctrico fraudulento

Pereira Bizama, Nicole January 2014 (has links)
Memoria para optar al título de Ingeniera Civil Industrial / La industria de distribución eléctrica en Chile sufre anualmente pérdidas, solo en el año 2012 la empresa en estudio registró pérdidas por más de 6 mil millones de pesos ya sea por robo o fallas en los equipos de medición, por lo cual existe un gran interés de parte de estas en buscar soluciones para mitigar esta problemática. El presente trabajo tiene como objetivo la creación de modelos de minería de datos que logren identificar aquellos consumidores que poseen una alta propensión al hurto de electricidad. Para esto, se utilizó la información histórica disponible de los clientes desde enero de 2012 a marzo de 2014, tales como consumo mensual, inspecciones previas, cortes de suministro, entre otras fuentes. La información fue separada en dos bases de datos de acuerdo a si un cliente posee o no algún registro de inspección durante el periodo de estudio. Esta división se debe a que un cliente inspeccionado ya posee un filtro previo de inspección y a diferencia de un cliente no inspeccionado, se tiene la certeza de si ha cometido fraude o no. Con la data de clientes inspeccionados, se construyeron tres modelos de clasificación: regresión logística, árbol de decisión y random forest. Además, debido a que se tiene una data desbalanceada con un 2.2% de casos fraude, se realizó de forma paralela un modelo de regresión logística ponderado que obtuvo resultados similares al modelo sin ponderar concluyendo que el desbalanceo de clases no afecta el problema. Utilizando como métrica de evaluación una curva de ganancia, el modelo de random forest obtuvo los mejores resultados capturando un 39% del fraude en el primer decil de clientes versus un 35% alcanzado por el modelo de regresión. En cuanto al tiempo de ejecución, el modelo random forest tardo más de un día en su construcción mientras que el modelo de regresión y árbol de decisión tardaron entre 2 y 3 minutos. Debido a la simpleza en la interpretación de sus resultados y a su breve tiempo de ejecución se escoge el modelo de regresión logística (sin ponderar) para generar la probabilidad de fraude de cada cliente, el cual al ser aplicado a la data de clientes no inspeccionados logra una tasa esperada de fraude de un 8.6%, cifra que supera al 2.2% capturado en la realidad y que además se traduciría en una recuperación promedio mensual de más de $MM 7 si se realizasen la cantidad de inspecciones sugeridas. De forma complementaria, con la data de clientes no inspeccionados, se construyó un modelo de clustering cuyo objetivo es agrupar clientes con similares características e identificar casos anómalos o más alejados de su grupo. Para establecer un punto de comparación entre los resultados obtenidos, se aplica el modelo de regresión al listado de casos anómalos, obteniendo una tasa esperada de fraude de un 3.1%. Finalmente, como lineamiento futuro se espera la incorporación de otras fuentes de información que se cree serán de gran aporte en la detección de fraude energético, tales como información demográfica más detallada de los clientes y un análisis económico más preciso que permita mejores estimaciones de los beneficios a obtener.

Page generated in 0.1371 seconds