Global ETD Search

1	A statistical investigation of the risk factors for tuberculosis van Woerden, Irene January 2013 (has links) Tuberculosis (TB) is called a disease of poverty and is the main cause of death from infectious diseases among adults. In 1993 the World Health Organisation (WHO) declared TB to be a global emergency; however there were still approximately 1.4 million deaths due to TB in 2011. This thesis contains a detailed study of the existing literature regarding the global risk factors of TB. The risk factors identified from the literature review search which were also available from the NFHS-3 survey were then analysed to determine how well we could identify respondents who are at high risk of TB. We looked at the stigma and misconceptions people have regarding TB and include detailed reports from the existing literature of how a persons wealth, health, education, nutrition, and HIV status affect how likely the person is to have TB. The difference in the risk factor distribution for the TB and non-TB populations were examined and classification trees, nearest neighbours, and logistic regression models were trialled to determine if it was possible for respondents who were at high risk of TB to be identified. Finally gender-specific statistically likely directed acyclic graphs were created to visualise the most likely associations between the variables. TB Tuberculosis NFHS-3 classification trees nearest neighbours logistic regression Directed Acyclic Graphs
2	Automating the Characterization and Detection of Software Performance Antipatterns Using a Data-Driven Approach Chalawadi, Ram Kishan January 2021 (has links) Background: With the increase in automating the performance testing strategies, many efforts have been made to detect the Software Performance Antipatterns (SPAs). These performance antipatterns have become a major threat to software platforms at the enterprise level, and detecting these anomalies is essential in any company dealing with performance-sensitive software as these processes should be performed quite often. Due to the complexity of the process, the manual identification of performance issues has become challenging and time-consuming. Objectives: The thesis aims to address and solve the issues mentioned above by developing a tool that automatically Characterizes and Detects Software Performance Antipatterns. The goal is to automate the parameterization process of the existing approach that helps characterize SPAs and improve the interpretation of detection of SPAs. These two processes are integrated into the tool designed to be deployed in the CI/CD pipeline. The developed tool is named Chanterelle. Methods: A case study and a survey has been used in this research. A case study has been conducted at Ericsson. A similar process as in the existing approach has been automated using python. A literature review is conducted to identify an appropriate approach to improve the interpretation of the detection of SPAs. A static user validation has been conducted with the help of a survey consisting of Chanterelle feasibility and usability questions. The responses are provided by Ericsson staff (developers and tester in the field of Software performance) after the tool is presented. Results: The results indicate that the automated parameterization and detection process proposed in this thesis have a considerable execution time compared to the existing approaches and helps the developers interpret the detection results easily. Moreover, it does not include domain experts t run the tests. The results of the static user validation show that Chanterelle is feasible and usable as a tool to be used by the developers. Conclusions: The validation of the tool suggests that Chanterelle helps the developers to interpret the performance-related bugs easily. It performs the automated parameterization and detection process in a considerable time when compared with the existing approaches. Antipatterns Machine learning K Nearest Neighbours Tool Automation Computer Sciences Datavetenskap (datalogi)
3	Classification of weather conditions based on supervised learning Safia, Mohamad, Abbas, Rodi January 2023 (has links) Forecasting the weather remains a challenging task because of the atmosphere's complexity and unpredictable nature. A few of the factors that decide weather conditions, such as rain, clouds, clear skies, and sunshine, include temperature, pressure, humidity, wind speed, and direction. Currently, sophisticated, and physical models are used to forecast weather, but they have several limitations, particularly in terms of computational time. In the past few years, supervised machine learning algorithms have shown great promise for the precise forecasting of meteorological events. Using historical weather data, these strategies train a model to predict the weather in the future. This study employs supervised machine learning techniques, including k-nearest neighbors (KNNs), support vector machines (SVMs), random forests (RFs), and artificial neural networks (ANNs), for better weather forecast accuracy. To conduct this study, we employed historical weather data from the Weatherstack API. The data spans several years and contains information on several meteorological variables, including temperature, pressure, humidity, wind speed, and direction. The data is processed beforehand which includes normalizing it and dividing it into separate training and testing sets. Finally, the effectiveness of different models is examined to determine which is best for producing accurate weather forecasts. The results of this study provide information on the application of supervised machine learning methods for weather forecasting and support the creation of better weather prediction models. / Att förutsäga vädret är fortfarande en utmanande uppgift på grund av atmosfärens komplexitet och oförutsägbara natur. Några av faktorerna som påverkar väderförhållandena, som regn, moln, klart väder och solsken, inkluderar temperatur, tryck, luftfuktighet, vindhastighet och riktning. För närvarande används sofistikerade fysiska modeller för att förutsäga vädret, men de har flera begränsningar, särskilt när det gäller beräkningstid. Under de senaste åren har övervakade maskininlärningsalgoritmer visat stor potential för att noggrant förutsäga meteorologiska händelser. Genom att använda historiska väderdata tränar dessa strategier en modell för att förutsäga framtida väder. Denna studie använder övervakade maskininlärningstekniker, inklusive k-nearest neighbors (KNNs), support vector machines (SVMs), random forests (RFs) och artificial neural networks (ANNs), för att förbättra noggrannheten i väderprognoser. För att genomföra denna studie använde vi historiska väderdata från Weatherstack API. Data sträcker sig över flera år och innehåller information om flera meteorologiska variabler, inklusive temperatur, tryck, luftfuktighet, vindhastighet och riktning. Data bearbetas i förväg, vilket inkluderar normalisering och uppdelning i separata tränings- och testset. Slutligen undersöks effektiviteten hos olika modeller för att avgöra vilken som är bäst för att producera noggranna väderprognoser. Resultaten av denna studie ger information om tillämpningen av övervakade maskininlärningsmetoder för väderprognoser och stödjer skapandet av bättre väderprognosmodeller. Machine learning Neural networks Support vector machines K-nearest neighbours Random forest Weather prediction Computer Sciences Datavetenskap (datalogi)
4	Identifying the beginning of a kayak race using velocity signal data Kvedaraite, Indre January 2023 (has links) A kayak is a small watercraft that moves over the water. The kayak is propelled by a person sitting inside of the hull and paddling using a double-bladed paddle. While kayaking can be casual, it is used as a competitive sport in races and even the Olympic games. Therefore, it is important to be able to analyse athletes’ performance during the race. To study the race better, some kayaking teams and organizations have attached sensors to their kayaks. These sensors record various data, which is later used to generate performance reports. However, to generate such reports, the coach must manually pinpoint the beginning of the race because the sensors collect data before the actual race begins, which may include practice runs, warming-up sessions, or just standing and waiting position. The identification of the race start and the race sequence in the data is tedious and time-consuming work and could be automated. This project proposes an approach to identify kayak races from velocity signal data with the help of a machine learning algorithm. The proposed approach is a combination of several techniques: signal preprocessing, a machine learning algorithm, and a programmatic approach. Three machine learning algorithms were evaluated to detect the race sequence, which are Support Vector Machine (SVM), k-Nearest Neighbour (kNN), and Random Forest (RF). SVM outperformed other algorithms with an accuracy of 95%. Programmatic approach was proposed to identify the start time of the race. The average error of the proposed approach is 0.24 seconds. The proposed approach was utilized in the implemented web-based application with a user interface for coaches to automatically detect the beginning of a kayak race and race signal sequence. kayak race velocity signal machine learning support vector machine k nearest neighbours random forest Computer Sciences Datavetenskap (datalogi)
5	Household’s energy consumption and productionforecasting: A Multi-step ahead forecast strategiescomparison. Martín-Roldán Villanueva, Gonzalo January 2017 (has links) In a changing global energy market where the decarbonization of the economy and the demand growth are pushing to look for new models away from the existing centralized non-renewable based grid. To do so, households have to take a ‘prosumer’ role; to help them take optimal actions is needed a multi-step ahead forecast of their expected energy production and consumption. In multi-step ahead forecasting there are different strategies to perform the forecast. The single-output: Recursive, Direct, DirRec, and the multi-output: MIMO and DIRMO. This thesis performs a comparison between the performance of the differents strategies in a ‘prosumer’ household; using Artificial Neural Networks, Random Forest and K-Nearest Neighbours Regression to forecast both solar energy production and grid input. The results of this thesis indicates that the methodology proposed performs better than state of the art models in a more detailed household energy consumption dataset. They also indicate that the strategy and model of choice is problem dependent and a strategy selection step should be added to the forecasting methodology. Additionally, the performance of the Recursive strategy is always far from the best while the DIRMO strategy performs similarly. This makes the latter a suitable option for exploratory analysis. Multi-step forecast strategies Recursive Direct DirRec DIRMO MIMO Artificial Neural Networks Random Forest K-Nearest Neighbours Regression MAPE MAE Social Sciences Interdisciplinary
6	Contribuição ao estudo dos níveis de energia em sistemas contendo íons Ln3+ Oliveira, Yuri Álisson Rodrigues de 28 July 2016 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / A new approach to describe the crystal field interaction in compounds that contains trivalent lanthanide ions is presented. It is considered the electrostatic balance of the optically active site, the effective charge of the central ion and the sign of crystal field parameters (CCP) as determinant factors in the crystal field interaction. The method of the first equivalent neighbours (MENN) was reformulated, and improvements in predicting the CCP and energy levels 7FJ of Eu3+ could be realized. Moreover, it was possible to predict the lanthanide-nearest neighbour interaction of load factors (Ln-NN), the maximum coverage of the wave functions of the interacting ions and energy levels levels structure of the 7FJ of Eu3+. The physically acceptable limits are designed to load factors and the maximum overlap of the wave functions of the interacting ions. The compounds studied have high symmetry and exhibit a first neighbourhood consisting of oxygen ions, fluorine or chlorine. The use of secular determinants solutions of the energy matrices served as an excellent theoretical framework for the development of the method, which is applied based on the simple overlap model (SOM). This allowed the description of the interaction of the crystalline field by a nonparametric method and simple application. In addition, it was predicted the wave functions overlap factor of the interacting ions, the total effective bonding charge of Eu3+, and the relationships and trends of the crystal field interaction with the chemical species of the NN and the type of Ln that makes up the main host matrix. Finally, it was possible to remodulate the MENN at a more theoretical method by using theoretical data. The results served to confirm the efficiency and accuracy of MENN in describing the interaction of the crystal field on systems containing trivalent lanthanide ions. / Uma nova abordagem na descrição da interação do campo cristalino em compostos contendo íons lantanídeos trivalentes é apresentada. São considerados o equilíbrio eletrostático do sítio opticamente ativo, a carga efetiva do íon central e o sinal dos parâmetros de campo cristalino (PCC) como fatores determinantes na descrição da interação de campo cristalino. O método dos primeiros vizinhos equivalentes (MENN) foi reformulado, e assim melhorias na previsão dos PCC e dos níveis de energia 7FJ do Eu3+ puderam ser realizadas. Além disso, foi possível prever fatores de carga de interação lantanídeo-primeiro vizinho (Ln-PV), o recobrimento máximo das funções de onda dos íons interagentes e estrutura dos níveis níveis de energia 7FJ do Eu3+. Foram estabelecidos limites fisicamente aceitáveis para os fatores de carga e para o recobrimento máximo das funções de onda dos íons interagentes. Os compostos estudados possuem alta simetria e apresentam uma primeira vizinhança composta por íons de oxigênio, flúor ou cloro. A utilização das soluções dos determinantes seculares das matrizes energéticas serviram como um excelente aporte teórico para o desenvolvimento do método, o qual é aplicado tendo por base o modelo de recobrimento simples (SOM). Isto permitiu a descrição da interação do campo cristalino por um método não paramétrico e de simples aplicação. Além disso, foi realizada a previsão do fator de recobrimento das funções de onda dos íons interagentes, carga efetiva total de ligação do Eu3+, e as relações e tendências da interação do campo cristalino com a espécie química dos PV, e o tipo de Ln que compõe a matriz hospedeira principal. Por fim, foi possível tornar o MENN um método mais teórico através da utilização de dados teóricos. Os resultados obtidos serviram para confirmar a eficiência e precisão do MENN na descrição da interação do campo cristalino em sistemas contendo íons lantanídeos trivalentes. Física Ions Metais de terras-raras Teoria do campo cristalino Energia Lantanídeos trivalentes Campo cristalino Carga de interação Trivalent lanthanides Crystal field Interaction charge Method of equivalent nearest neighbours CIENCIAS EXATAS E DA TERRA::FISICA
7	Analýza experimentálních EKG záznamů / Analysis of experimental ECG Maršánová, Lucie January 2015 (has links) This diploma thesis deals with the analysis of experimental electrograms (EG) recorded from isolated rabbit hearts. The theoretical part is focused on the basic principles of electrocardiography, pathological events in ECGs, automatic classification of ECG and experimental cardiological research. The practical part deals with manual classification of individual pathological events – these results will be presented in the database of EG records, which is under developing at the Department of Biomedical Engineering at BUT nowadays. Manual scoring of data was discussed with experts. After that, the presence of pathological events within particular experimental periods was described and influence of ischemia on heart electrical activity was reviewed. In the last part, morphological parameters calculated from EG beats were statistically analised with Kruskal-Wallis and Tukey-Kramer tests and also principal component analysis (PCA) and used as classification features to classify automatically four types of the beats. Classification was realized with four approaches such as discriminant function analysis, k-Nearest Neighbours, support vector machines, and naive Bayes classifier.
8	Automatická klasifikace spánkových fází z polysomnografických dat / Automatic sleep scoring using polysomnographic data Vávrová, Eva January 2016 (has links) The thesis is focused on analysis of polysomnographic signals based on extraction of chosen parameters in time, frequency and time-frequency domain. The parameters are acquired from 30 seconds long segments of EEG, EMG and EOG signals recorded during different sleep stages. The parameters used for automatic classification of sleep stages are selected according to statistical analysis. The classification is realized by artificial neural networks, k-NN classifier and linear discriminant analysis. The program with a graphical user interface was created using Matlab.
9	A Comparative Study of Machine Learning Algorithms Le Fort, Eric January 2018 (has links) The selection of machine learning algorithm used to solve a problem is an important choice. This paper outlines research measuring three performance metrics for eight different algorithms on a prediction task involving under- graduate admissions data. The algorithms that were tested are k-nearest neighbours, decision trees, random forests, gradient tree boosting, logistic regression, naive bayes, support vector machines, and artificial neural net- works. These algorithms were compared in terms of accuracy, training time, and execution time. / Thesis / Master of Applied Science (MASc) Machine Learning Comparative Study Data Science University Admissions Software Engineering Computer Science K-Nearest Neighbours Decision Tree Random Forest Gradient Tree Boosting Logistic Regression Naive Bayes Support Vector Machine Neural Network
10	Estimation robuste de courbes de consommmation électrique moyennes par sondage pour de petits domaines en présence de valeurs manquantes / Robust estimation of mean electricity consumption curves by sampling for small areas in presence of missing values De Moliner, Anne 05 December 2017 (has links) Dans cette thèse, nous nous intéressons à l'estimation robuste de courbes moyennes ou totales de consommation électrique par sondage en population finie, pour l'ensemble de la population ainsi que pour des petites sous-populations, en présence ou non de courbes partiellement inobservées.En effet, de nombreuses études réalisées dans le groupe EDF, que ce soit dans une optique commerciale ou de gestion du réseau de distribution par Enedis, se basent sur l'analyse de courbes de consommation électrique moyennes ou totales, pour différents groupes de clients partageant des caractéristiques communes. L'ensemble des consommations électriques de chacun des 35 millions de clients résidentiels et professionnels Français ne pouvant être mesurées pour des raisons de coût et de protection de la vie privée, ces courbes de consommation moyennes sont estimées par sondage à partir de panels. Nous prolongeons les travaux de Lardin (2012) sur l'estimation de courbes moyennes par sondage en nous intéressant à des aspects spécifiques de cette problématique, à savoir l'estimation robuste aux unités influentes, l'estimation sur des petits domaines, et l'estimation en présence de courbes partiellement ou totalement inobservées.Pour proposer des estimateurs robustes de courbes moyennes, nous adaptons au cadre fonctionnel l'approche unifiée d'estimation robuste en sondages basée sur le biais conditionnel proposée par Beaumont (2013). Pour cela, nous proposons et comparons sur des jeux de données réelles trois approches : l'application des méthodes usuelles sur les courbes discrétisées, la projection sur des bases de dimension finie (Ondelettes ou Composantes Principales de l'Analyse en Composantes Principales Sphériques Fonctionnelle en particulier) et la troncature fonctionnelle des biais conditionnels basée sur la notion de profondeur d'une courbe dans un jeu de données fonctionnelles. Des estimateurs d'erreur quadratique moyenne instantanée, explicites et par bootstrap, sont également proposés.Nous traitons ensuite la problématique de l'estimation sur de petites sous-populations. Dans ce cadre, nous proposons trois méthodes : les modèles linéaires mixtes au niveau unité appliqués sur les scores de l'Analyse en Composantes Principales ou les coefficients d'ondelettes, la régression fonctionnelle et enfin l'agrégation de prédictions de courbes individuelles réalisées à l'aide d'arbres de régression ou de forêts aléatoires pour une variable cible fonctionnelle. Des versions robustes de ces différents estimateurs sont ensuite proposées en déclinant la démarche d'estimation robuste basée sur les biais conditionnels proposée précédemment.Enfin, nous proposons quatre estimateurs de courbes moyennes en présence de courbes partiellement ou totalement inobservées. Le premier est un estimateur par repondération par lissage temporel non paramétrique adapté au contexte des sondages et de la non réponse et les suivants reposent sur des méthodes d'imputation. Les portions manquantes des courbes sont alors déterminées soit en utilisant l'estimateur par lissage précédemment cité, soit par imputation par les plus proches voisins adaptée au cadre fonctionnel ou enfin par une variante de l'interpolation linéaire permettant de prendre en compte le comportement moyen de l'ensemble des unités de l'échantillon. Des approximations de variance sont proposées dans chaque cas et l'ensemble des méthodes sont comparées sur des jeux de données réelles, pour des scénarios variés de valeurs manquantes. / In this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of each customer so these mean curves are estimated using samples. In this thesis, we extend the work of Lardin (2012) on mean curve estimation by sampling by focusing on specific aspects of this problem such as robustness to influential units, small area estimation and estimation in presence of partially or totally unobserved curves.In order to build robust estimators of mean curves we adapt the unified approach to robust estimation in finite population proposed by Beaumont et al (2013) to the context of functional data. To that purpose we propose three approaches : application of the usual method for real variables on discretised curves, projection on Functional Spherical Principal Components or on a Wavelets basis and thirdly functional truncation of conditional biases based on the notion of depth.These methods are tested and compared to each other on real datasets and Mean Squared Error estimators are also proposed.Secondly we address the problem of small area estimation for functional means or totals. We introduce three methods: unit level linear mixed model applied on the scores of functional principal components analysis or on wavelets coefficients, functional regression and aggregation of individual curves predictions by functional regression trees or functional random forests. Robust versions of these estimators are then proposed by following the approach to robust estimation based on conditional biais presented before.Finally, we suggest four estimators of mean curves by sampling in presence of partially or totally unobserved trajectories. The first estimator is a reweighting estimator where the weights are determined using a temporal non parametric kernel smoothing adapted to the context of finite population and missing data and the other ones rely on imputation of missing data. Missing parts of the curves are determined either by using the smoothing estimator presented before, or by nearest neighbours imputation adapted to functional data or by a variant of linear interpolation which takes into account the mean trajectory of the entire sample. Variance approximations are proposed for each method and all the estimators are compared to each other on real datasets for various missing data scenarios. Arbres de régression Biais conditionnels Données fonctionnelles Données manquantes Estimation sur petits domaines Estimateurs à noyau Forêts aléatoires Modèles linéaires mixtes Plus proches voisins Robustesse Sondage Conditional bias Functional data Kernel estimators Missing data Linear mixed models Nearest neighbours Random forests Regression trees Robustness Small area estimation Survey sampling 510

Search results