Global ETD Search

311	Avaliação da segurança de sistemas de potência para múltiplas contingências usando árvore de decisão multicaminhos OLIVEIRA, Werbeston Douglas de 15 September 2017 (has links) Submitted by Carmen Torres (carmensct@globo.com) on 2018-02-09T18:08:56Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoSegurancaSistemas.pdf: 3858130 bytes, checksum: 2cbcf782498880ce489e50eb58e31bf7 (MD5) / Approved for entry into archive by Edisangela Bastos (edisangela@ufpa.br) on 2018-02-16T13:42:52Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoSegurancaSistemas.pdf: 3858130 bytes, checksum: 2cbcf782498880ce489e50eb58e31bf7 (MD5) / Made available in DSpace on 2018-02-16T13:42:52Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoSegurancaSistemas.pdf: 3858130 bytes, checksum: 2cbcf782498880ce489e50eb58e31bf7 (MD5) Previous issue date: 2017-09-15 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico / Eletronorte - Centrais Elétricas do Norte do Brasil S/A / A busca de formas eficazes para promover a operação segura de sistemas de potência e aumentar a compreensão dos operadores tem encorajado a pesquisa contínua de novas técnicas e métodos que possam ajudar nessa tarefa. Nesta tese, propõe-se uma abordagem para avaliar a segurança da operação de sistemas de potência para múltiplas contingências usando a técnica árvore de decisão multi-caminhos ou (MDT, do inglês “Multiway Decision Tree”). A MDT difere de outras técnicas de árvore de decisão por estabelecer, na etapa de treinamento, um valor de atributo categórico por ramo. Essa abordagem propõe o uso de topologias (contingências) como atributos categóricos. Desta forma, a MDT melhora a interpretabilidade em relação ao estado operacional do sistema de potência, pois o operador pode ver claramente as variáveis críticas para cada topologia, de modo que as regras da MDT possam ser usadas no auxílio à tomada de decisão. Essa abordagem proposta foi utilizada para avaliação da segurança em dois sistemas testes, o sistema IEEE 39 barras e o sistema da parte do Norte do Sistema Interligado Nacional (SIN), sendo que este último foi testado com dados reais de um dia de operação. A técnica proposta baseada em MDT demonstrou bom desempenho, utiliando um conjunto de regras simples e claras. Também foi realizada uma comparação dos resultados obtidos com outras técnicas baseadas em árvores de decisão e verificou-se que o MDT resultou em um procedimento mais simples para a classificação de segurança dos sistemas com boa precisão. / The search for effective ways to promote the secure operation of power systems and to increase its understanding by operators has encouraged continuous research for new techniques and methods that can help in this task. In this paper, it is proposed an approach to assess power system operation security for multiple contingencies using a multiway decision tree (MDT). The MDT differs from other decision tree techniques for establishing, in the training step, one value of the categorical attributes by branch. This approach proposes the use of topologies (contingencies) as categorical attributes. In this way, it improves interpretability regarding the power system operational state, as the operator can see clearly the critical variables for each topology, such that the MDT rules can be used in aiding the decision-making. This proposal was used for security assessment of two test systems, the IEEE 39-bus system and the Northern part of the Brazilian Interconnected Power System (BIPS), and BIPS was tested with real data from one day operation. The proposed MDT-based technique demonstrated superior performance, with a set of simple and clear rules. It was also performed a comparison of the obtained results with other techniques based on DT and it turned out that MDT has resulted in a simpler procedure for power system security classification with good accuracy. Mineração de dados (Computação) Árvores de decisão Estabilidade transitória Divisão multi-caminhos MDT - Multiway Decision Tree Segurança do sistema de potência Múltiplas contingências SISTEMAS ELÉTRICOS DE POTÊNCIA SISTEMAS DE ENERGIA ELÉTRICA
312	REAL-TIME PREDICTION OF SHIMS DIMENSIONS IN POWER TRANSFER UNITS USING MACHINE LEARNING Jansson, Daniel, Blomstrand, Rasmus January 2019 (has links) No description available. MATLAB Machine Learning Smart Manufacturing Classification Regression Power Transfer Units Industry 4.0 Assembly Line Support Vector Machines Artificial Neural Network Partial Least Squares Regression Decision Tree Bagging Bagged Decision Trees Support Vector Regression Machines SVR SVM ANN PTU DT Bagged DTs Robotics Robotteknik och automation
313	以資料採礦方法探討國內數位落差之現象 / Effect of Digital Divide in Taiwan: Data Mining Applications 林建宇, Lin,chien yu Unknown Date (has links) 全球化時代與資訊化社會的來臨，電腦與網際網路成為生活中不可或缺的要素，儘管至2008年為止，我國有將近七成的民眾透過網路科技享受到更多的便利性，但社會上仍存在著數位落差（digital divide）的問題，數位落差除了使得資訊窮人（information-poor）不易取得資訊，亦將對其經濟、人權等各方面造成影響。故研究目的在利用資料採礦的應用，配合SPSS Clementine 12.0的軟體，探討數位落差的現象，並嘗試找出形成數位落差的影響原因。　　本研究主要投入人口統計變數以及生活型態變數，並藉由C5.0決策樹、C&RT分類樹，以及CHAID分類樹建立模型，透過這三個分類迴歸樹的模型，發現到「年齡」、「教育程度」、「地理區域」、「個人資產狀況」、「經濟主要來源：子女」、「個人每月可支配所得」以及「收入來源：薪資」共七項變數同時對民眾是否成為數位落差中的資訊富人（information-rich）有著較重要的影響性，因此，研究最後依據此七項進行政策建議，以提供相關單位之參考。 / In this globalized and informational society, computers and internet networks are essential elements in our daily lives. Until the year 2008, almost 70% of population in Taiwan has enjoyed greater conveniences through networking technologies. However, the issue of “digital divide” remains, where information-poor cannot obtain information easily, and the issue affects the society in terms of economies and human rights. Consequently, the purpose of this research is aimed to find the reasons behind “digital divide” using data-mining techniques with SPSS Clementine 12.0 statistical software. 　　The research will input demographic variables and life-style variables. Using C5.0 decision tree, C&RT tree, and CHAID methodologies to build model, and subsequently discovers that whether the 7 variables - “age”, “level of education”, “location”, “personal asset status”, “main source of income: children”, “monthly personal disposal income” and “source of income: salary” will have significant impacts on information-rich population within “digital divide”. Therefore, the research recommendations will be provided according to the results from these 7 variables. 數位落差資訊富人資訊窮人資料採礦分類迴歸樹 C5.0決策樹 C&RT分類樹 CHAID分類樹 Digital Divide Information-rich Information-poor Data Mining Decision Tree C5.0 C&RT CHAID
314	Success in the protean career : a predictive study of professional artists and tertiary arts graduates Bridgstock, Ruth Sarah January 2007 (has links) In the shift to a globalised creative economy where innovation and creativity are increasingly prized, many studies have documented direct and indirect social and economic benefits of the arts. In addition, arts workers have been argued to possess capabilities which are of great benefit both within and outside the arts, including (in addition to creativity) problem solving abilities, emotional intelligence, and team working skills (ARC Centre of Excellence for Creative Industries and Innovation, 2007). However, the labour force characteristics of professional artists in Australia and elsewhere belie their importance. The average earnings of workers in the arts sector are consistently less than other workers with similar educational backgrounds, and their rates of unemployment and underemployment are much higher (Australian Bureau of Statistics, 2005; Caves, 2000; Throsby & Hollister, 2003). Graduating students in the arts appear to experience similar employment challenges and exhibit similar patterns of work to artists in general. Many eventually obtain work unrelated to the arts or go back to university to complete further tertiary study in fields unrelated to arts (Graduate Careers Council of Australia, 2005a). Recent developments in career development theory have involved discussion of the rise of boundaryless careers amongst knowledge workers. Boundaryless careers are characterised by non-linear career progression occurring outside the bounds of a single organisation or field (Arthur & Rousseau, 1996a, 1996b). The protean career is an extreme form of the boundaryless career, where the careerist also possesses strong internal career motivations and criteria for success (Baruch, 2004; Hall, 2004; Hall & Mirvis, 1996). It involves a psychological contract with one's self rather than an organisation or organisations. The boundaryless and protean career literature suggests competencies and dispositions for career self-management and career success, but to date there has been minimal empirical work investigating the predictive value of these competencies and dispositions to career success in the boundaryless or protean career. This program of research employed competencies and dispositions from boundaryless and protean career theory to predict career success in professional artists and tertiary arts graduates. These competencies and dispositions were placed into context using individual and contextual career development influences suggested by the Systems Theory Framework of career development (McMahon & Patton, 1995; Patton & McMahon, 1999, 2006a). Four substantive studies were conducted, using online surveys with professional artists and tertiary arts students / graduates, which were preceded by a pilot study for measure development. A largely quantitative approach to the program of research was preferred, in the interests of generalisability of findings. However, at the time of data collection, there were no quantitative measures available which addressed the constructs of interest. Brief scales of Career Management Competence based on the Australian Blueprint for Career Development (Haines, Scott, & Lincoln, 2003), Protean Career Success Orientation based on the underlying dispositions for career success suggested by protean career theory, and Career Development Influences based on the Systems Theory Framework of career development (McMahon & Patton, 1995; Patton & McMahon, 1999, 2006a) were constructed and validated via a process of pilot testing and exploratory factor analyses. This process was followed by confirmatory factor analyses with data collected from two samples: 310 professional artists, and 218 graduating arts students who participated at time 1 (i.e., at the point of undergraduate course completion in October, 2005). Confirmatory factor analyses via Structural Equation Modelling conducted in Study 1 revealed that the scales would benefit from some respecification, and so modifications were made to the measures to enhance their validity and reliability. The three scales modified and validated in Study 1 were then used in Studies 3 and 4 as potential predictors of career success for the two groups of artists under investigation, along with relevant sociodemographic variables. The aim of the Study 2 was to explore the construct of career success in the two groups of artists studied. Each participant responded to an open-ended question asking them to define career success. The responses for professional artists were content analysed using emergent coding with two coders. The codebook was later applied to the arts students' definitions. The majority of the themes could be grouped into four main categories: internal definitions; financial recognition definitions; contribution definitions; and non-financial recognition definitions. Only one third of the definition themes in the professional artists' and arts graduates' definitions of career success were categorised as relating to financial recognition. Responses within the financial recognition category also indicated that many of the artists aspired only to a regular subsistence level of arts income (although a small number of the arts graduates did aspire to fame and fortune). The second section of the study investigated the statistical relationships between the five different measures of career success for each career success definitional category and overall. The professional artists' and arts graduates' surveys contained several measures of career success, including total earnings over the previous 12 months, arts earnings over the previous 12 months, 1-6 self-rated total employability, 1-6 self-rated arts employability, and 1-6 self-rated self-defined career success. All of the measures were found to be statistically related to one another, but a very strong statistical relationship was identified between each employability measure and its corresponding earnings measure for both of the samples. Consequently, it was decided to include only the earnings measures (earnings from arts, and earnings overall) and the self-defined career success rating measure in the later studies. Study 3 used the career development constructs validated in Study 1, sociodemographic variables, and the career success measures explored in Study 2 via Classification and Regression Tree (CART - Breiman, Friedman, Olshen, & Stone, 1984) style decision trees with v-fold crossvalidation pruning using the 1 SE rule. CART decision trees are a nonparametric analysis technique which can be used as an alternative to OLS or hierarchical regression in the case of data which violates parametric statistical assumptions. The three optimal decision trees for total earnings, arts earnings and self defined career success ratings explained a large proportion of the variance in their respective target variables (R2 between 0.49 and 0.68). The Career building subscale of the Career Management Competence scale, pertaining to the ability to manage the external aspects of a career, was the most consistent predictor of all three career success measures (and was the strongest predictor for two of the three trees), indicating the importance of the artists' abilities to secure work and build the external aspects of a career. Other important predictors included the Self management subscale of the Career Management Competence scale, Protean Career Success Orientation, length of time working in the arts, and the positive role of interpersonal influences, skills and abilities, and interests and beliefs from the Career Development Influences scale. Slightly different patterns of predictors were found for the three different career success measures. Study 4 also involved the career development constructs validated in Study 1, sociodemographic variables, and the career success measures explored in Study 2 via CART style decision trees. This study used a prospective repeated measures design where the data for the attribute variables were gathered at the point of undergraduate course completion, and the target variables were measured one year later. Data from a total of 122 arts students were used, as 122 of the 218 students who responded to the survey at time 1 (October 2005) also responded at time 2 (October 2006). The resulting optimal decision trees had R2 values of between 0.33 and 0.46. The values were lower than those for the professional artists' decision trees, and the trees themselves were smaller, but the R2 values nonetheless indicated that the arts students' trees possessed satisfactory explanatory power. The arts graduates' Career building scores at time 1 were strongly predictive of all three career success measures at time 2, a similar finding to the professional artists' trees. A further similarity between the trees for the two samples was the strong statistical relationship between Career building, Self management, and Protean Career Success Orientation. However, the most important variable in the total earnings tree was arts discipline category. Technical / design arts graduates consistently earned more overall than arts graduates from other disciplines. Other key predictors in the arts graduates' trees were work experience in arts prior to course completion, positive interpersonal influences, and the positive influence of skills and abilities and interests and beliefs on career development. The research program findings represent significant contributions to existing knowledge about artists' career development and success, and also the transition from higher education to the world of work, with specific reference to arts and creative industries programs. It also has implications for theory relating to career success and protean / boundaryless careers. Career success protean career boundaryless career career development career transition school to work transition university to work transition work experience higher education career self-management career management career education artists arts graduates creative industries creative workforce graduate attributes employability generic skills transferable skills scale development confirmatory factor analysis structural equation modelling content analysis decision tree regression tree CART
315	[en] MACHINE LEARNING METHODS APPLIED TO PREDICTIVE MODELS OF CHURN FOR LIFE INSURANCE / [pt] MÉTODOS DE MACHINE LEARNING APLICADOS À MODELAGEM PREDITIVA DE CANCELAMENTOS DE CLIENTES PARA SEGUROS DE VIDA THAIS TUYANE DE AZEVEDO 26 September 2018 (has links) [pt] O objetivo deste estudo foi explorar o problema de churn em seguros de vida, no sentido de prever se o cliente irá cancelar o produto nos próximos 6 meses. Atualmente, métodos de machine learning vêm se popularizando para este tipo de análise, tornando-se uma alternativa ao tradicional método de modelagem da probabilidade de cancelamento através da regressão logística. Em geral, um dos desafios encontrados neste tipo de modelagem é que a proporção de clientes que cancelam o serviço é relativamente pequena. Para isso, este estudo recorreu a técnicas de balanceamento para tratar a base naturalmente desbalanceada – técnicas de undersampling, oversampling e diferentes combinações destas duas foram utilizadas e comparadas entre si. As bases foram utilizadas para treinar modelos de Bagging, Random Forest e Boosting, e seus resultados foram comparados entre si e também aos resultados obtidos através do modelo de Regressão Logística. Observamos que a técnica SMOTE-modificado para balanceamento da base, aplicada ao modelo de Bagging, foi a combinação que apresentou melhores resultados dentre as combinações exploradas. / [en] The purpose of this study is to explore the churn problem in life insurance, in the sense of predicting if the client will cancel the product in the next 6 months. Currently, machine learning methods are becoming popular in this type of analysis, turning it into an alternative to the traditional method of modeling the probability of cancellation through logistics regression. In general, one of the challenges found in this type of modelling is that the proportion of clients who cancelled the service is relatively small. For this, the study resorted to balancing techniques to treat the naturally unbalanced base – under-sampling and over-sampling techniques and different combinations of these two were used and compared among each other. The bases were used to train models of Bagging, Random Forest and Boosting, and its results were compared among each other and to the results obtained through the Logistics Regression model. We observed that the modified SMOTE technique to balance the base, applied to the Bagging model, was the combination that presented the best results among the explored combinations. [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] ARVORE DE DECISAO [en] DECISION TREE [pt] SEGURO DE VIDA [en] LIFE INSURANCE [pt] BOOSTING [en] BOOSTING [pt] PROPENSAO A CANCELAMENTO [en] CANCELLATION PROPENSITY [pt] BAGGING [en] BAGGING [pt] RANDOM FOREST [en] RANDOM FOREST [pt] DADO DESBALANCEADO [en] UNBALANCED DATA [pt] UNDER SAMPLING [en] UNDER SAMPLING [pt] OVER SAMPLING [en] OVER SAMPLING [pt] SMOTE [en] SMOTE
316	[en] TS-TARX: TREE STRUCTURED - THRESHOLD AUTOREGRESSION WITH EXTERNAL VARIABLES / [pt] TS-TARX: UM MODELO DE REGRESSÃO COM LIMIARES BASEADO EM ÁRVORE DE DECISÃO CHRISTIAN NUNES ARANHA 28 January 2002 (has links) [pt] Este trabalho propõe um novo modelo linear por partes para a extração de regras de conhecimento de banco de dados. O modelo é uma heurística baseada em análise de árvore de regressão, como introduzido por Friedman (1979) e discutido em detalhe por Breiman (1984). A motivação desta pesquisa é trazer uma nova abordagem combinando técnicas estatísticas de modelagem e um algoritmo de busca por quebras eficiente. A decisão de quebra usada no algoritmo de busca leva em consideração informações do ajuste de equações lineares e foi implementado tendo por inspiração o trabalho de Tsay (1989). Neste, ele sugere um procedimento para construção um modelo para a análise de séries temporais chamado TAR (threshold autoregressive model), introduzido por Tong (1978) e discutido em detalhes por Tong e Lim (1980) e Tong (1983). O modelo TAR é um modelo linear por partes cuja idéia central é alterar os parâmetros do modelo linear autoregressivo de acordo com o valor de uma variável observada, chamada de variável limiar. No trabalho de Tsay, a Identificação do número e localização do potencial limiar era baseada na analise de gráficos. A idéia foi então criar um novo algoritmo todo automatizado. Este processo é um algoritmo que preserva o método de regressão por mínimos quadrados recursivo (MQR) usado no trabalho de Tsay. Esta talvez seja uma das grandes vantagens da metodologia introduzida neste trabalho, visto que Cooper (1998) em seu trabalho de análise de múltiplos regimes afirma não ser possível testar cada quebra. Da combinação da árvore de decisão com a técnica de regressão (MQR), o modelo se tornou o TS-TARX (Tree Structured - Threshold AutoRegression with eXternal variables). O procedimento consiste numa busca em árvore binária calculando a estatística F para a seleção das variáveis e o critério de informação BIC para a seleção dos modelos. Ao final, o algoritmo gera como resposta uma árvore de decisão (por meio de regras) e as equações de regressão estimadas para cada regime da partição. A principal característica deste tipo de resposta é sua fácil interpretação. O trabalho conclui com algumas aplicações em bases de dados padrões encontradas na literatura e outras que auxiliarão o entendimento do processo implementado. / [en] This research work proposes a new piecewise linear model to extract knowledge rules from databases. The model is an heuristic based on analysis of regression trees, introduced by Friedman (1979) and discussed in detail by Breiman (1984). The motivation of this research is to come up with a new approach combining both statistical modeling techniques and an efficient split search algorithm. The split decision used in the split search algorithm counts on information from adjusted linear equation and was implemented inspired by the work of Tsay (1989). In his work, he suggests a model-building procedure for a nonlinear time series model called by TAR (threshold autoregressive model), first proposed by Tong (1978) and discussed in detail by Tong and Lim (1980) and Tong (1983). The TAR model is a piecewise linear model which main idea is to set the coefficients of a linear autoregressive process in accordance with a value of observed variable, called by threshold variable. Tsay`s identification of the number and location of the potential thresholds was based on supplementary graphic devices. The idea is to get the whole process automatic on a new model-building process. This process is an algorithm that preserves the method of regression by recursive least squares (RLS) used in Tsay`s work. This regression method allowed the test of all possibilities of data split. Perhaps that is the main advantage of the methodology introduced in this work, seeing that Cooper, S. (1998) said about the impossibility of testing each break.Thus, combining decision tree methodology with a regression technique (RLS), the model became the TS-TARX (Tree Structured - Threshold AutoRegression with eXternal variables). It searches on a binary tree calculating F statistics for variable selection and the information criteria BIC for model selection. In the end, the algorithm produces as result a decision tree and a regression equation adjusted to each regime of the partition defined by the decision tree. Its major advantage is easy interpretation.This research work concludes with some applications in benchmark databases from literature and others that helps the understanding of the algorithm process. [pt] MINERACAO DE DADOS [en] DATA MINING [pt] REGRESSAO [en] REGRESSION [pt] REGRESSAO NAO LINEAR [en] NONLINEAR REGRESSION [pt] LINEAR POR PARTES [en] PIECEWISE MODEL [pt] ARVORE DE DECISAO [en] DECISION TREE [pt] ARVORE DE REGRESSAO [en] REGRESSION TREE [pt] MINIMOS QUADRADOS RECURSIVO [en] RECURSIVE LEAST SQUARES [pt] FACIL INTERPRETACAO [en] EASY INTERPRETATION
317	Classification automatique de textes pour les revues de littérature mixtes en santé Langlois, Alexis 12 1900 (has links) Les revues de littérature sont couramment employées en sciences de la santé pour justifier et interpréter les résultats d’un ensemble d’études. Elles permettent également aux chercheurs, praticiens et décideurs de demeurer à jour sur les connaissances. Les revues dites systématiques mixtes produisent un bilan des meilleures études portant sur un même sujet tout en considérant l’ensemble des méthodes de recherche quantitatives et qualitatives. Leur production est ralentie par la prolifération des publications dans les bases de données bibliographiques et la présence accentuée de travaux non scientifiques comme les éditoriaux et les textes d’opinion. Notamment, l’étape d’identification des études pertinentes pour l’élaboration de telles revues s’avère laborieuse et requiert un temps considérable. Traditionnellement, le triage s’effectue en utilisant un ensemble de règles établies manuellement. Dans cette étude, nous explorons la possibilité d’utiliser la classification automatique pour exécuter cette tâche. La famille d’algorithmes ayant été considérée dans le comparatif de ce travail regroupe les arbres de décision, la classification naïve bayésienne, la méthode des k plus proches voisins, les machines à vecteurs de support ainsi que les approches par votes. Différentes méthodes de combinaison de caractéristiques exploitant les termes numériques, les symboles ainsi que les synonymes ont été comparés. La pertinence des concepts issus d’un méta-thésaurus a également été mesurée. En exploitant les résumés et les titres d’approximativement 10 000 références, les forêts d’arbres de décision admettent le plus haut taux de succès (88.76%), suivies par les machines à vecteurs de support (86.94%). L’efficacité de ces approches devance la performance des filtres booléens conçus pour les bases de données bibliographiques. Toutefois, une sélection judicieuse des entrées de la collection d’entraînement est cruciale pour pallier l’instabilité du modèle final et la disparité des méthodologies quantitatives et qualitatives des études scientifiques existantes. / The interest of health researchers and policy-makers in literature reviews has continued to increase over the years. Mixed studies reviews are highly valued since they combine results from the best available studies on various topics while considering quantitative, qualitative and mixed research methods. These reviews can be used for several purposes such as justifying, designing and interpreting results of primary studies. Due to the proliferation of published papers and the growing number of nonempirical works such as editorials and opinion letters, screening records for mixed studies reviews is time consuming. Traditionally, reviewers are required to manually identify potential relevant studies. In order to facilitate this process, a comparison of different automated text classification methods was conducted in order to determine the most effective and robust approach to facilitate systematic mixed studies reviews. The group of algorithms considered in this study combined decision trees, naive Bayes classifiers, k-nearest neighbours, support vector machines and voting approaches. Statistical techniques were applied to assess the relevancy of multiple features according to a predefined dataset. The benefits of feature combination for numerical terms, synonyms and mathematical symbols were also measured. Furthermore, concepts extracted from a metathesaurus were used as additional features in order to improve the training process. Using the titles and abstracts of approximately 10,000 entries, decision trees perform the best with an accuracy of 88.76%, followed by support vector machine (86.94%). The final model based on decision trees relies on linear interpolation and a group of concepts extracted from a metathesaurus. This approach outperforms the mixed filters commonly used with bibliographic databases like MEDLINE. However, references chosen for training must be selected judiciously in order to address the model instability and the disparity of quantitative and qualitative study designs. Classification automatique Revue de littérature Étude mixte Méthode de recherche Santé Arbre de décision Machine à vecteurs de support Automated text classification Systematic review Mixed study Research method Health care Decision tree Support vector machine
318	Decision making strategy for antenatal echographic screening of foetal abnormalities using statistical learning / Méthodologie d'aide à la décision pour le dépistage anténatal échographique d'anomalies fœtales par apprentissage statistique Besson, Rémi 01 October 2019 (has links) Dans cette thèse, nous proposons une méthode pour construire un outil d'aide à la décision pour le diagnostic de maladie rare. Nous cherchons à minimiser le nombre de tests médicaux nécessaires pour atteindre un état où l'incertitude concernant la maladie du patient est inférieure à un seuil prédéterminé. Ce faisant, nous tenons compte de la nécessité dans de nombreuses applications médicales, d'éviter autant que possible, tout diagnostic erroné. Pour résoudre cette tâche d'optimisation, nous étudions plusieurs algorithmes d'apprentissage par renforcement et les rendons opérationnels pour notre problème de très grande dimension. Pour cela nous décomposons le problème initial sous la forme de plusieurs sous-problèmes et montrons qu'il est possible de tirer partie des intersections entre ces sous-tâches pour accélérer l'apprentissage. Les stratégies apprises se révèlent bien plus performantes que des stratégies gloutonnes classiques. Nous présentons également une façon de combiner les connaissances d'experts, exprimées sous forme de probabilités conditionnelles, avec des données cliniques. Il s'agit d'un aspect crucial car la rareté des données pour les maladies rares empêche toute approche basée uniquement sur des données cliniques. Nous montrons, tant théoriquement qu'empiriquement, que l'estimateur que nous proposons est toujours plus performant que le meilleur des deux modèles (expert ou données) à une constante près. Enfin nous montrons qu'il est possible d'intégrer efficacement des raisonnements tenant compte du niveau de granularité des symptômes renseignés tout en restant dans le cadre probabiliste développé tout au long de ce travail. / In this thesis, we propose a method to build a decision support tool for the diagnosis of rare diseases. We aim to minimize the number of medical tests necessary to achieve a state where the uncertainty regarding the patient's disease is less than a predetermined threshold. In doing so, we take into account the need in many medical applications, to avoid as much as possible, any misdiagnosis. To solve this optimization task, we investigate several reinforcement learning algorithm and make them operable in our high-dimensional. To do this, we break down the initial problem into several sub-problems and show that it is possible to take advantage of the intersections between these sub-tasks to accelerate the learning phase. The strategies learned are much more effective than classic greedy strategies. We also present a way to combine expert knowledge, expressed as conditional probabilities, with clinical data. This is crucial because the scarcity of data in the field of rare diseases prevents any approach based solely on clinical data. We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data) within a constant. Finally, we show that it is possible to effectively integrate reasoning taking into account the level of granularity of the symptoms reported while remaining within the probabilistic framework developed throughout this work. Optimisation d’arbre de décision Aide au diagnostic médical Mélange experts/données Sequential decision making Decision tree optimization Medical diagnostic decision support Planning in high-dimensional spaces Mixture experts/data Probabilistic reasoning in ontologies 570.151 95
319	Time series monitoring and prediction of data deviations in a manufacturing industry Lantz, Robin January 2020 (has links) An automated manufacturing industry makes use of many interacting moving parts and sensors. Data from these sensors generate complex multidimensional data in the production environment. This data is difficult to interpret and also difficult to find patterns in. This project provides tools to get a deeper understanding of Swedsafe’s production data, a company involved in an automated manufacturing business. The project is based on and will show the potential of the multidimensional production data. The project mainly consists of predicting deviations from predefined threshold values in Swedsafe’s production data. Machine learning is a good method of finding relationships in complex datasets. Supervised machine learning classification is used to predict deviation from threshold values in the data. An investigation is conducted to identify the classifier that performs best on Swedsafe's production data. The technique sliding window is used for managing time series data, which is used in this project. Apart from predicting deviations, this project also includes an implementation of live graphs to easily get an overview of the production data. A steady production with stable process values is important. So being able to monitor and predict events in the production environment can provide the same benefit for other manufacturing companies and is therefore suitable not only for Swedsafe. The best performing machine learning classifier tested in this project was the Random Forest classifier. The Multilayer Perceptron did not perform well on Swedsafe’s data, but further investigation in recurrent neural networks using LSTM neurons would be recommended. During the projekt a web based application displaying the sensor data in live graphs is also developed. Machine learning Supervised learning Time series classification Manufacturing industry Production data Data deviations Support Vector Machine K-Nearest Neighbours Linear Regression Decision Tree Random Forest Neural Network Recurrent Neural Network Computer Science Maskininlärning Tidsserier Tillverkningsindustri Klassificerare Avvikelser Computer Sciences Datavetenskap (datalogi) Mathematical Analysis Matematisk analys
320	Optimalizace strojového učení pro predikci KPI / Machine Learning Optimization of KPI Prediction Haris, Daniel January 2018 (has links) This thesis aims to optimize the machine learning algorithms for predicting KPI metrics for an organization. The organization is predicting whether projects meet planned deadlines of the last phase of development process using machine learning. The work focuses on the analysis of prediction models and sets the goal of selecting new candidate models for the prediction system. We have implemented a system that automatically selects the best feature variables for learning. Trained models were evaluated by several performance metrics and the best candidates were chosen for the prediction. Candidate models achieved higher accuracy, which means, that the prediction system provides more reliable responses. We suggested other improvements that could increase the accuracy of the forecast.

Search results