Spelling suggestions: "subject:"[een] DECISION TREE"" "subject:"[enn] DECISION TREE""
291 |
Digitalt beslutsträd för val av Tobii Dynavox-produkter : Optimering av produktval för förskrivningsprocessen / Digital Decision Tree for Selecting Tobii Dynavox Products : Optimization of Product Selection for the Prescribing ProcessYazdi, Anna, Nasrin, Yaguobi January 2024 (has links)
Denna studie fokuserar på utvecklingen av en interaktiv chatbot för att underlätta valet av kommunikationsenheter för personer med alternativa och kompletterande kommunikationsbehov (AKK). Genom att integrera MIRO för att skapa beslutsträd och Flowchart AI för att konstruera chatbotar möjliggörs en strukturerad process för visualisering och implementering av innehåll på en webbplattform. Beslutsträdet, konstruerat i MIRO, identifierar och strukturerar olika komponenter som är nödvändiga för att skapa en skräddarsydd enhet för varje individ inom AKK-kategorin. Flowchart AI används för att utveckla en interaktiv chatbot som vägleder valet av enhet, mjukvara och tillbehör baserat på användarens specifika behov och preferenser. Studien inkluderar även prototyper av e-postmeddelanden för att förse användarna med en sammanfattning av rekommendationerna. Slutresultatet av detta arbete innebär framgångsrik utveckling av en interaktiv chatbot som guidar användare genom valet av lämpliga Tobii Dynavox-produkter, skräddarsytt efter deras individuella behov. Genom att erbjuda användarna en mer effektiv och riktad rekommendationsprocess sparar arbetet både tid och resurser för såväl användare som professionella, som logopeder. Projektet bidrar till förbättrad tillgång till information och vägledning inom detta område, vilket främjar ökad tillgänglighet och effektivitet inom alternativa och kompletterande kommunikationslösningar. Vidare har projektet främjat kunskapsutvecklingen inom området genom att utveckla ett interaktivt program för medicinteknisk utrustning. Ur ett bredare samhällsperspektiv har arbetet också bidragit till ökad tillgänglighet och effektivitet inom alternativ och kompletterande kommunikation, vilket kan förbättra livskvaliteten för personer som behöver dessa lösningar. Arbetet har minskat risken för felaktiga val och ökat användarnas förmåga att kommunicera effektivt och självständigt. / This study focuses on the development of an interactive chatbot to facilitate the selection of communication devices for individuals with alternative and augmentative communication (AAC) needs. By integrating MIRO for creating decision trees and Flowchart AI for constructing the chatbot, a structured process for visualizing and implementing content on a web platform is enabled. The decision tree, constructed in MIRO, identifies and organizes various components necessary to create a customized device for each individual within the AAC category. Flowchart AI is utilized to develop an interactive chatbot that guides users through the selection of device, software, and accessories based on their specific needs and preferences. The study also includes prototypes of email notifications to provide users with a summary of recommendations. The culmination of this work results in the successful development of an interactive chatbot that guides users through the selection of suitable Tobii Dynavox products tailored to their individual needs. By offering users a more efficient and targeted recommendation process, the work saves time and resources for both users and professionals, such as speech therapist. The project contributes to improved access to information and guidance in this area, promoting increased accessibility and efficiency in alternative and augmentative communication solutions. Furthermore, the project has promoted knowledge development in the field by creating an interactive program for medical equipment. From a broader societal perspective, the work has also contributed to increased accessibility and efficiency in alternative and augmentative communication, potentially enhancing the quality of life for individuals in need of these solutions. The work has reduced the risk of erroneous choices and enhanced users' ability to communicate effectively and independently.
|
292 |
Distributed conditional computationLéonard, Nicholas 08 1900 (has links)
L'objectif de cette thèse est de présenter différentes applications du programme de recherche de calcul conditionnel distribué.
On espère que ces applications, ainsi que la théorie présentée ici, mènera à une solution générale du problème
d'intelligence artificielle, en particulier en ce qui a trait à la nécessité d'efficience.
La vision du calcul conditionnel distribué consiste à accélérer l'évaluation et l'entraînement de modèles profonds,
ce qui est très différent de l'objectif usuel d'améliorer sa capacité de généralisation et d'optimisation.
Le travail présenté ici a des liens étroits avec les modèles de type mélange d'experts.
Dans le chapitre 2, nous présentons un nouvel algorithme d'apprentissage profond qui
utilise une forme simple d'apprentissage par renforcement sur un modèle d'arbre de décisions à base
de réseau de neurones. Nous démontrons la nécessité d'une contrainte d'équilibre pour maintenir la
distribution d'exemples aux experts uniforme et empêcher les monopoles. Pour rendre le calcul efficient,
l'entrainement et l'évaluation sont contraints à être éparse en utilisant un routeur échantillonnant
des experts d'une distribution multinomiale étant donné un exemple.
Dans le chapitre 3, nous présentons un nouveau modèle profond constitué d'une représentation
éparse divisée en segments d'experts. Un modèle de langue à base de réseau de neurones est construit à partir
des transformations éparses entre ces segments. L'opération éparse par bloc est implémentée pour utilisation
sur des cartes graphiques. Sa vitesse est comparée à deux opérations denses du même calibre pour démontrer
le gain réel de calcul qui peut être obtenu. Un modèle profond utilisant des opérations éparses contrôlées
par un routeur distinct des experts est entraîné sur un ensemble de données d'un milliard de mots.
Un nouvel algorithme de partitionnement de données est appliqué sur un ensemble de mots pour
hiérarchiser la couche de sortie d'un modèle de langage, la rendant ainsi beaucoup plus efficiente.
Le travail présenté dans cette thèse est au centre de la vision de calcul conditionnel distribué
émis par Yoshua Bengio. Elle tente d'appliquer la recherche dans le domaine des mélanges d'experts
aux modèles profonds pour améliorer leur vitesse ainsi que leur capacité d'optimisation.
Nous croyons que la théorie et les expériences de cette thèse sont une étape importante sur
la voie du calcul conditionnel distribué car elle cadre bien le problème, surtout en ce qui
concerne la compétitivité des systèmes d'experts. / The objective of this paper is to present different applications of the distributed conditional computation research program.
It is hoped that these applications and the theory presented here will lead to a general solution of the problem of
artificial intelligence, especially with regard to the need for efficiency.
The vision of distributed conditional computation is to accelerate the evaluation and training of deep models
which is very different from the usual objective of improving its generalization and optimization capacity.
The work presented here has close ties with mixture of experts models.
In Chapter 2, we present a new deep learning algorithm that
uses a form of reinforcement learning on a novel neural network decision tree model.
We demonstrate the need for a balancing constraint to keep the
distribution of examples to experts uniform and to prevent monopolies. To make the calculation efficient,
the training and evaluation are constrained to be sparse by using a gater that
samples experts from a multinomial distribution given examples.
In Chapter 3 we present a new deep model consisting of a
sparse representation divided into segments of experts.
A neural network language model is constructed from blocks of sparse transformations between these expert segments.
The block-sparse operation is implemented for use on graphics cards.
Its speed is compared with two dense operations of the same caliber to demonstrate
and measure the actual efficiency gain that can be obtained. A deep model using
these block-sparse operations controlled by a distinct gater is trained on a dataset of one billion words.
A new algorithm for data partitioning (clustering) is applied to a set of words to
organize the output layer of a language model into a conditional hierarchy, thereby making it much more efficient.
The work presented in this thesis is central to the vision of distributed conditional computation
as issued by Yoshua Bengio. It attempts to apply research in the area of
mixture of experts to deep models to improve their speed and their optimization capacity.
We believe that the theory and experiments of this thesis are an important step
on the path to distributed conditional computation because it provides a good framework for the problem,
especially concerning competitiveness inherent to systems of experts.
|
293 |
Avaliação da segurança de sistemas de potência para múltiplas contingências usando árvore de decisão multicaminhosOLIVEIRA, Werbeston Douglas de 15 September 2017 (has links)
Submitted by Carmen Torres (carmensct@globo.com) on 2018-02-09T18:08:56Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tese_AvaliacaoSegurancaSistemas.pdf: 3858130 bytes, checksum: 2cbcf782498880ce489e50eb58e31bf7 (MD5) / Approved for entry into archive by Edisangela Bastos (edisangela@ufpa.br) on 2018-02-16T13:42:52Z (GMT) No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tese_AvaliacaoSegurancaSistemas.pdf: 3858130 bytes, checksum: 2cbcf782498880ce489e50eb58e31bf7 (MD5) / Made available in DSpace on 2018-02-16T13:42:52Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tese_AvaliacaoSegurancaSistemas.pdf: 3858130 bytes, checksum: 2cbcf782498880ce489e50eb58e31bf7 (MD5)
Previous issue date: 2017-09-15 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico / Eletronorte - Centrais Elétricas do Norte do Brasil S/A / A busca de formas eficazes para promover a operação segura de sistemas de potência e aumentar a compreensão dos operadores tem encorajado a pesquisa contínua de novas técnicas e métodos que possam ajudar nessa tarefa. Nesta tese, propõe-se uma abordagem para avaliar a segurança da operação de sistemas de potência para múltiplas contingências usando a técnica árvore de decisão multi-caminhos ou (MDT, do inglês “Multiway Decision Tree”). A MDT difere de outras técnicas de árvore de decisão por estabelecer, na etapa de treinamento, um valor de atributo categórico por ramo. Essa abordagem propõe o uso de topologias (contingências) como atributos categóricos. Desta forma, a MDT melhora a interpretabilidade em relação ao estado operacional do sistema de potência, pois o operador pode ver claramente as variáveis críticas para cada topologia, de modo que as regras da MDT possam ser usadas no auxílio à tomada de decisão. Essa abordagem proposta foi utilizada para avaliação da segurança em dois sistemas testes, o sistema IEEE 39 barras e o sistema da parte do Norte do Sistema Interligado Nacional (SIN), sendo que este último foi testado com dados reais de um dia de operação. A técnica proposta baseada em MDT demonstrou bom desempenho, utiliando um conjunto de regras simples e claras. Também foi realizada uma comparação dos resultados obtidos com outras técnicas baseadas em árvores de decisão e verificou-se que o MDT resultou em um procedimento mais simples para a classificação de segurança dos sistemas com boa precisão. / The search for effective ways to promote the secure operation of power systems and to increase its understanding by operators has encouraged continuous research for new techniques and methods that can help in this task. In this paper, it is proposed an approach to assess power system operation security for multiple contingencies using a multiway decision tree (MDT). The MDT differs from other decision tree techniques for establishing, in the training step, one value of the categorical attributes by branch. This approach proposes the use of topologies (contingencies) as categorical attributes. In this way, it improves interpretability regarding the power system operational state, as the operator can see clearly the critical variables for each topology, such that the MDT rules can be used in aiding the decision-making. This proposal was used for security assessment of two test systems, the IEEE 39-bus system and the Northern part of the Brazilian Interconnected Power System (BIPS), and BIPS was tested with real data from one day operation. The proposed MDT-based technique demonstrated superior performance, with a set of simple and clear rules. It was also performed a comparison of the obtained results with other techniques based on DT and it turned out that MDT has resulted in a simpler procedure for power system security classification with good accuracy.
|
294 |
REAL-TIME PREDICTION OF SHIMS DIMENSIONS IN POWER TRANSFER UNITS USING MACHINE LEARNINGJansson, Daniel, Blomstrand, Rasmus January 2019 (has links)
No description available.
|
295 |
以資料採礦方法探討國內數位落差之現象 / Effect of Digital Divide in Taiwan: Data Mining Applications林建宇, Lin,chien yu Unknown Date (has links)
全球化時代與資訊化社會的來臨,電腦與網際網路成為生活中不可或缺的要素,儘管至2008年為止,我國有將近七成的民眾透過網路科技享受到更多的便利性,但社會上仍存在著數位落差(digital divide)的問題,數位落差除了使得資訊窮人(information-poor)不易取得資訊,亦將對其經濟、人權等各方面造成影響。故研究目的在利用資料採礦的應用,配合SPSS Clementine 12.0的軟體,探討數位落差的現象,並嘗試找出形成數位落差的影響原因。
本研究主要投入人口統計變數以及生活型態變數,並藉由C5.0決策樹、C&RT分類樹,以及CHAID分類樹建立模型,透過這三個分類迴歸樹的模型,發現到「年齡」、「教育程度」、「地理區域」、「個人資產狀況」、「經濟主要來源:子女」、「個人每月可支配所得」以及「收入來源:薪資」共七項變數同時對民眾是否成為數位落差中的資訊富人(information-rich)有著較重要的影響性,因此,研究最後依據此七項進行政策建議,以提供相關單位之參考。 / In this globalized and informational society, computers and internet networks are essential elements in our daily lives. Until the year 2008, almost 70% of population in Taiwan has enjoyed greater conveniences through networking technologies. However, the issue of “digital divide” remains, where information-poor cannot obtain information easily, and the issue affects the society in terms of economies and human rights. Consequently, the purpose of this research is aimed to find the reasons behind “digital divide” using data-mining techniques with SPSS Clementine 12.0 statistical software.
The research will input demographic variables and life-style variables. Using C5.0 decision tree, C&RT tree, and CHAID methodologies to build model, and subsequently discovers that whether the 7 variables - “age”, “level of education”, “location”, “personal asset status”, “main source of income: children”, “monthly personal disposal income” and “source of income: salary” will have significant impacts on information-rich population within “digital divide”. Therefore, the research recommendations will be provided according to the results from these 7 variables.
|
296 |
Success in the protean career : a predictive study of professional artists and tertiary arts graduatesBridgstock, Ruth Sarah January 2007 (has links)
In the shift to a globalised creative economy where innovation and creativity are increasingly prized, many studies have documented direct and indirect social and economic benefits of the arts. In addition, arts workers have been argued to possess capabilities which are of great benefit both within and outside the arts, including (in addition to creativity) problem solving abilities, emotional intelligence, and team working skills (ARC Centre of Excellence for Creative Industries and Innovation, 2007). However, the labour force characteristics of professional artists in Australia and elsewhere belie their importance. The average earnings of workers in the arts sector are consistently less than other workers with similar educational backgrounds, and their rates of unemployment and underemployment are much higher (Australian Bureau of Statistics, 2005; Caves, 2000; Throsby & Hollister, 2003). Graduating students in the arts appear to experience similar employment challenges and exhibit similar patterns of work to artists in general. Many eventually obtain work unrelated to the arts or go back to university to complete further tertiary study in fields unrelated to arts (Graduate Careers Council of Australia, 2005a). Recent developments in career development theory have involved discussion of the rise of boundaryless careers amongst knowledge workers. Boundaryless careers are characterised by non-linear career progression occurring outside the bounds of a single organisation or field (Arthur & Rousseau, 1996a, 1996b). The protean career is an extreme form of the boundaryless career, where the careerist also possesses strong internal career motivations and criteria for success (Baruch, 2004; Hall, 2004; Hall & Mirvis, 1996). It involves a psychological contract with one's self rather than an organisation or organisations. The boundaryless and protean career literature suggests competencies and dispositions for career self-management and career success, but to date there has been minimal empirical work investigating the predictive value of these competencies and dispositions to career success in the boundaryless or protean career. This program of research employed competencies and dispositions from boundaryless and protean career theory to predict career success in professional artists and tertiary arts graduates. These competencies and dispositions were placed into context using individual and contextual career development influences suggested by the Systems Theory Framework of career development (McMahon & Patton, 1995; Patton & McMahon, 1999, 2006a). Four substantive studies were conducted, using online surveys with professional artists and tertiary arts students / graduates, which were preceded by a pilot study for measure development. A largely quantitative approach to the program of research was preferred, in the interests of generalisability of findings. However, at the time of data collection, there were no quantitative measures available which addressed the constructs of interest. Brief scales of Career Management Competence based on the Australian Blueprint for Career Development (Haines, Scott, & Lincoln, 2003), Protean Career Success Orientation based on the underlying dispositions for career success suggested by protean career theory, and Career Development Influences based on the Systems Theory Framework of career development (McMahon & Patton, 1995; Patton & McMahon, 1999, 2006a) were constructed and validated via a process of pilot testing and exploratory factor analyses. This process was followed by confirmatory factor analyses with data collected from two samples: 310 professional artists, and 218 graduating arts students who participated at time 1 (i.e., at the point of undergraduate course completion in October, 2005). Confirmatory factor analyses via Structural Equation Modelling conducted in Study 1 revealed that the scales would benefit from some respecification, and so modifications were made to the measures to enhance their validity and reliability. The three scales modified and validated in Study 1 were then used in Studies 3 and 4 as potential predictors of career success for the two groups of artists under investigation, along with relevant sociodemographic variables. The aim of the Study 2 was to explore the construct of career success in the two groups of artists studied. Each participant responded to an open-ended question asking them to define career success. The responses for professional artists were content analysed using emergent coding with two coders. The codebook was later applied to the arts students' definitions. The majority of the themes could be grouped into four main categories: internal definitions; financial recognition definitions; contribution definitions; and non-financial recognition definitions. Only one third of the definition themes in the professional artists' and arts graduates' definitions of career success were categorised as relating to financial recognition. Responses within the financial recognition category also indicated that many of the artists aspired only to a regular subsistence level of arts income (although a small number of the arts graduates did aspire to fame and fortune). The second section of the study investigated the statistical relationships between the five different measures of career success for each career success definitional category and overall. The professional artists' and arts graduates' surveys contained several measures of career success, including total earnings over the previous 12 months, arts earnings over the previous 12 months, 1-6 self-rated total employability, 1-6 self-rated arts employability, and 1-6 self-rated self-defined career success. All of the measures were found to be statistically related to one another, but a very strong statistical relationship was identified between each employability measure and its corresponding earnings measure for both of the samples. Consequently, it was decided to include only the earnings measures (earnings from arts, and earnings overall) and the self-defined career success rating measure in the later studies. Study 3 used the career development constructs validated in Study 1, sociodemographic variables, and the career success measures explored in Study 2 via Classification and Regression Tree (CART - Breiman, Friedman, Olshen, & Stone, 1984) style decision trees with v-fold crossvalidation pruning using the 1 SE rule. CART decision trees are a nonparametric analysis technique which can be used as an alternative to OLS or hierarchical regression in the case of data which violates parametric statistical assumptions. The three optimal decision trees for total earnings, arts earnings and self defined career success ratings explained a large proportion of the variance in their respective target variables (R2 between 0.49 and 0.68). The Career building subscale of the Career Management Competence scale, pertaining to the ability to manage the external aspects of a career, was the most consistent predictor of all three career success measures (and was the strongest predictor for two of the three trees), indicating the importance of the artists' abilities to secure work and build the external aspects of a career. Other important predictors included the Self management subscale of the Career Management Competence scale, Protean Career Success Orientation, length of time working in the arts, and the positive role of interpersonal influences, skills and abilities, and interests and beliefs from the Career Development Influences scale. Slightly different patterns of predictors were found for the three different career success measures. Study 4 also involved the career development constructs validated in Study 1, sociodemographic variables, and the career success measures explored in Study 2 via CART style decision trees. This study used a prospective repeated measures design where the data for the attribute variables were gathered at the point of undergraduate course completion, and the target variables were measured one year later. Data from a total of 122 arts students were used, as 122 of the 218 students who responded to the survey at time 1 (October 2005) also responded at time 2 (October 2006). The resulting optimal decision trees had R2 values of between 0.33 and 0.46. The values were lower than those for the professional artists' decision trees, and the trees themselves were smaller, but the R2 values nonetheless indicated that the arts students' trees possessed satisfactory explanatory power. The arts graduates' Career building scores at time 1 were strongly predictive of all three career success measures at time 2, a similar finding to the professional artists' trees. A further similarity between the trees for the two samples was the strong statistical relationship between Career building, Self management, and Protean Career Success Orientation. However, the most important variable in the total earnings tree was arts discipline category. Technical / design arts graduates consistently earned more overall than arts graduates from other disciplines. Other key predictors in the arts graduates' trees were work experience in arts prior to course completion, positive interpersonal influences, and the positive influence of skills and abilities and interests and beliefs on career development. The research program findings represent significant contributions to existing knowledge about artists' career development and success, and also the transition from higher education to the world of work, with specific reference to arts and creative industries programs. It also has implications for theory relating to career success and protean / boundaryless careers.
|
297 |
[en] MACHINE LEARNING METHODS APPLIED TO PREDICTIVE MODELS OF CHURN FOR LIFE INSURANCE / [pt] MÉTODOS DE MACHINE LEARNING APLICADOS À MODELAGEM PREDITIVA DE CANCELAMENTOS DE CLIENTES PARA SEGUROS DE VIDATHAIS TUYANE DE AZEVEDO 26 September 2018 (has links)
[pt] O objetivo deste estudo foi explorar o problema de churn em seguros de vida, no sentido de prever se o cliente irá cancelar o produto nos próximos 6 meses. Atualmente, métodos de machine learning vêm se popularizando para este tipo de análise, tornando-se uma alternativa ao tradicional método de modelagem da probabilidade de cancelamento através da regressão logística. Em geral, um dos desafios encontrados neste tipo de modelagem é que a proporção de clientes que cancelam o serviço é relativamente pequena. Para isso, este estudo recorreu a técnicas de balanceamento para tratar a base naturalmente desbalanceada – técnicas de undersampling, oversampling e diferentes combinações destas duas foram utilizadas e comparadas entre si. As bases foram utilizadas para treinar modelos de Bagging, Random Forest e Boosting, e seus resultados foram comparados entre si e também aos resultados obtidos através do modelo de Regressão Logística. Observamos que a técnica SMOTE-modificado para balanceamento da base, aplicada ao modelo de Bagging, foi a combinação que apresentou melhores resultados dentre as combinações exploradas. / [en] The purpose of this study is to explore the churn problem in life insurance, in the sense of predicting if the client will cancel the product in the next 6 months. Currently, machine learning methods are becoming popular in this type of analysis, turning it into an alternative to the traditional method of modeling the probability of cancellation through logistics regression. In general, one of the challenges found in this type of modelling is that the proportion of clients who cancelled the service is relatively small. For this, the study resorted to balancing techniques to treat the naturally unbalanced base – under-sampling and over-sampling techniques and different combinations of these two were used and compared among each other. The bases were used to train models of Bagging, Random Forest and Boosting, and its results were compared among each other and to the results obtained through the Logistics Regression model. We observed that the modified SMOTE technique to balance the base, applied to the Bagging model, was the combination that presented the best results among the explored combinations.
|
298 |
[en] TS-TARX: TREE STRUCTURED - THRESHOLD AUTOREGRESSION WITH EXTERNAL VARIABLES / [pt] TS-TARX: UM MODELO DE REGRESSÃO COM LIMIARES BASEADO EM ÁRVORE DE DECISÃOCHRISTIAN NUNES ARANHA 28 January 2002 (has links)
[pt] Este trabalho propõe um novo modelo linear por partes
para a extração de regras de conhecimento de banco de
dados. O modelo é uma heurística baseada em análise de
árvore de regressão, como introduzido por Friedman (1979)
e discutido em detalhe por Breiman (1984). A motivação
desta pesquisa é trazer uma nova abordagem combinando
técnicas estatísticas de modelagem e um algoritmo de
busca por quebras eficiente. A decisão de quebra usada no
algoritmo de busca leva em consideração informações do
ajuste de equações lineares e foi implementado tendo por
inspiração o trabalho de Tsay
(1989). Neste, ele sugere um procedimento para construção
um modelo para a análise de séries temporais chamado TAR
(threshold autoregressive model), introduzido por
Tong (1978) e discutido em detalhes por Tong e Lim (1980)
e Tong (1983). O modelo TAR é um modelo linear por partes
cuja idéia central é alterar os parâmetros do modelo
linear autoregressivo de acordo com o valor de uma
variável observada, chamada de variável limiar. No
trabalho de Tsay, a Identificação do número e
localização do potencial limiar era baseada na analise de
gráficos. A idéia foi então criar um novo algoritmo todo
automatizado. Este processo é um algoritmo que preserva
o método de regressão por mínimos quadrados recursivo
(MQR) usado no trabalho de Tsay. Esta talvez seja uma das
grandes vantagens da metodologia introduzida neste
trabalho, visto que Cooper (1998) em seu trabalho de
análise de múltiplos regimes afirma não ser possível
testar cada quebra. Da combinação da árvore de decisão
com a técnica de regressão (MQR), o modelo se tornou o
TS-TARX (Tree Structured - Threshold AutoRegression with
eXternal variables). O procedimento consiste numa busca
em árvore binária calculando a estatística F para a
seleção das variáveis e o critério de informação BIC para
a seleção dos modelos. Ao final, o algoritmo gera como
resposta uma árvore de decisão (por meio de regras) e as
equações de regressão estimadas para cada regime da
partição. A principal característica deste tipo de
resposta é sua fácil interpretação. O trabalho conclui
com algumas aplicações em bases de dados padrões
encontradas na literatura e outras que auxiliarão o
entendimento do processo implementado. / [en] This research work proposes a new piecewise linear model to
extract knowledge rules from databases. The model is an
heuristic based on analysis of regression trees, introduced
by Friedman (1979) and discussed in detail by Breiman
(1984). The motivation of this research is to come up with
a new approach combining both statistical modeling
techniques and an efficient split search algorithm.
The split decision used in the split search algorithm
counts on information from adjusted linear equation and was
implemented inspired by the work of Tsay (1989). In his
work, he suggests a model-building procedure for a
nonlinear time series model called by TAR (threshold
autoregressive model), first proposed by Tong (1978) and
discussed in detail by Tong and Lim (1980) and Tong (1983).
The TAR model is a piecewise linear model which main idea
is to set the coefficients of a linear autoregressive
process in accordance with a value of observed variable,
called by threshold variable. Tsay`s identification of the
number and location of the potential thresholds was based
on supplementary graphic devices. The idea is to get the
whole process automatic on a new model-building process.
This process is an algorithm that preserves the method of
regression by recursive least squares (RLS) used in Tsay`s
work. This regression method allowed the test of all
possibilities of data split. Perhaps that is the main
advantage of the methodology introduced in this work,
seeing that Cooper, S. (1998) said about the impossibility
of testing each break.Thus, combining decision tree
methodology with a regression technique (RLS), the model
became the TS-TARX (Tree Structured - Threshold
AutoRegression with eXternal variables). It searches on a
binary tree calculating F statistics for variable selection
and the information criteria BIC for model selection. In
the end, the algorithm produces as result a decision tree
and a regression equation adjusted to each regime of the
partition defined by the decision tree. Its major advantage
is easy interpretation.This research work concludes with
some applications in benchmark databases from literature
and others that helps the understanding of the algorithm
process.
|
299 |
Classification automatique de textes pour les revues de littérature mixtes en santéLanglois, Alexis 12 1900 (has links)
Les revues de littérature sont couramment employées en sciences de la santé pour justifier et interpréter les résultats d’un ensemble d’études. Elles permettent également aux chercheurs, praticiens et décideurs de demeurer à jour sur les connaissances. Les revues dites systématiques mixtes produisent un bilan des meilleures études portant sur un même sujet tout en considérant l’ensemble des méthodes de recherche quantitatives et qualitatives. Leur production est ralentie par la prolifération des publications dans les bases de données bibliographiques et la présence accentuée de travaux non scientifiques comme les éditoriaux et les textes d’opinion. Notamment, l’étape d’identification des études pertinentes pour l’élaboration de telles revues s’avère laborieuse et requiert un temps considérable. Traditionnellement, le triage s’effectue en utilisant un ensemble de règles établies manuellement. Dans cette étude, nous explorons la possibilité d’utiliser la classification automatique pour exécuter cette tâche.
La famille d’algorithmes ayant été considérée dans le comparatif de ce travail regroupe les arbres de décision, la classification naïve bayésienne, la méthode des k plus proches voisins, les machines à vecteurs de support ainsi que les approches par votes. Différentes méthodes de combinaison de caractéristiques exploitant les termes numériques, les symboles ainsi que les synonymes ont été comparés. La pertinence des concepts issus d’un méta-thésaurus a également été mesurée.
En exploitant les résumés et les titres d’approximativement 10 000 références, les forêts d’arbres de décision admettent le plus haut taux de succès (88.76%), suivies par les machines à vecteurs de support (86.94%). L’efficacité de ces approches devance la performance des filtres booléens conçus pour les bases de données bibliographiques. Toutefois, une sélection judicieuse des entrées de la collection d’entraînement est cruciale pour pallier l’instabilité du modèle final et la disparité des méthodologies quantitatives et qualitatives des études scientifiques existantes. / The interest of health researchers and policy-makers in literature reviews has continued to increase over the years. Mixed studies reviews are highly valued since they combine results from the best available studies on various topics while considering quantitative, qualitative and mixed research methods. These reviews can be used for several purposes such as justifying, designing and interpreting results of primary studies. Due to the proliferation of published papers and the growing number of nonempirical works such as editorials and opinion letters, screening records for mixed studies reviews is time consuming. Traditionally, reviewers are required to manually identify potential relevant studies. In order to facilitate this process, a comparison of different automated text classification methods was conducted in order to determine the most effective and robust approach to facilitate systematic mixed studies reviews.
The group of algorithms considered in this study combined decision trees, naive Bayes classifiers, k-nearest neighbours, support vector machines and voting approaches. Statistical techniques were applied to assess the relevancy of multiple features according to a predefined dataset. The benefits of feature combination for numerical terms, synonyms and mathematical symbols were also measured. Furthermore, concepts extracted from a metathesaurus were used as additional features in order to improve the training process.
Using the titles and abstracts of approximately 10,000 entries, decision trees perform the best with an accuracy of 88.76%, followed by support vector machine (86.94%). The final model based on decision trees relies on linear interpolation and a group of concepts extracted from a metathesaurus. This approach outperforms the mixed filters commonly used with bibliographic databases like MEDLINE. However, references chosen for training must be selected judiciously in order to address the model instability and the disparity of quantitative and qualitative study designs.
|
300 |
Decision making strategy for antenatal echographic screening of foetal abnormalities using statistical learning / Méthodologie d'aide à la décision pour le dépistage anténatal échographique d'anomalies fœtales par apprentissage statistiqueBesson, Rémi 01 October 2019 (has links)
Dans cette thèse, nous proposons une méthode pour construire un outil d'aide à la décision pour le diagnostic de maladie rare. Nous cherchons à minimiser le nombre de tests médicaux nécessaires pour atteindre un état où l'incertitude concernant la maladie du patient est inférieure à un seuil prédéterminé. Ce faisant, nous tenons compte de la nécessité dans de nombreuses applications médicales, d'éviter autant que possible, tout diagnostic erroné. Pour résoudre cette tâche d'optimisation, nous étudions plusieurs algorithmes d'apprentissage par renforcement et les rendons opérationnels pour notre problème de très grande dimension. Pour cela nous décomposons le problème initial sous la forme de plusieurs sous-problèmes et montrons qu'il est possible de tirer partie des intersections entre ces sous-tâches pour accélérer l'apprentissage. Les stratégies apprises se révèlent bien plus performantes que des stratégies gloutonnes classiques. Nous présentons également une façon de combiner les connaissances d'experts, exprimées sous forme de probabilités conditionnelles, avec des données cliniques. Il s'agit d'un aspect crucial car la rareté des données pour les maladies rares empêche toute approche basée uniquement sur des données cliniques. Nous montrons, tant théoriquement qu'empiriquement, que l'estimateur que nous proposons est toujours plus performant que le meilleur des deux modèles (expert ou données) à une constante près. Enfin nous montrons qu'il est possible d'intégrer efficacement des raisonnements tenant compte du niveau de granularité des symptômes renseignés tout en restant dans le cadre probabiliste développé tout au long de ce travail. / In this thesis, we propose a method to build a decision support tool for the diagnosis of rare diseases. We aim to minimize the number of medical tests necessary to achieve a state where the uncertainty regarding the patient's disease is less than a predetermined threshold. In doing so, we take into account the need in many medical applications, to avoid as much as possible, any misdiagnosis. To solve this optimization task, we investigate several reinforcement learning algorithm and make them operable in our high-dimensional. To do this, we break down the initial problem into several sub-problems and show that it is possible to take advantage of the intersections between these sub-tasks to accelerate the learning phase. The strategies learned are much more effective than classic greedy strategies. We also present a way to combine expert knowledge, expressed as conditional probabilities, with clinical data. This is crucial because the scarcity of data in the field of rare diseases prevents any approach based solely on clinical data. We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data) within a constant. Finally, we show that it is possible to effectively integrate reasoning taking into account the level of granularity of the symptoms reported while remaining within the probabilistic framework developed throughout this work.
|
Page generated in 0.0651 seconds