• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 146
  • 36
  • 22
  • 15
  • 8
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 289
  • 289
  • 97
  • 90
  • 77
  • 69
  • 57
  • 57
  • 56
  • 39
  • 39
  • 36
  • 34
  • 31
  • 28
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

Uma abordagem para a indução de árvores de decisão voltada para dados de expressão gênica / An Approach for the Induction of Decision Trees Focused on Gene Expression Data

Pedro Santoro Perez 18 April 2012 (has links)
Estudos de expressão gênica têm sido de extrema importância, permitindo desenvolver terapias, exames diagnósticos, medicamentos e desvendar uma infinidade de processos biológicos. No entanto, estes estudos envolvem uma série de dificuldades: grande quantidade de genes, sendo que geralmente apenas um pequeno número deles está envolvido no problema estudado; presença de ruído nos dados analisados; entre muitas outras. O projeto de pesquisa deste mestrado consiste no estudo de algoritmos de indução de árvores de decisão; na definição de uma metodologia capaz de tratar dados de expressão gênica usando árvores de decisão; e na implementação da metodologia proposta como algoritmos capazes de extrair conhecimento a partir desse tipo de dados. A indução de árvores de decisão procura por características relevantes nos dados que permitam modelar precisamente um conceito, mas tem também a preocupação com a compreensibilidade do modelo gerado, auxiliando os especialistas na descoberta de conhecimento, algo importante nas áreas médica e biológica. Por outro lado, tais indutores apresentam relativa instabilidade, podendo gerar modelos bem diferentes com pequenas mudanças nos dados de treinamento. Este é um dos problemas tratados neste mestrado. Mas o principal problema tratado se refere ao comportamento destes indutores em dados de alta dimensionalidade, mais especificamente dados de expressão gênica: atributos irrelevantes prejudicam o aprendizado e vários modelos com desempenho similar podem ser gerados. Diversas técnicas foram exploradas para atacar os problemas mencionados, mas este estudo se concentrou em duas delas: windowing, que foi a técnica mais explorada e para a qual este mestrado propôs uma série de alterações com vistas à melhoria de seu desempenho; e lookahead, que procura construir a árvore levando em considerações passos subsequentes do processo de indução. Quanto ao windowing, foram explorados aspectos relacionados ao procedimento de poda das árvores geradas durante a execução do algoritmo; uso do erro estimado em substituição ao erro de treinamento; uso de ponderação do erro calculado durante a indução de acordo com o tamanho da janela; e uso da confiança na classificação para decidir quais exemplos utilizar na atualização da janela corrente. Com relação ao lookahead, foi implementada uma versão de um passo à frente, ou seja, para tomar a decisão na iteração corrente, o indutor leva em consideração a razão de ganho de informação do passo seguinte. Os resultados obtidos, principalmente com relação às medidas de desempenho baseadas na compreensibilidade dos modelos induzidos, mostram que os algoritmos aqui propostos superaram algoritmos clássicos de indução de árvores. / Gene expression studies have been of great importance, allowing the development of new therapies, diagnostic exams, drugs and the understanding of a variety of biological processes. Nevertheless, those studies involve some obstacles: a huge number of genes, while only a very few of them are really relevant to the problem at hand; data with the presence of noise; among others. This research project consists of: the study of decision tree induction algorithms; the definition of a methodology capable of handling gene expression data using decision trees; and the implementation of that methodology as algorithms that can extract knowledge from that kind of data. The decision tree induction searches for relevant characteristics in the data which would allow it to precisely model a certain concept, but it also worries about the comprehensibility of the generated model, helping specialists to discover new knowledge, something very important in the medical and biological areas. On the other hand, such inducers present some instability, because small changes in the training data might produce great changes in the generated model. This is one of the problems being handled in this Master\'s project. But the main problem this project handles refers to the behavior of those inducers when it comes to high-dimensional data, more specifically to gene expression data: irrelevant attributes may harm the learning process and many models with similar performance may be generated. A variety of techniques have been explored to treat those problems, but this study focused on two of them: windowing, which was the most explored technique and to which this project has proposed some variations in order to improve its performance; and lookahead, which builds each node of a tree taking into consideration subsequent steps of the induction process. As for windowing, the study explored aspects related to the pruning of the trees generated during intermediary steps of the algorithm; the use of the estimated error instead of the training error; the use of the error weighted according to the size of the current window; and the use of the classification confidence as the window update criterion. As for lookahead, a 1-step version was implemented, i.e., in order to make the decision in the current iteration, the inducer takes into consideration the information gain ratio of the next iteration. The results show that the proposed algorithms outperform the classical ones, especially considering measures of complexity and comprehensibility of the induced models.
162

Novel Learning-Based Task Schedulers for Domain-Specific SoCs

January 2020 (has links)
abstract: This Master’s thesis includes the design, integration on-chip, and evaluation of a set of imitation learning (IL)-based scheduling policies: deep neural network (DNN)and decision tree (DT). We first developed IL-based scheduling policies for heterogeneous systems-on-chips (SoCs). Then, we tested these policies using a system-level domain-specific system-on-chip simulation framework [11]. Finally, we transformed them into efficient code using a cloud engine [1] and implemented on a user-space emulation framework [61] on a Unix-based SoC. IL is one area of machine learning (ML) and a useful method to train artificial intelligence (AI) models by imitating the decisions of an expert or Oracle that knows the optimal solution. This thesis's primary focus is to adapt an ML model to work on-chip and optimize the resource allocation for a set of domain-specific wireless and radar systems applications. Evaluation results with four streaming applications from wireless communications and radar domains show how the proposed IL-based scheduler approximates an offline Oracle expert with more than 97% accuracy and 1.20× faster execution time. The models have been implemented as an add-on, making it easy to port to other SoCs. / Dissertation/Thesis / Masters Thesis Computer Engineering 2020
163

Dolování dat / Data Mining

Stehno, David January 2013 (has links)
The aim of the thesis was to study and describe data mining methodology CRISP-DM. From the collected database of calls to the call center a prediction was performed, based on CRISP-DM methodology. In phase of test situation modeling four different testing methods were used: the k-NN, neural network, linear regression and super vector machine. The input attributes importance for further prediction was evaluated based on different selections. The results and findings may provide data for further more accurate forecasts in the future; not only in number of calls but also other indicators relevant to the call center.
164

Natural Language Explanation Model for Decision Trees

Silva, Jesús, Hernández Palma, Hugo, Niebles Núẽz, William, Ruiz-Lazaro, Alex, Varela, Noel 07 January 2020 (has links)
This study describes a model of explanations in natural language for classification decision trees. The explanations include global aspects of the classifier and local aspects of the classification of a particular instance. The proposal is implemented in the ExpliClas open source Web service [1], which in its current version operates on trees built with Weka and data sets with numerical attributes. The feasibility of the proposal is illustrated with two example cases, where the detailed explanation of the respective classification trees is shown.
165

Minska risk för vindskador i granbestånd – hur fungerar ett verktyg för riskanalys i praktiken / Reducing the risk of wind damage in spruce forest stands – evaluating a practical tool

Wimarson, Anders January 2021 (has links)
Starka vindar orsakar stora skador för det svenska skogsbruket och samhället. Därför är det viktigt att kunna hitta de bestånd som har hög sannolikhet att drabbas av dessa skador. För att lyckas med detta krävs ett enkelt verktyg där bestånden kan bedömas med denutrustning och den kunskap som finns ute på de svenska skogsgårdarna.Den här studien utvärderar och testar ett verktyg som är framtagen av Olofsson & Blennow (2005). Resultatet visar att verktyget fungerar och att det är användarvänligt. Av 90 undersökta bedömningarresulterade 23 % i hög sannolikhet för stormskador på den undersökta gården i norra Halland. Studien visar också på vikten av att använda aktuella data och arbeta med hög noggrannhet i framtagandet avbeståndsdata. De viktigaste parametrarna för att bedömasannolikheten var beståndskantshöjd och HD-kvot.
166

A Comparison of Machine Learning Techniques to Predict University Rates

Park, Samuel M. 06 September 2019 (has links)
No description available.
167

Digital Education Resource Mining for Decision Support

AL Fanah, Muna M.S. January 2021 (has links)
Nowadays education becomes a competitive and challenging domain, both na­tionally and internationally in terms of quality, visibility, experience of aca­demic delivery affecting institutions, applicants, regulatory bodies. Currently data becomes more available for the general and public use, and plays also an increasingly significant role in decision support for education topics. For example, world university rankings (WUR) such as Quacquarelli Symonds (QS), Central World University Rankings (CWUR), Times Higher Education (Times) and national university rankings (e.g. the Guardian newspaper Best UK Universities and the Complete University Guide league tables) have published their data for many years now and are increasingly used in such decision making processes by institutions and general public.
168

Development and validation of clinical prediction models to diagnose acute respiratory infections in children and adults from Canadian Hutterite communities.

Vuichard Gysin, Danielle January 2016 (has links)
Acute respiratory infections (ARI) caused by influenza and other respiratory viruses affect millions of people annually. Although usually self-limiting a more complicated or severe course may occur in previously healthy people but are more likely in individuals with underlying illnesses. The most common viral agent is rhinovirus whereas influenza is less frequent but is well known to cause winter epidemics. In primary care, rapid diagnosis of influenza virus infections is essential in order to provide treatment. Clinical presentations vary among the different pathogens but may overlap and may also depend on host factors. Predictive models have been developed for influenza but study results may be biased because only individuals presenting with fever were included. Most of these models have not been adequately validated and their predictive power, therefore, is likely overestimated. The main objective of this thesis was to compare different mathematical models for the derivation of clinical prediction rules in individuals presenting with symptoms of ARI to better distinguish between influenza, influenza A subtypes and entero-/rhinovirus-related illness in children and adults and to evaluate model performance by using data-splitting for internal validation. Data from a completed prospective cluster-randomized trial for the indirect effect of influenza vaccination in children of Hutterite communities served as a basis of my thesis. There were a total of 3288 first episodes per season of ARI in 2202 individuals and 321 (9.8%) influenza positive events over three influenza seasons (2008-2011). The data set was divided into children under 18 years and adults. Both data sets were randomly split by subjects into a derivation (2/3 of the dataset) and a validation population (1/3 of the dataset). All predictive models were developed in the derivation sets. Demographic factors and the classical symptoms of ARI were evaluated with logistic regression and Cox proportional hazard models using forward stepwise selection applying robust estimators to account for non-independent data and by means of recursive partitioning. The beta coefficients of the independent predictors were used to develop different point scores. These scores were then tested in the validation groups and performance between validation and derivation set was compared using receiver operating characteristics (ROC) curves. We determined sensitivities and specificities, positive and negative predictive values, and likelihood ratios at different cut-points which could reflect test and treatment thresholds. Fever, chills, and cough were the most important predictors in children whereas chills and cough but not fever were most predictive of influenza virus infection in adults. Performance of the individual models was moderate with areas under the receiver operating characteristic curves between 0.75 and 0.80 for the main outcome influenza A or B virus infection. There was no statistically significant difference in performance between the derivation and validation sets for the main outcome. The results have shown, that various mathematical models have similar discriminative ability to distinguish influenza from other respiratory viruses. The scores could assist clinicians in their decision-making. However, performance of the models was slightly overestimated due to potential clustering of data and the results would first needed to be validated in a different population before application in clinical practice. / Thesis / Master of Science (MSc) / Every year, millions of people are attacked by "the flu" or the common cold. Certain signs and symptoms apparently are more discriminative between the common cold and the flu. However, the decision between starting a simple symptom orientated treatment, treating empirically for influenza or ordering a rapid diagnostic test that has only moderate sensitivity and specificity can be challenging. This thesis, therefore, aims to help physicians in their decision-making process by developing simple scores and decision trees for the diagnosis of influenza versus non-influenza respiratory infections. Data from a completed trial for the indirect effect of influenza vaccination in children of Hutterite communities served as a basis of my thesis. There were a total of 3288 first seasonal episodes of ARI in 2202 individuals and 321 (9.8%) influenza positive events over three influenza seasons (2008-2011). The data set was divided into children under 18 years and adults. Both data sets were split into a derivation and a validation set (=holdout group). Different mathematical models were applied to the derivation set and demographic factors as well as the classical symptoms of ARI were evaluated. The scores generated from the most important factors that remained in the model were then tested in the validation group and performance between validation and derivation set was compared. Accuracy was determined at different cut-points which could reflect test and treatment thresholds. Fever, chills, and cough were the most important predictors in children whereas chills and cough but not fever were most predictive of influenza virus infection in adults. Performance of the individual models was moderate for the main outcome influenza A or B virus infection. There was no statistically significant difference in performance between the derivation and validation sets for the main outcome. The results have shown, that various mathematical models have similar discriminative ability to distinguish influenza from other respiratory viruses. The scores could assist clinicians in their decision-making. However, the results would first needed to be validated in a different population before application in clinical practice.
169

Modeling Nonignorable Missingness with Response Times Using Tree-based Framework in Cognitive Diagnostic Models

Yang, Yi January 2023 (has links)
As the testing moves from paper-and-pencil to computer-based assessment, both response accuracy (RA) and response time (RT) together provide a potential for improving the performance evaluation and ability estimation of the test takers. Most joint models utilizing RAs and RTs simultaneously assumed an IRT model for the RA measurement at the lower level, among which the hierarchical speed-accuracy (SA) model proposed by van der Linden (2007) is the most prevalent in literature. Zhan et al. (2017) extended the SA model in cognitive diagnostic modeling (CDM) by proposing the hierarchical joint response and times DINA (JRT-DINA) model, but little is known about its generalizability with the presence of missing data. Large-scale assessments are used in educational effectiveness studies to quantify educational achievement, in which the amount of item nonresponses is not negligible (Pohl et al., 2012; Pohl et al., 2019; Rose et al., 2017; Rose et al., 2010) due to lack of proficiency, lack of motivation and/or lack of time. Treating unplanned missingness as ignorable leads to biased sample-based estimates of item and person parameters (R. J. A. Little & Rubin, 2020; Rubin, 1976), therefore, in the past few decades, intensive efforts have been focused on nonignorable missingness (Glas & Pimentel, 2008; Holman & Glas, 2005; Pohl et al., 2019; Rose et al., 2017; Rose et al., 2010; Ulitzsch et al., 2020a, 2020b). However, a great majority of these methods were limited in item nonresponse types and/or model complexity until J. Lu and Wang (2020) incorporated the mixture cure-rate model (Lee & Ying, 2015) and the tree-based IRT framework (Debeer et al., 2017), which inherited a built-in behavior process for item nonresponses thus introduced no additional latent propensity parameters to the joint model. Nevertheless, these approaches were discussed within the IRT framework, and the traditional measurement models could not provide cognitive diagnostic information about attribute mastery. This dissertation first postulates the CDMTree model, an extension of the tree-based RT process model in CDM, and then explores its efficacy through a real data analysis using PISA 2012 computer-based assessment of mathematics data. The follow-up simulation study compares the proposed model to the JRT-DINA model under multiple conditions to deal with various types of nonignorable missingness, i.e. both omitted items (OIs) and not-reached items (NRIs) due to time limits. A fully Bayesian approach is used for the estimation of the model with the Markov chain Monte Carlo (MCMC) method.
170

OPTIMIZING DECISION TREE ENSEMBLES FOR GENE-GENE INTERACTION DETECTION

Assareh, Amin 27 November 2012 (has links)
No description available.

Page generated in 0.2511 seconds