Global ETD Search

31	Uncertainty and indistinguishability. Application to modelling with words. Hernández Jiménez, Enric 12 January 2007 (has links) El concepte d'igualtat és fonamental en qualsevol teoria donat que és una noció essencial a l'hora de discernir entre els elements objecte del seu estudi i possibilitar la definició de mecanismes de classificació.Quan totes les propietats són perfectament precises (absència d'incertesa), hom obtè la igualtat clàssica a on dos objectes són considerats iguals si i només si comparteixen el mateix conjunt de propietats. Però, què passa quan considerem l'aparició d'incertesa, com en el cas a on els objectes compleixen una determinada propietat només fins a un cert grau?. Llavors, donat que alguns objectes seran més similars entre si que d'altres, sorgeix la necessitat de una noció gradual del concepte d'igualtat.Aquestes consideracions refermen la idea de que certs contextos requereixen una definició més flexible, que superi la rigidesa de la noció clàssica d'igualtat. Els operadors de T-indistingibilitat semblen bons candidats per aquest nou tipus d'igualtat que cerquem.D'altra banda, La Teoria de l'Evidència de Dempster-Shafer, com a marc pel tractament d'evidències, defineix implícitament una noció d'indistingibilitat entre els elements del domini de discurs basada en la seva compatibilitat relativa amb l'evidència considerada. El capítol segon analitza diferents mètodes per definir l'operador de T-indistingibilitat associat a una evidència donada.En el capítol tercer, després de presentar un exhaustiu estat de l'art en mesures d'incertesa, ens centrem en la qüestió del còmput de l'entropia quan sobre els elements del domini s'ha definit una relació d'indistingibilitat. Llavors, l'entropia hauria de ser mesurada no en funció de l'ocurrència d'events diferents, sinó d'acord amb la variabilitat percebuda per un observador equipat amb la relació d'indistingibilitat considerada. Aquesta interpretació suggereix el "paradigma de l'observador" que ens porta a la introducció del concepte d'entropia observacional.La incertesa és un fenomen present al món real. El desenvolupament de tècniques que en permetin el tractament és doncs, una necessitat. La 'computació amb paraules' ('computing with words') pretén assolir aquest objectiu mitjançant un formalisme basat en etiquetes lingüístiques, en contrast amb els mètodes numèrics tradicionals. L'ús d'aquestes etiquetes millora la comprensibilitat del llenguatge de representació delconeixement, a l'hora que requereix una adaptació de les tècniques inductives tradicionals.En el quart capítol s'introdueix un nou tipus d'arbre de decisió que incorpora les indistingibilitats entre elements del domini a l'hora de calcular la impuresa dels nodes. Hem anomenat arbres de decisió observacionals a aquests nou tipus, donat que es basen en la incorporació de l'entropia observacional en la funció heurística de selecció d'atributs. A més, presentem un algorisme capaç d'induir regles lingüístiques mitjançant un tractament adient de la incertesa present a les etiquetes lingüístiques o a les dades mateixes. La definició de l'algorisme s'acompanya d'una comparació formal amb altres algorismes estàndards. / The concept of equality is a fundamental notion in any theory since it is essential to the ability of discerning the objects to whom it concerns, ability which in turn is a requirement for any classification mechanism that might be defined. When all the properties involved are entirely precise, what we obtain is the classical equality, where two individuals are considered equal if and only if they share the same set of properties. What happens, however, when imprecision arises as in the case of properties which are fulfilled only up to a degree? Then, because certain individuals will be more similar than others, the need for a gradual notion of equality arises.These considerations show that certain contexts that are pervaded with uncertainty require a more flexible concept of equality that goes beyond the rigidity of the classic concept of equality. T-indistinguishability operators seem to be good candidates for this more flexible and general version of the concept of equality that we are searching for.On the other hand, Dempster-Shafer Theory of Evidence, as a framework for representing and managing general evidences, implicitly conveys the notion of indistinguishability between the elements of the domain of discourse based on their relative compatibility with the evidence at hand. In chapter two we are concerned with providing definitions for the T-indistinguishability operator associated to a given body of evidence.In chapter three, after providing a comprehensive summary of the state of the art on measures of uncertainty, we tackle the problem of computing entropy when an indistinguishability relation has been defined over the elements of the domain. Entropy should then be measured not according to the occurrence of different events, but according to the variability perceived by an observer equipped with indistinguishability abilities as defined by the indistinguishability relation considered. This idea naturally leads to the introduction of the concept of observational entropy.Real data is often pervaded with uncertainty so that devising techniques intended to induce knowledge in the presence of uncertainty seems entirely advisable.The paradigm of computing with words follows this line in order to provide a computation formalism based on linguistic labels in contrast to traditional numerical-based methods.The use of linguistic labels enriches the understandability of the representation language, although it also requires adapting the classical inductive learning procedures to cope with such labels.In chapter four, a novel approach to building decision trees is introduced, addressing the case when uncertainty arises as a consequence of considering a more realistic setting in which decision maker's discernment abilities are taken into account when computing node's impurity measures. This novel paradigm results in what have been called --observational decision trees' since the main idea stems from the notion of observational entropy in order to incorporate indistinguishability concerns. In addition, we present an algorithm intended to induce linguistic rules from data by properly managing the uncertainty present either in the set of describing labels or in the data itself. A formal comparison with standard algorithms is also provided. machine learning decision trees indistinguishability operators uncertainty measures 004
32	Learning Instruction Scheduling Heuristics from Optimal Data Russell, Tyrel January 2006 (has links) The development of modern pipelined and multiple functional unit processors has increased the available instruction level parallelism. In order to fully utilize these resources, compiler writers spend large amounts of time developing complex scheduling heuristics for each new architecture. In order to reduce the time spent on this process, automated machine learning techniques have been proposed to generate scheduling heuristics. We present two case studies using these techniques to generate instruction scheduling heuristics for basic blocks and super blocks. A basic block is a block of code with a single flow of control and a super block is a collection of basic blocks with a single entry point but multiple exit points. We improve previous techniques for automated generation of basic block scheduling heuristics by increasing the quality of the training data and increasing the number of features considered, including several novel features that have useful effects on scheduling instructions. Our case study into super block scheduling heuristics is a novel contribution as previous approaches were only applied to basic blocks. We show through experimentation that we can produce efficient heuristics that perform better than current heuristic methods for basic block and super block scheduling. We show that we can reduce the number of non-optimally scheduled blocks by up to 55% for basic blocks and 38% for super blocks. We also show that we can produce better schedules 7. 8 times more often than the next best heuristic for basic blocks and 4. 4 times more often for super blocks. Computer Science Instruction Scheduling Heuristics Machine Learning Decision Trees
33	Learning Instruction Scheduling Heuristics from Optimal Data Russell, Tyrel January 2006 (has links) The development of modern pipelined and multiple functional unit processors has increased the available instruction level parallelism. In order to fully utilize these resources, compiler writers spend large amounts of time developing complex scheduling heuristics for each new architecture. In order to reduce the time spent on this process, automated machine learning techniques have been proposed to generate scheduling heuristics. We present two case studies using these techniques to generate instruction scheduling heuristics for basic blocks and super blocks. A basic block is a block of code with a single flow of control and a super block is a collection of basic blocks with a single entry point but multiple exit points. We improve previous techniques for automated generation of basic block scheduling heuristics by increasing the quality of the training data and increasing the number of features considered, including several novel features that have useful effects on scheduling instructions. Our case study into super block scheduling heuristics is a novel contribution as previous approaches were only applied to basic blocks. We show through experimentation that we can produce efficient heuristics that perform better than current heuristic methods for basic block and super block scheduling. We show that we can reduce the number of non-optimally scheduled blocks by up to 55% for basic blocks and 38% for super blocks. We also show that we can produce better schedules 7. 8 times more often than the next best heuristic for basic blocks and 4. 4 times more often for super blocks. Computer Science Instruction Scheduling Heuristics Machine Learning Decision Trees
34	A Class-rooted FP-tree Approach to Data Classification Chen, Chien-hung 29 June 2005 (has links) Classification, an important problem of data mining, is one of useful techniques for prediction. The goal of the classification problem is to construct a classifier from a given database for training, and to predict new data with the unknown class. Classification has been widely applied to many areas, such as medical diagnosis and weather prediction. The decision tree is the most popular model among classifiers, since it can generate understandable rules and perform classification without requiring any computation. However, a major drawback of the decision tree model is that it only examines a single attribute at a time. In the real world, attributes in some databases are dependent on each other. Thus, we may improve the accuracy of the decision tree by discovering the correlation between attributes. The CAM method applies the method of mining association rules, like the Apriori method, for discovering the attribute dependence. However, traditional methods for mining association rules are inefficient in the classification applications and could have five problems: (1) the combinatorial explosion problem, (2) invalid candidates, (3) unsuitable minimal support, (4) the ignored meaningful class values, and (5) itemsets without class data. The FP-growth avoids the first two problems. However, it is still suffered from the remaining three problems. Moreover, one more problem occurs: Unnecessary nodes for the classification problem which make the FP-tree incompact and huge. Furthermore, the workload of the CAM method is expensive due to too many times of database scanning, and the attribute combination problem causes some misclassification. Therefore, in this thesis, we present an efficient and accurate decision tree building method which resolves the above six problems and reduces the overhead of database scanning in the CAM method. We build a structure named class-rooted FP-tree which is a tree similar to the FP-tree, except the root of the tree is always a class item. Instead of using a static minimal support applied in the FP-growth method, we decide the minimal support dynamically, which can avoid some misjudgement of large itemsets used for the classification problem. In the decision tree building phase, we provide a pruning strategy that can reduce the times of database scanning. We also solve the attribute combination problem in the CAM method and improve the accuracy. From our simulation, we show that the performance of the proposed class-rooted FP-tree mining method is better than that of other mining association rule methods in terms of storage usage. Our simulation also shows the performance improvement of our method in terms of the times of database scanning and classification accuracy as compared with the CAM method. Therefore, the mining strategy of our proposed method is applicable to any method for building decision tree, and provides high accuracy in the real world. data mining correlated attributes association rules classification decision trees
35	Data Mining with Decision Trees in the Gene Logic Database : A Breast Cancer Study Rahpeymai, Neda January 2002 (has links) <p>Data mining approaches have been increasingly used in recent years in order to find patterns and regularities in large databases. In this study, the C4.5 decision tree approach was used for mining of Gene Logic database, containing biological data. The decision tree approach was used in order to identify the most relevant genes and risk factors involved in breast cancer, in order to separate healthy patients from breast cancer patients in the data sets used. Four different tests were performed for this purpose. Cross validation was performed, for each of the four tests, in order to evaluate the capacity of the decision tree approaches in correctly classifying ‘new’ samples. In the first test, the expression of 108 breast related genes, shown in appendix A, for 75 patients were used as input to the C4.5 algorithm. This test resulted in a decision tree containing only four genes considered to be the most relevant in order to correctly classify patients. Cross validation indicates an average accuracy of 89% in classifying ‘new’ samples. In the second test, risk factor data was used as input. The cross validation result shows an average accuracy of 87% in classifying ‘new’ samples. In the third test, both gene expression data and risk factor data were put together as one input. The cross validation procedure for this approach again indicates an average accuracy of 87% in classifying ‘new’ samples. In the final test, the C4.5 algorithm was used in order to indicate possible signalling pathways involving the four genes identified by the decision tree based on only gene expression data. In some of cases, the C4.5 algorithm found trees suggesting pathways which are supported by the breast cancer literature. Since not all pathways involving the four putative breast cancer genes are known yet, the other suggested pathways should be further analyzed in order to increase their credibility.</p><p>In summary, this study demonstrates the application of decision tree approaches for the identification of genes and risk factors relevant for the classification of breast cancer patients</p> Data mining Decision trees C4.5 Breast cancer Bioinformatics Bioinformatik
36	Localised splitting criteria for classification and regression trees / Bremner, Alexandra P. January 2004 (has links) Thesis (Ph.D.)--Murdoch University, 2004. / Thesis submitted to the Division of Science and Engineering. Bibliography: leaves 172-182.
37	Simplicial complexes of graphs / Jonsson, Jakob. January 2008 (has links) (PDF) Univ., Diss.--Stockholm, 2005. / Includes bibliographical references (p. [361] - 369) and index.
38	Computer-enhanced knowledge discovery in environmental science : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Environmental Science at the University of Canterbury / Fukuda, K. January 1900 (has links) Thesis (Ph. D.)--University of Canterbury, 2008. / Typescript (photocopy). Includes bibliographical references. Also available via the World Wide Web.
39	Development of predictive mapping techniques for soil survey and salinity mapping / Elnaggar, Abdelhamid A. January 1900 (has links) Thesis (Ph. D.)--Oregon State University, 2008. / Printout. Includes bibliographical references. Also available on the World Wide Web.
40	Qualidade técnica e reparo periapical em retratamentos endodônticos : estudo observacional Signor, Bruna January 2017 (has links) Introdução: Retratamentos endodônticos apresentam maior complexidade técnica e piores prognósticos quando comparados ao tratamento endodôntico inicial. Nesse contexto, uma investigação mais detalhada em relação aos fatores que afetam a exiquibilidade de se obter qualidade técnica satisfatória e reparo periapical é necessária. Técnicas empregadas para mineração de dados são pouco exploradas na área da Odontologia, ainda que apresentem potencial em contribuir com a descoberta do conhecimento. No presente estudo, padrões e fatores de risco relacionados à qualidade técnica e ao reparo periapical de retratamentos endodônticos foram investigados. Árvores de decisão foram geradas, sendo essa técnica complementada pela análise estatística convencional. Metodologia: Este estudo observacional incluiu 321 indivíduos com indicação de retratamento endodôntico atendidos por alunos de especialização em Endodontia. Foram coletados dados demográficos, referentes a história médica, ao diagnóstico, ao tratamento e a controles pós-operatórios, os quais foram transferidos para uma base de dados eletrônica. Após o preparo e pré-processamento de dados, foram selecionadas 32 variáveis independentes e 2 variáveis dependentes, as quais compreenderam os desfechos qualidade técnica do retratamento e reparo periapical. Estatísticas descritivas foram conduzidas a fim de determinar a frequência de dados ausentes, a distribuição das variáveis categóricas e a média e desvio-padrão de variáveis numéricas. Foram geradas árvores de decisão para a determinação de padrões relacionados aos desfechos, através do software de mineração de dados Weka (Waikato Environment of Knowledge Analysis, University of Waikato, New Zealand). Análises estatísticas convencionais foram conduzidas com auxílio do Software SPSS (SPSS Inc., Chicago, IL, USA), a fim de determinar fatores que poderiam interferir nos referidos desfechos. Resultados: Após o retratamento endodôntico, qualidade técnica satisfatória e reparo periapical foram obtidos em 65,20% e em 80,50% dos casos, respectivamente. A qualidade técnica do retratamento endodôntico foi afetada por vários fatores de risco, incluindo curvatura radicular severa (p < 0,001) e alterações na morfologia do canal radicular (p = 0,002). As árvores de decisão sugeriram padrões que combinam a ocorrência simultânea de raízes retas e reabsorções radiculares apicais com resultados tecnicamente insatisfatórios. O diâmetro da lesão periapical (p = 0,018), o grupo dentário (p = 0,015) e a presença de reabsorções apicais (p = 0,024) apresentaram associação significativa com o insucesso de retratamentos endodônticos. A análise de mineração de dados sugeriu que lesões periapicais extensas e qualidade da obturação insatisfatória no tratamento endodôntico inicial, apresentam mecanismos de interação entre a infecção intracanal e a resposta do hospedeiro que não foram totalmente elucidados, sendo necessários estudos complementares. Conclusão: Qualidade técnica satisfatória é afetada por diversos fatores de risco, entre eles, a presença de curvaturas radiculares severas e alterações na morfologia do canal radicular. A localização dos acidentes de procedimento exerce influência na obtenção da qualidade técnica. Fatores como o diâmetro da lesão periapical, o grupo dentário e as reabsorções radiculares apicais mostraram-se significativamente associados ao insucesso de retratamentos endodônticos. / Introduction: Non-surgical root canal retreatment presents higher technical complexity and poor prognosis compared to primary endodontic treatment. Within this context, a more detaild investigation on the factors affecting the feasibilty of achieving technical quality and periapical healing in teeth presenting secondary root canal infection is needed. Data mining approach is still little explored in the dentistry field, regardless of its potential to contribute to knowledge discovery. In the present study decision trees were complemented by conventional statistical analysis aiming to investigate patterns and risk factors related to technical quality and healing outcomes in non-surgical root canal retreatment. Methods: This observational study included 321 consecutive patients presenting for non-surgical root canal retreatment. Patients were treated by postgraduate students, following standard protocols. Data concerning demographic, medical, diagnostic, treatment and follow-up variables were transferred to an eletronic chart database (ECD). After data preprocessing and preparation a total of 32 independent variables and 2 dependent variables were defined. Basic statistics were tabled and provided the frequency of missing values, the distribution of categorical attributes and the mean and standard deviation values of numeric attributes. Decision trees were generated to predict patterns related to technical quality (satisfactory/unsatisfactory) and periapical healing (healed /failure), using J48 classification algorithm in Weka data mining software (Waikato Environment of Knowledge Analysis, University of Waikato, New Zealand). Statistical tests were performed using SPSS software (SPSS Inc., Chicago, IL, USA). Univariate and multivariate analytic methods were used to determine factors affecting endodontic retreatment technical quality and periapical healing. Results: After endodontic retreatment, technical outcome was satisfactory in 65.20%, and periapical healing was observed in 80.50% of the cases. Technical quality of endodontic retreatment was affected by several risk factors, including severity of root curvature (p < 0.001) and altered root canal morphology (p = 0.002). The decision trees suggested that patterns that combine straight root curvature and apical root resorption may prevent satisfactory technical outcomes. Periapical lesion area (p = 0.018), tooth type (p = 0.015) and apical resorption (p = 0.024) were shown to be significantly associated with endodontic retreatment failure. Data mining analysis suggested that large periapical lesions, as well as poor root filling quality in the initial endodontic treatment, present mechanisms that are not fully understood with regards to the interaction between intracanal infection and host response, which should be further investigated. Conclusions: Technical quality of endodontic retreatment is affected by several risk factors, including severity of root curvature and altered root canal morphology. The occurence of procedure accidents is especially relevant in the apical third of the roots, affecting the technical quality. Periapical lesion area, tooth type and apical resorption were shown to be significantly associated with endodontic retreatment failure. Endodontia Retratamento endodôntico Retreatment Endodontics Decision trees Data mining

Search results