Spelling suggestions: "subject:"rough set"" "subject:"tough set""
11 |
Rule-Based Approaches for Large Biological Datasets Analysis : A Suite of Tools and MethodsKruczyk, Marcin January 2013 (has links)
This thesis is about new and improved computational methods to analyze complex biological data produced by advanced biotechnologies. Such data is not only very large but it also is characterized by very high numbers of features. Addressing these needs, we developed a set of methods and tools that are suitable to analyze large sets of data, including next generation sequencing data, and built transparent models that may be interpreted by researchers not necessarily expert in computing. We focused on brain related diseases. The first aim of the thesis was to employ the meta-server approach to finding peaks in ChIP-seq data. Taking existing peak finders we created an algorithm that produces consensus results better than any single peak finder. The second aim was to use supervised machine learning to identify features that are significant in predictive diagnosis of Alzheimer disease in patients with mild cognitive impairment. This experience led to a development of a better feature selection method for rough sets, a machine learning method. The third aim was to deepen the understanding of the role that STAT3 transcription factor plays in gliomas. Interestingly, we found that STAT3 in addition to being an activator is also a repressor in certain glioma rat and human models. This was achieved by analyzing STAT3 binding sites in combination with epigenetic marks. STAT3 regulation was determined using expression data of untreated cells and cells after JAK2/STAT3 inhibition. The four papers constituting the thesis are preceded by an exposition of the biological, biotechnological and computational background that provides foundations for the papers. The overall results of this thesis are witness of the mutually beneficial relationship played by Bioinformatics in modern Life Sciences and Computer Science.
|
12 |
Resolving Quasi-Synonym Relationships in Automatic Thesaurus Construction using Fuzzy Rough Sets and an Inverse Term Frequency Similarity FunctionDavault, Julius Mack, III 01 January 2009 (has links)
One of the problems associated with automatic thesaurus construction is with determining the semantic relationship between word pairs. Quasi-synonyms provide a type of equivalence relationship: words are similar only for purposes of information retrieval. Determining such relationships in a thesaurus is hard to achieve automatically. The term vector space model and an inverse term frequency similarity function can provide a way to automatically determine the similarity between words in thesaurus. A thesaurus constructed using this method can also improve precision and recall in information retrieval, when the thesaurus is constructed in conjunction with fuzzy rough set algorithms and used with tight upper approximation query expansion. This dissertation presents a method that combines fuzzy rough sets and a word weighting and inverse term frequency similarity function as a technique for automatic thesaurus construction.
|
13 |
Seleção de atributos relevantes para aprendizado de máquina utilizando a abordagem de Rough Sets. / Machine learning feature subset selection using Rough Sets approach.Pila, Adriano Donizete 25 May 2001 (has links)
No Aprendizado de Máquina Supervisionado---AM---o algoritmo de indução trabalha com um conjunto de exemplos de treinamento, no qual cada exemplo é constituído de um vetor com os valores dos atributos e as classes, e tem como tarefa induzir um classificador capaz de predizer a qual classe pertence um novo exemplo. Em geral, os algoritmos de indução baseiam-se nos exemplos de treinamento para a construção do classificador, sendo que uma representação inadequada desses exemplos, bem como inconsistências nos mesmos podem tornar a tarefa de aprendizado difícil. Um dos problemas centrais de AM é a Seleção de um Subconjunto de Atributos---SSA---cujo objetivo é diminuir o número de atributos utilizados na representação dos exemplos. São três as principais razões para a realização de SSA. A primeira razão é que a maioria dos algoritmos de AM, computacionalmente viáveis, não trabalham bem na presença de vários atributos. A segunda razão é que, com um número menor de atributos, o conceito induzido através do classificador pode ser melhor compreendido. E, a terceira razão é o alto custo para coletar e processar grande quantidade de informações. Basicamente, são três as abordagens para a SSA: embedded, filtro e wrapper. A Teoria de Rough Sets---RS---é uma abordagem matemática criada no início da década de 80, cuja principal funcionalidade são os redutos, e será tratada neste trabalho. Segundo essa abordagem, os redutos são subconjuntos mínimos de atributos que possuem a propriedade de preservar o poder de descrição do conceito relacionado ao conjunto de todos os atributos. Neste trabalho o enfoque esta na abordagem filtro para a realização da SSA utilizando como filtro os redutos calculados através de RS. São descritos vários experimentos sobre nove conjuntos de dados naturais utilizando redutos, bem como outros filtros para SSA. Feito isso, os atributos selecionados foram submetidos a dois algoritmos simbólicos de AM. Para cada conjunto de dados e indutor, foram realizadas várias medidas, tais como número de atributos selecionados, precisão e números de regras induzidas. Também, é descrito um estudo de caso sobre um conjunto de dados do mundo real proveniente da área médica. O objetivo desse estudo pode ser dividido em dois focos: comparar a precisão dos algoritmos de indução e avaliar o conhecimento extraído com a ajuda do especialista. Embora o conhecimento extraído não apresente surpresa, pôde-se confirmar algumas hipóteses feitas anteriormente pelo especialista utilizando outros métodos. Isso mostra que o Aprendizado de Máquina também pode ser visto como uma contribuição para outros campos científicos. / In Supervised Machine Learning---ML---an induction algorithm is typically presented with a set of training examples, where each example is described by a vector of feature values and a class label. The task of the induction algorithm is to induce a classifier that will be useful in classifying new cases. In general, the inductive-learning algorithms rely on existing provided data to build their classifiers. Inadequate representation of the examples through the description language as well as inconsistencies in the training examples can make the learning task hard. One of the main problems in ML is the Feature Subset Selection---FSS---problem, i.e. the learning algorithm is faced with the problem of selecting some subset of feature upon which to focus its attention, while ignoring the rest. There are three main reasons that justify doing FSS. The first reason is that most ML algorithms, that are computationally feasible, do not work well in the presence of many features. The second reason is that FSS may improve comprehensibility, when using less features to induce symbolic concepts. And, the third reason for doing FSS is the high cost in some domains for collecting data. Basically, there are three approaches in ML for FSS: embedded, filter and wrapper. The Rough Sets Theory---RS---is a mathematical approach developed in the early 1980\'s whose main functionality are the reducts, and will be treated in this work. According to this approach, the reducts are minimal subsets of features capable to preserve the same concept description related to the entire set of features. In this work we focus on the filter approach for FSS using as filter the reducts obtained through the RS approach. We describe a series of FSS experiments on nine natural datasets using RS reducts as well as other filters. Afterwards we submit the selected features to two symbolic ML algorithms. For each dataset, various measures are taken to compare inducers performance, such as number of selected features, accuracy and number of induced rules. We also present a case study on a real world dataset from the medical area. The aim of this case study is twofold: comparing the induction algorithms performance as well as evaluating the extracted knowledge with the aid of the specialist. Although the induced knowledge lacks surprising, it allows us to confirm some hypothesis already made by the specialist using other methods. This shows that Machine Learning can also be viewed as a contribution to other scientific fields.
|
14 |
A Learning Approach To Obtain Efficient Testing Strategies In Medical DiagnosisFakih, Saif 15 March 2004 (has links)
Determining the most efficient use of diagnostic tests is one of the complex issues facing the medical practitioners. It is generally accepted that excessive use of tests is common practice in medical diagnosis. Many tests are performed even though the incremental knowledge gained does not affect the course of diagnosis. With the soaring cost of healthcare in the US, there is a critical need for cutting costs of diagnostic tests, while achieving a higher level of diagnostic accuracy. Various decision making tools assisting physicians in diagnosis management have been presented to the literature. One such method, called analytical hierarchy process, utilize a multilevel structure of decision criterion for sequential pair wise comparison of available test choices. Many of the decision-analytic methods are based on Bayes' theory and decision trees. These methods use threshold treatment probabilities and performance characteristics of the tests, such as true-positive rate and false-positive rates, to choose among the available alternatives. Sequential testing approaches tend to elongate the diagnosis process, whereas the parallel testing approach generally involves higher number of tests.
This research is focused on developing a machine learning based methodology for finding an efficient testing strategy for medical diagnosis. The method, based on the patient parameters (both observed and tested), recommends test(s) with the objective of optimizing a measure of performance for the diagnosis process. The performance measure is a combined cost of the testing, the risk and discomfort associated with the tests and the time taken to reach diagnosis. The performance measure also considers the diagnostic ability of the tests.
The methodology is developed combining tools from the fields of data mining (rough set theory, in particular), utility theory, Markov decision processes (MDP), and reinforcement learning (RL). The rough set theory is used in extracting diagnostic information in the form of rules from the medical databases. Utility theory is used to bring three non-homogenous measures (cost of testing, risk and discomfort and diagnostic ability) into one cost based measure of performance. The MDP framework along with an RL algorithm facilitates obtaining efficient testing strategies. The methodology is implemented on a sample problem of diagnosing Solitary Pulmonary Nodule (SPN). The results obtained are compared with those from four other approaches. It is shown that the RL based methodology holds significant promise in improving the performance of diagnostic process.
|
15 |
Jämförande studie av LEM2 och Dynamiska Redukter / Comparison of LEM2 and a Dynamic Reduct Classification AlgorithmLeifler, Ola January 2002 (has links)
<p>This thesis presents the results of the implementation and evaluation of two machine learning algorithms [Baz98, GB97]based on notions from Rough Set theory [Paw82]. Both algorithms were implemented and tested using the Weka [WF00]software framework. The main purpose for doing this was to investigate whether the experimental results obtained in [Baz98]could be reproduced, by implementing both algorithms in a framework that provided common functionalities needed by both. As a result of this thesis, a Rough Set framework accompanying the Weka system was designed and implemented, as well as three methods for discretization and three classi cation methods. </p><p>The results of the evaluation did not match those obtained by the original authors. On two standard benchmarking datasets also used previously in [Baz98](Breast Cancer and Lymphography), signi cant results indicating that one of the algorithms performed better than the other could not be established, using the Students t- test and a con dence limit of 95%. However, on two other datasets (Balance Scale and Zoo) differences could be established with more than 95% signi cance. The Dynamic Reduct Approach scored better on the Balance Scale dataset whilst the LEM2 Approach scored better on the Zoo dataset.</p>
|
16 |
Topics in Soft ComputingKeukelaar, J. H. D. January 2002 (has links)
No description available.
|
17 |
Learning with ALiCE IILockery, Daniel Alexander 14 September 2007 (has links)
The problem considered in this thesis is the development of an autonomous prototype robot capable of gathering sensory information
from its environment allowing it to provide feedback on the condition of specific targets to aid in maintenance of hydro equipment. The context for the solution to this problem is based on the power grid environment operated by the local hydro utility. The intent is to monitor power line structures by travelling
along skywire located at the top of towers, providing a view of everything beneath it including, for example, insulators, conductors, and towers. The contribution of this thesis is a novel robot design with the potential to prevent hazardous situations and the use of rough coverage feedback modified reinforcement learning algorithms to establish behaviours. / October 2007
|
18 |
2D to 3D conversion with direct geometrical search and approximation spacesBorkowski, Maciej 14 September 2007 (has links)
This dissertation describes the design and implementation of a system that has been designed to extract 3D information from pairs of 2D images. System input consists of two images taken by an ordinary digital camera. System output is a full 3D model extracted from 2D images. There are no assumptions about the positions of the cameras during the time when the images are being taken, but the scene must not undergo any modifications.
The process of extracting 3D information from 2D images consists of three basic steps. First, point matching is performed. The main contribution of this step is the introduction of an approach to matching image segments in the context of an approximation space. The second step copes with the problem of estimating external camera parameters. The proposed solution to this problem uses 3D geometry rather than the fundamental matrix widely used in 2D to 3D conversion. In the proposed approach (DirectGS), the distances between reprojected rays for all image points are minimised. The contribution of the approach considered in this step is a definition of an optimal search space for solving the 2D to 3D conversion problem and introduction of an efficient algorithm that minimises reprojection error. In the third step, the problem of dense matching is considered. The contribution of this step is the introduction of a proposed approach to dense matching of 3D object structures that utilises the presence of points on lines in 3D space.
The theory and experiments developed for this dissertation demonstrate the usefulness of the proposed system in the process of digitizing 3D information. The main advantage of the proposed approach is its low cost, simplicity in use for an untrained user and the high precision of reconstructed objects. / October 2007
|
19 |
Reinforcement learning in biologically-inspired collective robotics: a rough set approachHenry, Christopher 19 September 2006 (has links)
This thesis presents a rough set approach to reinforcement learning. This is made possible by considering behaviour patterns of learning agents in the context of approximation spaces. Rough set theory introduced by Zdzisław Pawlak in the early 1980s provides a ground for deriving pattern-based rewards within approximation spaces. Learning can be considered episodic. The framework provided by an approximation space makes it possible to derive pattern-based reference rewards at the end of each episode. Reference rewards provide a standard for reinforcement comparison as well as the actor-critic method of reinforcement learning. In addition, approximation spaces provide a basis for deriving episodic weights that provide a
basis for a new form of off-policy Monte Carlo learning control method. A number of conventional and pattern-based reinforcement learning methods are investigated in this thesis. In addition, this thesis introduces two learning environments used to compare the algorithms. The first is a Monocular Vision System used to track a moving target. The second is an artificial ecosystem testbed that makes it possible to study swarm behaviour by collections of biologically-inspired bots. The simulated ecosystem has an ethological basis inspired by the work of Niko Tinbergen, who introduced in the 1960s methods of observing and explaining the behaviour of biological organisms that carry over into the study of the behaviour of interacting robotic devices that cooperate to survive and to carry out highly specialized tasks. Agent behaviour during each episode is recorded in a decision table called an ethogram, which records features such as states, proximate causes, responses (actions), action preferences, rewards and decisions (actions chosen and actions rejected). At all times an agent follows a policy that maps perceived states of the
environment to actions. The goal of the learning algorithms is to find an optimal policy in a non-stationary environment. The results of the learning experiments with seven forms of reinforcement learning are given. The contribution of this thesis is a comprehensive introduction to a pattern-based evaluation of behaviour during reinforcement learning using approximation spaces. / May 2006
|
20 |
Topics in Soft ComputingKeukelaar, J. H. D. January 2002 (has links)
No description available.
|
Page generated in 0.0633 seconds