Spelling suggestions: "subject:"discovery"" "subject:"rediscovery""
741 |
Discovering Frequent Episodes : Fast Algorithms, Connections With HMMs And GeneralizationsLaxman, Srivatsan 03 1900 (has links)
Temporal data mining is concerned with the exploration of large sequential (or temporally ordered) data sets to discover some nontrivial information that was previously unknown to the data owner. Sequential data sets come up naturally in a wide range of application domains, ranging from bioinformatics to manufacturing processes. Pattern discovery refers to a broad class of data mining techniques in which the objective is to unearth hidden patterns or unexpected trends in the data. In general, pattern discovery is about finding all patterns of 'interest' in the data and one popular measure of interestingness for a pattern is its frequency in the data. The problem of frequent pattern discovery is to find all patterns in the data whose frequency exceeds some user-defined threshold. Discovery of temporal patterns that occur frequently in sequential data has received a lot of attention in recent times. Different approaches consider different classes of temporal patterns and propose different algorithms for their efficient discovery from the data. This thesis is concerned with a specific class of temporal patterns called episodes and their discovery in large sequential data sets.
In the framework of frequent episode discovery, data (referred to as an event sequence or an event stream) is available as a single long sequence of events. The ith event in the sequence is an ordered pair, (Et,tt), where Et takes values from a finite alphabet (of event types), and U is the time of occurrence of the event. The events in the sequence are ordered according to these times of occurrence. An episode (which is the temporal pattern considered in this framework) is a (typically) short partially ordered sequence of event types. Formally, an episode is a triple, (V,<,9), where V is a collection of nodes, < is a partial order on V and 9 is a map that assigns an event type to each node of the episode. When < is total, the episode is referred to as a serial episode, and when < is trivial (or empty), the episode is referred to as a parallel episode. An episode is said to occur in an event sequence if there are events in the sequence, with event types same as those constituting the episode, and with times of occurrence respecting the partial order in the episode. The frequency of an episode is some measure of how often it occurs in the event sequence. Given a frequency definition for episodes, the task is to discover all episodes whose frequencies exceed some threshold. This is done using a level-wise procedure. In each level, a candidate generation step is used to combine frequent episodes from the previous level to build candidates of the next larger size, and then a frequency counting step makes one pass over the event stream to determine frequencies of all the candidates and thus identify the frequent episodes.
Frequency counting is the main computationally intensive step in frequent episode discovery. Choice of frequency definition for episodes has a direct bearing on the efficiency of the counting procedure. In the original framework of frequent episode discovery, episode frequency is defined as the number of fixed-width sliding windows over the data in which the episode occurs at least once. Under this frequency definition, frequency counting of a set of |C| candidate serial episodes of size N has space complexity O(N|C|) and time complexity O(ΔTN|C|) (where ΔT is the difference between the times of occurrence of the last and the first event in the data stream). The other main frequency definition available in the literature, defines episode frequency as the number of minimal occurrences of the episode (where, a minimal occurrence is a window on the time axis containing an occurrence of the episode, such that, no proper sub-window of it contains another occurrence of the episode). The algorithm for obtaining frequencies for a set of |C| episodes needs O(n|C|) time (where n denotes the number of events in the data stream). While this is time-wise better than the the windows-based algorithm, the space needed to locate minimal occurrences of an episode can be very high (and is in fact of the order of length, n, of the event stream).
This thesis proposes a new definition for episode frequency, based on the notion of, what is called, non-overlapped occurrences of episodes in the event stream. Two occurrences are said to be non-overlapped if no event corresponding to one occurrence appears in between events corresponding to the other. Frequency of an episode is defined as the maximum possible number of non-overlapped occurrences of the episode in the data. The thesis also presents algorithms for efficient frequent episode discovery under this frequency definition. The space and time complexities for frequency counting of serial episodes are O(|C|) and O(n|C|) respectively (where n denotes the total number of events in the given event sequence and |C| denotes the num-ber of candidate episodes). These are arguably the best possible space and time complexities for the frequency counting step that can be achieved. Also, the fact that the time needed by the non-overlapped occurrences-based algorithm is linear in the number of events, n, in the event sequence (rather than the difference, ΔT, between occurrence times of the first and last events in the data stream, as is the case with the windows-based algorithm), can result in considerable time advantage when the number of time ticks far exceeds the number of events in the event stream. The thesis also presents efficient algorithms for frequent episode discovery under expiry time constraints (according to which, an occurrence of an episode can be counted for its frequency only if the total time span of the occurrence is less than a user-defined threshold). It is shown through simulation experiments that, in terms of actual run-times, frequent episode discovery under the non-overlapped occurrences-based frequency (using the algorithms developed here) is much faster than existing methods.
There is also a second frequency measure that is proposed in this thesis, which is based on, what is termed as, non-interleaved occurrences of episodes in the data. This definition counts certain kinds of overlapping occurrences of the episode. The time needed is linear in the number of events, n, in the data sequence, the size, N, of episodes and the number of candidates, |C|. Simulation experiments show that run-time performance under this frequency definition is slightly inferior compared to the non-overlapped occurrences-based frequency, but is still better than the run-times under the windows-based frequency. This thesis also establishes the following interesting property that connects the non-overlapped, the non-interleaved and the minimal occurrences-based frequencies of an episode in the data: the number of minimal occurrences of an episode is bounded below by the maximum number of non-overlapped occurrences of the episode, and is bounded above by the maximum number of non-interleaved occurrences of the episode in the data. Hence, non-interleaved occurrences-based frequency is an efficient alternative to that based on minimal occurrences.
In addition to being superior in terms of both time and space complexities compared to all other existing algorithms for frequent episode discovery, the non-overlapped occurrences-based frequency has another very important property. It facilitates a formal connection between discovering frequent serial episodes in data streams and learning or estimating a model for the data generation process in terms of certain kinds of Hidden Markov Models (HMMs). In order to establish this connection, a special class of HMMs, called Episode Generating HMMs (EGHs) are defined. The symbol set for the HMM is chosen to be the alphabet of event types, so that, the output of EGHs can be regarded as event streams in the frequent episode discovery framework.
Given a serial episode, α, that occurs in the event stream, a method is proposed to uniquely associate it with an EGH, Λα. Consider two N-node serial episodes, α and β, whose (non-overlapped occurrences-based) frequencies in the given event stream, o, are fα and fβ respectively. Let Λα and Λβ be the EGHs associated with α and β. The main result connecting episodes and EGHs states that, the joint probability of o and the most likely state sequence for Λα is more than the corresponding probability for Λβ, if and only if, fα is greater than fβ. This theoretical connection has some interesting consequences. First of all, since the most frequent serial episode is associated with the EGH having the highest data likelihood, frequent episode discovery can now be interpreted as a generative model learning exercise. More importantly, it is now possible to derive a formal test of significance for serial episodes in the data, that prescribes, for a given size of the test, a minimum frequency for the episode needed in order to declare it as statistically significant. Note that this significance test for serial episodes does not require any separate model estimation (or training). The only quantity required to assess significance of an episode is its non-overlapped occurrences-based frequency (and this is obtained through the usual counting procedure). The significance test also helps to automatically fix the frequency threshold for the frequent episode discovery process, so that it can lead to what may be termed parameterless data mining.
In the framework considered so far, the input to frequent episode discovery process is a sequence of instantaneous events. However, in many applications events tend to persist for different periods of time and the durations may carry important information from a data mining perspective. This thesis extends the framework of frequent episodes to incorporate such duration information directly into the definition of episodes, so that, the patterns discovered will now carry this duration information as well. Each event in this generalized framework looks like a triple, (Ei, ti, τi), where Ei, as earlier, is the event type (from some finite alphabet) corresponding to the ith event, and ti and τi denote the start and end times of this event. The new temporal pattern, called the generalized episode, is a quadruple, (V, <, g, d), where V, < and g, as earlier, respectively denote a collection of nodes, a partial order over this collection and a map assigning event types to nodes. The new feature in the generalized episode is d, which is a map from V to 2I, where, I denotes a collection of time interval possibilities for event durations, which is defined by the user. An occurrence of a generalized episode in the event sequence consists of events with both 'correct' event types and 'correct' time durations, appearing in the event sequence in 'correct' time order. All frequency definitions for episodes over instantaneous event streams are applicable for generalized episodes as well. The algorithms for frequent episode discovery also easily extend to the case of generalized episodes. The extra design choice that the user has in this generalized framework, is the set, I, of time interval possibilities. This can be used to orient and focus the frequent episode discovery process to come up with temporal correlations involving only time durations that are of interest. Through extensive simulations the utility and effectiveness of the generalized framework are demonstrated.
The new algorithms for frequent episode discovery presented in this thesis are used to develop an application for temporal data mining of some data from car engine manufacturing plants. Engine manufacturing is a heavily automated and complex distributed controlled process with large amounts of faults data logged each day. The goal of temporal data mining here is to unearth some strong time-ordered correlations in the data which can facilitate quick diagnosis of root causes for persistent problems and predict major breakdowns well in advance. This thesis presents an application of the algorithms developed here for such analysis of the faults data. The data consists of time-stamped faults logged in car engine manufacturing plants of General Motors. Each fault is logged using an extensive list of codes (which constitutes the alphabet of event types for frequent episode discovery). Frequent episodes in fault logs represent temporal correlations among faults and these can be used for fault diagnosis in the plant. This thesis describes how the outputs from the frequent episode discovery framework, can be used to help plant engineers interpret the large volumes of faults logged, in an efficient and convenient manner. Such a system, based on the algorithms developed in this thesis, is currently being used in one of the engine manufacturing plants of General Motors. Some examples of the results obtained that were regarded as useful by the plant engineers are also presented.
|
742 |
Criblage virtuel et expérimental de chimiothèques pour le développement d’inhibiteurs des cytokines TNF-alpha et IL-6. / Virtual and experimental screening of chemical libraries for the development of inhibitors of cytokines TNF-alpha and IL-6Perrier, Julie 17 December 2014 (has links)
Les biothérapies (anticorps monoclonaux, récepteurs solubles) ciblant les cytokines IL-6 etTNF-alpha pour le traitement des maladies inflammatoires chroniques ont constitué un succèsmajeur de l’industrie pharmaceutique. Elles présentent néanmoins des inconvénientsimportants : résistances, mode d’administration contraignant, coût élevé.Notre équipe travaille sur l’identification de petites molécules inhibant directement cescytokines, afin d’élargir l’offre thérapeutique existante. Administrées par voie orale, ellesconstitueraient une alternative particluièrement favorable aux patients.Durant ma thèse, j’ai réalisé le criblage expérimental (tests cellulaires et tests biochimiquesde liaison) des meilleurs composés identifiés par criblage virtuel d’un grande chimiothèque dediversité, ainsi que de composés dérivés de pyridazine issus d’une chimiothèque médicinale. J’aiainsi pu identifier plusieurs inhibiteurs directs du TNF-alpha et de l’IL-6. De plus, mon travail apermis d’affiner les procédures de criblage du Laboratoire.Ces travaux ouvrent de nouvelles pistes pour le développement de médicaments anti-cytokines. / Anti-cytokine biologics (monoclonal antibodies, soluble receptors) targeting TNF-alpha and IL-6in chronic inflammatory diseases have been a major success for pharmaceutical industry.However, they exhibit several drawbacks : resistance, difficult administration, high costs.Our team works on the discovery of small molecule inhibitors of cytokines suck as TNF-alphaand IL-6, in order to widen the range of therapeutic drugs. Orally active drugs would represent ahighly beneficial alternative for patients.During my PhD, I have performed an experimental screening (using cellular and biochemicalbinding testings) of the best compounds identified through virtual screening of a large chemicallibrary, and on pyridazine compounds of a medicinal chemical library. I have been able toidentify several small molecules inhibiting the interaction of TNF-! and IL-6 with their receptor.Moreover, my work will have an impact on the laboratory screening strategies.Overall, this work opens new avenues for anti-cytokine drug discovery.
|
743 |
Efficient service discovery in wide area networksBrown, Alan January 2008 (has links)
Living in an increasingly networked world, with an abundant number of services available to consumers, the consumer electronics market is enjoying a boom. The average consumer in the developed world may own several networked devices such as games consoles, mobile phones, PDAs, laptops and desktops, wireless picture frames and printers to name but a few. With this growing number of networked devices comes a growing demand for services, defined here as functions requested by a client and provided by a networked node. For example, a client may wish to download and share music or pictures, find and use printer services, or lookup information (e.g. train times, cinema bookings). It is notable that a significant proportion of networked devices are now mobile. Mobile devices introduce a new dynamic to the service discovery problem, such as lower battery and processing power and more expensive bandwidth. Device owners expect to access services not only in their immediate proximity, but further afield (e.g. in their homes and offices). Solving these problems is the focus of this research. This Thesis offers two alternative approaches to service discovery in Wide Area Networks (WANs). Firstly, a unique combination of the Session Initiation Protocol (SIP) and the OSGi middleware technology is presented to provide both mobility and service discovery capability in WANs. Through experimentation, this technique is shown to be successful where the number of operating domains is small, but it does not scale well. To address the issue of scalability, this Thesis proposes the use of Peer-to-Peer (P2P) service overlays as a medium for service discovery in WANs. To confirm that P2P overlays can in fact support service discovery, a technique to utilise the Distributed Hash Table (DHT) functionality of distributed systems is used to store and retrieve service advertisements. Through simulation, this is shown to be both a scalable and a flexible service discovery technique. However, the problems associated with P2P networks with respect to efficiency are well documented. In a novel approach to reduce messaging costs in P2P networks, multi-destination multicast is used. Two well known P2P overlays are extended using the Explicit Multi-Unicast (XCAST) protocol. The resulting analysis of this extension provides a strong argument for multiple P2P maintenance algorithms co-existing in a single P2P overlay to provide adaptable performance. A novel multi-tier P2P overlay system is presented, which is tailored for service rich mobile devices and which provides an efficient platform for service discovery.
|
744 |
Discovery and Analysis of Aligned Pattern Clusters from Protein Family SequencesLee, En-Shiun Annie 28 April 2015 (has links)
Protein sequences are essential for encoding molecular structures and functions. Consequently, biologists invest substantial resources and time discovering functional patterns in proteins. Using high-throughput technologies, biologists are generating an increasing amount of data. Thus, the major challenge in biosequencing today is the ability to conduct data analysis in an effi cient and productive manner. Conserved amino acids in proteins reveal important functional domains within protein families. Conversely, less conserved amino acid variations within these protein sequence patterns reveal areas of evolutionary and functional divergence.
Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search is used. However, at present, combinatorial methods of pattern search generate a large set of solutions, and probabilistic methods require richer representations. They require biological ground truth of the input sequences, such as gene name or taxonomic species, as class labels based on traditional classi fication practice to train a model for predicting unknown sequences. However, these algorithms are inherently biased by mislabelling and may not be able to reveal class characteristics in a detailed and succinct manner.
A novel pattern representation called an Aligned Pattern Cluster (AP Cluster) as developed in this dissertation is compact yet rich. It captures conservations and variations of amino acids and covers more sequences with lower entropy and greatly reduces the number of patterns. AP Clusters contain statistically signi cant patterns with variations; their importance has been confi rmed by the following biological evidences: 1) Most of the discovered AP Clusters correspond to binding segments while their aligned columns correspond to binding sites as verifi ed by pFam, PROSITE, and the three-dimensional structure. 2) By compacting strong correlated functional information together, AP Clusters are able to reveal class characteristics for taxonomical classes, gene classes and other functional classes, or incorrect class labelling. 3) Co-occurrence of AP Clusters on the same homologous protein sequences are spatially close in the protein's three-dimensional structure.
These results demonstrate the power and usefulness of AP Clusters. They bring in
similar statistically signifi cance patterns with variation together and align them to reveal
protein regional functionality, class characteristics, binding and interacting sites for the
study of protein-protein and protein-drug interactions, for diff erentiation of cancer tumour
types, targeted gene therapy as well as for drug target discovery.
|
745 |
Nouvelles approches pour la détection de relations utiles dans les processus : application aux parcours de santé / New approaches for the discovery of high-utility relations in processes : application to healthcareDalmas, Benjamin 06 April 2018 (has links)
Depuis le Baby-Boom d'après guerre, la France, comme d'autres pays, est confrontée à un vieillissement de la population et à des pathologies qui deviennent de plus en plus chroniques. Ces nouveaux problèmes de santé impliquent des prises en charge plus fréquentes, plus complexes et plus transversales. Cependant, plusieurs freins liés à l'évolution de la société où à l'organisation interne du système de santé, viennent entraver le développement de prises en charge adaptées pour répondre aux nouveaux besoins. Dans un contexte de réduction des dépenses, il est nécessaire d'avoir une meilleure maîtrise des processus de santé.L'objectif de nos travaux est de proposer un ensemble de méthodes pour une meilleure compréhension du parcours de santé de la personne âgée en Auvergne. Pour cela, une description des parcours des personnes âgées est nécessaire pour avoir cette vue d'ensemble aujourd'hui manquante. De plus, cela permettra d'identifier les intervenants, les interactions ou encore les contraintes impliquées dans les différents parcours. Les travaux présentés dans ce manuscrit s'intéressent à deux problématiques. La première consiste à mettre au point des méthodes pour modéliser rapidement et efficacement les parcours de santé. Ces modèles permettront d'analyser comment les différents segments d'une prise en charge s'enchaînent. La seconde problématique consiste à proposer des méthodes pour extraire des informations pertinentes à partir des données selon un point de vue métier prédéfini, propre à la personne qui souhaite analyser ce modèle. Ces informations permettront par exemple de détecter des segments de parcours fréquents, ou encore anormaux.Pour répondre à ces problématiques, les méthodes que nous proposons dans ce manuscrit sont issues du process mining et du data mining. Ces disciplines ont pour objectif d'exploiter les données disponibles dans les systèmes d'information pour en extraire des connaissances pertinentes. Dans un premier temps, nous proposons une méthodologie qui se base sur le pouvoir d'expression des modèles de processus pour extraire des connaissances intéressantes. Dans un second temps, nous proposons des techniques pour construire des modèles de processus partiels, dont l'objectif est de permettre l'extraction de fragments de comportements intéressants.Les expérimentations effectuées démontrent l'efficacité des méthodes proposées. De plus, différentes études de cas ont été menées dans divers domaines pour prouver la généricité des techniques développées. / Ever since the post-World War II baby-boom, France has had to cope with an aging population and with pathologies that have become chronic. These new health problems imply more reccurent, more complex and multidisciplinary medical care. However, multiple obstacles related to the evolution of the society or to the internal organization of the healthcare system hinder the development of new and more adapted health procedures to meet new medical needs. In a context where health expanses have to be reduced, it is necessary to have a better management of health processes.Our work aims at proposing a set of methods to gain an understanding of the elderly health path in the Auvergne region. To this end, a description of the elderly health path is necessary in order to have this missing overview. Moreover, this will enable the identification of the stakeholders, their interactions and the constraints to which they are submitted in the different health procedures.The work presented in this thesis focuses on two problems. The first one consists in developing techniques to efficiently model health paths. With these models, we will be able to analyze how the different segments of a medical care are ordered. The second problem consists in developing techniques to extract relevant information from the data, according to a predefined business point of view, specific to the user who analyzes the model. This knowledge will enable the detection of frequent or abnormal parts of a health path.To resolve these problems, the methods we propose in this thesis are related to process mining and data mining. These disciplines aim at exploiting data available in today's information systems in order to discover useful knowledge. In a first part, we propose a methodology that relies on the expressive power of process models to extract relevant information. In a second part, we propose techniques to build local process models that represent interesting fragments of behavior.The experiments we performed show the efficiency of the methods we propose. Moreover, we analyze data from different application domains to prove the genericity of the developed techniques.
|
746 |
Criblage virtuel et expérimental de chimiothèques pour le développement d’inhibiteurs des cytokines TNF-alpha et IL-6 / Virtual and experimental screening of chemical libraries for the development of inhibitors of cytokines TNF-alpha and IL-6Perrier, Julie 17 December 2014 (has links)
Les biothérapies (anticorps monoclonaux, récepteurs solubles) ciblant les cytokines IL-6 etTNF-alpha pour le traitement des maladies inflammatoires chroniques ont constitué un succèsmajeur de l’industrie pharmaceutique. Elles présentent néanmoins des inconvénientsimportants : résistances, mode d’administration contraignant, coût élevé.Notre équipe travaille sur l’identification de petites molécules inhibant directement cescytokines, afin d’élargir l’offre thérapeutique existante. Administrées par voie orale, ellesconstitueraient une alternative particluièrement favorable aux patients.Durant ma thèse, j’ai réalisé le criblage expérimental (tests cellulaires et tests biochimiquesde liaison) des meilleurs composés identifiés par criblage virtuel d’un grande chimiothèque dediversité, ainsi que de composés dérivés de pyridazine issus d’une chimiothèque médicinale. J’aiainsi pu identifier plusieurs inhibiteurs directs du TNF-alpha et de l’IL-6. De plus, mon travail apermis d’affiner les procédures de criblage du Laboratoire.Ces travaux ouvrent de nouvelles pistes pour le développement de médicaments anti-cytokines. / Anti-cytokine biologics (monoclonal antibodies, soluble receptors) targeting TNF-alpha and IL-6in chronic inflammatory diseases have been a major success for pharmaceutical industry.However, they exhibit several drawbacks : resistance, difficult administration, high costs.Our team works on the discovery of small molecule inhibitors of cytokines suck as TNF-alphaand IL-6, in order to widen the range of therapeutic drugs. Orally active drugs would represent ahighly beneficial alternative for patients.During my PhD, I have performed an experimental screening (using cellular and biochemicalbinding testings) of the best compounds identified through virtual screening of a large chemicallibrary, and on pyridazine compounds of a medicinal chemical library. I have been able toidentify several small molecules inhibiting the interaction of TNF-! and IL-6 with their receptor.Moreover, my work will have an impact on the laboratory screening strategies.Overall, this work opens new avenues for anti-cytokine drug discovery.
|
747 |
Descoberta de regras de conhecimento utilizando computação evolutiva multiobjetivo / Discoveing knowledge rules with multiobjective evolutionary computingRafael Giusti 22 June 2010 (has links)
Na área de inteligência artificial existem algoritmos de aprendizado, notavelmente aqueles pertencentes à área de aprendizado de máquina AM , capazes de automatizar a extração do conhecimento implícito de um conjunto de dados. Dentre estes, os algoritmos de AM simbólico são aqueles que extraem um modelo de conhecimento inteligível, isto é, que pode ser facilmente interpretado pelo usuário. A utilização de AM simbólico é comum no contexto de classificação, no qual o modelo de conhecimento extraído é tal que descreve uma correlação entre um conjunto de atributos denominados premissas e um atributo particular denominado classe. Uma característica dos algoritmos de classificação é que, em geral, estes são utilizados visando principalmente a maximização das medidas de cobertura e precisão, focando a construção de um classificador genérico e preciso. Embora essa seja uma boa abordagem para automatizar processos de tomada de decisão, pode deixar a desejar quando o usuário tem o desejo de extrair um modelo de conhecimento que possa ser estudado e que possa ser útil para uma melhor compreensão do domínio. Tendo-se em vista esse cenário, o principal objetivo deste trabalho é pesquisar métodos de computação evolutiva multiobjetivo para a construção de regras de conhecimento individuais com base em critérios definidos pelo usuário. Para isso utiliza-se a biblioteca de classes e ambiente de construção de regras de conhecimento ECLE, cujo desenvolvimento remete a projetos anteriores. Outro objetivo deste trabalho consiste comparar os métodos de computação evolutiva pesquisados com métodos baseado em composição de rankings previamente existentes na ECLE. É mostrado que os métodos de computação evolutiva multiobjetivo apresentam melhores resultados que os métodos baseados em composição de rankings, tanto em termos de dominância e proximidade das soluções construídas com aquelas da fronteira Pareto-ótima quanto em termos de diversidade na fronteira de Pareto. Em otimização multiobjetivo, ambos os critérios são importantes, uma vez que o propósito da otimização multiobjetivo é fornecer não apenas uma, mas uma gama de soluções eficientes para o problema, das quais o usuário pode escolher uma ou mais soluções que apresentem os melhores compromissos entre os objetivos / Machine Learning algorithms are notable examples of Artificial Intelligence algorithms capable of automating the extraction of implicit knowledge from datasets. In particular, Symbolic Learning algorithms are those which yield an intelligible knowledge model, i.e., one which a user may easily read. The usage of Symbolic Learning is particularly common within the context of classification, which involves the extraction of knowledge such that the associated model describes correelation among a set of attributes named the premises and one specific attribute named the class. Classification algorithms usually target into creating knowledge models which maximize the measures of coverage and precision, leading to classifiers that tend to be generic and precise. Althought this constitutes a good approach to creating models that automate the decision making process, it may not yield equally good results when the user wishes to extract a knowledge model which could assist them into getting a better understanding of the domain. Having that in mind, it has been established as the main goal of this Masters thesis the research of multi-objective evolutionary computing methods to create individual knowledge rules maximizing sets of arbitrary user-defined criteria. This is achieved by employing the class library and knowledge rule construction environment ECLE, which had been developed during previous research work. A second goal of this Masters thesis is the comparison of the researched evolutionary computing methods against previously existing ranking composition methods in ECLE. It is shown in this Masters thesis that the employment of multi-objective evolutionary computing methods produces better results than those produced by the employment of ranking composition-based methods. This improvement is verified both in terms of solution dominance and proximity of the solution set to the Pareto-optimal front and in terms of Pareto-front diversity. Both criteria are important for evaluating the efficiency of multi-objective optimization algorithms, for the goal of multi-objective optimization is to provide a broad range of efficient solutions, so the user may pick one or more solutions which present the best trade-off among all objectives
|
748 |
Playful sciencing and the early childhood classroomKirby, Barbara Mary 01 January 2005 (has links)
The purpose of this project is to examine the power of play, guided discovery, and hands-on experiences in the early childhood classroom, specifically as it relates to early childhood science experience. This paper will also propose a science curriculum encompassing a hands-on, guided discovery, play-based approach.
|
749 |
Zpracování signálu UHF RFID čtečky / Signal Processing for UHF RFID ReaderNovotný, Jan January 2015 (has links)
The master’s thesis is focused on the UHF RFID reader EXIN-1 signal processing. The first part describes the concept of the EXIN-1 front end, its basic testing and possible communication interfaces for reader control and for receiving and transmitting baseband signals. The second part of this work is aimed to a simple description of EPCglobal Class-1 Generation-2 UHF RFID Protocol, especially to used modulations and codings. In the last part, a block connection between the front end and an ARM Cortex-M4 microcontroller discovery board is designed. The microcontroller is used for generating of all required signals and also for receiving incoming signals and processing them for identification numbers of RFID cards (tags), which are in the reading range of the reader. A decoding algorithm is designed in MATLAB software and implemented to the selected microcontroller. Obtained identification data are displayed on an LCD display and also sent to a PC through a serial communication.
|
750 |
Dolování periodických vzorů / Periodic Patterns MiningStríž, Rostislav January 2012 (has links)
Data collecting and analysis are commonly used techniques in many sectors of today's business and science. Process called Knowledge Discovery in Databases presents itself as a great tool to find new and interesting information that can be used in a future developement. This thesis deals with basic principles of data mining and temporal data mining as well as with specifics of concrete implementation of chosen algorithms for mining periodic patterns in time series. These algorithms have been developed in a form of managed plug-ins for Microsoft Analysis Services -- service that provides data mining features for Microsoft SQL Server. Finally, we discuss obtained results of performed experiments focused on time complexity of implemented algorithms.
|
Page generated in 0.0342 seconds