Global ETD Search

1	An Efficient Bit-Pattern-Based Algorithm for Mining Sequential Patterns in Protein Databases Jeng, Yin-han 26 June 2009 (has links) Proteins are the structural components of living cells and tissues, and thus an important building block in all living organisms. Patterns in proteins sequences are some subsequences which appear frequently. Patterns often denote important functional regions in proteins and can be used to characterize a protein family or discover the function of proteins. Moreover, it provides valuable information about the evolution of species. Patterns contain gaps of arbitrary size. Considering the no--gap--limit sequential pattern problem in a protein database, we may use the algorithm of mining sequential patterns to solve it. However, in a protein database, the order of segment appearing in protein sequences is important and it may appear many times repeatedly in a protein sequence. Therefore, we can not directly use the traditional sequential pattern mining algorithms to mine them. Many algorithms have been proposed to mine sequential patterns in protein databases, for example, the SP-index algorithm. They enumerate patterns of limited sizes (segments) in the solution space and find all patterns. The SP-index algorithm is based on the traditional sequential pattern mining algorithms and considers the the problem of the multiple--appearances of segments in a protein sequence. Although the SP-index algorithm considers the characteristics of bioinformatics, it still contains a time--consuming step which constructs the SP-tree to find the frequent patterns. In this step, it has to trace many nodes to get the result. Therefore, in this thesis, we propose a Bit--Pattern--based (BP) algorithm to improve the disadvantages of the SP-index algorithm. First, we transform the protein sequences into bit sequences. Second, we construct the frequent segments by using the AND operator. Because we use the bit operator, it is efficient to get the frequent segments. Then, we prune unnecessary frequent segments, which results in the case that we do not have to test many frequent segments in the following step. Third, we use the OR operator to get the longest pattern. In this step, we test whether two segments can be linked together to construct a long segment, and we get the result by testing once. Because we focus on which position the segment appears on, we can use the OR operator and then judge the bit sequences to get the result. Thus, we can avoid many testing processes. From our performance study based on the biological data, we show that we can improve the efficiency of the SP-index algorithm. Moreover, from our simulation results, we show that our proposed algorithm can improve the processing time up to 50\% as compared to the SP-index algorithm, since the SP--index algorithm has to trace many nodes to construct the longest pattern. Sequential Patterns Protein Databases Bit-Pattern-Based
2	Les motifs séquentiels pour les données issues des puces ADN / Mining sequential patterns for DNA microarrays Salle, Paola 13 July 2010 (has links) L'émergence des biotechnologies, telles que les puces ADN, a permis l'acquisition d'énormes quantités de données d'une cellule à un instant donné et sous certaines conditions. Elles sont devenues incontournables lorsqu'il s'agit de comprendre une maladie qui proviendrait d'une anomalie génomique perturbant le développement naturel entre la croissance, la division et la mort des cellules. En utilisant cette biotechnologie, l'objectif est d'identifier les gènes impliqués dans la maladie étudiée. Mais chaque puce donne l'information de plus de 19 000 gènes rendant difficile toute exploitation et analyse des résultats. La fouille de données a longtemps été étudiée pour mettre en évidence des corrélations non triviales à partir de grande base de données. Initialement proposées pour répondre aux interrogations des décideurs lorsqu'il s'agissait de mieux connaître le comportement des clients d'un supermarché, ces méthodes connaissent aujourd'hui un tel succès qu'elles ont été utilisées et adaptées dans divers domaines d'applications allant du marketing jusqu'à la santé. L'étude que nous proposons de mener est de proposer de nouvelles méthodes de fouille de données pour aider les biologistes à déduire de nouvelles connaissances à partir des données obtenues par l'analyse des puces ADN. Plus précisément, nous proposons de mettre en évidence des gènes fréquemment ordonnés selon leurs expressions et nous étudions l'apport de ce type d'information comme nouveau matériel d'étude pour les biologistes. / The emergence of biotechnology, such as DNA chips, has acquired huge amounts of data in a cell at a given moment and under certain conditions. They are used in order to understand a disease whose origin is a genomic abnormality disrupting the natural development between growth, division and cell death. Using this biotechnology, the aim is to identify the genes involved in disease studied. But each chip gives information on more than 19,000 genes then it is difficult to use and to analyse the results. Methods of Data mining are used in order to find interesting correlations from large database. Initially proposed to address questions about the behavior of customers of a supermarket, these methods are now used and adapted in various fields of applications ranging marketing to health. In this study, we propose new methods in order to help biologists to deduce new knowledge from data obtained by DNA microarray analysis. Specifically, we propose to identify genes frequently ordered by their expressions and we study the contribution of such information as the new study material for biologists. Motifs séquentiels Data mining Biomedical Sequential patterns DNA microarrays Biomedical
3	Vehicular Movement Patterns: A Sequential Patterns Data Mining Approach Towards Vehicular Route Prediction Merah, Amar Farouk 09 May 2012 (has links) Behavioral patterns prediction in the context of Vehicular Ad hoc Networks (VANETs)has been receiving increasing attention due to enabling on-demand, intelligent traffic analysis and response to real-time traffic issues. One of these patterns, sequential patterns, are a type of behavioral patterns that describe the occurence of events in a timely-ordered fashion. In the context of VANETs, these events are defined as an ordered list of road segments traversed by vehicles during their trips from a starting point to their final intended destination, forming a vehicular path. Due to their predictable nature, undertaken vehicular paths can be exploited to extract the paths that are considered frequent. From the extracted frequent paths through data mining, the probability that a vehicular path will take a certain direction is obtained. However, in order to achieve this, samples of vehicular paths need to be initially collected over periods of time in order to be data-mined accordingly. In this thesis, a new set of formal definitions depicting vehicular paths as sequential patterns is described. Also, five novel communication schemes have been designed and implemented under a simulated environment to collect vehicular paths; such schemes are classified under two categories: Road Side Unit-Triggered (RSU-Triggered) and Vehicle-Triggered. After collection, extracted frequent paths are obtained through data mining, and the probability of these frequent paths is measured. In order to evaluate the e ciency and e ectiveness of the proposed schemes, extensive experimental analysis has been realized. From the results, two of the Vehicle-Triggered schemes, VTB-FP and VTRD-FP, have improved the vehicular path collection operation in terms of communication cost and latency over others. In terms of reliability, the Vehicle-Triggered schemes achieved a higher success rate than the RSU-Triggered scheme. Finally, frequent vehicular movement patterns have been effectively extracted from the collected vehicular paths according to a user-de ned threshold and the confidence of generated movement rules have been measured. From the analysis, it was clear that the user-de ned threshold needs to be set accordingly in order to not discard important vehicular movement patterns. VANETs route prediction mobility prediction sequential patterns data mining
4	Data Mining in Acquiring Association Knowledge Between Diseases and Medicine Treatments Chen, Shih-Yuan 02 August 2000 (has links) None Association Rules Medical Treatments Data Mining Sequential Patterns
5	Classification and Sequential Pattern Mining From Uncertain Datasets Hooshsadat, Metanat Unknown Date No description available. Associative Classification Sequential Patterns Uncertain Datasets Expected Support
6	Vehicular Movement Patterns: A Sequential Patterns Data Mining Approach Towards Vehicular Route Prediction Merah, Amar Farouk 09 May 2012 (has links) Behavioral patterns prediction in the context of Vehicular Ad hoc Networks (VANETs)has been receiving increasing attention due to enabling on-demand, intelligent traffic analysis and response to real-time traffic issues. One of these patterns, sequential patterns, are a type of behavioral patterns that describe the occurence of events in a timely-ordered fashion. In the context of VANETs, these events are defined as an ordered list of road segments traversed by vehicles during their trips from a starting point to their final intended destination, forming a vehicular path. Due to their predictable nature, undertaken vehicular paths can be exploited to extract the paths that are considered frequent. From the extracted frequent paths through data mining, the probability that a vehicular path will take a certain direction is obtained. However, in order to achieve this, samples of vehicular paths need to be initially collected over periods of time in order to be data-mined accordingly. In this thesis, a new set of formal definitions depicting vehicular paths as sequential patterns is described. Also, five novel communication schemes have been designed and implemented under a simulated environment to collect vehicular paths; such schemes are classified under two categories: Road Side Unit-Triggered (RSU-Triggered) and Vehicle-Triggered. After collection, extracted frequent paths are obtained through data mining, and the probability of these frequent paths is measured. In order to evaluate the e ciency and e ectiveness of the proposed schemes, extensive experimental analysis has been realized. From the results, two of the Vehicle-Triggered schemes, VTB-FP and VTRD-FP, have improved the vehicular path collection operation in terms of communication cost and latency over others. In terms of reliability, the Vehicle-Triggered schemes achieved a higher success rate than the RSU-Triggered scheme. Finally, frequent vehicular movement patterns have been effectively extracted from the collected vehicular paths according to a user-de ned threshold and the confidence of generated movement rules have been measured. From the analysis, it was clear that the user-de ned threshold needs to be set accordingly in order to not discard important vehicular movement patterns. VANETs route prediction mobility prediction sequential patterns data mining
7	Vehicular Movement Patterns: A Sequential Patterns Data Mining Approach Towards Vehicular Route Prediction Merah, Amar Farouk January 2012 (has links) Behavioral patterns prediction in the context of Vehicular Ad hoc Networks (VANETs)has been receiving increasing attention due to enabling on-demand, intelligent traffic analysis and response to real-time traffic issues. One of these patterns, sequential patterns, are a type of behavioral patterns that describe the occurence of events in a timely-ordered fashion. In the context of VANETs, these events are defined as an ordered list of road segments traversed by vehicles during their trips from a starting point to their final intended destination, forming a vehicular path. Due to their predictable nature, undertaken vehicular paths can be exploited to extract the paths that are considered frequent. From the extracted frequent paths through data mining, the probability that a vehicular path will take a certain direction is obtained. However, in order to achieve this, samples of vehicular paths need to be initially collected over periods of time in order to be data-mined accordingly. In this thesis, a new set of formal definitions depicting vehicular paths as sequential patterns is described. Also, five novel communication schemes have been designed and implemented under a simulated environment to collect vehicular paths; such schemes are classified under two categories: Road Side Unit-Triggered (RSU-Triggered) and Vehicle-Triggered. After collection, extracted frequent paths are obtained through data mining, and the probability of these frequent paths is measured. In order to evaluate the e ciency and e ectiveness of the proposed schemes, extensive experimental analysis has been realized. From the results, two of the Vehicle-Triggered schemes, VTB-FP and VTRD-FP, have improved the vehicular path collection operation in terms of communication cost and latency over others. In terms of reliability, the Vehicle-Triggered schemes achieved a higher success rate than the RSU-Triggered scheme. Finally, frequent vehicular movement patterns have been effectively extracted from the collected vehicular paths according to a user-de ned threshold and the confidence of generated movement rules have been measured. From the analysis, it was clear that the user-de ned threshold needs to be set accordingly in order to not discard important vehicular movement patterns. VANETs route prediction mobility prediction sequential patterns data mining
8	From sequential patterns to concurrent branch patterns : a new post sequential patterns mining approach Lu, Jing January 2006 (has links) Sequential patterns mining is an important pattern discovery technique used to identify frequently observed sequential occurrence of items across ordered transactions over time. It has been intensively studied and there exists a great diversity of algorithms. However, there is a major problem associated with the conventional sequential patterns mining in that patterns derived are often large and not very easy to understand or use. In addition, more complex relations among events are often hidden behind sequences. A novel model for sequential patterns called Sequential Patterns Graph (SPG) is proposed. The construction algorithm of SPG is presented with experimental results to substantiate the concept. The thesis then sets out to define some new structural patterns such as concurrent branch patterns, exclusive patterns and iterative patterns which are generally hidden behind sequential patterns. Finally, an integrative framework, named Post Sequential Patterns Mining (PSPM), which is based on sequential patterns mining, is also proposed for the discovery and visualisation of structural patterns. This thesis is intended to prove that discrete sequential patterns derived from traditional sequential patterns mining can be modelled graphically using SPG. It is concluded from experiments and theoretical studies that SPG is not only a minimal representation of sequential patterns mining, but it also represents the interrelation among patterns and establishes further the foundation for mining structural knowledge (i.e. concurrent branch patterns, exclusive patterns and iterative patterns). from experiments conducted on both synthetic and real datasets, it is shown that Concurrent Branch Patterns (CBP) mining is an effective and efficient mining algorithm suitable for concurrent branch patterns. 005.74
9	Dolování víceúrovňových sekvenčních vzorů / Mining Multi-Level Sequential Patterns Šebek, Michal January 2017 (has links) Dolování sekvenčních vzorů je důležitá oblast získávání znalostí z databází. Stále více průmyslových a obchodních aplikací uchovává data mající povahu sekvencí, kdy je dáno pořadí jednotlivých transakcí. Toho může být využito například při analýze po sobě jdoucích nákupů zákazníků. Tato práce se zabývá využitím hierarchického uspořádání položek při dolování sekvenčních vzorů. V rámci práce jsou řešeny dvě základní oblasti - dolování víceúrovňových sekvenčních vzorů s křížením a bez křížení úrovní hierarchií. Dolovací úlohy pro obě oblasti jsou v práci formalizovány a následně navrženy algoritmy hGSP a MLSP pro jejich řešení. Experimentálně bylo ověřeno, že především algoritmus MLSP dosahuje výborných výkonnostních vlastností a stability. Význam nově získaných vzorů je ukázán na dolování reálných produkčních dat.
10	An Efficient Bitmap-Based Approach to Mining Sequential Patterns for Large Databases Wu, Chien-Hui 29 July 2004 (has links) The task of Data Mining is to find the useful information within the incredible sets of data. One of important research areas of Data Mining is Mining Sequential Patterns. For a transaction database, sequential pattern means that there are some relations between the items bought by customers in a period of time. If we can find these relations by mining sequential patterns, we can provide better selling strategy to gain more customers' attentions. However, since the transaction database contains a lot of data, and it will be scanned during the mining process again and again, to improve the running efficiency is an important topic. In the GSP algorithm proposed by Srikant and Agrawal, they use a complex data structure to store and generate candidates. The generated candidates satisfy a property, ``the subsets of a frequent itemset are also frequent'. The property leads to fewer number of candidates; however, it still spends too much time to counting candidates. In the SPAM algorithm proposed by Aryes et al., they use the bitwise operations to reduce the time for counting candidates. However, it generates too many candidates which will never become frequent itemsets, which decreases the efficiency. In this thesis, we proposed a new bitmap-based algorithm. By modifying the way to generate candidates in the GSP algorithm and applying the bitwise operations in the SPAM algorithm, the proposed algorithm can mine sequential patterns efficiently. That is, we use the similar candidate generation method presented in the GSP algorithm to reduce the number of candidates and the similar counting method proposed in the SPAM algorithm to reduce the time of counting candidates. In the proposed algorithm, we classify the itemsets into two cases, simultaneous occurrence (noted as AB) and sequential occurrence (noted as A-> B). In the case of simultaneous occurrence, the number of candidate is C(n,k) based on the exhausted method. In order to prevent too many candidates generated, we make use of the property, ``the subsets of a frequent itemset are also frequent', to reduce the number of candidates from C(n,k) to C(y,k), k <= y < n. In the case of sequential occurrence, the candidates are generated by using a special join operation which could combine, for example, A->B and B->C to A->B->C. Moreover, we have to consider two other cases: (1) combing A->B and A->C to A->BC; (2) combing A->C and B->C to AB->C. The method of counting candidates is similar to the SPAM algorithm (i.e., bitwise operations). From our simulation results, based on the same bit representation for the transaction database, we show that our proposed algorithm could provide better performance than the SPAM algorithm in terms of the processing time, since our algorithm could generate fewer number of candidates than the SPAM algorithm. Bitmap Based Mining Knowledge Discovery Sequential Patterns Pattern Analysis Data Mining

Search results