Global ETD Search

1	Fast direct algorithms for elliptic equations via hierarchical matrix compression Schmitz, Phillip Gordon 14 December 2010 (has links) We present a fast direct algorithm for the solution of linear systems arising from elliptic equations. We extend the work of Xia et al. (2009) on combining the multifrontal method with hierarchical matrices. We offer a more geometric interpretation of that approach, extend it in two dimensions to the unstructured mesh case, and detail an adaptive decomposition procedure for selectively refined meshes. Linear time complexity is shown for a quasi-uniform grid and demonstrated via numerical results for the adaptive algorithm. We also provide an extension to three dimensions with proven linear complexity but a more practical variant with slightly worse scaling is also described. / text Fast algorithms Hierarchical matrices Sparse Direct Elliptic
2	Factorization of Quasiseparable Matrices Johnson, Paul D. 21 November 2008 (has links) This paper investigates some of the ideas and algorithms developed for exploiting the structure of quasiseparable matrices. The case of purely scalar generators is considered initially. The process by which a quasiseparable matrix is represented as the product of matrices comprised of its generators is explained. This is done clearly in the scalar case, but may be extended to block generators. The complete factoring approach is then considered. This consists of two stages: inner-outer factorization followed by inner-coprime factorization. Finally, the stability of the algorithm is investigated. The algorithm is used to factor various quasiseparable matrices R created first using minimal generators, and subsequently using non-minimal generators. The result is that stability of the algorithm is compromised when non-minimal generators are present. QR factorization fast algorithms quasiseparable matrices structure matrices Mathematics
3	Field-Programmable Custom Computing Machines for DFT/FFT and DCT/DST Algorithms Potluri, Uma Sadhvi January 2013 (has links) No description available. Electrical Engineering
4	Algorithm and hardware based architectural design targeting the intra-frame prediction of the HEVC video coding standard / Algorithm and hardware based architectural design targeting the intra-frame prediction of the HEVC video coding standard Palomino, Daniel Munari Vilchez January 2013 (has links) Este trabalho apresenta uma arquitetura de hardware para a predição intra-quadro do padrão emergente HEVC de codificação de vídeo. O padrão HEVC está sendo desenvolvido tendo como principal objetivo o aumento em 50% na eficiência de compressão, quando comparado com o padrão H.264/AVC, atual padrão estado da arte na codificação de vídeos. Para atingir este objetivo, várias novas ferramentas de codificação foram desenvolvidas para serem introduzidas no novo padrão HEVC. Embora essas novas ferramentas tenham obtido êxito em aumentar a eficiência de compressão do novo padrão HEVC, elas também colaboraram para o aumento da complexidade computacional no processo de codificação. Analisando somente os avanços na predição intra-quadro, em comparação com o padrão H.264/AVC, é possível perceber que vários novos modos direcionais de codificação foram inseridos no processo de predição. Além disso, existem mais tamanhos de blocos que podem ser considerados pela predição intra-quadro. Nesse contexto, este trabalho propõe o uso de duas abordagens para melhorar o desempenho da predição intra-quadro em codificadores HEVC. Primeiramente, foram desenvolvidos algoritmos rápidos de decisão de modo, baseados em heurísticas, para a predição intra-quadro. Os resultados mostraram que é possível reduzir a complexidade computacional do processo de predição intra-quadro com pequenas perdas na eficiência de compressão (taxa de bits e qualidade visual). No pior caso, a perda foi de 6.9% na taxa de bits e de 0.12dB na qualidade, para uma redução de 35% no tempo de processamento. Em seguida, utilizando um dos algoritmos desenvolvidos, uma arquitetura de hardware para a predição intra-quadro foi desenvolvida. Além da redução de complexidade proporcionada pelo uso do algoritmo desenvolvido, técnicas de desenvolvido de hardware, tais como aumento no nível de paralelismo e uso de pipeline, também foram utilizadas para melhorar o desempenho da arquitetura desenvolvida. Os resultados de síntese da arquitetura para a tecnologia IBM 0,65um mostram que ela é capaz de operar a 500MHz, atingindo uma taxa de processamento suficiente para realizar a predição intra-quadro de mais de 30 quadros por segundo para resoluções como Full HD (1920x1080pixels). / This work presents an intra-frame prediction hardware architecture targeting the emerging HEVC video coding standard. The HEVC standard is being developed with the main goal of increase the compression efficiency in 50% when compared to the latest H.264/AVC video coding standard. To achieve such a goal, several new video coding strategies were developed to be used in the HEVC. Although these strategies have increased the compression efficiency of the emerging HEVC standard, it also increased the computational complexity of the encoding process. Looking only to the intra prediction process, several new directional modes are used to perform the prediction. Besides, there are more block sizes that can be supported by the intra prediction process. This work proposes to use two different approaches to improve the HEVC intra prediction performance. First we developed fast intra mode decision algorithms, showing that it is possible to decrease the intra prediction computational complexity with negligible loss in the compression performance (bit-rate and video quality). In the worst case, the bit-rate loss was 6.99% and the PSNR loss was 0.12dB in average allowing reducing the encoding time up to 35%. Then, using the developed fast algorithms as base, this work proposes an intra prediction hardware architecture. The designed architecture was specifically based on one of the developed fast intra mode decision algorithms. Besides, hardware techniques such as increase the parallelism level and pipeline were also used to improve the intra prediction performance. The synthesis results for the IBM 0.65nm have shown that the architecture is able to achieve 500MHz as maximum operation frequency. This way, the architecture throughput is enough to perform the intra prediction process for more than 30 frames per second considering high resolution digital videos, such as Full HD (1920x1080). Microeletrônica Vlsi : Circuitos integrados : Eletronica Compressao : Video Hardware design Video coding Intra-frame prediction HEVC Mode decision Fast algorithms
5	Algorithm and hardware based architectural design targeting the intra-frame prediction of the HEVC video coding standard / Algorithm and hardware based architectural design targeting the intra-frame prediction of the HEVC video coding standard Palomino, Daniel Munari Vilchez January 2013 (has links) Este trabalho apresenta uma arquitetura de hardware para a predição intra-quadro do padrão emergente HEVC de codificação de vídeo. O padrão HEVC está sendo desenvolvido tendo como principal objetivo o aumento em 50% na eficiência de compressão, quando comparado com o padrão H.264/AVC, atual padrão estado da arte na codificação de vídeos. Para atingir este objetivo, várias novas ferramentas de codificação foram desenvolvidas para serem introduzidas no novo padrão HEVC. Embora essas novas ferramentas tenham obtido êxito em aumentar a eficiência de compressão do novo padrão HEVC, elas também colaboraram para o aumento da complexidade computacional no processo de codificação. Analisando somente os avanços na predição intra-quadro, em comparação com o padrão H.264/AVC, é possível perceber que vários novos modos direcionais de codificação foram inseridos no processo de predição. Além disso, existem mais tamanhos de blocos que podem ser considerados pela predição intra-quadro. Nesse contexto, este trabalho propõe o uso de duas abordagens para melhorar o desempenho da predição intra-quadro em codificadores HEVC. Primeiramente, foram desenvolvidos algoritmos rápidos de decisão de modo, baseados em heurísticas, para a predição intra-quadro. Os resultados mostraram que é possível reduzir a complexidade computacional do processo de predição intra-quadro com pequenas perdas na eficiência de compressão (taxa de bits e qualidade visual). No pior caso, a perda foi de 6.9% na taxa de bits e de 0.12dB na qualidade, para uma redução de 35% no tempo de processamento. Em seguida, utilizando um dos algoritmos desenvolvidos, uma arquitetura de hardware para a predição intra-quadro foi desenvolvida. Além da redução de complexidade proporcionada pelo uso do algoritmo desenvolvido, técnicas de desenvolvido de hardware, tais como aumento no nível de paralelismo e uso de pipeline, também foram utilizadas para melhorar o desempenho da arquitetura desenvolvida. Os resultados de síntese da arquitetura para a tecnologia IBM 0,65um mostram que ela é capaz de operar a 500MHz, atingindo uma taxa de processamento suficiente para realizar a predição intra-quadro de mais de 30 quadros por segundo para resoluções como Full HD (1920x1080pixels). / This work presents an intra-frame prediction hardware architecture targeting the emerging HEVC video coding standard. The HEVC standard is being developed with the main goal of increase the compression efficiency in 50% when compared to the latest H.264/AVC video coding standard. To achieve such a goal, several new video coding strategies were developed to be used in the HEVC. Although these strategies have increased the compression efficiency of the emerging HEVC standard, it also increased the computational complexity of the encoding process. Looking only to the intra prediction process, several new directional modes are used to perform the prediction. Besides, there are more block sizes that can be supported by the intra prediction process. This work proposes to use two different approaches to improve the HEVC intra prediction performance. First we developed fast intra mode decision algorithms, showing that it is possible to decrease the intra prediction computational complexity with negligible loss in the compression performance (bit-rate and video quality). In the worst case, the bit-rate loss was 6.99% and the PSNR loss was 0.12dB in average allowing reducing the encoding time up to 35%. Then, using the developed fast algorithms as base, this work proposes an intra prediction hardware architecture. The designed architecture was specifically based on one of the developed fast intra mode decision algorithms. Besides, hardware techniques such as increase the parallelism level and pipeline were also used to improve the intra prediction performance. The synthesis results for the IBM 0.65nm have shown that the architecture is able to achieve 500MHz as maximum operation frequency. This way, the architecture throughput is enough to perform the intra prediction process for more than 30 frames per second considering high resolution digital videos, such as Full HD (1920x1080). Microeletrônica Vlsi : Circuitos integrados : Eletronica Compressao : Video Hardware design Video coding Intra-frame prediction HEVC Mode decision Fast algorithms
6	Algorithm and hardware based architectural design targeting the intra-frame prediction of the HEVC video coding standard / Algorithm and hardware based architectural design targeting the intra-frame prediction of the HEVC video coding standard Palomino, Daniel Munari Vilchez January 2013 (has links) Este trabalho apresenta uma arquitetura de hardware para a predição intra-quadro do padrão emergente HEVC de codificação de vídeo. O padrão HEVC está sendo desenvolvido tendo como principal objetivo o aumento em 50% na eficiência de compressão, quando comparado com o padrão H.264/AVC, atual padrão estado da arte na codificação de vídeos. Para atingir este objetivo, várias novas ferramentas de codificação foram desenvolvidas para serem introduzidas no novo padrão HEVC. Embora essas novas ferramentas tenham obtido êxito em aumentar a eficiência de compressão do novo padrão HEVC, elas também colaboraram para o aumento da complexidade computacional no processo de codificação. Analisando somente os avanços na predição intra-quadro, em comparação com o padrão H.264/AVC, é possível perceber que vários novos modos direcionais de codificação foram inseridos no processo de predição. Além disso, existem mais tamanhos de blocos que podem ser considerados pela predição intra-quadro. Nesse contexto, este trabalho propõe o uso de duas abordagens para melhorar o desempenho da predição intra-quadro em codificadores HEVC. Primeiramente, foram desenvolvidos algoritmos rápidos de decisão de modo, baseados em heurísticas, para a predição intra-quadro. Os resultados mostraram que é possível reduzir a complexidade computacional do processo de predição intra-quadro com pequenas perdas na eficiência de compressão (taxa de bits e qualidade visual). No pior caso, a perda foi de 6.9% na taxa de bits e de 0.12dB na qualidade, para uma redução de 35% no tempo de processamento. Em seguida, utilizando um dos algoritmos desenvolvidos, uma arquitetura de hardware para a predição intra-quadro foi desenvolvida. Além da redução de complexidade proporcionada pelo uso do algoritmo desenvolvido, técnicas de desenvolvido de hardware, tais como aumento no nível de paralelismo e uso de pipeline, também foram utilizadas para melhorar o desempenho da arquitetura desenvolvida. Os resultados de síntese da arquitetura para a tecnologia IBM 0,65um mostram que ela é capaz de operar a 500MHz, atingindo uma taxa de processamento suficiente para realizar a predição intra-quadro de mais de 30 quadros por segundo para resoluções como Full HD (1920x1080pixels). / This work presents an intra-frame prediction hardware architecture targeting the emerging HEVC video coding standard. The HEVC standard is being developed with the main goal of increase the compression efficiency in 50% when compared to the latest H.264/AVC video coding standard. To achieve such a goal, several new video coding strategies were developed to be used in the HEVC. Although these strategies have increased the compression efficiency of the emerging HEVC standard, it also increased the computational complexity of the encoding process. Looking only to the intra prediction process, several new directional modes are used to perform the prediction. Besides, there are more block sizes that can be supported by the intra prediction process. This work proposes to use two different approaches to improve the HEVC intra prediction performance. First we developed fast intra mode decision algorithms, showing that it is possible to decrease the intra prediction computational complexity with negligible loss in the compression performance (bit-rate and video quality). In the worst case, the bit-rate loss was 6.99% and the PSNR loss was 0.12dB in average allowing reducing the encoding time up to 35%. Then, using the developed fast algorithms as base, this work proposes an intra prediction hardware architecture. The designed architecture was specifically based on one of the developed fast intra mode decision algorithms. Besides, hardware techniques such as increase the parallelism level and pipeline were also used to improve the intra prediction performance. The synthesis results for the IBM 0.65nm have shown that the architecture is able to achieve 500MHz as maximum operation frequency. This way, the architecture throughput is enough to perform the intra prediction process for more than 30 frames per second considering high resolution digital videos, such as Full HD (1920x1080). Microeletrônica Vlsi : Circuitos integrados : Eletronica Compressao : Video Hardware design Video coding Intra-frame prediction HEVC Mode decision Fast algorithms
7	Fast algorithms for material specific process chain design and analysis in metal forming - final report DFG Priority Programme SPP 1204 / Algorithmen zur schnellen, werkstoffgerechten Prozesskettengestaltung und -analyse in der Umformtechnik 23 August 2016 (has links) (PDF) The book summarises the results of the DFG-funded coordinated priority programme \"Fast Algorithms for Material Specific Process Chain Design and Analysis in Metal Forming\". In the first part it includes articles which provide a general introduction and overview on the field of process modeling in metal forming. The second part collates the reports from all projects included in the priority programme. Prozesskettenanalyse Prozessmodellierung Schnelle Algorithmen Metallformung Process Chain Process Model Fast Algorithms Metal Forming ddc:620 Umformen Metallischer Werkstoff Prozesskette Prozessmodell Algorithmus
8	Fast algorithms for material specific process chain design and analysis in metal forming - final report DFG Priority Programme SPP 1204 Kawalla, Rudolf January 2016 (has links) The book summarises the results of the DFG-funded coordinated priority programme \"Fast Algorithms for Material Specific Process Chain Design and Analysis in Metal Forming\". In the first part it includes articles which provide a general introduction and overview on the field of process modeling in metal forming. The second part collates the reports from all projects included in the priority programme. info:eu-repo/classification/ddc/620 ddc:620
9	<b>FAST ALGORITHMS FOR MATRIX COMPUTATION AND APPLICATIONS</b> Qiyuan Pang (17565405) 10 December 2023 (has links) <p dir="ltr">Matrix decompositions play a pivotal role in matrix computation and applications. While general dense matrix-vector multiplications and linear equation solvers are prohibitively expensive, matrix decompositions offer fast alternatives for matrices meeting specific properties. This dissertation delves into my contributions to two fast matrix multiplication algorithms and one fast linear equation solver algorithm tailored for certain matrices and applications, all based on efficient matrix decompositions. Fast dimensionality reduction methods in spectral clustering, based on efficient eigen-decompositions, are also explored.</p><p dir="ltr">The first matrix decomposition introduced is the "kernel-independent" interpolative decomposition butterfly factorization (IDBF), acting as a data-sparse approximation for matrices adhering to a complementary low-rank property. Constructible in $O(N\log N)$ operations for an $N \times N$ matrix via hierarchical interpolative decompositions (IDs), the IDBF results in a product of $O(\log N)$ sparse matrices, each with $O(N)$ non-zero entries. This factorization facilitates rapid matrix-vector multiplication in $O(N \log N)$ operations, making it a versatile framework applicable to various scenarios like special function transformation, Fourier integral operators, and high-frequency wave computation.</p><p dir="ltr">The second matrix decomposition accelerates matrix-vector multiplication for computing multi-dimensional Jacobi polynomial transforms. Leveraging the observation that solutions to Jacobi's differential equation can be represented through non-oscillatory phase and amplitude functions, the corresponding matrix is expressed as the Hadamard product of a numerically low-rank matrix and a multi-dimensional discrete Fourier transform (DFT) matrix. This approach utilizes $r^d$ fast Fourier transforms (FFTs), where $r = O(\log n / \log \log n)$ and $d$ is the dimension, resulting in an almost optimal algorithm for computing the multidimensional Jacobi polynomial transform.</p><p dir="ltr">An efficient numerical method is developed based on a matrix decomposition, Hierarchical Interpolative Factorization, for solving modified Poisson-Boltzmann (MPB) equations. Addressing the computational bottleneck of evaluating Green's function in the MPB solver, the proposed method achieves linear scaling by combining selected inversion and hierarchical interpolative factorization. This innovation significantly reduces the computational cost associated with solving MPB equations, particularly in the evaluation of Green's function.</p><p dir="ltr"><br></p><p dir="ltr">Finally, eigen-decomposition methods, including the block Chebyshev-Davidson method and Orthogonalization-Free methods, are proposed for dimensionality reduction in spectral clustering. By leveraging well-known spectrum bounds of a Laplacian matrix, the Chebyshev-Davidson methods allow dimensionality reduction without the need for spectrum bounds estimation. And instead of the vanilla Chebyshev-Davidson method, it is better to use the block Chebyshev-Davidson method with an inner-outer restart technique to reduce total CPU time and a progressive polynomial filter to take advantage of suitable initial vectors when available, for example, in the streaming graph scenario. Theoretically, the Orthogonalization-Free method constructs a unitary isomorphic space to the eigenspace or a space weighting the eigenspace, solving optimization problems through Gradient Descent with Momentum Acceleration based on Conjugate Gradient and Line Search for optimal step sizes. Numerical results indicate that the eigenspace and the weighted eigenspace are equivalent in clustering performance, and scalable parallel versions of the block Chebyshev-Davidson method and OFM are developed to enhance efficiency in parallel computing.</p> Matrix computations matrix application technique matrix decomposition techniques Fast Algorithms
10	Discovering Frequent Episodes : Fast Algorithms, Connections With HMMs And Generalizations Laxman, Srivatsan 03 1900 (has links) Temporal data mining is concerned with the exploration of large sequential (or temporally ordered) data sets to discover some nontrivial information that was previously unknown to the data owner. Sequential data sets come up naturally in a wide range of application domains, ranging from bioinformatics to manufacturing processes. Pattern discovery refers to a broad class of data mining techniques in which the objective is to unearth hidden patterns or unexpected trends in the data. In general, pattern discovery is about finding all patterns of 'interest' in the data and one popular measure of interestingness for a pattern is its frequency in the data. The problem of frequent pattern discovery is to find all patterns in the data whose frequency exceeds some user-defined threshold. Discovery of temporal patterns that occur frequently in sequential data has received a lot of attention in recent times. Different approaches consider different classes of temporal patterns and propose different algorithms for their efficient discovery from the data. This thesis is concerned with a specific class of temporal patterns called episodes and their discovery in large sequential data sets. In the framework of frequent episode discovery, data (referred to as an event sequence or an event stream) is available as a single long sequence of events. The ith event in the sequence is an ordered pair, (Et,tt), where Et takes values from a finite alphabet (of event types), and U is the time of occurrence of the event. The events in the sequence are ordered according to these times of occurrence. An episode (which is the temporal pattern considered in this framework) is a (typically) short partially ordered sequence of event types. Formally, an episode is a triple, (V,<,9), where V is a collection of nodes, < is a partial order on V and 9 is a map that assigns an event type to each node of the episode. When < is total, the episode is referred to as a serial episode, and when < is trivial (or empty), the episode is referred to as a parallel episode. An episode is said to occur in an event sequence if there are events in the sequence, with event types same as those constituting the episode, and with times of occurrence respecting the partial order in the episode. The frequency of an episode is some measure of how often it occurs in the event sequence. Given a frequency definition for episodes, the task is to discover all episodes whose frequencies exceed some threshold. This is done using a level-wise procedure. In each level, a candidate generation step is used to combine frequent episodes from the previous level to build candidates of the next larger size, and then a frequency counting step makes one pass over the event stream to determine frequencies of all the candidates and thus identify the frequent episodes. Frequency counting is the main computationally intensive step in frequent episode discovery. Choice of frequency definition for episodes has a direct bearing on the efficiency of the counting procedure. In the original framework of frequent episode discovery, episode frequency is defined as the number of fixed-width sliding windows over the data in which the episode occurs at least once. Under this frequency definition, frequency counting of a set of \|C\| candidate serial episodes of size N has space complexity O(N\|C\|) and time complexity O(ΔTN\|C\|) (where ΔT is the difference between the times of occurrence of the last and the first event in the data stream). The other main frequency definition available in the literature, defines episode frequency as the number of minimal occurrences of the episode (where, a minimal occurrence is a window on the time axis containing an occurrence of the episode, such that, no proper sub-window of it contains another occurrence of the episode). The algorithm for obtaining frequencies for a set of \|C\| episodes needs O(n\|C\|) time (where n denotes the number of events in the data stream). While this is time-wise better than the the windows-based algorithm, the space needed to locate minimal occurrences of an episode can be very high (and is in fact of the order of length, n, of the event stream). This thesis proposes a new definition for episode frequency, based on the notion of, what is called, non-overlapped occurrences of episodes in the event stream. Two occurrences are said to be non-overlapped if no event corresponding to one occurrence appears in between events corresponding to the other. Frequency of an episode is defined as the maximum possible number of non-overlapped occurrences of the episode in the data. The thesis also presents algorithms for efficient frequent episode discovery under this frequency definition. The space and time complexities for frequency counting of serial episodes are O(\|C\|) and O(n\|C\|) respectively (where n denotes the total number of events in the given event sequence and \|C\| denotes the num-ber of candidate episodes). These are arguably the best possible space and time complexities for the frequency counting step that can be achieved. Also, the fact that the time needed by the non-overlapped occurrences-based algorithm is linear in the number of events, n, in the event sequence (rather than the difference, ΔT, between occurrence times of the first and last events in the data stream, as is the case with the windows-based algorithm), can result in considerable time advantage when the number of time ticks far exceeds the number of events in the event stream. The thesis also presents efficient algorithms for frequent episode discovery under expiry time constraints (according to which, an occurrence of an episode can be counted for its frequency only if the total time span of the occurrence is less than a user-defined threshold). It is shown through simulation experiments that, in terms of actual run-times, frequent episode discovery under the non-overlapped occurrences-based frequency (using the algorithms developed here) is much faster than existing methods. There is also a second frequency measure that is proposed in this thesis, which is based on, what is termed as, non-interleaved occurrences of episodes in the data. This definition counts certain kinds of overlapping occurrences of the episode. The time needed is linear in the number of events, n, in the data sequence, the size, N, of episodes and the number of candidates, \|C\|. Simulation experiments show that run-time performance under this frequency definition is slightly inferior compared to the non-overlapped occurrences-based frequency, but is still better than the run-times under the windows-based frequency. This thesis also establishes the following interesting property that connects the non-overlapped, the non-interleaved and the minimal occurrences-based frequencies of an episode in the data: the number of minimal occurrences of an episode is bounded below by the maximum number of non-overlapped occurrences of the episode, and is bounded above by the maximum number of non-interleaved occurrences of the episode in the data. Hence, non-interleaved occurrences-based frequency is an efficient alternative to that based on minimal occurrences. In addition to being superior in terms of both time and space complexities compared to all other existing algorithms for frequent episode discovery, the non-overlapped occurrences-based frequency has another very important property. It facilitates a formal connection between discovering frequent serial episodes in data streams and learning or estimating a model for the data generation process in terms of certain kinds of Hidden Markov Models (HMMs). In order to establish this connection, a special class of HMMs, called Episode Generating HMMs (EGHs) are defined. The symbol set for the HMM is chosen to be the alphabet of event types, so that, the output of EGHs can be regarded as event streams in the frequent episode discovery framework. Given a serial episode, α, that occurs in the event stream, a method is proposed to uniquely associate it with an EGH, Λα. Consider two N-node serial episodes, α and β, whose (non-overlapped occurrences-based) frequencies in the given event stream, o, are fα and fβ respectively. Let Λα and Λβ be the EGHs associated with α and β. The main result connecting episodes and EGHs states that, the joint probability of o and the most likely state sequence for Λα is more than the corresponding probability for Λβ, if and only if, fα is greater than fβ. This theoretical connection has some interesting consequences. First of all, since the most frequent serial episode is associated with the EGH having the highest data likelihood, frequent episode discovery can now be interpreted as a generative model learning exercise. More importantly, it is now possible to derive a formal test of significance for serial episodes in the data, that prescribes, for a given size of the test, a minimum frequency for the episode needed in order to declare it as statistically significant. Note that this significance test for serial episodes does not require any separate model estimation (or training). The only quantity required to assess significance of an episode is its non-overlapped occurrences-based frequency (and this is obtained through the usual counting procedure). The significance test also helps to automatically fix the frequency threshold for the frequent episode discovery process, so that it can lead to what may be termed parameterless data mining. In the framework considered so far, the input to frequent episode discovery process is a sequence of instantaneous events. However, in many applications events tend to persist for different periods of time and the durations may carry important information from a data mining perspective. This thesis extends the framework of frequent episodes to incorporate such duration information directly into the definition of episodes, so that, the patterns discovered will now carry this duration information as well. Each event in this generalized framework looks like a triple, (Ei, ti, τi), where Ei, as earlier, is the event type (from some finite alphabet) corresponding to the ith event, and ti and τi denote the start and end times of this event. The new temporal pattern, called the generalized episode, is a quadruple, (V, <, g, d), where V, < and g, as earlier, respectively denote a collection of nodes, a partial order over this collection and a map assigning event types to nodes. The new feature in the generalized episode is d, which is a map from V to 2I, where, I denotes a collection of time interval possibilities for event durations, which is defined by the user. An occurrence of a generalized episode in the event sequence consists of events with both 'correct' event types and 'correct' time durations, appearing in the event sequence in 'correct' time order. All frequency definitions for episodes over instantaneous event streams are applicable for generalized episodes as well. The algorithms for frequent episode discovery also easily extend to the case of generalized episodes. The extra design choice that the user has in this generalized framework, is the set, I, of time interval possibilities. This can be used to orient and focus the frequent episode discovery process to come up with temporal correlations involving only time durations that are of interest. Through extensive simulations the utility and effectiveness of the generalized framework are demonstrated. The new algorithms for frequent episode discovery presented in this thesis are used to develop an application for temporal data mining of some data from car engine manufacturing plants. Engine manufacturing is a heavily automated and complex distributed controlled process with large amounts of faults data logged each day. The goal of temporal data mining here is to unearth some strong time-ordered correlations in the data which can facilitate quick diagnosis of root causes for persistent problems and predict major breakdowns well in advance. This thesis presents an application of the algorithms developed here for such analysis of the faults data. The data consists of time-stamped faults logged in car engine manufacturing plants of General Motors. Each fault is logged using an extensive list of codes (which constitutes the alphabet of event types for frequent episode discovery). Frequent episodes in fault logs represent temporal correlations among faults and these can be used for fault diagnosis in the plant. This thesis describes how the outputs from the frequent episode discovery framework, can be used to help plant engineers interpret the large volumes of faults logged, in an efficient and convenient manner. Such a system, based on the algorithms developed in this thesis, is currently being used in one of the engine manufacturing plants of General Motors. Some examples of the results obtained that were regarded as useful by the plant engineers are also presented. Data Mining Databases - Data Mining Hidden Markov Models Temporal Data Mining Episodes (Temporal Patterns) Event Sequences Frequent Episodes Episode Discovery Fast Algorithms Pattern Discovery Computer Science

Search results