Global ETD Search

41	Agrupamento de séries temporais em fluxos contínuos de dados / Time series clustering for data streams Cássio Martini Martins Pereira 29 October 2013 (has links) Recentemente, a área de mineração de fluxos contínuos de dados ganhou importância, a qual visa extrair informação útil a partir de conjuntos massivos e contínuos de dados que evoluem com o tempo. Uma das técnicas que mais se destaca nessa área e a de agrupamento de dados, a qual busca estruturar grandes volumes de dados em hierarquias ou partições, tais que objetos mais similares estejam em um mesmo grupo. Diversos algoritmos foram propostos nesse contexto, porém a maioria concentrou-se no agrupamento de fluxos compostos por pontos em um espaço multidimensional. Poucos trabalhos voltaram-se para o agrupamento de séries temporais, as quais se caracterizam por serem coleções de observações coletadas sequencialmente no tempo. Técnicas atuais para agrupamento de séries temporais em fluxos contínuos apresentam uma limitação na escolha da medida de similaridade, a qual na maioria dos casos e baseada em uma simples correlação, como a de Pearson. Este trabalho mostra que até para modelos clássicos de séries temporais, como os de Box e Jenkins, a correlação de Pearson não é capaz de detectar similaridade, apesar das séries serem provenientes de um mesmo modelo matemático e com mesma parametrização. Essa limitação nas técnicas atuais motivou este trabalho a considerar os modelos geradores de séries temporais, ou seja, as equações que regem sua geração, por meio de diversas medidas descritivas, tais como a Autoinformação Mútua, o Expoente de Hurst e várias outras. A hipótese considerada e que, por meio do uso de medidas descritivas, pode-se obter uma melhor caracterização do modelo gerador de séries temporais e, consequentemente, um agrupamento de maior qualidade. Nesse sentido, foi realizada uma avaliação de diversas medidas descritivas, as quais foram usadas como entrada para um novo algoritmo de agrupamento baseado em árvores, denominado TS-Stream. Experimentos com bases sintéticas compostas por diversos modelos de séries temporais foram realizados, mostrando a superioridade de TS-Stream sobre ODAC, a técnica mais popular para esta tarefa encontrada na literatura. Experimentos com séries reais provenientes de preços de ações da NYSE e NASDAQ mostraram que o uso de TS-Stream na escolha de ações, por meio da criação de uma carteira de investimentos diversificada, pode aumentar os retornos das aplicações em várias ordens de grandeza, se comparado a estratégias baseadas somente no indicador econômico Moving Average Convergence Divergence / Recently, the data streams mining area has gained importance, which aims to extract useful information from massive and continuous data sources that evolve over time. One of the most popular techniques in this area is clustering, which aims to structure large volumes of data into hierarchies or partitions, such that similar objects are placed in the same group. Several algorithms were proposed in this context, however most of them focused on the clustering of streams composed of multidimensional points. Few studies have focused on clustering streaming time series, which are characterized by being collections of observations sampled sequentially along time. Current techniques for clustering streaming time series have a limitation in the choice of the similarity measure, as most are based on a simple correlation, such as Pearson. This thesis shows that even for classic time series models, such as those from Box and Jenkins, the Pearson correlation is not capable of detecting similarity, despite dealing with series originating from the same mathematical model and the same parametrization. This limitation in current techniques motivated this work to consider time series generating models, i.e., generating equations, through the use of several descriptive measures, such as Auto Mutual Information, the Hurst Exponent and several others. The hypothesis is that through the use of several descriptive measures, a better characterization of time series generating models can be achieved, which in turn will lead to better clustering quality. In that context, several descriptive measures were evaluated and then used as input to a new tree-based clustering algorithm, entitled TS-Stream. Experiments were conducted with synthetic data sets composed of various time series models, confirming the superiority of TS-Stream when compared to ODAC, the most successful technique in the literature for this task. Experiments with real-world time series from stock market data of the NYSE and NASDAQ showed that the use of TS-Stream in the selection of stocks, by the creation of a diversified portfolio, can increase the returns of the investment in several orders of magnitude when compared to trading strategies solely based on the Moving Average Convergence Divergence financial indicator Agrupamento Aprendizado de máquina Fluxos contínuos de dados Séries temporais Clustering Data streams Machine learning Time series
42	Classificação de fluxo de dados não estacionários com aplicação em sensores identificadores de insetos / Classification of non-stationary data stream with application in sensors for insect identification. Vinicius Mourão Alves de Souza 23 May 2016 (has links) Diversas aplicações são responsáveis por gerar dados ao longo do tempo de maneira contínua, ordenada e ininterrupta em um ambiente dinâmico, denominados fluxo de dados. Entre possíveis tarefas que podem ser realizadas com estes dados, classificação é uma das mais proeminentes. Devido à natureza não estacionária do ambiente responsável por gerar os dados, as características que descrevem os conceitos das classes do problema de classificação podem se alterar ao longo do tempo. Por isso, classificadores de fluxo de dados requerem constantes atualizações em seus modelos para que a taxa de acerto se mantenha estável ao longo do tempo. Na etapa de atualização a maior parte das abordagens considera que, após a predição de cada exemplo, o seu rótulo correto é imediatamente disponibilizado sem qualquer atraso de tempo (latência nula). Devido aos altos custos do processo de rotulação, os rótulos corretos nem sempre podem ser obtidos para a maior parte dos dados ou são obtidos após um considerável atraso de tempo. No caso mais desafiador, encontram-se as aplicações em que após a etapa de classificação dos exemplos, os seus respectivos rótulos corretos nunca sã disponibilizados para o algoritmo, caso chamado de latência extrema. Neste cenário, não é possível o uso de abordagens tradicionais, sendo necessário o desenvolvimento de novos métodos que sejam capazes de manter um modelo de classificação atualizado mesmo na ausência de dados rotulados. Nesta tese, além de discutir o problema de latência na tarefa de classificação de fluxo de dados não estacionários, negligenciado por boa parte da literatura, também sã propostos dois algoritmos denominados SCARGC e MClassification para o cenário de latência extrema. Ambas as propostas se baseiam no uso de técnicas de agrupamento para a adaptação à mudanças de maneira não supervisionada. Os algoritmos propostos são intuitivos, simples e apresentam resultados superiores ou equivalentes a outros algoritmos da literatura em avaliações com dados sintéticos e reais, tanto em termos de acurácia de classificação como em tempo computacional. Aléem de buscar o avanço no estado-da-arte na área de aprendizado em fluxo de dados, este trabalho também apresenta contribuições para uma importante aplicação tecnológica com impacto social e na saúde pública. Especificamente, explorou-se um sensor óptico para a identificação automática de espécies de insetos a partir da análise de informações provenientes do batimento de asas dos insetos. Para a descrição dos dados, foi verificado que os coeficientes Mel-cepstrais apresentaram os melhores resultados entre as diferentes técnicas de processamento digital de sinais avaliadas. Este sensor é um exemplo concreto de aplicação responsável por gerar um fluxo de dados em que é necessário realizar classificações em tempo real. Durante a etapa de classificação, este sensor exige a adaptação a possíveis variações em condições ambientais, responsáveis por alterar o comportamento dos insetos ao longo do tempo. Para lidar com este problema, é proposto um Sistema com Múltiplos Classificadores que realiza a seleção dinâmica do classificador mais adequado de acordo com características de cada exemplo de teste. Em avaliações com mudanças pouco significativas nas condições ambientais, foi possível obter uma acurácia de classificação próxima de 90%, no cenário com múltiplas classes e, cerca de 95% para a identificação da espécie Aedes aegypti, considerando o treinamento com uma única classe. No cenário com mudanças significativas nos dados, foi possível obter 91% de acurácia em um problema com 5 classes e 96% para a classificação de insetos vetores de importantes doenças como dengue e zika vírus. / Many applications are able to generate data continuously over t ime in an ordered and uninterrupted way in a dynamic environment , called data streams. Among possible tasks that can be performed with these data, classification is one of the most prominent . Due to non-stationarity of the environment that generates the data, the features that describe the concepts of the classes can change over time. Thus, the classifiers that deal with data streams require constants updates in their classification models to maintain a stable accuracy over time. In the update phase, most of the approaches assume that after the classification of each example from the stream, their actual class label is available without any t ime delay (zero latency). Given the high label costs, it is more reasonable to consider that this delay could vary for the most portion of the data. In the more challenging case, there are applications with extreme latency, where in after the classification of the examples, heir actual class labels are never available to the algorithm. In this scenario, it is not possible to use traditional approaches. Thus, there is the need of new methods that are able to maintain a classification model updated in the absence of labeled data. In this thesis, besides to discuss the problem of latency to obtain actual labels in data stream classification problems, neglected by most of the works, we also propose two new algorithms to deal with extreme latency, called SCARGC and MClassification. Both algorithms are based on the use of clustering approaches to adapt to changes in an unsupervised way. The proposed algorithms are intuitive, simpleand showed superior or equivalent results in terms of accuracy and computation time compared to other approaches from literature in an evaluation on synthetic and real data. In addition to the advance in the state-of-the-art in the stream learning area, this thesis also presents contributions to an important technological application with social and public health impacts. Specifically, it was studied an optical sensor to automatically identify insect species by the means of the analysis of information coming from wing beat of insects. To describe the data, we conclude that the Mel-cepst ral coefficients guide to the best results among different evaluated digital signal processing techniques. This sensor is a concrete example of an applicat ion that generates a data st ream for which it is necessary to perform real-time classification. During the classification phase, this sensor must adapt their classification model to possible variat ions in environmental conditions, responsible for changing the behavior of insects. To address this problem, we propose a System with Multiple Classifiers that dynamically selects the most adequate classifier according to characteristics of each test example. In evaluations with minor changes in the environmental conditions, we achieved a classification accuracy close to 90% in a scenario with multiple classes and 95% when identifying Aedes aegypti species considering the training phase with only the positive class. In the scenario with considerable changes in the environmental conditions, we achieved 91% of accuracy considering 5 species and 96% to classify vector mosquitoes of important diseases as dengue and zika virus. Classificação Fluxo de dados Latência Sensor óptico Automatic insect identification Classification Data streams Latency Optical sensor
43	Meta-aprendizado aplicado a fluxos contínuos de dados / Metalearning for algorithm selection in data strams Andre Luís Debiaso Rossi 19 December 2013 (has links) Algoritmos de aprendizado de máquina são amplamente empregados na indução de modelos para descoberta de conhecimento em conjuntos de dados. Como grande parte desses algoritmos assume que os dados são gerados por uma função de distribuição estacionária, um modelo é induzido uma única vez e usado indefinidamente para a predição do rótulo de novos dados. Entretanto, atualmente, diversas aplicações, como gerenciamento de transportes e monitoramento por redes de sensores, geram fluxos contínuos de dados que podem mudar ao longo do tempo. Consequentemente, a eficácia do algoritmo escolhido para esses problemas pode se deteriorar ou outros algoritmos podem se tornar mais apropriados para as características dos novos dados. Nesta tese é proposto um método baseado em meta-aprendizado para gerenciar o processo de aprendizado em ambientes dinâmicos de fluxos contínuos de dados com o objetivo de melhorar o desempenho preditivo do sistema de aprendizado. Esse método, denominado MetaStream, seleciona regularmente o algoritmo mais promissor para os dados que estão chegando, de acordo com as características desses dados e de experiências passadas. O método proposto emprega técnicas de aprendizado de máquina para gerar o meta-conhecimento, que relaciona as características extraídas dos dados em diferentes instantes do tempo ao desempenho preditivo dos algoritmos. Entre as medidas usadas para extrair informação relevante dos dados, estão aquelas comumente empregadas em meta-aprendizado convencional com diferentes conjuntos de dados, que são adaptadas para as especificidades do cenário de fluxos, e de áreas correlatas, que consideram, por exemplo, a ordem de chegada dos dados. O MetaStream é avaliado para três conjuntos de dados reais e seis algoritmos de aprendizado diferentes. Os resultados mostram a aplicabilidade do MetaStream e sua capacidade de melhorar o desempenho preditivo geral do sistema de aprendizado em relação a um método de referência para a maioria dos problemas investigados. Deve ser observado que uma combinação de modelos mostrou-se superior ao MetaStream para dois conjuntos de dados. Assim, foram analisados os principais fatores que podem ter influenciado nos resultados observados e são indicadas possíveis melhorias do método proposto / Machine learning algorithms are widely employed to induce models for knowledge discovery in databases. Since most of these algorithms suppose that the underlying distribution of the data is stationary, a model is induced only once e it is applied to predict the label of new data indefinitely. However, currently, many real applications, such as transportation management systems and monitoring of sensor networks, generate data streams that can change over time. Consequently, the effectiveness of the algorithm chosen for these problems may deteriorate or other algorithms may become more suitable for the new data characteristics. This thesis proposes a metalearning based method for the management of the learning process in dynamic environments of data streams aiming to improve the general predictive performance of the learning system. This method, named MetaStream, regularly selects the most promising algorithm for arriving data according to its characteristics and past experiences. The proposed method employs machine learning techniques to generate metaknowledge, which relates the characteristics extracted from data in different time points to the predictive performance of the algorithms. Among the measures applied to extract relevant information are those commonly used in conventional metalearning for different data sets, which are adapted for the data stream particularities, and from other related areas that consider the order of the data stream. We evaluate MetaStream for three real data stream problems and six different learning algorithms. The results show the applicability of the MetaStream and its capability to improve the general predictive performance of the learning system compared to a baseline method for the majority of the cases investigated. It must be observed that an ensemble of models is usually superior to MetaStream. Thus, we analyzed the main factors that may have influenced the results and indicate possible improvements for the proposed method Fluxos contínuos de dados Meta-aprendizado Seleção de algoritmos Algorithm selection Data streams Metalearning
44	Detecção de novidade com aplicação a fluxos contínuos de dados / Novelty detection with application to data streams Eduardo Jaques Spinosa 20 February 2008 (has links) Neste trabalho a detecção de novidade é tratada como o problema de identificação de conceitos emergentes em dados que podem ser apresentados em um fluxo contínuo. Considerando a relação intrínseca entre tempo e novidade e os desafios impostos por fluxos de dados, uma nova abordagem é proposta. OLINDDA (OnLIne Novelty and Drift Detection Algorithm) vai além da classficação com uma classe e concentra-se no aprendizado contínuo não-supervisionado de novos conceitos. Tendo aprendido uma descrição inicial de um conceito normal, prossegue à análise de novos dados, tratando-os como um fluxo contínuo em que novos conceitos podem aparecer a qualquer momento. Com o uso de técnicas de agrupamento, OLINDDA pode empregar diversos critérios de validação para avaliar grupos em termos de sua coesão e representatividade. Grupos considerados válidos produzem conceitos que podem sofrer fusão, e cujo conhecimento é continuamente incorporado. A técnica é avaliada experimentalmente com dados artificiais e reais. O módulo de classificação com uma classe é comparado a outras técnicas de detecção de novidade, e a abordagem como um todo é analisada sob vários aspectos por meio da evolução temporal de diversas métricas. Os resultados reforçam a importância da detecção contínua de novos conceitos, assim como as dificuldades e desafios do aprendizado não-supervisionado de novos conceitos em fluxos de dados / In this work novelty detection is treated as the problem of identifying emerging concepts in data that may be presented in a continuous ow. Considering the intrinsic relationship between time and novelty and the challenges imposed by data streams, a novel approach is proposed. OLINDDA, an OnLIne Novelty and Drift Detection Algorithm, goes beyond one-class classification and focuses on the unsupervised continuous learning of novel concepts. Having learned an initial description of a normal concept, it proceeds to the analysis of new data, treating them as a continuous ow where novel concepts may appear at any time. By the use of clustering techniques, OLINDDA may employ several validation criteria to evaluate clusters in terms of their cohesiveness and representativeness. Clusters considered valid produce concepts that may be merged, and whose knowledge is continuously incorporated. The technique is experimentally evaluated with artificial and real data. The one-class classification module is compared to other novelty detection techniques, and the whole approach is analyzed from various aspects through the temporal evolution of several metrics. Results reinforce the importance of continuous detection of novel concepts, as well as the dificulties and challenges of the unsupervised learning of novel concepts in data streams Agrupamento Aprendizado não-supervisionado Detecção de novidade Fluxos contínuos de dados Clustering Data streams Novelty detection Unsupervised learning
45	Adaptação de viés indutivo de algoritmos de agrupamento de fluxos de dados / Adapting the inductive bias of data-stream clustering algorithms Marcelo Keese Albertini 11 April 2012 (has links) Diversas áreas de pesquisa são dedicadas à compreensão de fenômenos que exigem a coleta ininterrupta de sequências de amostras, denominadas fluxos de dados. Esses fenômenos frequentemente apresentam comportamento variável e são estudados por meio de indução não supervisionada baseada em agrupamento de dados. Atualmente, o processo de agrupamento tem exibido sérias limitações em sua aplicação a fluxos de dados, devido às exigências impostas pelas variações comportamentais e pelo modo de coleta de dados. Embora tem-se desenvolvido algoritmos eficientes para agrupar fluxos de dados, há a necessidade de estudos sobre a influência de variações comportamentais nos parâmetros de algoritmos (e.g., taxas de aprendizado e limiares de proximidade), as quais interferem diretamente na compreensão de fenômenos. Essa lacuna motivou esta tese, cujo objetivo foi a proposta de uma abordagem para a adaptação do viés indutivo de algoritmos de agrupamento de fluxos de dados de acordo com variações comportamentais dos fenômenos em estudo. Para cumprir esse objetivo projetou-se: i) uma abordagem baseada em uma nova arquitetura de rede neural artificial que permite avaliação de comportamento de fenômenos por meio da estimação de cadeias de Markov e entropia de Shannon; ii) uma abordagem para adaptar parâmetros de algoritmos de agrupamento tradicional de acordo com variações comportamentais em blocos sequenciais de dados; e iii) uma abordagem para adaptar parâmetros de agrupamento de acordo com a contínua avaliação da estabilidade de dados. Adicionalmente, apresenta-se nesta tese uma taxonomia de técnicas de detecção de variação comportamental de fenômenos e uma formalização para o problema de agrupamento de fluxos de dados / Several research fields have described phenomena that produce endless sequences of samples, referred to as data streams. These phenomena usually present behavior variation and are studied by means of unsupervised induction based on data clustering. In order to cope with the characteristics of data streams, researchers have designed clustering algorithms with low time and space complexity requirements. However, predefined and static parameters (thresholds, number of clusters and learning rates) found in current algorithms still limit the application of clustering to data streams. This limitation motivated this thesis, which proposes a continuous approach to evaluate behavior variations and adapt algorithm inductive bias by changing its parameters. The main contribution of this thesis is the proposal of three approaches to adapt induction bias: i) an approach based on the design of an adaptive artificial self-organizing neural network architecture that enables behavior evaluation by means of Markov chain and Shannon entropy estimations; ii) an approach to adapt traditional data clustering algorithms according to behavior variations in sequences of data chunks; and iii) an approach based on the proposed neural network architecture to continuously adapt parameters by means of the evaluation of data stability. Additionally, in order to analyze the essential characteristics of data streams, this thesis presents a formalization for the problem of data stream clustering and a taxonomy on approaches to detect behavior variations Agrupamento de dados Aprendizado de máquina Fluxosa de dados Data clustering Data streams Machine learning
46	Learning in the Presence of Skew and Missing Labels Through Online Ensembles and Meta-reinforcement Learning Vafaie, Parsa 07 September 2021 (has links) Data streams are large sequences of data, possibly endless and temporarily ordered, that are common-place in Internet of Things (IoT) applications such as intrusion detection in computer networking, fraud detection in financial institutions, real-time tumor tracking in radiotherapy and social media analysis. Algorithms learning from such streams need to be able to construct near real-time models that continuously adapt to potential changes in patterns, in order to retain high performance throughout the stream. It follows that there are numerous challenges involved in supervised learning (or so-called classification) in such environments. One of the challenges in learning from streams is multi-class imbalance, in which the rates of instances in the different class labels differ substantially. Notably, classification algorithms may become biased towards the classes with more frequent instances, sacrificing the performance of the less frequent or so-called minority classes. Further, minority instances often arrive infrequently and in bursts, making accurate model construction problematic. For example, network intrusion detection systems must be able to distinguish between normal traffic and multiple minority classes corresponding to a variety of different types of attacks. Further, having labels for all instances are often infeasible, since we might have missing or late-arriving labels. For instance, when learning from a stream regarding the task of detecting network intrusions, the true label for all instances might not be available, or it might take time until the label is made available, especially for new types of attacks. In this thesis, we contribute to the advancements of online learning from evolving streams by focusing on the above-mentioned areas of multi-class imbalance and missing labels. First, we introduce a multi-class online ensemble algorithm designed to maintain a balanced performance over all classes. Specifically, our approach samples instances with replacement while dynamically increasing the weights of under-represented classes, in order to produce models that benefit all classes. Our experimental results show that our online ensemble method performs well against multi-class imbalanced data in various datasets. We further continue our study by introducing an approach to dealing with missing labels that utilize both labelled and unlabelled data to increase a model’s performance. That is, our method utilizes labelled data for pseudo-labelling unlabelled instances, allowing the model to perform better in environments where labels are scarce. More specifically, our approach features a meta-reinforcement learning agent, trained on multiple-source streams, that can effectively select the prediction of a K nearest neighbours (K-NN) classifier as the label for unlabelled instances. Extensive experiments on benchmark datasets demonstrate the value and effectiveness of our approach and confirm that our method outperforms state-of-the-art. Machine learning Data streams Imbalanced learning Semi-supervised learning Meta-learning
47	Software Defined Radio Feature Modelling and Implementation of RDS2 Encoder Furqan, Rao Muhammad Waseem 25 January 2022 (has links) Radio broadcasting is the primary part of the infotainment system in automobile and nowadays it is not only uses for the entertainment but facilitate the users by broadcasting digital information along with audio data. Radio Data System (RDS) is the radio broadcasting communication standard that broadcasts the digitally encoded data on the conventional band of the analogue FM radio. It transfers different type of data that uses for various applications along with audio channel of the FM radio station and overcome the glitches of the car radio to big scope. It is using commercially in the infotainment system of the automobile with basic features of the RDS since fifteen years. However, it is facing issues with lower data rates as compared to other digital radio broadcasting technologies that are also using in many countries. To confront this issue, RDS forum proposed the standardization of RDS with RDS2 in 2018, So RDS2 has been standardized and published that improves the data capacity by adding more channels. Hence, developers can use these channels for broadcasting the additional information. The ultimate goal of thesis is the development of RDS2 encoder, which increases the data capacity up to four times as compared to currently RDS encoder by upgrading the data channels with the modeling of additional RDS2 features and services. This thesis proposes an approach for the implementation and validation of encoder with complete simulation of FM-RDS2 transmitter under all standardized feature modeling in the Software Defined Radio (SDR) environment. For this purpose, GNU Radio Companion (GRC) is used which offers the signal processing blocks to construct SDR modules. Using GRC, three main steps will be completed by using different libraries and tools. Implementation of RDS2 encoder and data-rate enforcer blocks with features modeling are written in C++ language. GUI designing of each block is executed in XML script language and simulation of complete FM-RDS2 transmitter using all-new custom and predefined signal processing blocks. Our determinative observation shows this approach remarkable improvements in terms of a prototype of new standard’s encoder, interpolating property of framework and decreasing the cost. info:eu-repo/classification/ddc/000 ddc:000 RDS; Datenstrom
48	High-Performance Processing of Continuous Uncertain Data Tran, Thanh Thi Lac 01 May 2013 (has links) Uncertain data has arisen in a growing number of applications such as sensor networks, RFID systems, weather radar networks, and digital sky surveys. The fact that the raw data in these applications is often incomplete, imprecise and even misleading has two implications: (i) the raw data is not suitable for direct querying, (ii) feeding the uncertain data into existing systems produces results of unknown quality. This thesis presents a system for uncertain data processing that has two key functionalities, (i) capturing and transforming raw noisy data to rich queriable tuples that carry attributes needed for query processing with quantified uncertainty, and (ii) performing query processing on such tuples, which captures changes of uncertainty as data goes through various query operators. The proposed system considers data naturally captured by continuous distributions, which is prevalent in sensing and scientific applications. The first part of the thesis addresses data capture and transformation by proposing a probabilistic modeling and inference approach. Since this task is application-specific and requires domain knowledge, this approach is demonstrated for RFID data from mobile readers. More specifically, the proposed solution involves an inference and cleaning substrate to transform raw RFID data streams to object location tuple streams where locations are inferred from raw noisy data and their uncertain values are captured by probability distributions. The second, also the main part, of this thesis examines query processing for uncertain data modeled by continuous random variables. The proposed system includes new data models and algorithms for relational processing, with a focus on aggregation and conditioning operations. For operations of high complexity, optimizations including approximations with guaranteed error bounds are considered. Then complex queries involving a mix of operations are addressed by query planning, which given a query, finds an efficient plan that meets user-defined accuracy requirements. Besides relational processing, this thesis also provides the support for user-defined functions (UDFs) on uncertain data, which aims to compute the output distribution given uncertain input and a black-box UDF. The proposed solution employs a learning-based approach using Gaussian processes to compute approximate output with error bounds, and a suite of optimizations for high performance in online settings such as data stream processing and interactive data analysis. The techniques proposed in this thesis are thoroughly evaluated using both synthetic data with controlled properties and various real-world datasets from the domains of severe weather monitoring, object tracking using RFID readers, and computational astrophysics. The experimental results show that these techniques can yield high accuracy, meet stream speeds, and outperform existing techniques such as Monte Carlo sampling for many important workloads . databases data management data models data streams processing algorithms uncertain data Computer Sciences
49	Sensor Data Streams Correlation Platform for Asthma Management Sridharan, Vaikunth 08 June 2018 (has links) No description available. Computer Science Health asthma asthma management sensor data streams Internet of Things IoT
50	New techniques for efficiently discovering frequent patterns Jin, Ruoming 01 August 2005 (has links) No description available. Computer Science Frequent Pattern Mining Graph Mining Multiple Query Optimization Mining Data Streams Knowledgeable Cache

Search results