Global ETD Search

131	Detecção de novidade em fluxos contínuos de dados multiclasse / Novelty detection in multiclass data streams Elaine Ribeiro de Faria Paiva 08 May 2014 (has links) Mineração de fluxos contínuos de dados é uma área de pesquisa emergente que visa extrair conhecimento a partir de grandes quantidades de dados, gerados continuamente. Detecção de novidade é uma tarefa de classificação que consiste em reconhecer que um exemplo ou conjunto de exemplos em um fluxo de dados diferem significativamente dos exemplos vistos anteriormente. Essa é uma importante tarefa para fluxos contínuos de dados, principalmente porque novos conceitos podem aparecer, desaparecer ou evoluir ao longo do tempo. A maioria dos trabalhos da literatura apresentam a detecção de novidade como uma tarefa de classificação binária. Poucos trabalhos tratam essa tarefa como multiclasse, mas usam medidas de avaliação binária. Em vários problemas, o correto seria tratar a detecção de novidade em fluxos contínuos de dados como uma tarefa multiclasse, no qual o conceito conhecido do problema é formado por uma ou mais classes, e diferentes novas classes podem aparecer ao longo do tempo. Esta tese propõe um novo algoritmo MINAS para detecção de novidade em fluxos contínuos de dados. MINAS considera que a detecção de novidade é uma tarefa multiclasse. Na fase de treinamento, MINAS constrói um modelo de decisão com base em um conjunto de exemplos rotulados. Na fase de aplicação, novos exemplos são classificados usando o modelo de decisão atual, ou marcados como desconhecidos. Grupos de exemplos desconhecidos podem formar padrões-novidade válidos, que são então adicionados ao modelo de decisão. O modelo de decisão é atualizado ao longo do fluxo a fim de refletir mudanças nas classes conhecidas e permitir inserção de padrões-novidade. Esta tese também propõe uma nova metodologia para avaliação de algoritmos para detecção de novidade em fluxos contínuos de dados. Essa metodologia associa os padrões-novidade não rotulados às classes reais do problema, permitindo assim avaliar a matriz de confusão que é incremental e retangular. Além disso, a metodologia de avaliação propõe avaliar os exemplos desconhecidos separadamente e utilizar medidas de avaliação multiclasse. Por último, esta tese apresenta uma série de experimentos executados usando o MINAS e os principais algoritmos da literatura em bases de dados artificiais e reais. Além disso, o MINAS foi aplicado a um problema real, que consiste no reconhecimento de atividades humanas usando dados de acelerômetro. Os resultados experimentais mostram o potencial do algoritmo e da metodologia propostos / Data stream mining is an emergent research area that aims to extract knowledge from large amounts of continuously generated data. Novelty detection is a classification task that assesses if an example or a set of examples differ significantly from the previously seen examples. This is an important task for data streams, mainly because new concepts may appear, disappear or evolve over time. Most of the work found in the novelty detection literature presents novelty detection as a binary classification task. A few authors treat this task as multiclass, but even they use binary evaluation measures. In several real problems, novelty detection in data streams must be treated as a multiclass task, in which, the known concept about the problem is composed by one or more classes and different new classes may appear over time. This thesis proposes a new algorithm MINAS for novelty detection in data streams. MINAS deals with novelty detection as a multiclass task. In the training phase, MINAS builds a decision model based on a labeled data set. In the application phase, new examples are classified using the decision model, or marked with an unknown profile. Groups of unknown examples can be later used to create valid novelty patterns, which are added to the current decision model. The decision model is updated as new data arrives in the stream in order to reflect changes in the known classes and to allow the addition of novelty patterns. This thesis also proposes a new methodology to evaluate classifiers for novelty detection in data streams. This methodology associates the unlabeled novelty patterns to the true problem classes, allowing the evaluation of a confusion matrix that is incremental and rectangular. In addition, the proposed methodology allows the evaluation of unknown examples separately and the use multiclass evaluation measures. Additionally, this thesis presents a set of experiments carried out comparing the MINAS algorithm and the main novelty detection algorithms found in the literature, using artificial and real data sets. Finally, MINAS was applied to a human activity recognition problem using accelerometer data. The experimental results show the potential of the proposed algorithm and methodologies Detecção de novidades Fluxos contínuos de dados Data streams Novelty detection
132	Análise da função de uma várzea na ciclagem de nitrogênio / Analysis of a floodplain\'s function in nitrogen cycling Corina Verónica Sidagis Galli 05 August 2003 (has links) Para identificar a influência de uma área de várzea do ribeirão do Feijão (São Carlos-SP) sobre a ciclagem de nitrogênio e sobre a qualidade da água superficial e subsuperficial, foram analisadas as características físicas e químicas da água e determinadas as taxas de nitrificação e desnitrificação dos sedimentos da várzea. A maior concentração dos compostos nitrogenados foi observada na água de interface subsuperficial da várzea, região mais ativa em termos de fluxos de água e materiais. As taxas de nitrificação variaram de 0,145 a 0,068 &#956mol N-NO3-.g-1.dia-1 e a rota metabólica predominante foi a autotrófica, na qual as bactérias utilizaram amônio como substrato. As taxas de desnitrificação tiveram um valor médio de 0,0081 nmol N2O.g-1.dia-1. Mediante um modelo de estimativa foi calculado que 70% da água que circula no Ribeirão do Feijão provém do lençol que flui sob terras secas e o restante das áreas de várzea da bacia. Foi observado que existe uma considerável redução das concentrações dos compostos nitrogenados, principalmente do amônio, desde as zonas ripárias mais distantes do curso do rio até o canal, passando pela área de várzea. O funcionamento da várzea como sistema de filtro e depuração das águas subsuperficiais que alimentam o rio foi evidenciada pelas características físicas e químicas da água do rio em relação ao uso do solo na bacia. / In order to identify the influence of a floodplain area of the Feijão stream (São Carlos-SP) on surface and subsurface water quality, the physical and chemical characteristics of the water were analyzed and the floodplain sediment\'s nitrification and denitrification rates were determined. The highest concentration of nitrogen compounds was observed at the floodplain\'s subsurface water interface it being the most active region with respect to water and solute flow. Nitrification rates varied between 0.145 and 0.068 &#956mol N-NO3-.g-1.day-1 and the autotrophic metabolic route dominated, in which bacteria use ammonia as a substrate. Denitrification rate average was 0.0081 nmol N2O.g-1.day-1. Through a model it was estimated that 70% of the water flowing in the Feijão stream came from the water table flowing under dry land, the remainder coming from the floodplain of the area. A significant reduction of nitrogen compound concentration, mainly ammonium, was observed between the more distant riparian zones and the river\'s channel going through the floodplain. The floodplain\'s action as a filtering system for the water reaching the river was brought out through the physical and chemical characteristics of the river water relative to land use in the catchment area. Ciclagem de nutrientes Nitrogênio Rios Várzea Foodplain Nitrogen Nutrient cycling Streams
133	Biodiversidade de diatomáceas (bacillariophyta) em córregos conservados do cerrado / Biodiversity diatoms (bacillariophyta) in preserved cerrado streams França, Alline Alves 24 March 2016 (has links) Submitted by Marlene Santos (marlene.bc.ufg@gmail.com) on 2016-09-15T13:15:36Z No. of bitstreams: 2 Dissertação - Alline Alves França - 2016.pdf: 2005555 bytes, checksum: 84feb7f2b775249a2495d3a742b1ec05 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2016-09-15T14:42:25Z (GMT) No. of bitstreams: 2 Dissertação - Alline Alves França - 2016.pdf: 2005555 bytes, checksum: 84feb7f2b775249a2495d3a742b1ec05 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2016-09-15T14:42:25Z (GMT). No. of bitstreams: 2 Dissertação - Alline Alves França - 2016.pdf: 2005555 bytes, checksum: 84feb7f2b775249a2495d3a742b1ec05 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2016-03-24 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / (Pinnularia Ehrenberg (Bacillariophyta) of pristine streams of Central Brazil). This study aimed to inventory the species of the genus Pinnularia present in pristine streams located in the cerrado biome (Midwest Brazil) between the years 2012 and 2013. The periphyton was collected in five streams in the savannah, in different substrates and seasons. Were identified 23 species, of which 17 are on the 1st occurrence of citations for the Midwest Region: P. angustivalva, P. butantanum, P. castraregina, P. divergens var. biconstricta, P. divergens var. mesoleptiformis, P. divergens var. protracta, P. gibba var. subundulata, P. meridiana var. meridiana, P. microstrauron var. rostrata, P. paulensis, P. persudetica var. persudetica, P. subgibba var. angustarea, P. subgibba var. capitada, P. superpaulensis, P. viridiformis var. minor and P. undula var. undula. Taxa that had a higher frequency of occurrence in the studied streams were P. subanglica, P. angustivalva, P. brauniana and P. butantanum. / (Pinnularia Ehrenberg (Bacillariophyta) de córregos prístinos do Brasil Central). Este estudo objetivou inventariar as espécies do gênero Pinnularia presentes em córregos prístinos localizados no bioma cerrado (Centro-Oeste do Brasil) entre os anos de 2012 e 2013. O perifíton foi coletado em cinco córregos no cerrado, em diferentes substratos e períodos sazonais. Foram identificadas 23 espécies, sendo que 17 são primeiras citações de ocorrência para a Região Centro-Oeste: P. angustivalva, P. butantanum, P. castraregina, P. divergens var. biconstricta,P. divergens var. mesoleptiformis, P. divergens var. protracta, P. gibba var. subundulata, P. meridiana var. meridiana, P. microstrauron var. rostrata, P. paulensis, P. persudetica var. persudetica, P. subgibba var. angustarea, P. subgibba var. capitada, P. superpaulensis, P. viridiformis var. minor e P. undula var. undula. Os táxons que tiveram maior freqüência de ocorrência nos córregos analisados foram P. subanglica, P. angustivalva, P. brauniana e P. butantanum. Diatomáceas Perifíton Córregos Diatom Periphyton Streams CIENCIAS BIOLOGICAS::BOTANICA
134	Agrupamento de fluxos de dados utilizando dimensão fractal / Clustering data streams using fractal dimension Christian Cesar Bones 15 March 2018 (has links) Realizar o agrupamento de fluxos de dados contínuos e multidimensionais (multidimensional data streams) é uma tarefa dispendiosa, visto que esses tipos de dados podem possuir características peculiares e que precisam ser consideradas, dentre as quais destacam-se: podem ser infinitos, tornando inviável, em muitas aplicações realizar mais de uma leitura dos dados; ponto de dados podem possuir diversas dimensões e a correlação entre as dimensões pode impactar no resultado final da análise e; são capazes de evoluir com o passar do tempo. Portanto, faz-se necessário o desenvolvimento de métodos computacionais adequados a essas características, principalmente nas aplicações em que realizar manualmente tal tarefa seja algo impraticável em razão do volume de dados, por exemplo, na análise e predição do comportamento climático. Nesse contexto, o objetivo desse trabalho de pesquisa foi propor técnicas computacionais, eficientes e eficazes, que contribuíssem para a extração de conhecimento de fluxos de dados com foco na tarefa de agrupamento de fluxos de dados similares. Assim, no escopo deste trabalho, foram desenvolvidos dois métodos para agrupamento de fluxos de dados evolutivos, multidimensionais e potencialmente infinitos, ambos baseados no conceito de dimensão fractal, até então não utilizada nesse contexto na literatura: o eFCDS, acrônimo para evolving Fractal Clustering of Data Streams, e o eFCC, acrônimo para evolving Fractal Clusters Construction. O eFCDS utiliza a dimensão fractal para mensurar a correlação, linear ou não, existente entre as dimensões dos dados de um fluxo de dados multidimensional num período de tempo. Esta medida, calculada para cada fluxo de dados, é utilizada como critério de agrupamento de fluxos de dados com comportamentos similares ao longo do tempo. O eFCC, por outro lado, realiza o agrupamento de fluxos de dados multidimensionais de acordo com dois critérios principais: comportamento ao longo do tempo, considerando a medida de correlação entre as dimensões dos dados de cada fluxo de dados, e a distribuição de dados em cada grupo criado, analisada por meio da dimensão fractal do mesmo. Ambos os métodos possibilitam ainda a identificação de outliers e constroem incrementalmente os grupos ao longo do tempo. Além disso, as soluções propostas para tratamento de correlações em fluxos de dados multidimensionais diferem dos métodos apresentados na literatura da área, que em geral utilizam técnicas de sumarização e identificação de correlações lineares aplicadas apenas à fluxos de dados unidimensionais. O eFCDS e o eFCC foram testados e confrontados com métodos da literatura que também se propõem a agrupar fluxos de dados. Nos experimentos realizados com dados sintéticos e reais, tanto o eFCDS quanto o eFCC obtiveram maior eficiência na construção dos agrupamentos, identificando os fluxos de dados com comportamento semelhante e cujas dimensões se correlacionam de maneira similar. Além disso, o eFCC conseguiu agrupar os fluxos de dados que mantiveram distribuição dos dados semelhante em um período de tempo. Os métodos possuem como uma das aplicações imediatas a extração de padrões de interesse de fluxos de dados proveniente de sensores climáticos, com o objetivo de apoiar pesquisas em Agrometeorologia. / To cluster multidimensional data streams is an expensive task since this kind of data could have some peculiarities characteristics that must be considered, among which: they are potencially infinite, making many reads impossible to perform; data can have many dimensions and the correlation among them could have an affect on the analysis; as the time pass through they are capable of evolving. Therefore, it is necessary the development of appropriate computational methods to these characteristics, especially in the areas where performing such task manually is impractical due to the volume of data, for example, in the analysis and prediction of climate behavior. In that context, the research goal was to propose efficient and effective techniques that clusters multidimensional evolving data streams. Among the applications that handles with that task, we highlight the evolving Fractal Clustering of Data Streams, and the eFCC acronym for evolving Fractal Clusters Construction. The eFCDS calculates the data streams fractal dimension to correlate the dimensions in a non-linear way and to cluster those with the biggest similarity over a period of time, evolving the clusters as new data is read. Through calculating the fractal dimension and then cluster the data streams the eFCDS applies an innovative strategy, distinguishing itself from the state-of-art methods that perform clustering using summaries techniques and linear correlation to build their clusters over unidimensional data streams. The eFCDS also identifies those data streams who showed anomalous behavior in the analyzed time period treating them as outliers. The other method devoleped is called eFCC. It also builds data streams clusters, however, they are built on a two premises basis: the data distribution should be likely the same and second the behavior should be similar in the same time period. To perform that kind of clustering the eFCC calculates the clusters fractal dimension itself and the data streams fractal dimension, following the evolution in the data, relocating the data streams from one group to another when necessary and identifying those that become outlier. Both eFCDS and eFCC were evaluated and confronted with their competitor, that also propose to cluster data streams and not only data points. Through a detailed experimental evaluation using synthetic and real data, both methods have achieved better efficiency on building the groups, better identifying data streams with similar behavior during a period of time and whose dimensions correlated in a similar way, as can be observed in the result chapter 6. Besides that, the eFCC also cluster the data streams which maintained similar data distribution over a period of time. As immediate application the methods developed in this thesis can be used to extract patterns of interest from climate sensors aiming to support researches in agrometeorology. Agrupamento de sensores Extração de conhecimento Fluxo de dados Clustering Data streams Sensors
135	Impacts of Retrogressive Thaw Slumps on the Geochemistry of Permafrost Catchments, Stony Creek Watershed, NWT Malone, Laura January 2013 (has links) Retrogressive thaw slumps are one of the most dramatic thermokarst landforms in periglacial regions. This thesis investigates the impacts of two of the largest hillslope thaw slumps on the geochemistry of periglacial streams on the Peel Plateau, Northwest Territories. It aims to describe the inorganic geochemistry of runoff across active mega-slumps, impacted and pristine tundra streams, as well as that of the ice-rich permafrost exposed in the slump headwalls. Slump runoff is characterized by elevated suspended sediments (911 g/L), high conductivity (2700 µS/cm), and high SO42- ( up to 2078 ppm). The runoff originates as a solute-rich meltwater near the slump headwall, and leaches and re-dissolves soluble salts (e.g., gypsum) as it flows along the mudflow. Conductivity increases until the runoff mixes with pristine tundra streams, diluting the slump runoff signal. SO42-/Cl- is used as a tracer to isolate the slump runoff signal in impacted waters, and suggests that the contribution of slump runoff to the Peel River has been increasing since the 1960s. retrogressive thaw slump periglacial streams geochemistry Richardson Mountains
136	A Comparative Study of Ensemble Active Learning Alabdulrahman, Rabaa January 2014 (has links) Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication records. Such types of data bring new challenges to the data mining research community. Specifically, a number of researchers have developed techniques in order to build accurate classification models against such Data Streams. Ensemble Learning, where a number of so-called base classifiers are combined in order to build a model, has shown some promise. However, a number of challenges remain. Often, the class labels of the arriving data are incorrect or missing. Furthermore, Data Stream algorithms may benefit from an online learning paradigm, where a small amount of newly arriving data is used to learn incrementally. To this end, the use of Active Learning, where the user is in the loop, has been proposed as a way to extend Ensemble Learning. Here, the hypothesis is that Active Learning would increase the performance, in terms of accuracy, ensemble size, and the time it takes to build the model. This thesis tests the validity of this hypothesis. Namely, we explore whether augmenting Ensemble Learning with an Active Learning component benefits the Data Stream Learning process. Our analysis indicates that this hypothesis does not necessarily hold for the datasets under consideration. That is, the accuracies of Active Ensemble Learning are not statistically significantly higher than when using normal Ensemble Learning. Rather, Active Learning may even cause an increase in error rate. Further, Active Ensemble Learning actually results in an increase in the time taken to build the model. However, our results indicate that Active Ensemble Learning builds accurate models against much smaller ensemble sizes, when compared to the traditional Ensemble Learning algorithms. Further, the models we build are constructed against small and incrementally growing training sets, which may be very beneficial in a real time Data Stream setting. Data Streams Ensemble Learning Active Learning Active Ensemble Learning
137	Doplnění interaktivního režimu vývojového prostředí BlueJ o podporu práce s datovody / Supplementing functionality of Integrated Development Environment BlueJ with possibility of working with streams in an interactive mode. Pešat, David January 2015 (has links) Main objective of this thesis is to extend existing functionality of Integrated Development Environment (IDE) BlueJ with possibility of working with streams in an interactive mode. This new functionality helps to falicitate and improve teaching of programming within the First Architecture methodology. First part of this thesis deals with IDE BlueJ and discusses problematic software con-structions which don´t have sufficient support in interactive mode. Main focus is put on streams. Another part of this thesis suggests possible options for extension which should be integrated to the existing functionality. Following part focuses on analysis of the proposed changes and final part discusses implementation itself and describes author´s process of realization.
138	Reach-scale predictions of the fate and transport of contaminants of emerging concern at Fourmile Creek in Ankeny, Iowa Cullin, Joseph Albert 01 May 2014 (has links) Contaminants of emerging concern (CECs) are an unregulated suite of constituents frequently detected in environmental waters, which possess the potential to cause a host of reproductive and developmental problems in humans and wildlife. Degradation pathways of several CECs are well-characterized in idealized laboratory settings, but CEC fate and transport in complex field settings is poorly understood. In the present study I use a multi-tracer solute injection to study and quantify physical transport and photodegradation in a wastewater effluent-impacted stream in Ankeny, Iowa. Conservative tracers are used to quantify physical transport processes in the stream. Use of reactive fluorescent tracers allows for isolation of the relative contribution of photodegradation within the system. Field data were used to calibrate a one-dimensional transport model, and forward modeling was then used to predict the transport of sulfamethoxazole, an antibiotic in the effluent which is susceptible to photolysis. Results show that accurate predictions of reactive CECs at the scale of stream reaches can be made using the fate and transport model based on field tracer studies. Results of this study demonstrate a framework that can be used to couple field tracer and laboratory CEC studies to accurately predict the transport and fate of CECs in streams. emerging contaminants fate and transport photolysis streams wastewater Geology
139	Query Processing Over Incomplete Data Streams Ren, Weilong 19 November 2021 (has links) No description available. Computer Science P-iDS Query Processing Incomplete Data Streams
140	Efficient Estimation of Dynamic Density Functions with Applications in Streaming Data Qahtan, Abdulhakim Ali Ali 11 May 2016 (has links) Recent advances in computing technology allow for collecting vast amount of data that arrive continuously in the form of streams. Mining data streams is challenged by the speed and volume of the arriving data. Furthermore, the underlying distribution of the data changes over the time in unpredicted scenarios. To reduce the computational cost, data streams are often studied in forms of condensed representation, e.g., Probability Density Function (PDF). This thesis aims at developing an online density estimator that builds a model called KDE-Track for characterizing the dynamic density of the data streams. KDE-Track estimates the PDF of the stream at a set of resampling points and uses interpolation to estimate the density at any given point. To reduce the interpolation error and computational complexity, we introduce adaptive resampling where more/less resampling points are used in high/low curved regions of the PDF. The PDF values at the resampling points are updated online to provide up-to-date model of the data stream. Comparing with other existing online density estimators, KDE-Track is often more accurate (as reflected by smaller error values) and more computationally efficient (as reflected by shorter running time). The anytime available PDF estimated by KDE-Track can be applied for visualizing the dynamic density of data streams, outlier detection and change detection in data streams. In this thesis work, the first application is to visualize the taxi traffic volume in New York city. Utilizing KDE-Track allows for visualizing and monitoring the traffic flow on real time without extra overhead and provides insight analysis of the pick up demand that can be utilized by service providers to improve service availability. The second application is to detect outliers in data streams from sensor networks based on the estimated PDF. The method detects outliers accurately and outperforms baseline methods designed for detecting and cleaning outliers in sensor data. The third application is to detect changes in data streams. We propose a framework based on Principal Component Analysis (PCA) that reduces the problem of detecting changes in multidimensional data into the problem of detecting changes in the projected data on the principal components. We provide a theoretical analysis, which is support by experimental results to show that utilizing PCA reflects different types of changes in data streams on the projected data over one or more principal components. Our framework is accurate in detecting changes with low computational costs and scales well for high dimensional data. data streams density estimation dynamic density change detection outtier detection

Search results