• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 34
  • 26
  • 6
  • 4
  • 3
  • 3
  • 2
  • 1
  • Tagged with
  • 87
  • 87
  • 24
  • 13
  • 12
  • 10
  • 9
  • 9
  • 9
  • 9
  • 9
  • 8
  • 8
  • 8
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Wavelet-Based Methodology in Data Mining for Complicated Functional Data

Jeong, Myong-Kee 04 April 2004 (has links)
To handle potentially large size and complicated nonstationary functional data, we present the wavelet-based methodology in data mining for process monitoring and fault classification. Since traditional wavelet shrinkage methods for data de-noising are ineffective for the more demanding data reduction goals, this thesis presents data reduction methods based on discrete wavelet transform. Our new methods minimize objective functions to balance the tradeoff between data reduction and modeling accuracy. Several evaluation studies with four popular testing curves used in the literature and with two real-life data sets demonstrate the superiority of the proposed methods to engineering data compression and statistical data de-noising methods that are currently used to achieve data reduction goals. Further experimentation in applying a classification tree-based data mining procedure to the reduced-size data to identify process fault classes also demonstrates the excellence of the proposed methods. In this application the proposed methods, compared with analysis of original large-size data, result in lower misclassification rates with much better computational efficiency. This thesis extends the scalogram's ability for handling noisy and possibly massive data which show time-shifted patterns. The proposed thresholded scalogram is built on the fast wavelet transform, which can effectively and efficiently capture non-stationary changes in data patterns. Finally, we present a SPC procedure that adaptively determines which wavelet coefficients will be monitored, based on their shift information, which is estimated from process data. By adaptively monitoring the process, we can improve the performance of the control charts for functional data. Using a simulation study, we compare the performance of some of the recommended approaches.
62

The incorporation of World War II experiences in the life stories of alumni from the Vrije University in Amsterdam: an exploration at the crossroads between narrative, identity, and culture

Visser, Roemer Maarten Sander 15 May 2009 (has links)
For this study, twelve life stories of alumni from the Vrije Universiteit in Amsterdam, who were enrolled during the Nazi Occupation between 1940 and 1945, were collected and analyzed. Besides exploring the extent to which the interviews were co-constructed jointly by the interviewer and interviewees, this study addresses three questions. First, it acknowledges methodological concerns associated with an overabundance of narrative data, and suggests a new method for arriving at a core narrative based on the distribution of time. This core narrative can then be analyzed further. Second, it is suggested that early memories serve as identity claims; because of their congruency with the remainder of the story, they appear to foreshadow what is to come. As a result, it is argued that childhood memories merit special attention in the analysis of narratives. Third, and finally, the constraints on narratives imposed by cultural conventions, or master narratives, are explored. Narrators use a variety of strategies in order to satisfy sometimes competing demands on their narratives. It is argued that culture makes its influence felt in ways that are not always obvious, particularly if the interviewee and interviewer share the same culture.
63

Boolean factor analysis a review of a novel method of matrix decomposition and neural network Boolean factor analysis /

Upadrasta, Bharat. January 2009 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Systems Science and Industrial Engineering, 2009. / Includes bibliographical references.
64

Analýza úmrtnostních tabulek pomocí vybraných vícerozměrných statistických metod / Life tables analysis using selected multivariate statistical methods

Bršlíková, Jana January 2015 (has links)
The mortality is historically one of the most important demographic indicator and definitely reflects the maturity of each country. The objective of this diploma thesis is the comparison of mortality rates in analyzed countries around the world over time and among each other using the principle component analysis that allows assessing data different way. The big advantage of this method is minimal loss of information and quite understandable interpretation of mortality in each country. This thesis offers several interesting graphical outputs, that for example confirm higher mortality rate in Eastern European countries compared to Western European countries and show that Czech republic is country where mortality has fallen most in context of post-communist countries between 1990 and 2010. Source of the data is Human Mortality Database and all data were processed in statistical tool SPSS.
65

Interface gr?fica para redu??o de espectros ?pticos

Silva, Alberlan Lopes 16 March 2016 (has links)
Submitted by Ricardo Cedraz Duque Moliterno (ricardo.moliterno@uefs.br) on 2016-09-15T21:10:44Z No. of bitstreams: 1 dissertacao1 - F.pdf: 19663712 bytes, checksum: b4d07b5a7a0e8418855eb7151fddc47c (MD5) / Made available in DSpace on 2016-09-15T21:10:44Z (GMT). No. of bitstreams: 1 dissertacao1 - F.pdf: 19663712 bytes, checksum: b4d07b5a7a0e8418855eb7151fddc47c (MD5) Previous issue date: 2016-03-16 / This work shows the development of a graphical interface to reduce spectroscopic observations of peculiar galaxies present in the Catalogue of Southern Peculiar Galaxies and Associations (Arp and Madore 1987). The data have been observed with the Cassegrain spectrograph installed at the 1.6-m telescope of the Observat?rio do Pico dos Dias, Brazil/Laborat?rio Nacional de Astrof?sica ? Minist?rio da Ci?ncia Tecnologia e Inova??o (OOD/LNA-MCTI). The interface is called "PyOPD-Cass". In this project, the tasks present in IRAF package are sequenced together in order to produce calibrated data in flux and wavelength. Thus, the methodology used to reduce spectral data allows the user to quickly and easily insert all the necessary parameters for the spectral reduction process. The material developed in this work, within the concept of "pipeline", has the advantage of minimizing the time in the spectral reduction. As application of the computational process, we show only the spectral reduction to a selected set of 73 peculiar galaxies with emission lines observed over the long-term project carried out in OPD/LNA-MCTI ("Spectroscopic Study of Peculiar Galaxies and associations", OP2012A-009). The scientific results of these active galaxies will be discussed later in another study, already under development. All the development program was conducted in Python, further comprising PyRAF language and PyQt adapter as a basis for the final GUI architecture. / Este trabalho, aplicado na sub?rea de Astronomia Extragal?ctica, apresenta o desenvolvimento de uma interface gr?fica de redu??o de dados espectrais no ?ptico para Gal?xias Peculiares e Associa??es do Cat?logo de Arp e Madore (?A Catalogue of Southern Peculiar Galaxies and Associations?, 1987), que vem sendo observadas com o espectr?grafo Cassegrain instalado no foco f/10 do telesc?pio de 1,60m do OPD/LNA-MCTI (Observat?rio do Pico dos Dias/Laborat?rio Nacional de Astrof?sica ? Minist?rio da Ci?ncia Tecnologia e Inova??o). Denominamos a mesma neste trabalho de ?PyOPD-Cass?. Neste projeto de Computa??o Aplicada, as tarefas de redu??o de dados presentes no IRAF ser?o sequenciadas e agrupadas afim de produzir dados finais calibrados em fluxo e comprimento de onda. Dessa forma, constru?mos uma metodologia otimizada para a redu??o dos dados espectrais, a qual permitir? ao usu?rio a inser??o de maneira f?cil e r?pidade todos os par?metros necess?rios para o processo de redu??o espectral. O material desenvolvido neste trabalho, dentro do conceito de "pipeline", possui a grande vantagem de minimizar o tempo de redu??o espectral em rela??o ao processo empregado atualmente. Como aplica??o do processo computacional desenvolvido neste trabalho, ser? mostrada apenas a redu??oespectral para um conjunto selecionado de 73 gal?xias peculiarescom linhas de emiss?o observadas ao longo do projeto de longo prazo realizado no OPD/LNA-MCTI (?Estudo Espectrosc?pico de Gal?xias Peculiares e Associa??es?, OP2012A-009), de autoria do orientador deste trabalho dissertativo. Todo desenvolvimento de programa??o foi realizado no ambiente Python, integrando ainda a linguagem PyRAF e o adaptador de linguagem PyQT, como base para a arquitetura final da interface gr?fica.
66

Towards reducing bandwidth consumption in publish/subscribe systems

Ye, Yifan January 2020 (has links)
Efficient data collection is one of the key research areas for 5G and beyond, since it can reduce the network burden of transferring massive data for various data analytics and machine learning applications. Specifically, 5G offers great support for massive deployment of IoT devices, and the number of IoT devices is exploding.There are mainly two complementary ways for achieving efficient data collection: one is integrating data processing into the collection process via e.g. data filtering, aggregation; the other one is reducing the amount of the data needs to be transferred via e.g. data compression/approximation.In this thesis, efficient data collection is studied from the mentioned two perspectives. In particular, we introduce enhanced syntax and functionalities to the message queueing telemetry transport (MQTT) protocol, such as data filtering and data aggregation. Furthermore, we enhance the flexibility of MQTT by supporting customized or user-defined functions to be executed in the MQTT broker, and thus data processing in the broker will not be constrained to the predefined processing functions. Lastly, dual prediction is studied for reducing the data transmissions by maintaining the same learning model on both sides of the sender and receiver. In particular, we study and prototype least mean square (LMS) as the dual prediction algorithm. Our implementations are based on MQTT and the benefits are shown and evaluated via experiments using real IoT data. / Effektiv datainsamling är ett av de viktigaste forskningsområdena för 5G och därefter, eftersom det kan minska nätbördan för att överföra massiva data för olika dataanalyser och maskininlärningsapplikationer. Specifikt erbjuder 5G bra stöd för massiv distribution av IoT-enheter, och antalet IoT-enheter exploderar.Det finns huvudsakligen två komplementära sätt att uppnå effektiv datainsamling: ett är att integrera databehandling i insamlingsprocessen via t.ex. datafiltrering, aggregering; den andra minskar mängden data som behöver överföras via t.ex. datakomprimering / tillnärmning.I denna avhandling studeras effektiv datainsamling ur nämnda två perspektiv.I synnerhet introducerar vi förbättrad syntax och funktionalitet till meddelandekö telemetri-transportprotokollet (MQTT), till exempel datafiltrering och dataggregation. Dessutom förbättrar vi MQTT-flexibiliteten genom att stödja anpassade eller användardefinierade funktioner som ska köras i MQTT-mäklaren, och därför kommer databehandling i mäklaren inte att begränsas till de fördefinierade behandlingsfunktionerna. Slutligen studeras dubbla förutsägelser för att minska dataöverföringarna genom att bibehålla samma inlärningsmodell på båda sidornaav avsändaren och mottagaren. I synnerhet studerar och prototypar vi minst genomsnitt kvadrat (LMS) som den dubbla förutsägelsealgoritmen. Våra implementeringar är baserade på MQTT och fördelarna visas och utvärderas via experiment med verkliga IoT-data.
67

Um benchmark para avaliação de técnicas de busca no contexto de análise de Mutantes sql / A benchmark to evaluation of search techniques in the context of sql mutation analysis

Queiroz, Leonardo Teixeira 02 August 2013 (has links)
Submitted by Cássia Santos (cassia.bcufg@gmail.com) on 2014-09-08T15:43:32Z No. of bitstreams: 2 Dissertacao Leonardo T Queiroz.pdf: 3060512 bytes, checksum: 9db02d07b1a185dc6a2000968c571ae9 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2014-09-08T15:43:32Z (GMT). No. of bitstreams: 2 Dissertacao Leonardo T Queiroz.pdf: 3060512 bytes, checksum: 9db02d07b1a185dc6a2000968c571ae9 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2013-08-02 / Fundação de Amparo à Pesquisa do Estado de Goiás - FAPEG / One of the concerns in test Applications Database (ADB) is to keep the operating and computational costs low. In the context of the ADB, one way to collaborate with this assumption is ensuring that the Test Databases (TDB) are small, but effective in revealing defects of SQL statements. Such bases can be constructed or obtained by the reduction of Production Databases (PDB). In the reductions case, there are combinatorial aspects involved that require the use of a specific technique for their implementation. In this context, in response to a deficiency identified in the literature, this work aims to build and provide a benchmark to enable performance evaluation, using SQL Mutation Analysis, any search technique that intends to conduct databases reductions. Therefore, to exercise the search techniques, the benchmark was built with two scenarios where each one is composed of a PDB and a set of SQL statements. In addition, as a reference for search techniques, it also contains performance of data database randomly reduced. As a secondary objective of this work, from the experiments conducted in the construction of the benchmark, analyses were made with the results obtained to answer important questions about what factors are involved in the complexity of SQL statements in the context of Test Mutation. A key finding in this regard was on the restrictiveness of SQL commands, and this is the factor that most influences the complexity of statements. / Uma das preocupações no teste de Aplicações de Bancos de Dados (ABD) é manter o custo operacional e computacional baixo. No contexto das ABD, uma das maneiras de colaborar com essa premissa é garantir que as bases de dados de teste (BDT) sejam pequenas, porém, eficazes na revelação de defeitos de instruções SQL. Tais bases podem ser construídas ou obtidas pela redução de grandes bases de dados de produção (BDP). No caso da redução, estão envolvidos aspectos combinatórios que exigem o uso de alguma técnica para a sua realização. Neste contexto, em resposta a uma carência identificada na literatura, o presente trabalho tem como objetivo construir e disponibilizar um benchmark para possibilitar a avaliação de desempenho, utilizando a Análise de Mutantes SQL, de qualquer técnica de busca que se proponha a realizar reduções de bases de dados. Sendo assim, para exercitar as técnicas de busca, o benchmark foi construído com dois cenários, onde cada um é composto por uma BDP e um conjunto de instruções SQL. Além disso, como uma referência para as técnicas de busca, ele é composto também por resultados de desempenho de bases de dados reduzidas aleatoriamente. Como objetivo secundário deste trabalho, a partir dos experimentos conduzidos na construção do benchmark, foram feitas análises dos resultados obtidos para responder importantes questões sobre quais fatores estão envolvidos na complexidade de instruções SQL no contexto da Análise de Mutantes. Uma das principais conclusões neste sentido foi sobre a restritividade dos comandos SQL, sendo este o fator que mais influencia na complexidade das instruções.
68

Mécanismes de traitement des données dans les réseaux de capteurs sans fils dans les cas d'accès intermittent à la station de base / Data Management in Wireless Sensor Networks with Intermittent Sink Access

Dini, Cosmin 21 December 2010 (has links)
Les réseaux des capteurs sans fil sont considérés comme une alternative aux réseaux câblés afin de permettre l'installation dans des zones peu accessibles. Par conséquent, de nouveaux protocoles ont été conçus pour supporter le manque des ressources qui est spécifique à ce type de réseau. La communication entre les nœuds est réalisée par des protocoles spécifiques pour la gestion efficace de l'énergie. La gestion des données collectées par ces nœuds doit être également prise en compte car la communication entre les nœuds engendre un coût non négligeable en termes d'énergie. De plus, l'installation de ce type de réseau dans des régions lointaines facilite les attaques sur la structure des réseaux ainsi que sur les données collectées. Les mesures de sécurité envisagées amènent des coûts d'énergie supplémentaires. Un aspect souvent négligé concerne le cas où un nœud ne peut pas communiquer avec la station de base (sink node) qui collectionne et traite les données. Cependant, les nœuds continuent à accumuler des informations en suivant les plans de collection. Si la situation continue, l'espace de mémoire (storage) diminue à un point où la collection de nouvelles données n'est plus possible.Nous proposons des mécanismes pour la réduction contrôlée de données en considérant leur priorité relative. Les données sont divisées dans des unités auxquelles un niveau d'importance est alloué, en fonction des considérations d'utilité et de missions qui les utilisent. Nous proposons un ensemble de primitives (opérations) qui permettent la réduction d'espace de stockage nécessaire, tout en préservant un niveau raisonnable de résolution des informations collectées. Pour les larges réseaux à multiple nœuds, nous proposons des mécanismes pour le partage de données (data load sharing) ainsi que la redondance. Des algorithmes ont été proposés pour évaluer l'efficacité de ces techniques de gestion de données vis-à-vis de l'énergie nécessaire pour transférer les données.A travers des simulations, nous avons validé le fait que les résultats sont très utiles dans les cas à mémoire limitée (wireless nades) et pour les communications intermittentes. / Wireless Sensor Networks have evolved as an alternative to wired networks fit for quick deployments in areas with limited access. New protocols have been deviees to deal with the inherent scarcity of resources that characterizes such netvorks. Energy efficient network protocols are used for communication between nades. Data collected by wireless nades is transmitted at an energy cost and therefore carefully managed. The remote deployment of wireless networks opens the possibility of malicious attacks on the data and on the infrastructure itself. Security measures have also been devised, but they come at an energy cost. One item that has received little attention is the situation of the data sink becoming unreachable. The nodes still collect data as instructed and accumulate it. Under prolonged unavailability of the sink node, the storage space on sensor nades is used up and collecting new data is no longer feasible. Our proposal for a prioritized data reduction alleviates this problem. The collected data is divided into data units who are assigned an importance level calculated in agreement with the business case. We have proposed data reduction primitive operations that reduce the needed space while only losing a limited amount of data resolution. A multi-node deployment opens the possibility for data load sharing between the nodes as well as redundancy. Algorithms were proposed to evaluate the potential gain ofthese approaches in relation to the amount of energy spent for data transfer. The proposed approach works well in coping with fixed size data storage by trimming the low interest data in a manner that data is still usable.
69

Large Data Clustering And Classification Schemes For Data Mining

Babu, T Ravindra 12 1900 (has links)
Data Mining deals with extracting valid, novel, easily understood by humans, potentially useful and general abstractions from large data. A data is large when number of patterns, number of features per pattern or both are large. Largeness of data is characterized by its size which is beyond the capacity of main memory of a computer. Data Mining is an interdisciplinary field involving database systems, statistics, machine learning, visualization and computational aspects. The focus of data mining algorithms is scalability and efficiency. Large data clustering and classification is an important activity in Data Mining. The clustering algorithms are predominantly iterative requiring multiple scans of dataset, which is very expensive when data is stored on the disk. In the current work we propose different schemes that have both theoretical validity and practical utility in dealing with such a large data. The schemes broadly encompass data compaction, classification, prototype selection, use of domain knowledge and hybrid intelligent systems. The proposed approaches can be broadly classified as (a) compressing the data by some means in a non-lossy manner; cluster as well as classify the patterns in their compressed form directly through a novel algorithm, (b) compressing the data in a lossy fashion such that a very high degree of compression and abstraction is obtained in terms of 'distinct subsequences'; classify the data in such compressed form to improve the prediction accuracy, (c) with the help of incremental clustering, a lossy compression scheme and rough set approach, obtain simultaneous prototype and feature selection, (d) demonstrate that prototype selection and data-dependent techniques can reduce number of comparisons in multiclass classification scenario using SVMs, and (e) by making use of domain knowledge of the problem and data under consideration, we show that we obtaina very high classification accuracy with less number of iterations with AdaBoost. The schemes have pragmatic utility. The prototype selection algorithm is incremental, requiring a single dataset scan and has linear time and space requirements. We provide results obtained with a large, high dimensional handwritten(hw) digit data. The compression algorithm is based on simple concepts, where we demonstrate that classification of the compressed data improves computation time required by a factor 5 with prediction accuracy with both compressed and original data being exactly the same as 92.47%. With the proposed lossy compression scheme and pruning methods, we demonstrate that even with a reduction of distinct sequences by a factor of 6 (690 to 106), the prediction accuracy improves. Specifically, with original data containing 690 distinct subsequences, the classification accuracy is 92.47% and with appropriate choice of parameters for pruning, the number of distinct subsequences reduces to 106 with corresponding classification accuracy as 92.92%. The best classification accuracy of 93.3% is obtained with 452 distinct subsequences. With the scheme of simultaneous feature and prototype selection, we improved classification accuracy to better than that obtained with kNNC, viz., 93.58%, while significantly reducing the number of features and prototypes, achieving a compaction of 45.1%. In case of hybrid schemes based on SVM, prototypes and domain knowledge based tree(KB-Tree), we demonstrated reduction in SVM training time by 50% and testing time by about 30% as compared to complete data and improvement of classification accuracy to 94.75%. In case of AdaBoost the classification accuracy is 94.48%, which is better than those obtained with NNC and kNNC on the entire data; the training timing is reduced because of use of prototypes instead of the complete data. Another important aspect of the work is to devise a KB-Tree (with maximum depth of 4), that classifies a 10-category data in just 4 comparisons. In addition to hw data, we applied the schemes to Network Intrusion Detection Data (10% dataset of KDDCUP99) and demonstrated that the proposed schemes provided less overall cost than the reported values.
70

Réduction à la volée du volume des traces d'exécution pour l'analyse d'applications multimédia de systèmes embarqués / Online execution trace reduction for multimedia software analysis of embedded systems

Emteu Tchagou, Serge Vladimir 15 December 2015 (has links)
Le marché de l'électronique grand public est dominé par les systèmes embarqués du fait de leur puissance de calcul toujours croissante et des nombreuses fonctionnalités qu'ils proposent.Pour procurer de telles caractéristiques, les architectures des systèmes embarqués sont devenues de plus en plus complexes (pluralité et hétérogénéité des unités de traitements, exécution concurrente des tâches, ...).Cette complexité a fortement influencé leur programmabilité au point où rendre difficile la compréhension de l'exécution d'une application sur ces architectures.L'approche la plus utilisée actuellement pour l'analyse de l'exécution des applications sur les systèmes embarqués est la capture des traces d'exécution (séquences d'événements, tels que les appels systèmes ou les changements de contexte, générés pendant l'exécution des applications).Cette approche est utilisée lors des activités de test, débogage ou de profilage des applications.Toutefois, suivant certains cas d'utilisation, les traces d'exécution générées peuvent devenir très volumineuses, de l'ordre de plusieurs centaines de gigaoctets.C'est le cas des tests d'endurance ou encore des tests de validation, qui consistent à tracer l'exécution d'une application sur un système embarqué pendant de longues périodes, allant de plusieurs heures à plusieurs jours.Les outils et méthodes d'analyse de traces d'exécution actuels ne sont pas conçus pour traiter de telles quantités de données.Nous proposons une approche de réduction du volume de trace enregistrée à travers une analyse à la volée de la trace durant sa capture.Notre approche repose sur les spécificités des applications multimédia, qui sont parmi les plus importantes pour le succès des dispositifs populaires comme les Set-top boxes ou les smartphones.Notre approche a pour but de détecter automatiquement les fragments (périodes) suspectes de l'exécution d'une application afin de n'enregistrer que les parties de la trace correspondant à ces périodes d'activités.L'approche que nous proposons comporte deux étapes : une étape d'apprentissage qui consiste à découvrir les comportements réguliers de l'application à partir de la trace d'exécution, et une étape de détection d'anomalies qui consiste à identifier les comportements déviant des comportements réguliers.Les nombreuses expériences, réalisées sur des données synthétiques et des données réelles, montrent que notre approche permet d'obtenir une réduction du volume de trace enregistrée d'un ordre de grandeur avec d'excellentes performances de détection des comportements suspects. / The consumer electronics market is dominated by embedded systems due to their ever-increasing processing power and the large number of functionnalities they offer.To provide such features, architectures of embedded systems have increased in complexity: they rely on several heterogeneous processing units, and allow concurrent tasks execution.This complexity degrades the programmability of embedded system architectures and makes application execution difficult to understand on such systems.The most used approach for analyzing application execution on embedded systems consists in capturing execution traces (event sequences, such as system call invocations or context switch, generated during application execution).This approach is used in application testing, debugging or profiling.However in some use cases, execution traces generated can be very large, up to several hundreds of gigabytes.For example endurance tests, which are tests consisting in tracing execution of an application on an embedded system during long periods, from several hours to several days.Current tools and methods for analyzing execution traces are not designed to handle such amounts of data.We propose an approach for monitoring an application execution by analyzing traces on the fly in order to reduce the volume of recorded trace.Our approach is based on features of multimedia applications which contribute the most to the success of popular devices such as set-top boxes or smartphones.This approach consists in identifying automatically the suspicious periods of an application execution in order to record only the parts of traces which correspond to these periods.The proposed approach consists of two steps: a learning step which discovers regular behaviors of an application from its execution trace, and an anomaly detection step which identifies behaviors deviating from the regular ones.The many experiments, performed on synthetic and real-life datasets, show that our approach reduces the trace size by an order of magnitude while maintaining a good performance in detecting suspicious behaviors.

Page generated in 0.0726 seconds