Global ETD Search

11	Clustering Techniques for Mining and Analysis of Evolving Data Devagiri, Vishnu Manasa January 2021 (has links) The amount of data generated is on rise due to increased demand for fields like IoT, smart monitoring applications, etc. Data generated through such systems have many distinct characteristics like continuous data generation, evolutionary, multi-source nature, and heterogeneity. In addition, the real-world data generated in these fields is largely unlabelled. Clustering is an unsupervised learning technique used to group, analyze and interpret unlabelled data. Conventional clustering algorithms are not suitable for dealing with data having previously mentioned characteristics due to memory and computational constraints, their inability to handle concept drift, distributed location of data. Therefore novel clustering approaches capable of analyzing and interpreting evolving and/or multi-source streaming data are needed. The thesis is focused on building evolutionary clustering algorithms for data that evolves over time. We have initially proposed an evolutionary clustering approach, entitled Split-Merge Clustering (Paper I), capable of continuously updating the generated clustering solution in the presence of new data. Through the progression of the work, new challenges have been studied and addressed. Namely, the Split-Merge Clustering algorithm has been enhanced in Paper II with new capabilities to deal with the challenges of multi-view data applications. A multi-view or multi-source data presents the studied phenomenon/system from different perspectives (views), and can reveal interesting knowledge that is not visible when only one view is considered and analyzed. This has motivated us to continue in this direction by designing two other novel multi-view data stream clustering algorithms. The algorithm proposed in Paper III improves the performance and interpretability of the algorithm proposed in Paper II. Paper IV introduces a minimum spanning tree based multi-view clustering algorithm capable of transferring knowledge between consecutive data chunks, and it is also enriched with a post-clustering pattern-labeling procedure. The proposed and studied evolutionary clustering algorithms are evaluated on various data sets. The obtained results have demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams. They are able to adequately adapt single and multi-view clustering models by continuously integrating newly arriving data. Clustering analysis Concept drift Evolutionary clustering Machine learning Streaming data Computer Sciences Datavetenskap (datalogi)
12	Real-time road traffic events detection and geo-parsing Kumar, Saurabh 08 August 2018 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / In the 21st century, there is an increasing number of vehicles on the road as well as a limited road infrastructure. These aspects culminate in daily challenges for the average commuter due to congestion and slow moving traffic. In the United States alone, it costs an average US driver $1200 every year in the form of fuel and time. Some positive steps, including (a) introduction of the push notification system and (b) deploying more law enforcement troops, have been taken for better traffic management. However, these methods have limitations and require extensive planning. Another method to deal with traffic problems is to track the congested area in a city using social media. Next, law enforcement resources can be re-routed to these areas on a real-time basis. Given the ever-increasing number of smartphone devices, social media can be used as a source of information to track the traffic-related incidents. Social media sites allow users to share their opinions and information. Platforms like Twitter, Facebook, and Instagram are very popular among users. These platforms enable users to share whatever they want in the form of text and images. Facebook users generate millions of posts in a minute. On these platforms, abundant data, including news, trends, events, opinions, product reviews, etc. are generated on a daily basis. Worldwide, organizations are using social media for marketing purposes. This data can also be used to analyze the traffic-related events like congestion, construction work, slow-moving traffic etc. Thus the motivation behind this research is to use social media posts to extract information relevant to traffic, with effective and proactive traffic administration as the primary focus. I propose an intuitive two-step process to utilize Twitter users' posts to obtain for retrieving traffic-related information on a real-time basis. It uses a text classifier to filter out the data that contains only traffic information. This is followed by a Part-Of-Speech (POS) tagger to find the geolocation information. A prototype of the proposed system is implemented using distributed microservices architecture. Machine Learning Deep Learning Data Mining Distributed Computing
13	Hierarchical Anomaly Detection for Time Series Data Sperl, Ryan E. 07 June 2020 (has links) No description available. Computer Science Information Science time series data anomaly detection moving-average SARIMA streaming data
14	Entropy-driven Clustering of Streaming Data Nagesh Rao, Disha 23 August 2022 (has links) No description available. Computer Science Entropy Streaming Data Clustering merging clusters covariance matrix Gaussian
15	Visualizing Error in Real-Time Video Streaming Data for a Monitoring System Aditya Wardana, I Wayan Kurniawan January 2019 (has links) The aim of this master thesis is to investigate the affordances and limitations of using information visualization methods to visualize errors in real-time video streaming data. The study was carried in Red Bee Media company by following several steps including user research, prototyping, and user evaluation. The user research produced design requirements and basic tasks for the prototype. The prototype had to follow the design requirements and use information visualization techniques to visualize the error data. Next, the prototype was evaluated by 5 expert users, all Red Bee Media employees with 1,5 to 3 years experience of working with the existing Red Bee Media system. The results show the prototype obtained a higher SUS score compared to the Red Bee Media monitoring system. Based on a comparison questionnaire, the prototype also had a better visualization for each basic task compared to Red Bee Media monitoring system. The comments from the user evaluation have been categorized into 4 different labels. Those labels listed several usabilities need to be focused on when developing a video monitoring system. / Syftet med denna masteruppsats är att undersöka möjligheterna och begränsningarna av att använda informationsvisualiseringsmetoder för att visualisera fel i realtidsvideoströmningsdata. Studien genomfördes hos företaget Red Bee Media genom att följa flera steg inklusive användarstudier, framtagning av prototyper och användarutvärdering. Användarstudien gav designkrav och grundläggande uppgifter för prototypen. Prototypen måste följa designkraven och använda informationsvisualiseringstekniker för att visualisera feldata. Därefter utvärderades prototypen av 5 expertanvändare, som är medarbetare inom Red Bee Media med 1,5 till 3 års erfarenhet av att arbeta med det befintliga Red Bee Mediasystemet. Resultaten visar att prototypen erhållit ett högre SUS-poäng jämfört med Red Bee Medias nuvarande övervakningssystem. Genom ett jämförelseformulär erhöll även prototypen en bättre visualisering för varje grundläggande uppgift jämfört med Red Bee Medias övervakningssystem. Kommentarer från användarutvärderingen har kategoriserats i 4 olika kategorier. Dessa anger flertalet användningsområden som måste fokuseras på när ett övervakningssystem utvecklas. Information visualization data visualization video streaming data monitoring system. Computer and Information Sciences Data- och informationsvetenskap
16	Efficient and parallel evaluation of XQuery Li, Xiaogang 22 February 2006 (has links) No description available. Computer Science XQuery XML Streaming Data Data Intensive Computing Restructuring Compiler
17	Predicting Indoor Carbon Dioxide Concentration using Online Machine Learning : Adaptive ventilation control for exhibition halls Carlsson, Filip, Egerhag, Edvin January 2022 (has links) A problem that exhibition halls have is the balance between having good indoor air quality andminimizing energy waste due to the naturally slow decrease of CO2 concentration, which causes Heat-ing, Ventilation and Air-Conditioning systems to keep ventilating empty halls when occupants have leftthe vicinity. Several studies have been made on the topic of CO2 prediction and occupancy predictionbased on CO2 for smaller spaces such as offices and schools. However, few studies have been madefor bigger venues where a larger group of people gather. An online machine learning model using theRiver library was developed to tackle this problem by predicting the CO2 ahead of time. Five datasetswere used for training and predicting, three with real data and two with simulated data. The resultsfrom this model was compared with three already developed traditional models in order to evaluate theperformance of an online machine learning model compared to traditional models. The online machinelearning model was successful in predicting CO2 one hour ahead of time considerably faster than thetraditional models, achieving a r2 score of up to 0.95. CO2 Prediction CO2 Concentration Exhibition Halls Online Machine Learning Machine Learning Streaming Data Computer Systems Datorsystem
18	Erbium : Reconciling languages, runtimes, compilation and optimizations for streaming applications / Erbium : réconcilier les langages, les supports d'exécution, la compilation, et les optimisations pour calculs sur des flux de données Miranda, Cupertino 11 February 2013 (has links) Frappée par les rendements décroissants de la performance séquentielle et les limitations thermiques, l’industrie des microprocesseurs s’est tournée résolument vers les multiprocesseurs sur puce. Ce mouvement a ramené des problèmes anciens et difficiles sous les feux de l’actualité du développement logiciel. Les compilateurs sont l’une des pièces maitresses du puzzle permettant de poursuivre la traduction de la loi de Moore en gains de performances effectifs, gains inaccessibles sans exploiter le parallélisme de threads. Pourtant, la recherche sur les systèmes parallèles s’est concentrée sur les aspects langage et architecture, et le potentiel reste énorme en termes de compilation de programmes parallèles, d’optimisation et d’adaptation de programmes parallèles pour exploiter efficacement le matériel. Cette thèse relève ces défis en présentant Erbium, un langage de bas niveau fondé sur le traitement de flots de données, et mettant en œuvre des communications multi-producteur multi-consommateur ; un exécutif parallèle très efficace pour les architectures x86 et des variantes pour d’autres types d’architectures ; un schéma d’intégration du langage dans un compilateur illustré en tant que représentation intermédiaire dans GCC ; une étude des primitives du langage et de leurs dépendances permettant aux compilateurs d’optimiser des programmes Erbium à l’aide de transformations spécifiques aux programmes parallèles, et également à travers des formes généralisées d’optimisations classiques, telles que l’élimination de redondances partielles et l’élimination de code mort. / As transistors size and power limitations stroke computer industry, hardware parallelism arose as the solution, bringing old forgotten problems back into equation to solve the existing limitations of current parallel technologies. Compilers regain focus by being the most relevant puzzle piece in the quest for the expected computer performance improvements predicted by Moores law no longer possible without parallelism. Parallel research is mainly focused in either the language or architectural aspects, not really giving the needed attention to compiler problems, being the reason for the weak compiler support by many parallel languages or architectures, not allowing to exploit performance to the best. This thesis addresses these problems by presenting: Erbium, a low level streaming data-flow language supporting multiple producer and consumer task communication; a very efficient runtime implementation for x86 architectures also addressing other types of architectures; a compiler integration of the language as an intermediate representation in GCC; a study of the language primitives dependencies, allowing compilers to further optimise the Erbium code not only through specific parallel optimisations but also through traditional compiler optimisations, such as partial redundancy elimination and dead code elimination. Calcul sur des flux de données Représentation intermédiaire Compilation Optimisations Au moment de l'exécution Streaming data-flow Intermediate representation Compilation Optimisations Runtime
19	Approximate Clustering Algorithms for High Dimensional Streaming and Distributed Data Carraher, Lee A. 22 May 2018 (has links) No description available. Computer Engineering data clustering distributed data mining streaming data algorithms locality sensitive hashing count-min cut tree random projection
20	Approximation of OLAP queries on data warehouses Cao, Phuong Thao 20 June 2013 (has links) (PDF) We study the approximate answers to OLAP queries on data warehouses. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first introduce three specific methods: the uniform sampling, the measure-based sampling and the statistical model. We introduce also an edit distance between data warehouses with edit operations adapted for data warehouses. Then, in the OLAP data exchange, we study how to sample each source and combine the samples to approximate any OLAP query. We next consider a streaming context, where a data warehouse is built by streams of different sources. We show a lower bound on the size of the memory necessary to approximate queries. In this case, we approximate OLAP queries with a finite memory. We describe also a method to discover the statistical dependencies, a new notion we introduce. We are looking for them based on the decision tree. We apply the method to two data warehouses. The first one simulates the data of sensors, which provide weather parameters over time and location from different sources. The second one is the collection of RSS from the web sites on Internet. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre OLAP Approximate query answering OLAP data exchange Streaming data Edit distance Sampling algorithm Statistical dependencies Statistical model

Search results