Global ETD Search

1	Semantic Integration across Heterogeneous Databases : Finding Data Correspondences using Agglomerative Hierarchical Clustering and Artificial Neural Networks / Semantisk integrering mellan heterogena databaser : Hitta datakopplingar med hjälp av hierarkisk klustring och artiﬁciella neuronnät Hobro, Mark January 2018 (has links) The process of data integration is an important part of the database field when it comes to database migrations and the merging of data. The research in the area has grown with the addition of machine learning approaches in the last 20 years. Due to the complexity of the research field, no go-to solutions have appeared. Instead, a wide variety of ways of enhancing database migrations have emerged. This thesis examines how well a learning-based solution performs for the semantic integration problem in database migrations. Two algorithms are implemented. One that is based on information retrieval theory, with the goal of yielding a matching result that can be used as a benchmark for measuring the performance of the machine learning algorithm. The machine learning approach is based on grouping data with agglomerative hierarchical clustering and then training a neural network to recognize patterns in the data. This allows making predictions about potential data correspondences across two databases. The results show that agglomerative hierarchical clustering performs well in the task of grouping the data into classes. The classes can in turn be used for training a neural network. The matching algorithm gives a high recall of matching tables, but improvements are needed to both receive a high recall and precision. The conclusion is that the proposed learning-based approach, using agglomerative hierarchical clustering and a neural network, works as a solid base to semi-automate the data integration problem seen in this thesis. But the solution needs to be enhanced with scenario specific algorithms and rules, to reach desired performance. / Dataintegrering är en viktig del inom området databaser när det kommer till databasmigreringar och sammanslagning av data. Forskning inom området har ökat i takt med att maskininlärning blivit ett attraktivt tillvägagångssätt under de senaste 20 åren. På grund av komplexiteten av forskningsområdet, har inga optimala lösningar hittats. Istället har flera olika tekniker framställts, som tillsammans kan förbättra databasmigreringar. Denna avhandling undersöker hur bra en lösning baserad på maskininlärning presterar för dataintegreringsproblemet vid databasmigreringar. Två algoritmer har implementerats. En är baserad på informationssökningsteori, som främst används för att ha en prestandamässig utgångspunkt för algoritmen som är baserad på maskininlärning. Den algoritmen består av ett första steg, där data grupperas med hjälp av hierarkisk klustring. Sedan tränas ett artificiellt neuronnät att hitta mönster i dessa grupperingar, för att kunna göra förutsägelser huruvida olika datainstanser har ett samband mellan två databaser. Resultatet visar att agglomerativ hierarkisk klustring presterar väl i uppgiften att klassificera den data som använts. Resultatet av matchningsalgoritmen visar på att en stor mängd av de matchande tabellerna kan hittas. Men förbättringar behöver göras för att både ge hög en hög återkallelse av matchningar och hög precision för de matchningar som hittas. Slutsatsen är att ett inlärningsbaserat tillvägagångssätt, i detta fall att använda agglomerativ hierarkisk klustring och sedan träna ett artificiellt neuronnät, fungerar bra som en basis för att till viss del automatisera ett dataintegreringsproblem likt det som presenterats i denna avhandling. För att få bättre resultat, krävs att lösningen förbättras med mer situationsspecifika algoritmer och regler. Semantic integration data integration artificial neural networks agglomerative hierarchical clustering heterogeneous databases relational data Computer Sciences Datavetenskap (datalogi)
2	Clustering the Web : Comparing Clustering Methods in Swedish / Webbklustring : En jämförelse av klustringsmetoder på svenska Hinz, Joel January 2013 (has links) Clustering -- automatically sorting -- web search results has been the focus of much attention but is by no means a solved problem, and there is little previous work in Swedish. This thesis studies the performance of three clustering algorithms -- k-means, agglomerative hierarchical clustering, and bisecting k-means -- on a total of 32 corpora, as well as whether clustering web search previews, called snippets, instead of full texts can achieve reasonably decent results. Four internal evaluation metrics are used to assess the data. Results indicate that k-means performs worse than the other two algorithms, and that snippets may be good enough to use in an actual product, although there is ample opportunity for further research on both issues; however, results are inconclusive regarding bisecting k-means vis-à-vis agglomerative hierarchical clustering. Stop word and stemmer usage results are not significant, and appear to not affect the clustering by any considerable magnitude. clustering web search results snippets k-means agglomerative hierarchical clustering bisecting k-means swedish Human Computer Interaction
3	Stream Clustering And Visualization Of Geotagged Text Data For Crisis Management Crossman, Nathaniel C. 08 June 2020 (has links) No description available. Computer Science social media services microblogging services stream text clustering micro-clustering macro-clustering k-means agglomerative hierarchical clustering visualization crisis management
4	[pt] MINERANDO O PROCESSO DE UM COQUEAMENTO RETARDADO ATRAVÉS DE AGRUPAMENTO DE ESTADOS / [en] MINING THE PROCESS OF A DELAYED COKER USING CLUSTERED STATES RAFAEL AUGUSTO GASETA FRANCA 25 November 2021 (has links) [pt] Procedimentos e processos são essenciais para garantir a qualidade de qualquer operação. Porém, o processo realizado na prática nem sempre está de acordo com o processo idealizado. Além disso, uma análise mais refinada de gargalos e inconsistências só é possível a partir do registro de eventos do processo (log). Mineração de processos (process mining) é uma área que reúne um conjunto de métodos para reconstruir, monitorar e aprimorar um processo a partir de seu registro de eventos. Mas, ao aplicar as soluções já existentes no log de uma unidade de coqueamento retardado, os resultados foram insatisfatórios. O núcleo do problema está na forma como o log está estruturado, carecendo de uma identificação de casos, essencial para a mineração do processo. Para contornar esse problema, aplicamos agrupamento hierárquico aglomerativo no log, separando as válvulas em grupos que exercem uma função na operação. Desenvolvemos uma ferramenta (PLANTSTATE) para avaliar a qualidade desses grupos no contexto da planta e ajustar conforme a necessidade do domínio. Identificando os momentos de ativação desses grupos no log chegamos a uma estrutura de sequência e paralelismo entre os grupos. Finalmente, propomos um modelo capaz de representar as relações entre os grupos, resultando em um processo que representa a operações em uma unidade de coqueamento retardado. / [en] Procedures and processes are essential to guarantee the quality of any operation. However, processes carried out in the real world are not always in accordance with the imagined process. Furthermore, a more refined analysis of obstacles and inconsistencies is only possible from the process events record (log). Process mining is an area that brings together a set of methods to rebuild, monitor and improve processes from their log. Nevertheless, when applying existing solutions to the log of a delayed coker unit, the results were unsatisfactory. The core of the problem is how the log is structured, lacking a case identification, essential for process mining. To deal with this issue, we apply agglomerative hierarchical clustering in the log, separating the valves into groups that perform a task in an operation. We developed a tool (PLANTSTATE) to assess the quality of these groups in the context of the plant and to adjust in accord to the needs of the domain. By identifying the moments of activation of these groups in the log we arrive at a structure of sequence and parallelism between the groups. Finally, we propose a model capable of representing the relationships between groups, resulting in a process that represents the operations in a delayed coker unit. [pt] APRENDIZADO DE MAQUINA [pt] COQUEAMENTO RETARDADO [pt] MINERACAO DE PROCESSOS [pt] CIENCIA DE DADOS [en] MACHINE LEARNING [en] DELAYED COKE [en] PROCESS MINING [en] DATA SCIENCE

1

Page generated in 0.1552 seconds