Global ETD Search

21	Automatic Parallelization of Simulation Code from Equation Based Simulation Languages Aronsson, Peter January 2002 (has links) <p>Modern state-of-the-art equation based object oriented modeling languages such as Modelica have enabled easy modeling of large and complex physical systems. When such complex models are to be simulated, simulation tools typically perform a number of optimizations on the underlying set of equations in the modeled system, with the goal of gaining better simulation performance by decreasing the equation system size and complexity. The tools then typically generate efficient code to obtain fast execution of the simulations. However, with increasing complexity of modeled systems the number of equations and variables are increasing. Therefore, to be able to simulate these large complex systems in an efficient way parallel computing can be exploited.</p><p>This thesis presents the work of building an automatic parallelization tool that produces an efficient parallel version of the simulation code by building a data dependency graph (task graph) from the simulation code and applying efficient scheduling and clustering algorithms on the task graph. Various scheduling and clustering algorithms, adapted for the requirements from this type of simulation code, have been implemented and evaluated. The scheduling and clustering algorithms presented and evaluated can also be used for functional dataflow languages in general, since the algorithms work on a task graph with dataflow edges between nodes.</p><p>Results are given in form of speedup measurements and task graph statistics produced by the tool. The conclusion drawn is that some of the algorithms investigated and adapted in this work give reasonable measured speedup results for some specific Modelica models, e.g. a model of a thermofluid pipe gave a speedup of about 2.5 on 8 processors in a PC-cluster. However, future work lies in finding a good algorithm that works well in general.</p> / Report code: LiU-Tek-Lic-2002:06. state-of-the-art equation object oriented modeling automatic parallelization tool data dependency graph clustering algorithms Computer science Datavetenskap
22	Recognition of Anomalous Motion Patterns in Urban Surveillance Andersson, Maria, Gustafsson, Fredrik, St-Laurent, Louis, Prevost, Donald January 2013 (has links) We investigate the unsupervised K-means clustering and the semi-supervised hidden Markov model (HMM) to automatically detect anomalous motion patterns in groups of people (crowds). Anomalous motion patterns are typically people merging into a dense group, followed by disturbances or threatening situations within the group. The application of K-means clustering and HMM are illustrated with datasets from four surveillance scenarios. The results indicate that by investigating the group of people in a systematic way with different K values, analyze cluster density, cluster quality and changes in cluster shape we can automatically detect anomalous motion patterns. The results correspond well with the events in the datasets. The results also indicate that very accurate detections of the people in the dense group would not be necessary. The clustering and HMM results will be very much the same also with some increased uncertainty in the detections. / <p>Funding Agencies\|Vinnova (Swedish Governmental Agency for Innovation Systems) under the VINNMER program\|\|</p> Clustering algorithms decision support systems hidden Markov models machine learning machine vision object segmentation pattern recognition TECHNOLOGY TEKNIKVETENSKAP
23	A Fuzzy Software Prototype For Spatial Phenomena: Case Study Precipitation Distribution Yanar, Tahsin Alp 01 October 2010 (has links) (PDF) As the complexity of a spatial phenomenon increases, traditional modeling becomes impractical. Alternatively, data-driven modeling, which is based on the analysis of data characterizing the phenomena, can be used. In this thesis, the generation of understandable and reliable spatial models using observational data is addressed. An interpretability oriented data-driven fuzzy modeling approach is proposed. The methodology is based on construction of fuzzy models from data, tuning and fuzzy model simplification. Mamdani type fuzzy models with triangular membership functions are considered. Fuzzy models are constructed using fuzzy clustering algorithms and simulated annealing metaheuristic is adapted for the tuning step. To obtain compact and interpretable fuzzy models a simplification methodology is proposed. Simplification methodology reduced the number of fuzzy sets for each variable and simplified the rule base. Prototype software is developed and mean annual precipitation data of Turkey is examined as case study to assess the results of the approach in terms of both precision and interpretability. In the first step of the approach, in which fuzzy models are constructed from data, &quot / Fuzzy Clustering and Data Analysis Toolbox&quot / , which is developed for use with MATLAB, is used. For the other steps, the optimization of obtained fuzzy models from data using adapted simulated annealing algorithm step and the generation of compact and interpretable fuzzy models by simplification algorithm step, developed prototype software is used. If the accuracy is the primary objective then the proposed approach can produce more accurate solutions for training data than geographically weighted regression method. The minimum training error value produced by the proposed approach is 74.82 mm while the error obtained by geographically weighted regression method is 106.78 mm. The minimum error value on test data is 202.93 mm. An understandable fuzzy model for annual precipitation is generated only with 12 membership functions and 8 fuzzy rules. Furthermore, more interpretable fuzzy models are obtained when Gath-Geva fuzzy clustering algorithms are used during fuzzy model construction. QA General 15707
24	Διαχωριστική ανάλυση, ταξινόμηση και ομαδοποίηση δεδομένων με εφαρμογές στο SPSS Λούκινα, Βίκυ 12 April 2013 (has links) Αρχικά, στο πρώτο μέρος της διπλωματικής εργασίας μελετώνται οι πολυδιάστατες στατιστικές τεχνικές της Διαχωριστικής Ανάλυσης και της Ταξινόμησης δεδομένων, με σκοπό το διαχωρισμό διαφορετικών ομάδων αντικειμένων και τη κατάταξη νέων αντικειμένων σε προκαθορισμένο σύνολο ομάδων με τη χρήση ενός κανόνα, αντίστοιχα. Η διαδικασία κατασκευής και αξιολόγησης των κανόνων Ταξινόμησης βασίζεται στη κανονικότητα των δεδομένων. Ενώ ο σχηματισμός των γραμμικών συναρτήσεων Fisher για το διαχωρισμό των δεδομένων, υποθέτει ίσους πίνακες διασποράς. Στη συνέχεια παρατίθεται παράδειγμα εφαρμογής των δύο παραπάνω στατιστικών τεχνικών μέσω του στατιστικού πακέτου SPSS. Στο δεύτερο μέρος, εξετάζεται η διερευνητική τεχνική της Ομαδοποίησης δεδομένων, όπου στοχεύει στην οργάνωση των τιμών των αντικειμένων σε συστάδες. Έτσι ώστε να επιτυγχάνεται η μέγιστη ομοιότητα μεταξύ των παρατηρήσεων μέσα σε κάθε ομάδα και η μέγιστη ανομοιότητα μεταξύ των συστάδων, όπου αρχικά θεωρούνται άγνωστες σε αντίθεση με τη Διαχωριστικής Ανάλυση και της Ταξινόμηση όπου θεωρούνται γνωστές. Ο πιο δημοφιλής τρόπος για τον υπολογισμό της ομοιότητας είναι η απόσταση, όμως η εφαρμογή των αλγορίθμων συσταδοποίησης είναι πιο αποδοτικοί για την ομαδοποίηση των δεδομένων. Τέλος, εφόσον οι αλγόριθμοι ομαδοποίησης χωριστούν σε δυο κατηγορίες επιδιώκεται η σύγκριση μεταξύ τους, ως προς την αποτελεσματικότητα τους, με τη χρήση του στατιστικού πακέτου SPSS. / - Διαχωρισμός Ταξινόμηση Κανόνες κατάταξης Ρυθμός σφάλματος Συνάρτηση Fisher 519.53 Regression Classification Clustering algorithms
25	Systems biological approach to Parkinson's disease Heil, Katharina Friedlinde January 2018 (has links) Parkinson’s Disease (PD) is the second most common neurodegenerative disease in the Western world. It shows a high degree of genetic and phenotypic complexity with many implicated factors, various disease manifestations but few clear causal links. Ongoing research has identified a growing number of molecular alterations linked to the disease. Dopaminergic neurons in the substantia nigra, specifically their synapses, are the key-affected region in PD. Therefore, this work focuses on understanding the disease effects on the synapse, aiming to identify potential genetic triggers and synaptic PD associated mechanisms. Currently, one of the main challenges in this area is data quality and accessibility. In order to study PD, publicly available data were systematically retrieved and analysed. 418 PD associated genes could be identified, based on mutations and curated annotations. I curated an up-to-date and complete synaptic proteome map containing a total of 6,706 proteins. Region specific datasets describing the presynapse, postsynapse and synaptosome were also delimited. These datasets were analysed, investigating similarities and differences, including reproducibility and functional interpretations. The use of Protein-Protein-Interaction Network (PPIN) analysis was chosen to gain deeper knowledge regarding specific effects of PD on the synapse. Thus I generated a customised, filtered, human specific Protein-Protein Interaction (PPI) dataset, containing 211,824 direct interactions, from four public databases. Proteomics data and PPI information allowed the construction of PPINs. These were analysed and a set of low level statistics, including modularity, clustering coefficient and node degree, explaining the network’s topology from a mathematical point of view were obtained. Apart from low-level network statistics, high-level topology of the PPINs was studied. To identify functional network subgroups, different clustering algorithms were investigated. In the context of biological networks, the underlying hypothesis is that proteins in a structural community are more likely to share common functions. Therefore I attempted to identify PD enriched communities of synaptic proteins. Once identified, they were compared amongst each other. Three community clusters could be identified as containing largely overlapping gene sets. These contain 24 PD associated genes. Apart from the known disease associated genes in these communities, a total of 322 genes was identified. Each of the three clusters is specifically enriched for specific biological processes and cellular components, which include neurotransmitter secretion, positive regulation of synapse assembly, pre- and post-synaptic membrane, scaffolding proteins, neuromuscular junction development and complement activation (classical pathway) amongst others. The presented approach combined a curated set of PD associated genes, filtered PPI information and synaptic proteomes. Various small- and large-scale analytical approaches, including PPIN topology analysis, clustering algorithms and enrichment studies identified highly PD affected synaptic proteins and subregions. Specific disease associated functions confirmed known research insights and allowed me to propose a new list of so far unknown potential disease associated genes. Due to the open design, this approach can be used to answer similar research questions regarding other complex diseases amongst others.
26	Paralelização do algoritmo DIANA com OpenMP e MPI / Parallelization of the DIANA algorithm with OpenMP and MPI Ribeiro, Hethini do Nascimento 31 August 2018 (has links) Submitted by HETHINI DO NASCIMENTO RIBEIRO (hethini.ribeiro@outlook.com) on 2018-10-08T23:20:34Z No. of bitstreams: 1 Dissertação_hethini.pdf: 1986842 bytes, checksum: f1d6e8b9be8decd1fb1e992204d2b2d0 (MD5) / Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) A FICHA CATALOGRÁFICA (Obrigatório pela ABNT NBR14724) está desconfigurada e falta número do CDU. Problema 02) Falta citação nos agradecimentos, segundo a Portaria nº 206, de 4 de setembro de 2018, todos os trabalhos que tiveram financiamento CAPES deve constar nos agradecimentos a expressão: "O presente trabalho foi realizado com apoio da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Código de Financiamento 001 Problema 03) Falta o ABSTRACT (resumo em língua estrangeira), você colocou apenas o resumo em português. Problema 04) Na lista de tabelas, a página referente a Tabela 9 está desconfigurada. Problema 05) A cidade na folha de aprovação deve ser Bauru, cidade onde foi feita a defesa. Bauru 31 de agosto de 2018 Problema 06) A paginação deve ser sequencial, iniciando a contagem na folha de rosto e mostrando o número a partir da introdução, a ficha catalográfica ficará após a folha de rosto e não deverá ser contada. OBS:-Estou encaminhando via e-mail o template/modelo das páginas pré-textuais para que você possa fazer as correções da paginação, sugerimos que siga este modelo pois ele contempla as normas da ABNT Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão on 2018-10-09T14:18:32Z (GMT) / Submitted by HETHINI DO NASCIMENTO RIBEIRO (hethini.ribeiro@outlook.com) on 2018-10-10T00:30:40Z No. of bitstreams: 1 Dissertação_hethini_corrigido.pdf: 1570340 bytes, checksum: a42848ab9f1c4352dcef8839391827a7 (MD5) / Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-10-10T14:37:37Z (GMT) No. of bitstreams: 1 ribeiro_hn_me_sjrp.pdf: 1566499 bytes, checksum: 640247f599771152e290426a2174d30f (MD5) / Made available in DSpace on 2018-10-10T14:37:37Z (GMT). No. of bitstreams: 1 ribeiro_hn_me_sjrp.pdf: 1566499 bytes, checksum: 640247f599771152e290426a2174d30f (MD5) Previous issue date: 2018-08-31 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / No início desta década havia cerca de 5 bilhões de telefones em uso gerando dados. Essa produção global aumentou aproximadamente 40% ao ano no início da década passada. Esses grandes conjuntos de dados que podem ser capturados, comunicados, agregados, armazenados e analisados, também chamados de Big Data, estão colocando desafios inevitáveis em muitas áreas e, em particular, no campo Machine Learning. Algoritmos de Machine Learning são capazes de extrair informações úteis desses grandes repositórios de dados e por este motivo está se tornando cada vez mais importante o seu estudo. Os programas aptos a realizarem essa tarefa podem ser chamados de algoritmos de classificação e clusterização. Essas aplicações são dispendiosas computacionalmente. Para citar alguns exemplos desse custo, o algoritmo Quality Threshold Clustering tem, no pior caso, complexidade O(��5). Os algoritmos hierárquicos AGNES e DIANA, por sua vez, possuem O(n²) e O(2n) respectivamente. Sendo assim, existe um grande desafio, que consiste em processar grandes quantidades de dados em um período de tempo realista, encorajando o desenvolvimento de algoritmos paralelos que se adequam ao volume de dados. O objetivo deste trabalho é apresentar a paralelização do algoritmo de hierárquico divisivo DIANA. O desenvolvimento do algoritmo foi realizado em MPI e OpenMP, chegando a ser três vezes mais rápido que a versão monoprocessada, evidenciando que embora em ambientes de memória distribuídas necessite de sincronização e troca de mensagens, para um certo grau de paralelismo é vantajosa a aplicação desse tipo de otimização para esse algoritmo. / Earlier in this decade there were about 5 billion phones in use generating data. This global production increased approximately 40% per year at the beginning of the last decade. These large datasets that can be captured, communicated, aggregated, stored and analyzed, also called Big Data, are posing inevitable challenges in many areas, and in particular in the Machine Learning field. Machine Learning algorithms are able to extract useful information from these large data repositories and for this reason their study is becoming increasingly important. The programs that can perform this task can be called classification and clustering algorithms. These applications are computationally expensive. To cite some examples of this cost, the Quality Threshold Clustering algorithm has, in the worst case, complexity O (n5). The hierarchical algorithms AGNES and DIANA, in turn, have O (n²) and O (2n) respectively. Thus, there is a great challenge, which is to process large amounts of data in a realistic period of time, encouraging the development of parallel algorithms that fit the volume of data. The objective of this work is to present the parallelization of the DIANA divisive hierarchical algorithm. The development of the algorithm was performed in MPI and OpenMP, reaching three times faster than the monoprocessed version, evidencing that although in distributed memory environments need synchronization and exchange of messages, for a certain degree of parallelism it is advantageous to apply this type of optimization for this algorithm. / 1757857 Paralelismo Algoritmos de clusterização Aprendizado de máquina Mineração de dados DIANA OpenMP MPI Parallelism Clustering algorithms Machine learning Data mining
27	Agrupamento híbrido de dados utilizando algoritmos genéticos / Hybrid clustering techniques with genetic algorithms Murilo Coelho Naldi 16 October 2006 (has links) Técnicas de Agrupamento vêm obtendo bons resultados quando utilizados em diversos problemas de análise de dados, como, por exemplo, a análise de dados de expressão gênica. Porém, uma mesma técnica de agrupamento utilizada em um mesmo conjunto de dados pode resultar em diferentes formas de agrupar esses dados, devido aos possíveis agrupamentos iniciais ou à utilização de diferentes valores para seus parâmetros livres. Assim, a obtenção de um bom agrupamento pode ser visto como um processo de otimização. Esse processo procura escolher bons agrupamentos iniciais e encontrar o melhor conjunto de valores para os parâmetros livres. Por serem métodos de busca global, Algoritmos Genéticos podem ser utilizados durante esse processo de otimização. O objetivo desse projeto de pesquisa é investigar a utilização de Técnicas de Agrupamento em conjunto com Algoritmos Genéticos para aprimorar a qualidade dos grupos encontrados por algoritmos de agrupamento, principalmente o k-médias. Esta investigação será realizada utilizando como aplicação a análise de dados de expressão gênica. Essa dissertação de mestrado apresenta uma revisão bibliográfica sobre os temas abordados no projeto, a descrição da metodologia utilizada, seu desenvolvimento e uma análise dos resultados obtidos. / Clustering techniques have been obtaining good results when used in several data analysis problems, like, for example, gene expression data analysis. However, the same clustering technique used for the same data set can result in different ways of clustering the data, due to the possible initial clustering or the use of different values for the free parameters. Thus, the obtainment of a good clustering can be seen as an optimization process. This process tries to obtain good clustering by selecting the best values for the free parameters. For being global search methods, Genetic Algorithms have been successfully used during the optimization process. The goal of this research project is to investigate the use of clustering techniques together with Genetic Algorithms to improve the quality of the clusters found by clustering algorithms, mainly the k-means. This investigation was carried out using as application the analysis of gene expression data, a Bioinformatics problem. This dissertation presents a bibliographic review of the issues covered in the project, the description of the methodology followed, its development and an analysis of the results obtained. Algoritmos de agrupamento Algoritmos evolutivos Computação bioinspirada Sistemas híbridos Bio inspired computation Clustering algorithms Evoutionary algorithms Hybrid systems
28	Some contributions to the clustering of financial time series and applications to credit default swaps / Quelques contributions aux méthodes de partitionnement automatique des séries temporelles financières, et applications aux couvertures de défaillance Marti, Gautier 10 November 2017 (has links) Nous commençons cette thèse par passer en revue l'ensemble épars de la littérature sur les méthodes de partitionnement automatique des séries temporelles financières. Ensuite, tout en introduisant les jeux de données qui ont aussi bien servi lors des études empiriques que motivé les choix de modélisation, nous essayons de donner des informations intéressantes sur l'état du marché des couvertures de défaillance peu connu du grand public sinon pour son rôle lors de la crise financière mondiale de 2007-2008. Contrairement à la majorité de la littérature sur les méthodes de partitionnement automatique des séries temporelles financières, notre but n'est pas de décrire et expliquer les résultats par des explications économiques, mais de pouvoir bâtir des modèles et autres larges systèmes d'information sur ces groupes homogènes. Pour ce faire, les fondations doivent être stables. C'est pourquoi l'essentiel des travaux entrepris et décrits dans cette thèse visent à affermir le bien-fondé de l'utilisation de ces regroupements automatiques en discutant de leur consistance et stabilité aux perturbations. De nouvelles distances entre séries temporelles financières prenant mieux en compte leur nature stochastique et pouvant être mis à profit dans les méthodes de partitionnement automatique existantes sont proposées. Nous étudions empiriquement leur impact sur les résultats. Les résultats de ces études peuvent être consultés sur www.datagrapple.com. / In this thesis we first review the scattered literature about clustering financial time series. We then try to give as much colors as possible on the credit default swap market, a relatively unknown market from the general public but for its role in the contagion of bank failures during the global financial crisis of 2007-2008, while introducing the datasets that have been used in the empirical studies. Unlike the existing body of literature which mostly offers descriptive studies, we aim at building models and large information systems based on clusters which are seen as basic building blocks: These foundations must be stable. That is why the work undertaken and described in the following intends to ground further the clustering methodologies. For that purpose, we discuss their consistency and propose alternative measures of similarity that can be plugged in the clustering methodologies. We study empirically their impact on the clusters. Results of the empirical studies can be explored at www.datagrapple.com. Apprentissage non supervisé Traitement du signal Algorithmes de partitionnement Séries temporelles Unsupervised machine learning Signal processing Clustering algorithms Big data Time series
29	Automatic Parallelization of Simulation Code from Equation Based Simulation Languages Aronsson, Peter January 2002 (has links) Modern state-of-the-art equation based object oriented modeling languages such as Modelica have enabled easy modeling of large and complex physical systems. When such complex models are to be simulated, simulation tools typically perform a number of optimizations on the underlying set of equations in the modeled system, with the goal of gaining better simulation performance by decreasing the equation system size and complexity. The tools then typically generate efficient code to obtain fast execution of the simulations. However, with increasing complexity of modeled systems the number of equations and variables are increasing. Therefore, to be able to simulate these large complex systems in an efficient way parallel computing can be exploited. This thesis presents the work of building an automatic parallelization tool that produces an efficient parallel version of the simulation code by building a data dependency graph (task graph) from the simulation code and applying efficient scheduling and clustering algorithms on the task graph. Various scheduling and clustering algorithms, adapted for the requirements from this type of simulation code, have been implemented and evaluated. The scheduling and clustering algorithms presented and evaluated can also be used for functional dataflow languages in general, since the algorithms work on a task graph with dataflow edges between nodes. Results are given in form of speedup measurements and task graph statistics produced by the tool. The conclusion drawn is that some of the algorithms investigated and adapted in this work give reasonable measured speedup results for some specific Modelica models, e.g. a model of a thermofluid pipe gave a speedup of about 2.5 on 8 processors in a PC-cluster. However, future work lies in finding a good algorithm that works well in general. / <p>Report code: LiU-Tek-Lic-2002:06.</p> state-of-the-art equation object oriented modeling automatic parallelization tool data dependency graph clustering algorithms Computer Sciences Datavetenskap (datalogi)
30	1500 Students and Only a Single Cluster? A Multimethod Clustering Analysis of Assessment Data from a Large, Structured Engineering Course Taylor Williams (13956285) 17 October 2022 (has links) <p> </p> <p>Clustering, a prevalent class of machine learning (ML) algorithms used in data mining and pattern-finding—has increasingly helped engineering education researchers and educators see and understand assessment patterns at scale. However, a challenge remains to make ML-enabled educational inferences that are useful and reliable for research or instruction, especially if those inferences influence pedagogical decisions or student outcomes. ML offers an opportunity to better personalizing learners’ experiences using those inferences, even within large engineering classrooms. However, neglecting to verify the trustworthiness of ML-derived inferences can have wide-ranging negative impacts on the lives of learners. </p> <p><br></p> <p>This study investigated what student clusters exist within the standard operational data of a large first-year engineering course (>1500 students). This course focuses on computational thinking skills for engineering design. The clustering data set included approximately 500,000 assessment data points using a consistent five-scale criterion-based grading framework. Two clustering techniques—N-TARP profiling and K-means clustering—examined criterion-based assessment data and identified student cluster sets. N-TARP profiling is an expansion of the N-TARP binary clustering method. N-TARP is well suited to this course’s assessment data because of the large and potentially high-dimensional nature of the data set. K-means clustering is one of the oldest and most widely used clustering methods in educational research, making it a good candidate for comparison. After finding clusters, their interpretability and trustworthiness were determined. The following research questions provided the structure for this study: RQ1 – What student clusters do N-TARP profiling and K-means clustering identify when applied to structured assessment data from a large engineering course? RQ2 – What are the characteristics of an average student in each cluster? and How well does the average student in each cluster represent the students of that cluster? And RQ3 – What are the strengths and limitations of using N-TARP and K-means clustering techniques with large, highly structured engineering course assessment data?</p> <p><br></p> <p>Although both K-means clustering and N-TARP profiling did identify potential student clusters, the clusters of neither method were verifiable or replicable. Such dubious results suggest that a better interpretation is that all student performance data from this course exist in a single homogeneous cluster. This study further demonstrated the utility and precision of N-TARP’s warning that the clustering results within this educational data set were not trustworthy (by using its W value). Providing this warning is rare among the thousands of available clustering methods; most clustering methods (including K-means) will return clusters regardless. When a clustering algorithm identifies false clusters that lack meaningful separation or differences, incorrect or harmful educational inferences can result. </p> Education assessment and evaluation Learning analytics Engineering education Dissertations, Academic dissertations Engineering education research clustering algorithms classroom assessment

Search results