Global ETD Search

101	Localização de danos em estruturas isotrópicas com a utilização de aprendizado de máquina / Oliveira, Daniela Cabral de January 2017 (has links) Orientador: Vicente Lopes Júnior / Resumo: Este trabalho introduz uma nova metodologia de Monitoramento da Integridade de Estruturas (SHM, do inglês Structural Health Monitoring) utilizando algoritmos de aprendizado de máquina não-supervisionado para localização e detecção de dano. A abordagem foi testada em material isotrópico (placa de alumínio). Os dados experimentais foram cedidos por Rosa (2016). O banco de dados disponibilizado é abrangente e inclui medidas em diversas situações. Os transdutores piezelétricos foram colados na placa de alumínio com dimensões de 500 x 500 x 2mm, que atuam como sensores e atuadores ao mesmo tempo. Para manipulação dos dados foram analisados os sinais definindo o primeiro pacote do sinal (first packet), considerando apenas o intervalo de tempo igual ao tempo da força de excitação. Neste caso, na há interferência dos sinais refletidos nas bordas da estrutura. Os sinais são obtidos na situação sem dano (baseline) e, posteriormente nas diversas situações de dano. Como método de avaliação do quanto o dano interfere em cada caminho, foram implementadas as seguintes métricas: pico máximo, valor médio quadrático (RMSD), correlação entre os sinais, normas H2 e H∞ entre os sinais baseline e sinais com dano. Logo após o cálculo das métricas para as diversas situações de dano, foi implementado o algoritmo de aprendizado de máquina não-supervisionado K-Means no matlab e também testado no toolbox Weka. No algoritmo K-Means há a necessidade da pré-determinação do número de clusters e isto pode di... (Resumo completo, clicar acesso eletrônico abaixo) / Mestre SHM Algoritmo K-Means Algoritmo propagação de afinidade
102	Automatizace generování stopslov Krupník, Jiří January 2014 (has links) This diploma thesis focuses its point on automatization of stopwords generation as one method of pre-processing a textual documents. It analyses an influence of stopwords removal to a result of data mining tasks (classification and clustering). First the text mining techniques and frequently used algorithms are described. Methods of creating domain specific lists of stopwords are described to detail. In the end the results of large collections of text files testing and implementation methods are presented and discussed.
103	Typologie zákaznic společnosti Jdu běhat s.r.o / Typology of clients of company Jdu běhat s.r.o Španiková, Lucia January 2018 (has links) Title: Typology of clients of company Jdu běhat s.r.o Objectives: The main aim of this thesis has been to design the typology of clients of company Jdu běhat s.r.o. Methods: The data have been collected by means of quantitative marketing research. The basic set was represented by 408 attendants. In the synthetic part of thesis the typology has been devised via by methods of k-means. Results: The results of questionary research are presented through graphs in the analytical section. The synthetic section of the thesis is devoted to the description of the character of each of the three suggested segments. Keywords: running, motivation, women, typology, segmentation, k-means
104	Classificação One-Class para predição de adaptação de espécies em ambientes desconhecidos Salmazzo, Natália January 2016 (has links) Orientadora: Profa. Dra. Debora Maria Rossi de Medeiros / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2016. / O crescente aumento da exploração do meio ambiente e da biodiversidade faz com que seja necessário preservar os recursos naturais para evitar escassez e reduzir os impactos ambientais. Utilizando dados de distribuição geográfica de espécies combinados com características ambientais e, é possível gerar modelos de distribuição geográfica de espécies. Esses modelos podem ser aplicados na solução de diversos problemas relacionados à manutenção da biodiversidade e preservação das espécies, por exemplo, como auxiliar na dentição de politicas publica e cenários para o uso sustentável do meio ambiente, estudar o potencial de crescimento e proliferação de espécies invasoras, e avaliar os impactos das mudanças climáticas na biodiversidade. Este trabalho propõe um método para a geração de modelos de distribuição de espécies através da aplicação de conceitos de aprendizado de maquina adaptados para a resolução de problemas de uma classe. Os modelos gerados possibilitam a localização de áreas com características similares ao habitat natural das espécies e, dessa forma, contribuem para a sua preservação. Para avaliar a sua acácia, o método proposto foi aplicado em uma base de dados real e algumas bases de Benchmark, e comparado com uma versão do algoritmo Suporta Vector Machies, para dados com uma única classe. O SVM é um dos algoritmos mais aplicados na modelagem de distribuição de espécies e esta disponível em algumas das soluções mais utilizados pelos pesquisadores da área, como o Openmodeller 1 e o Biodiversityr2, avaliação para outras situações, como bases de dados que incluam dados de ausência de espécies bases de dados com um maior numeram de exemplos, os resultados são promissores e indicam que o aprofundamento da pesquisa nessa área pode ter impacto relevante para a modelagem de distribuição de espécies, portanto oferece uma base solida para avaliação. Os resultados mostraram que o método proposto é vi Avel e competitivo. Em muitos casos, como para dados possíveis de serem separados linearmente, o novo método apresentou resultados superiores aos do SVM. Embora ainda seja necessário estender a sua avaliação para outras situações, como bases de dados que incluam dados de ausência de espécies e bases de dados com um maior numeram de exemplos, os resultados são promissores e indicam que o aprofundamento da pesquisa nessa área pode ter impacto relevante para a modelagem de distribuição de espécies. / The increasing exploitation of the environment and biodiversity makes it necessary to preserve the natural resources to avoid scarcity and reduce environmental impacts. Using geographical species distribution data combined with environmental and ecological characteristics, geographical species distribution models can be generated. These models can be applied in solving various problems related to the maintenance of biodiversity and species conservation, such as an aid in the denition of public policies and scenarios for sustainable use of the environment, study the potential for growth and proliferation of invasive species, and assess the impacts of climate change on biodiversity. This work proposes a method for generating geographical species distribution models by applying Machine Learning concepts adapted to solving one-class problems. The generated models enable the identication of areas with similar characteristics to the natural habitat of the species and therefore contribute to its preservation. To evaluate its eectiveness, the proposed method was applied to a real database and some benchmark bases, and compared with a version of the Support Vector Machines algorithm, for one-class classication . The SVM is one of the most applied algorithms for species distribution modelling and is available in some of the solutions most used by researchers in this eld, such as openModeller 3 and BiodiversityR 4. Therefore, it provides a solid base for evaluation. The results showed that the proposed method is viable and competitive. In many cases, such as when the data can be linearly separable, the results obtained by applying the new method were better than those of SVM. Although additional research is necessary to evaluate the method in dierent situations, such as by using databases that include species absence data and databases with a large number of examples, the results are promising and indicate that further research in this area could have a relevant impact to the species distribution modelling technique. CLASSIFICAÇÃO ONE-CLASS K-MEANS ECOOCC ONE-CLASS CLASSICATION
105	Alternativní způsob měření rozvoje zemí. / Alternative approach to measuring development progress of countries. Efimenko, Valeria January 2018 (has links) This thesis studies the relationship between GDP and Social Progress Index, components of social progress model and their dimensions. Using the dataset of 49 countries and Bayesian Model Averaging (BMA) and clustering analysis we found that there is not straight relationship between GDP and SPI. By testing 15 different models for each of 3 dimension (Basic Human Needs, Foundations of Wellbeing and Opportunity) of SPI we have found that the best variation of components would be to include all of them for each dimension. By using BMA approach we have found that the best model of SPI out of 12 components includes only intercept, tolerance and inclusion variables. The rest of components show quite low probability of inclusion, however, none of them showed 0 posterior probability. JEL Classification A13, C11, E01, I30, Keywords Kuznets, progress, SPI, GDP, BMA Author's e-mail valeria.e.efimenko@gmail.com Supervisor's e-mail daniel.vach@gmail.com
106	Agrupamento em análise estatística de formas ARAÚJO, Luiz Henrique Gama Dore de 27 February 2008 (has links) Submitted by (ana.araujo@ufrpe.br) on 2016-08-03T13:41:09Z No. of bitstreams: 1 Luis Henrique Gama Dore de Araujo.pdf: 1262882 bytes, checksum: 228850aa4903df4934951d776241b9de (MD5) / Made available in DSpace on 2016-08-03T13:41:09Z (GMT). No. of bitstreams: 1 Luis Henrique Gama Dore de Araujo.pdf: 1262882 bytes, checksum: 228850aa4903df4934951d776241b9de (MD5) Previous issue date: 2008-02-27 / In this work, the k-means algorithm proposed by Hartigan and Wong is adapted to the case of random element observations in general metric space. Simulation results show that the performance of the algorithm in the case when the metric space is the shape space of the plane configurations, is independent on the choice of the usual shape metrics, more precisely the regular, complete and partial Procrustes distance. Besides, this modified version of the algorithm, applied to the shape space with any of the three metrics, exhibits the same performance as the original algorithm applied to the partial tangent Procrustes coordinates. The current study was motivated by the problem of identification of species of half-beak fish Hemiramphus balao and Hemiramphus brasiliensis.Currently, the parameters used for identification of these species are subject to certain operational difficulties, which often result in erroneous classification of the specimens. The algorithm was used to perform clustering of shape configuration samples, and two groups with statistically distinct shapes have been identified. These groups exhibit a pronounced difference regarding position of the head in relation to the body: for one group the head is slightly inclined upwards, while for the other group the head is slightly inclined downwards. Observation of these characteristics on the photos of fish specimens on which the two species were correctly classified, leads to identification of group 1 as Hemirapmphus balao and group 2 as species Hemiramphus brasiliensis. Therefore, head position with relation to body (which represents information entirely on the specimen shape) represents a rather robust parameter for identification of species. / Neste trabalho, o algoritmo k-médias proposto por Hartigan e Wong foi adaptado para o caso no qual se tem observações de um elemento aleatório sobre um espaço métrico arbitrário. Resultados de simulações indicam que o desempenho do algoritmo, no caso em que o espaço métrico é o espaço das formas de configurações planas, é invariante com relação às três métricas de forma usuais a saber, as distâncias de Procrustes completa e parcial e a distância de Procrustes. Além disso, a versão modificada do algoritmo, quando aplicada no espaço das formas com qualquer uma destas três métricas, apresenta o mesmo desempenho do algoritmo original aplicado às coordenadas de Procrustes tangentes parciais. Um problema na identificação das espécies de peixes-agulhas Hemiramphus balao e Hemiramphus brasiliensis motivou este estudo. Atualmente, os parâmetros de identificação utilizados apresentam alguns problemas operacionais os quais permitem, em muitos casos, que peixes-agulha de uma espécie sejam classificados como da outra. O algoritmo foi utilizado para agrupar uma amostra das formas de configurações destes peixes e dois grupos com padrões de forma estatisticamente distintos foram encontrados. Estes grupos apresentaram uma diferença marcante na posição da cabeça com relação ao resto do corpo: no grupo 1 a cabeça é levemente inclinada para cima enquanto que no grupo 2 a cabeça é levemente inclinada para baixo. A observação destas características em fotos de peixes-agulha nas quais as duas espécies foram corretamente identificadas, permitiu constatar que o grupo 1 corresponde à espécie Hemirapmphus balao e o grupo 2 à espécie Hemiramphus brasiliensis. Dessa maneira, a posição da cabeça com relação ao resto do corpo (a qual é uma informação totalmente baseada na forma do peixe), pode ser utilizada como um parâmetro bastante robusto para identificação de sua espécie. Análise estatística K-médias Agrupamentos K-means Clustering
107	A Machine Learning Approach for Studying Linked Residential Burglaries Márquez, Ángela Marqués January 2014 (has links) Context. Multiple studies demonstrate that most of the residential burglaries are committed by a few offenders. Statistics collected by the Swedish National Council for Crime Prevention show that the number of residential burglary varies from year to year. But this value normally increases. Besides, around half of all reported burglaries occur in big cities and only some burglaries occur in sparsely-populated areas. Thus, law enforcement agencies need to study possible linked residential burglaries for their investigations. Linking crime-reports is a difficult task and currently there is not a systematic way to do it. Objectives. This study presents an analysis of the different features of the collected residential burglaries by the law enforcement in Sweden. The objective is to study the possibility of linking crimes depending on these features. The characteristics used are residential features, modus operandi, victim features, goods stolen, difference of days and distance between crimes. Methods. To reach the objectives, quasi experiment and repeated measures are used. To obtain the distance between crimes, routes using Google maps are used. Different cluster methods are investigated in order to obtain the best cluster solution for linking residential burglaries. In addition, the study compares different algorithms in order to identify which algorithm offers the best performance in linking crimes. Results. Clustering quality is measured using different methods, Rule of Thumb, the Elbow method and Silhouette. To evaluate these measurements, ANOVA, Tukey and Fisher’s test are used. Silhouette presents the greatest quality level compared to other methods. Other clustering algorithms present similar average Silhouette width, and therefore, similar quality clustering. Results also show that distance, days and residential features are the most important features to link crimes. Conclusions. The clustering suggestion denotes that it is possible to reduce the amount of burglaries cases. This reduction is done by finding linked residential burglaries. Having done the clustering, the results have to be investigated by law enforcement. k-means algorithm residential burglaries cluster analysis below the abstract. Computer Sciences Datavetenskap (datalogi)
108	A Novel 3-D Segmentation Algorithm for Anatomic Liver and Tumor Volume Calculations for Liver Cancer Treatment Planning Goryawala, Mohammed 23 March 2012 (has links) Three-Dimensional (3-D) imaging is vital in computer-assisted surgical planning including minimal invasive surgery, targeted drug delivery, and tumor resection. Selective Internal Radiation Therapy (SIRT) is a liver directed radiation therapy for the treatment of liver cancer. Accurate calculation of anatomical liver and tumor volumes are essential for the determination of the tumor to normal liver ratio and for the calculation of the dose of Y-90 microspheres that will result in high concentration of the radiation in the tumor region as compared to nearby healthy tissue. Present manual techniques for segmentation of the liver from Computed Tomography (CT) tend to be tedious and greatly dependent on the skill of the technician/doctor performing the task. This dissertation presents the development and implementation of a fully integrated algorithm for 3-D liver and tumor segmentation from tri-phase CT that yield highly accurate estimations of the respective volumes of the liver and tumor(s). The algorithm as designed requires minimal human intervention without compromising the accuracy of the segmentation results. Embedded within this algorithm is an effective method for extracting blood vessels that feed the tumor(s) in order to plan effectively the appropriate treatment. Segmentation of the liver led to an accuracy in excess of 95% in estimating liver volumes in 20 datasets in comparison to the manual gold standard volumes. In a similar comparison, tumor segmentation exhibited an accuracy of 86% in estimating tumor(s) volume(s). Qualitative results of the blood vessel segmentation algorithm demonstrated the effectiveness of the algorithm in extracting and rendering the vasculature structure of the liver. Results of the parallel computing process, using a single workstation, showed a 78% gain. Also, statistical analysis carried out to determine if the manual initialization has any impact on the accuracy showed user initialization independence in the results. The dissertation thus provides a complete 3-D solution towards liver cancer treatment planning with the opportunity to extract, visualize and quantify the needed statistics for liver cancer treatment. Since SIRT requires highly accurate calculation of the liver and tumor volumes, this new method provides an effective and computationally efficient process required of such challenging clinical requirements. 3D reconstruction liver segmentation parallel computing K-means algorithm image segmentation MatLab
109	Color Range Determination and Alpha Matting for Color Images Luo, Zhenyi January 2011 (has links) This thesis proposes a new chroma keying method that can automatically detect background, foreground, and unknown regions. For background color detection, we use K-means clustering in color space to calculate the limited number of clusters of background colors. We use spatial information to clean the background regions and minimize the unknown regions. Our method only needs minimum inputs from user. For unknown regions, we implement the alpha matte based on Wang's robust matting algorithm, which is considered one of the best algorithms in the literature, if not the best. Wang's algorithm is based on modified random walk. We proposed a better color selection method, which improves matting results in the experiments. In the thesis, a detailed implementation of robust matting is provided. The experimental results demonstrate that our proposed method can handle images with one background color, images with gridded background, and images with difficult regions such as complex hair stripes and semi-transparent clothes. Chroma keying Alpha matting Foreground extraction Modified random walk K-means
110	Apports bioinformatiques et statistiques à l'identification d'inhibiteurs du récepteur MET / Bioinformatics and statistical contributions to the identification of inhibitors for the MET receptor Apostol, Costin 21 December 2010 (has links) L’effet des polysaccharides sur l’interaction HGF-MET est étudié à l’aide d’un plan d’expérience comportant plusieurs puces à protéines sous différentes conditions d’expérimentation. Le but de l’analyse est la sélection des meilleurs polysaccharides inhibiteurs de l’interaction HGF-MET. D’un point de vue statistique c’est un problème de classification. Le traitement informatique et statistique des biopuces obtenues nécessite la mise en place de la plateforme PASE avec des plug-ins d’analyse statistique pour ce type de données. La principale caractéristique statistique de ces données est le caractère de répétition : l’expérience est répétée sur 5 puces et les polysaccharides, au sein d’une même puce, sont répliqués 3 fois. On n’est donc plus dans le cas classique des données indépendantes globalement, mais de celui d’une indépendance seulement au niveau intersujets et intrasujet. Nous proposons les modèles mixtes pour la normalisation des données et la représentation des sujets par la fonction de répartition empirique. L’utilisation de la statistique de Kolmogorov-Smirnov apparaît naturelle dans ce contexte et nous étudions son comportement dans les algorithmes de classification de type nuées dynamique et hiérarchique. Le choix du nombre de classes ainsi que du nombre de répétitions nécessaires pour une classification robuste sont traités en détail. L’efficacité de cette méthodologie est mesurée sur des simulations et appliquée aux données HGF-MET. Les résultats obtenus ont aidé au choix des meilleurs polysaccharides dans les essais effectués par les biologistes et les chimistes de l’Institut de Biologie de Lille. Certains de ces résultats ont aussi conforté l’intuition des ces chercheurs. Les scripts R implémentant cette méthodologie sont intégrés à la plateforme PASE. L’utilisation de l’analyse des données fonctionnelles sur ce type de données fait partie des perspectives immédiates de ce travail. / The effect of polysaccharides on HGF-MET interaction was studied using an experimental design with several microarrays under different experimental conditions. The purpose of the analysis is the selection of the best polysaccharides, inhibitors of HGF-MET interaction. From a statistical point of view this is a classification problem. Statistical and computer processing of the obtained microarrays requires the implementation of the PASE platform with statistical analysis plug-ins for this type of data. The main feature of these statistical data is the repeated measurements: the experiment was repeated on 5 microarrays and all studied polysaccharides are replicated 3 times on each microarray. We are no longer in the classical case of globally independent data, we only have independence at inter-subjects and intra-subject levels. We propose mixed models for data normalization and representation of subjects by the empirical cumulative distribution function. The use of the Kolmogorov-Smirnov statistic appears natural in this context and we study its behavior in the classification algorithms like hierarchical classification and k-means. The choice of the number of clusters and the number of repetitions needed for a robust classification are discussed in detail. The robustness of this methodology is measured by simulations and applied to HGF-MET data. The results helped the biologists and chemists from the Institute of Biology of Lille to choose the best polysaccharides in tests conducted by them. Some of these results also confirmed the intuition of the researchers. The R scripts implementing this methodology are integrated into the platform PASE. The use of functional data analysis on such data is part of the immediate future work. Classification des données répétées Mesures répétées Fonction de répartition Classification hiérarchique Clustering Cumulative distribution fonction K-means

Search results