Global ETD Search

341	Tipologia dos sistemas de produção de leite no município de alegrete, rs, com base nos índices produtivos Silva, Caroline Alvares January 2017 (has links) Submitted by Marcos Anselmo (marcos.anselmo@unipampa.edu.br) on 2017-06-09T17:56:41Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) CAROLINE ALVARES SILVA.pdf: 1028707 bytes, checksum: a2d589df5106df1027902b88897041a3 (MD5) / Approved for entry into archive by Marcos Anselmo (marcos.anselmo@unipampa.edu.br) on 2017-06-09T17:57:00Z (GMT) No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) CAROLINE ALVARES SILVA.pdf: 1028707 bytes, checksum: a2d589df5106df1027902b88897041a3 (MD5) / Made available in DSpace on 2017-06-09T17:57:00Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) CAROLINE ALVARES SILVA.pdf: 1028707 bytes, checksum: a2d589df5106df1027902b88897041a3 (MD5) Previous issue date: 2017 / Com o avanço da ciência, diversas informações técnicas e tecnologias disponíveis para a atividade leiteira são geradas, todavia, não aplicáveis a todos os sistemas de produção. A recomendação das tecnologias e informações mais aplicáveis a uma determinada região ou propriedade é um dos grandes desafios enfrentados por quem trabalha no setor leiteiro com essa missão. Nesse cenário, estudos que mensurem as características produtivas permitem uma visão sistêmica dos sistemas de produção pecuários, a qual contribui para a orientação de profissionais que atuam nas ciências agrárias e sociais, norteando decisões produtivas locais ou de políticas públicas e privadas voltadas ao sistema agroindustrial. Com o objetivo de tipificar os sistemas de produção, através da caracterização dos perfis produtivos das propriedades que desenvolvem a atividade leiteira no município de Alegrete no Estado do Rio Grande do Sul, desenvolveu-se esse trabalho. O estudo foi conduzido em 43 propriedades distribuídas em 22 localidades do município. Como critério de representatividade das propriedades, foi priorizado o volume diário de leite, onde os sistemas de produção foram ordenados de acordo à produtividade. A coleta de dados referentes aos sistemas de produção foi realizada através de visitas às propriedades, utilizando um questionário guia semiestruturado, abordando informações sobre os dados cadastrais, caracterização do proprietário e da propriedade rural, da produção leiteira e do rebanho, manejo nutricional, manejo de ordenha, manejo reprodutivo, controle sanitário e finalizava abordando as estratégias de comercialização do leite. Os dados obtidos com os questionários foram tabulados e com o auxílio do IBM SPSS Statistics 20.0 software, por meio da estatística multivariada foram submetidos à análise de componentes principais (ACP) e análise de clusters hierárquicos (CHA), para dividir as 43 unidades de produção em grupos homogêneos. As variáveis estudadas foram sumarizadas por meio da ACP em dois componentes principais (1 e 2), os quais explicaram 71,531% da variância explicada. A partir da análise de classificação hierárquica, o conjunto de dados das 43 propriedades estudadas foi reduzido em seis grupos (G1, G2, G3, G4, G5 e G6). Os quadrantes obtidos a partir da inserção dos eixos dos componentes principais 1 e 2 permitiram a interpretação dos grupos de sistemas, de acordo com as características relacionadas à produção de leite. Os aspectos produtivos que definem as características dos sistemas de produção de leite no município foram relacionados 8 com a estrutura do rebanho, área de pastagem, produção diária, critérios de descarte e manejo de ordenha, sugerindo que as ações de assistência técnica e extensão rural nos sistemas de produção leiteiros no município de Alegrete devem ser direcionadas de acordo com os gargalos de cada sistema. / As the science advances, many technical and technological informations available for dairy activity production are generated, however, not all applicable to all production systems. The recommendation of technologies and information more applicable to a particular region or property is one of the great challenges faced by those who work in the dairy sector with this task. In this scenario, studies that measure the productive characteristics allow a systemic view of livestock production systems, which contributes to the orientation of professionals working in the agrarian and social sciences, orienting local productive decisions or public and private policies focused on the agroindustrial system. With the intention of typify the production systems, through the characterization of the productive profiles of the properties that develop the milk activity in the city of Alegrete in the state of Rio Grande do Sul, this study was developed. The study was conducted in 43 properties that are distributed in 22 localities of the municipality. As a criterion of representativeness of the properties, the daily milk volume was prioritized, where the production systems were ordered according to productivity. The data collection was made by visiting the properties, using a semi-structured questionnaire guide, informations about registration data, characteristics of the owner and of the rural property, dairy production and cattle, nutritional management, milking management, reproductive management, sanitary control and concluded by approaching the milk marketing strategies. The obtained data by the questionnaires were tabulated and by the IBM SPSS Statistics 20.0 software, using the multivariate statistic submitted to the main component analysis (MCA) and Hierarchical Cluster Analysis (CHA) to divide the 43 units of production into homogeneous groups. As the studied variables were summarized through the MCA in two main components (1 and 2), which explained 71.531% of the variance explained. From the hierarchical classification analysis, the dataset of the 43 properties studied was reduced in six groups (G1, G2, G3, G4, G5 and G6). The results obtained from the insertion of the axes of the main components 1 and 2 allowed the interpretation of the groups of systems, according to the characteristics related to milk production. The productive aspects that define the characteristics of milk production systems in the studied area were related to the structure of the cattle, pasture area, daily production, disposal criteria and milking management, suggesting that the technical assistance and rural extension actions in the dairy’s system production in the municipality of Alegrete should be directed according to the bottlenecks of each system. CNPQ::CIENCIAS AGRARIAS Dairy production Cluster analysis Animal science Ciência animal Atividade leiteira Análise de cluster
342	Definições de caso e classificação da gravidade do dengue e suas implicações no aprimoramento da vigilância e de intervenções em Saúde Pública / Case Definitions and Classification of the Severity of dengue and its Implications in Improving Surveillance and Public Health Interventions Quijano, Fredi Alexander Diaz 28 September 2011 (has links) Objetivos: Formular uma definição de caso suspeito e de caso provável de dengue e uma classificação da sua gravidade com a finalidade de aprimorar seus indicadores de validade, conferir maior consistência aos dados da vigilância e subsidiar condutas clínicas. Metodologia: Trata-se de estudo observacional, analítico, com coleta prospectiva de dados, desenvolvido na área metropolitana de Bucaramanga (Colômbia), abrangendo pacientes recrutados entre 2003 e 2008, com síndrome febril aguda de origem desconhecida (SFA-OD), definida como febre de início recente (menos de uma semana) de origem não determinada clínicamente. As variáveis de interesse foram as demográficas, relativas a aspectos clínicos (sintomas, sinais e evolução) e laboratoriais (valores de leucócitos, plaquetas e o hematócrito). A existência de associação entre o dengue (variável dependente) e as variáveis independentes foi estimada por meio das odds ratio não ajustadas e ajustadas mediante análise de regressão logística não condicional. Por meio da análise de cluster, no subgrupo de pacientes com dengue buscamos identificar o agrupamento de pacientes com parâmetros similares de gravidade. Resultados: Foram incluídos e seguidos 1.698 pacientes com SFA-OD, entre os quais foram identificados 545 pacientes com dengue com idades entre os quatro e 85 anos. Inicialmente, a partir da análise de cluster nos casos de dengue, obtiveram-se três grupos que foram classificados em três níveis de gravidade: leve, moderada e grave, os quais estiveram relacionados com a incidência de hospitalização (0,8 por cento , 11,7 por cento e 30,5 por cento , respectivamente) e com outras variáveis como a duração da doença e alterações em alguns biomarcadores. Posteriormente, ao comparar os casos de dengue com os SFA-OD de outras etiologias, obteve-se um modelo multivariado incluindo os níveis de leucócitos e plaquetas e os seguintes sintomas: exantema, tosse e rinorréia, e sinais de prurido, hiperemia conjuntival e dor à palpação abdominal. Este modelo foi traduzido em uma escala de diagnóstico que mostrou uma área abaixo da curva ROC de 83,3 por cento para a previsão de dengue (IC95 por cento : 81 por cento - 85,5 por cento ). Essa escala foi utilizada para propor definições de caso suspeito e provável de dengue. Conclusão: As definições de caso e classificação de gravidade, propostas neste estudo, estão baseadas em uma análise de dados clínicos de pacientes de área endêmica. Portanto, esperamos que ajudem a um melhor acompanhamento das tendências do dengue, assim como, à identificação de grupos e fatores de risco para subsidiar intervenções de saúde pública. Por outro lado, sua aplicação poderia melhorar o prognóstico das suas formas graves, ao contribuir à oportuna identificação das complicações / Objectives: To develop a case definition of suspect and probable cases of dengue and a classification of its severity in order to improve their validity indicators, giving greater consistency to surveillance information and support clinical decisions. Methodology: This is a observational study with prospective data collection, developed in the metropolitan area of Bucaramanga (Colombia). This study included patients with acute febrile syndrome of unknown origin (AFS-UO), defined as fever of recent onset (less than a week) and clinically undetermined origin. Patients were enrolled between 2003 and 2008. The variables of interest were the demographic, clinical aspects (signs, symptoms and treatment) and laboratory values (leukocytes, platelets and hematocrit). The existence of an association between dengue (dependent variable) and independent variables was estimated by the odds ratio and adjusted by analysis of logistic regression. Through cluster analysis, the subgroup of dengue patients was evaluated in order to identify the groups with similar expression and magnitude of severity. Results: 1.698 patients with AFS-UO were included, of which, 545 were co nfirmed as dengue cases whose age ranged between 4 and 85 years old. Initially, from the cluster analysis in cases of dengue, we obtained three groups that were adapted for three levels of severity: mild, moderate and severe, which were related to the incidence of hospitalization (0,8 per cent , 11,7 per cent and 30,5 per cent , respectively) and other variables such as duration of disease and changes in some biomarkers. Later, when comparing the dengue cases with other etiologies, we obtained a multivariate model including the levels of leukocytes and platelets, symptoms of rash, itching, rhinorrhea and cough, and signs of conjunctival injection and pain on abdominal palpation. This model was translated to a diagnostic score that revealed an area under the ROC curve of 83.3 per cent for the prediction of dengue infection (95 per cent CI: 81 per cent - 85.5 per cent ). This scale was used to propose definitions of probable and suspected cases of dengue. Conclusion: The case definitions and classification of severity proposed in this study are based on an analysis of clinical data of patients from endemic areas. We hope that these surveillance tools contribute to better monitoring of trends of dengue, as well as the identification of risk groups to support public health interventions. Moreover, its application could improve the prognosis of severe forms through contributing to the early identification of complications Análise Cluster Case Definition Cluster Analysis Definição de Caso Dengue Dengue Gravidade Severity
343	Optimal Bayesian estimators for latent variable cluster models Rastelli, Riccardo, Friel, Nial 11 1900 (has links) (PDF) In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden Markov models and block models for networks. However, due to the categorical nature of the clustering variables and the lack of scalable algorithms, summary tools that can interpret such samples are not available. We adopt a Bayesian decision theoretical approach to define an optimality criterion for clusterings and propose a fast and context-independent greedy algorithm to find the best allocations. One important facet of our approach is that the optimal number of groups is automatically selected, thereby solving the clustering and the model-choice problems at the same time. We consider several loss functions to compare partitions and show that our approach can accommodate a wide range of cases. Finally, we illustrate our approach on both artificial and real datasets for three different clustering models: Gaussian mixtures, stochastic block models and latent block models for networks.
344	Detecting students who are conducting inquiry Without Thinking Fastidiously (WTF) in the Context of Microworld Learning Environments Wixon, Michael 09 April 2013 (has links) In recent years, there has been increased interest and research on identifying the various ways that students can deviate from expected or desired patterns while using educational software. This includes research on gaming the system, player transformation, haphazard inquiry, and failure to use key features of the learning system. Detection of these sorts of behaviors has helped researchers to better understand these behaviors, thus allowing software designers to develop interventions that can remediate them and/or reduce their negative impacts on student learning. This work addresses two types of student disengagement: carelessness and a behavior we term WTF (â€œWithout Thinking Fastidiouslyâ€�) behavior. Carelessness is defined as not demonstrating a skill despite knowing it; we measured carelessness using a machine learned model. In WTF behavior, the student is interacting with the software, but their actions appear to have no relationship to the intended learning task. We discuss the detector development process, validate the detectors with human labels of the behavior, and discuss implications for understanding how and why students conduct inquiry without thinking fastidiously while learning in science inquiry microworlds. Following this work we explore the relationship between student learner characteristics and the aforementioned disengaged behaviors carelessness and WTF. Our goal was to develop a deeper understanding of which learner characteristics correlate to carelessness or WTF behavior. Our work examines three alternative methods for predicting carelessness and WTF behaviors from learner characteristics: simple correlations, k-means clustering, and decision tree rule learners. Automated Detectors Cluster Analysis Decision Tree Rule Learners Machine Learning Disengaged Behavior Learner Characteristics Science Inquiry
345	Avaliação crítica do revestimento de cromo duro em cilindros de laminação a frio. / Critical evaluation of hard chromium coating over cold mill work rolls. Antonio Fabiano de Oliveira 22 November 2018 (has links) O presente trabalho analisou o comportamento tribológico do revestimento superficial cromo duro, aplicado em cilindros de laminação e sua influência durante o processo de laminação a frio, para ligas de aço, cobre e alumínio. Iniciou-se o processo com a análise dos mecanismos de desgaste que ocorrem em cilindros de laminação. Em função das dimensões dos cilindros de laminação, foram produzidas réplicas obtidas em vários cilindros de laminação em diferentes empresas, antes e após a campanha dos cilindros de laminação. Consideramos no processo de amostragem, 3 tipos de acabamento do cilindro: texturizado, jateado e retificado. E ainda, cilindro sem e com revestimento de cromo. A estrutura da Texturização Shot Blaster (SBT) não atende todos os requisitos relacionados ao produto acabado. A camada de cromo duro, aplicado sobre estes cilindros de laminação SBT, melhora a qualidade da superfície da chapa, sendo de interesse para aquelas siderúrgicas que não têm outra alternativa às novas estruturas de superfície, podendo ser obtidas pelos métodos: Electro Discharge Texturing (EDT), Laser Texturing (LT), Electro Beam Texturing (EBT). A função das superfícies geradas, obtidas por estes métodos diferentes vai influenciar as propriedades tribológicas durante os processos de laminação e deformação de chapa. Por outro lado, pode aumentar o custo final do produto ou exigem grandes investimentos para obtenção de tais métodos. / The present work analyzed the tribological behavior of the hard chrome surface coating applied in rolling cylinders and their influence during the cold rolling process for steel, copper and aluminum alloys. The process started by analyzing the wear mechanisms that occur in work rolls. In function on the dimensions of the work rolls, replicas were obtained from several rolls and in different companies, before and after the roll chance. We consider in the sampling process, three types of work rolls finish: texturing, shot blasted and rectified. And still, work rolls without and with chrome coating. The Texture Shot Blaster (SBT) structure is not available in all respects related to the finished product. The hard chromium layer applied to these SBT rollers improves the surface quality of the sheet and is of interest to steel parts that are not of good capacity: EDT Textures, Laser texturing (LT), electrical beam texturization (EBT). The function of the generated surfaces obtained by these different methods is going to influence the tribological properties during the subsequent forming processes. On the other hand, they can increase the product final cost or require large investments to obtain such methods. Cromagem Laminação Revestimentos Rugosidade 3D Texturização 3D finishing Cluster analysis Deposit chrome Rolling Tribology
346	Problemas de otimização na engenharia de produção e transportes Gerchman, Marcos January 2016 (has links) Este trabalho tem como objetivo solucionar problemas complexos em diferentes segmentos da Engenharia de Produção e Transporte a partir da utilização de técnicas de otimização. São consideradas as áreas de sistemas de saúde, transportes e análise sensorial, envolvendo problemas de formação de grade de horários e análise de clusters. De forma específica, as abordagens objetivam: (i) em relação ao setor hospitalar, alocar especialidades cirúrgicas em uma grade de horários de um hospital de modo a minimizar a variância do tempo pós-operatório; (ii) quanto à análise sensorial, desenvolver um índice capaz de identificar painelistas que necessitam de treinamento utilizando conceitos de análise de clusters; (iii) no setor aeroportuário, identificar aeroportos com baixa capacidade preditiva de demanda e relacioná-los com suas características físicas, a partir da análise de clusters. Em todos os problemas abordados, as soluções envolvendo métodos de otimização se mostraram adequadas, com resultados satisfatórios. / This study aims to solve complex problems in different segments of Production Engineering and Transportation using optimization techniques. Different areas are considered, such as the areas of health systems, transport and sensory analysis, involving the timetable scheduling problem and cluster analysis. Specifically, this works aims to: (i) in relation to the hospital sector, allocate surgical specialties in a timetable in order to minimize the variance of postoperative time; (ii) for the sensory analysis, develop an index able to identify panelists who require training, using concepts of cluster analysis; (iii) in the airport sector, identify airports with low predictive capacity of demand and relate them to their physical characteristics, using cluster analysis. In all addressed problems, solutions involving optimization methods were adequate, with satisfactory results. Engenharia de produção Engenharia de transportes Programação matemática Optimization Cluster analysis Timetable scheduling
347	An investigation of cluster analysis techniques as a means of structuring specifications in the design of complex systems Holden, Timothy Aloysius January 1978 (has links) Thesis (Ocean E.)--Massachusetts Institute of Technology, Dept. of Ocean Engineering; and, (M.S.)--Massachusetts Institute of Technology Sloan School of Management, 1978. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Bibliography: leaves 153-156. / by Timothy A. Holden. / Ocean E. / M.S. Sloan School of Management Ocean Engineering Operating systems (Computers) Cluster analysis Engineering design Data processing
348	Interaction-Based Learning for High-Dimensional Data with Continuous Predictors Huang, Chien-Hsun January 2014 (has links) High-dimensional data, such as that relating to gene expression in microarray experiments, may contain substantial amount of useful information to be explored. However, the information, relevant variables and their joint interactions are usually diluted by noise due to a large number of non-informative variables. Consequently, variable selection plays a pivotal role for learning in high dimensional problems. Most of the traditional feature selection methods, such as Pearson's correlation between response and predictors, stepwise linear regressions and LASSO are among the popular linear methods. These methods are effective in identifying linear marginal effect but are limited in detecting non-linear or higher order interaction effects. It is well known that epistasis (gene - gene interactions) may play an important role in gene expression where unknown functional forms are difficult to identify. In this thesis, we propose a novel nonparametric measure to first screen and do feature selection based on information from nearest neighborhoods. The method is inspired by Lo and Zheng's earlier work (2002) on detecting interactions for discrete predictors. We apply a backward elimination algorithm based on this measure which leads to the identification of many in influential clusters of variables. Those identified groups of variables can capture both marginal and interactive effects. Second, each identified cluster has the potential to perform predictions and classifications more accurately. We also study procedures how to combine these groups of individual classifiers to form a final predictor. Through simulation and real data analysis, the proposed measure is capable of identifying important variable sets and patterns including higher-order interaction sets. The proposed procedure outperforms existing methods in three different microarray datasets. Moreover, the nonparametric measure is quite flexible and can be easily extended and applied to other areas of high-dimensional data and studies. Epistasis (Genetics) Instrumental variables (Statistics) Nonparametric statistics Cluster analysis Machine learning--Statistical methods Statistics
349	Anatomia da madeira em Sapotaceae / Wood anatomy of the Sapotaceae Melfi, Adriana Donizetti Carvalho Costa 02 March 2007 (has links) Este trabalho apresenta o levantamento anatômico da madeira de 107 espécies distribuídas em 11 gêneros pertencentes à família Sapotaceae (ordem Ericales), do continente americano, dentre os quais Manilkara Adanson, Sideroxylon Linnaeus, Micropholis (Grisebach) Pierre, Chromolucuma Ducke, Sarcaulus Radkolfer, Elaeoluma Baillon, Pouteria Aublet, Chrysophyllum Linnaeus, Ecclinusa Martius, Pradosia Liais e Diploon Cronquist. Na mais recente tentativa de classificação, Pennington (1990, 1991) reconheceu cinco tribos com base, principalmente, em características da flor e da semente. De acordo com o próprio pesquisador, quatro das tribos representariam grupos naturais, provavelmente monofiléticos. Entretanto Swenson & Anderberg (2005), utilizando a análise molecular combinada com características morfológicas, concluíram que os dois maiores gêneros da família, Chrysophyllum e Pouteria são polifiléticos. Desse modo, a família Sapotaceae necessita de uma revisão e inúmeros autores (Record, 1939; Kukachka, 1978a) mencionaram a necessidade de mais informações anatômicas do xilema para complementar e ampliar os estudos taxonômicos e filogenéticos. Portanto, este trabalho tem como objetivo verificar se a anatomia da madeira corrobora a classificação proposta por Pennington em 1990 e a obtenção de informações que possibilitem indicar características anatômicas de valor diagnóstico e estatístico, buscando contribuir com estudos futuros no agrupamento das espécies brasileiras associadas com as africanas e asiáticas da família. A descrição da anatomia da madeira segue a terminologia adotada pelo comitê da Associação Internacional dos Anatomistas de Madeira (IAWA Committee 1989). O resultado obtido pela estatística indica a formação de oito grupos que apresentam similaridades quanto ao tipo de parênquima axial, diâmetro dos vasos, diâmetro das pontoações intervasculares, tipo e localização das pontoações raio-vasculares e das inclusões minerais como, cristais prismáticos, estiloidais e areniformes, assim como dos corpos silicosos. Tais características agrupam gêneros e espécies afins com significância estatística. Conclui-se que a anatomia da madeira apresenta valor de diagnose para os diferentes gêneros, e que, em muitos casos, não corrobora a classificação proposta por Pennington (1990, 1991) na última revisão taxonomica da família. / This work presents the anatomical hoist of the wood of 107 species distributed in 11 belonging kinds to the family Sapotaceae (order Ericales), of the American continent, among the which Manilkara Adanson, Sideroxylon Linnaeus, Micropholis (Grisebach) Pierre, Chromolucuma Ducke, Sarcaulus Radkolfer, Elaeoluma Baillon, Pouteria Aublet, Chrysophyllum Linnaeus, Ecclinusa Martius, Pradosia Liais and Diploon Cronquist. In the most recent attempt of classification, Pennington (1990, 1991) recognized five tribes with base, mainly, in characteristics of the flower and of the seed. According to the own researcher, four of the tribes would represent natural groups, probably monofiléticos. However Swenson & Anderberg (2005), utilizing the molecular analysis combined with characteristics morfológicas, concluded that the two biggest kinds of the family, Chrysophyllum and Pouteria are polifiléticos. Of that way, the family Sapotaceae needs a revision and endless number authors (Record, 1939; Kukachka, 1978a) mentioned the need of more anatomical information of the xilema for complementary and extend the studies taxonômicos and filogenéticos. Therefore, this work has like objective verify itself the anatomy of the wood corroborates the classification proposal by Pennington in 1990 and the obtaining of information that are going to indicate worthy anatomical characteristics diagnosis and statistical, seeking contribute with future studies in the group of the Brazilian species associated with the Africans and Asians of the family. The description of the anatomy of the wood follows the terminology adopted by the committee of the International Association of the Anatomists of Madeira (IAWA Committee 1989). The result obtained by the statistical one indicates the formation of eight groups that present similarities as regards the kind of parênquima axial, diameter of the glasses, diameter of the pontoações intervasculares, kind and location of the pontoações ray-vascular and of the mineral enclosures as, crystals prismáticos, estiloidais and areniformes, as well as of the bodies silicosos. Such characteristics group kinds and related species with significância statistical. I concluded that the anatomy of the wood presents value of diagnose for the different kinds, and that, in many cases, does not corroborate the classification proposal by Pennington (1990, 1991) in the last revision taxonomica of the family. Anatomia da madeira Cluster analysis Espécies neotropicais Grupamento de espécies Neotropical species Sapotaceae Sapotaceae Wood anatomy
350	An investigation into fuzzy clustering quality and speed : fuzzy C-means with effective seeding Stetco, Adrian January 2017 (has links) Cluster analysis, the automatic procedure by which large data sets can be split into similar groups of objects (clusters), has innumerable applications in a wide range of problem domains. Improvements in clustering quality (as captured by internal validation indexes) and speed (number of iterations until cost function convergence), the main focus of this work, have many desirable consequences. They can result, for example, in faster and more precise detection of illness onset based on symptoms or it could provide investors with a rapid detection and visualization of patterns in financial time series and so on. Partitional clustering, one of the most popular ways of doing cluster analysis, can be classified into two main categories: hard (where the clusters discovered are disjoint) and soft (also known as fuzzy; clusters are non-disjoint, or overlapping). In this work we consider how improvements in the speed and solution quality of the soft partitional clustering algorithm Fuzzy C-means (FCM) can be achieved through more careful and informed initialization based on data content. By carefully selecting the cluster centers in a way which disperses the initial cluster centers through the data space, the resulting FCM++ approach samples starting cluster centers during the initialization phase. The cluster centers are well spread in the input space, resulting in both faster convergence times and higher quality solutions. Moreover, we allow the user to specify a parameter indicating how far and apart the cluster centers should be picked in the dataspace right at the beginning of the clustering procedure. We show FCM++'s superior behaviour in both convergence times and quality compared with existing methods, on a wide rangeof artificially generated and real data sets. We consider a case study where we propose a methodology based on FCM++for pattern discovery on synthetic and real world time series data. We discuss a method to utilize both Pearson correlation and Multi-Dimensional Scaling in order to reduce data dimensionality, remove noise and make the dataset easier to interpret and analyse. We show that by using FCM++ we can make an positive impact on the quality (with the Xie Beni index being lower in nine out of ten cases for FCM++) and speed (with on average 6.3 iterations compared with 22.6 iterations) when trying to cluster these lower dimensional, noise reduced, representations of the time series. This methodology provides a clearer picture of the cluster analysis results and helps in detecting similarly behaving time series which could otherwise come from any domain. Further, we investigate the use of Spherical Fuzzy C-Means (SFCM) with the seeding mechanism used for FCM++ on news text data retrieved from a popular British newspaper. The methodology allows us to visualize and group hundreds of news articles based on the topics discussed within. The positive impact made by SFCM++ translates into a faster process (with on average 12.2 iterations compared with the 16.8 needed by the standard SFCM) and a higher quality solution (with the Xie Beni being lower for SFCM++ in seven out of every ten runs). 004

Search results