181 |
Styles in business process modeling: an exploration and a modelPinggera, Jakob, Soffer, Pnina, Fahland, Dirk, Weidlich, Matthias, Zugal, Stefan, Weber, Barbara, Reijers, Hajo A., Mendling, Jan 07 1900 (has links) (PDF)
Business process models are an important means to design, analyze, implement, and control business processes. As with every type of conceptual model, a business process model has to meet certain syntactic, semantic, and pragmatic quality requirements to be of value. For many years, such quality aspects were investigated by centering on the properties of the model artifact itself. Only recently, the process of model creation is considered as a factor that influences the resulting model's quality. Our work contributes to this stream of research and presents an explorative analysis of the process of process modeling (PPM). We report on two large-scale modeling sessions involving 115 students. In these sessions, the act of model creation, i.e., the PPM, was automatically recorded. We conducted a cluster analysis on this data and identified three distinct styles of modeling. Further, we investigated how both task- and modeler-specific factors influence particular aspects of those modeling styles. Based thereupon, we propose a model that captures our insights. It lays the foundations for future research that may unveil how high-quality process models can be established through better modeling support and modeling instruction. (authors' abstract)
|
182 |
A search for solar dark matter with the IceCube neutrino detector : Advances in data treatment and analysis techniqueZoll, Marcel Christian Robert January 2016 (has links)
There is compelling observational evidence for the existence of dark matter in the Universe, including our own Galaxy, which could possibly consist of weakly interacting massive particles (WIMPs) not contained in the standard model (SM) of particle physics. WIMPs may get gravitationally trapped inside heavy celestial bodies of ordinary matter. The Sun is a nearby candidate for such a capture process which is driven by the scattering of WIMPs on its nuclei. Forming an over-density at the Sun's core the WIMPs would self-annihilate yielding energetic neutrinos, which leave the Sun and can be detected in experiments on Earth. The cubic-kilometer sized IceCube neutrino observatory, constructed in the clear glacial ice at the Amundsen-Scott South Pole Station in Antarctica offers an excellent opportunity to search for this striking signal. This thesis is dedicated to the search for these solar dark matter signatures in muon neutrinos from the direction of the Sun. Newly developed techniques based on hit clustering and hit-based vetos allow more accurate reconstruction and identification of events in the detector and thereby a stronger rejection of background. These techniques are also applicable to other IceCube analyses and event filters. In addition, new approaches to the analysis without seasonal cuts lead to improvements in sensitivity especially in the low-energy regime (<=100 GeV), the target of the more densely instrumented DeepCore sub-array. This first analysis of 369 days of data recorded with the completed detector array of 86 strings revealed no significant excess above the expected background of atmospheric neutrinos. This allows us to set strong limits on the annihilation rate of WIMPs in the Sun for the models probed in this analysis. The IceCube limits for the spin-independent WIMP-proton scattering cross-section are the most stringent ones for WIMP masses above 100 GeV. / IceCube
|
183 |
Identifying Profiles of Resilience among a High-Risk Adolescent PopulationWright, Anna W 01 January 2016 (has links)
The purpose of the present study was to determine whether distinct patterns of adolescent adjustment existed when four domains of functioning were considered. The study included a sample of 299 high-risk urban adolescents, predominantly African American, ages 9-16 and their maternal caregivers. Cluster analysis was used to identify patterns of adjustment. Logistic regression analyses were used to explore whether variations in levels of five theoretically and empirically supported protective factors predicted cluster membership. A four-cluster model was determined to best fit the data. Higher rates of goal directedness and anger regulation coping predicted membership within the highest functioning cluster over a cluster demonstrating high externalizing problem behaviors, and neighborhood cohesion predicted highest functioning cluster membership over a cluster demonstrating high internalizing symptoms. Findings suggest that within a high-risk population of adolescents, significant variability in functioning will exist. The presence or absence of specific protective factors predicts developmental outcomes.
|
184 |
Zhluková analýza dynamických dát / Clustering of dynamic dataMarko, Michal January 2011 (has links)
Title: Cluster analysis of dynamic data Author: Bc. Michal Marko Department: Department of Software and Computer Science Education Supervisor: RNDr. František Mráz, CSc. Supervisor's e-mail address: Frantisek.Mraz@mff.cuni.cz Abstract: The mail goal of this thesis is to choose or eventually to propose own modifications to some of the cluster analysis methods in order to observe the progress of dynamic data and its clusters. The chosen ones are applied to the real data. The dynamic data denotes series of information that is created periodically over the time describing the same characteristics of the given set of data objects. When applied to such data, the problem of classic clustering algorithm is the lack of coherence between the results of particular data set from the series which can be illustrated via application to our artificial data. We discuss the idea of proposed modifications and compare the progress of the methods based on them. In order to be able to use our modified methods on the real data, we examine their applicability to the multidimensional artificial data. Due to the complications caused by multidimensional space we develop our own validation criterion. Once the methods are approved for use in such space, we apply our modified methods on the real data, followed by the visualization and...
|
185 |
An improved unsupervised modeling methodology for detecting fraud in vendor payment transactionsRouillard, Gregory W. 06 1900 (has links)
Approved for public release; distribution is unlimited. / (DFAS) vendor payment transactions through Unsupervised Modeling (cluster analysis). Clementine Data Mining software is used to construct unsupervised models of vendor payment data using the K-Means, Two Step, and Kohonen algorithms. Cluster validation techniques are applied to select the most useful model of each type, which are then combined to select candidate records for physical examination by a DFAS auditor. Our unsupervised modeling technique utilizes all the available valid transaction data, much of which is not admitted under the current supervised modeling procedure. Our procedure standardizes and provides rigor to the existing unsupervised modeling methodology at DFAS. Additionally, we demonstrate a new clustering approach called Tree Clustering, which uses Classification and Regression Trees to cluster data with automatic variable selection and scaling. A Recommended SOP for Unsupervised Modeling, detailed explanation of all Clementine procedures, and implementation of the Tree Clustering algorithm are included as appendices. / Major, United States Marine Corps
|
186 |
Seleção de grupos a partir de hierarquias: uma modelagem baseada em grafos / Clusters selection from hierarchies: a graph-based modelAnjos, Francisco de Assis Rodrigues dos 28 June 2018 (has links)
A análise de agrupamento de dados é uma tarefa fundamental em mineração de dados e aprendizagem de máquina. Ela tem por objetivo encontrar um conjunto finito de categorias que evidencie as relações entre os objetos (registros, instâncias, observações, exemplos) de um conjunto de dados de interesse. Os algoritmos de agrupamento podem ser divididos em particionais e hierárquicos. Uma das vantagens dos algoritmos hierárquicos é conseguir representar agrupamentos em diferentes níveis de granularidade e ainda serem capazes de produzir partições planas como aquelas produzidas pelos algoritmos particionais, mas para isso é necessário que seja realizado um corte (por exemplo horizontal) sobre o dendrograma ou hierarquia dos grupos. A escolha de como realizar esse corte é um problema clássico que vem sendo investigado há décadas. Mais recentemente, este problema tem ganho especial importância no contexto de algoritmos hierárquicos baseados em densidade, pois somente estratégias mais sofisticadas de corte, em particular cortes não-horizontais denominados cortes locais (ao invés de globais) conseguem selecionar grupos de densidades diferentes para compor a solução final. Entre as principais vantagens dos algoritmos baseados em densidade está sua robustez à interferência de dados anômalos, que são detectados e deixados de fora da partição final, rotulados como ruído, além da capacidade de detectar clusters de formas arbitrárias. O objetivo deste trabalho foi adaptar uma variante da medida da Modularidade, utilizada amplamente na área de detecção de comunidades em redes complexas, para que esta possa ser aplicada ao problema de corte local de hierarquias de agrupamento. Os resultados obtidos mostraram que essa adaptação da modularidade pode ser uma alternativa competitiva para a medida de estabilidade utilizada originalmente pelo algoritmo estado-da-arte em agrupamento de dados baseado em densidade, HDBSCAN*. / Cluster Analysis is a fundamental task in Data Mining and Machine Learning. It aims to find a finite set of categories that evidences the relationships between the objects (records, instances, observations, examples) of a data set of interest. Clustering algorithms can be divided into partitional and hierarchical. One of the advantages of hierarchical algorithms is to be able to represent clusters at different levels of granularity while being able to produce flat partitions like those produced by partitional algorithms. To achieve this, it is necessary to perform a cut (for example horizontal) through the dendrogram or cluster tree. How to perform this cut is a classic problem that has been investigated for decades. More recently, this problem has gained special importance in the context of density-based hierarchical algorithms, since only more sophisticated cutting strategies, in particular nonhorizontal cuts (instead of global ones) are able to select clusters with different densities to compose the final solution. Among the main advantages of density-based algorithms is their robustness to noise and their capability to detect clusters of arbitrary shape. The objective of this work was to adapt a variant of the Q Modularity measure, widely used in the realm of community detection in complex networks, so that it can be applied to the problem of local cuts through cluster hierarchies. The results show that the proposed measure can be a competitive alternative to the stability measure, originally used by the state-of-the-art density-based clustering algorithm HDBSCAN*.
|
187 |
Image retrieval using visual attentionUnknown Date (has links) (PDF)
The retrieval of digital images is hindered by the semantic gap. The semantic gap is the disparity between a user's high-level interpretation of an image and the information that can be extracted from an image's physical properties. Content based image retrieval systems are particularly vulnerable to the semantic gap due to their reliance on low-level visual features for describing image content. The semantic gap can be narrowed by including high-level, user-generated information. High-level descriptions of images are more capable of capturing the semantic meaning of image content, but it is not always practical to collect this information. Thus, both content-based and human-generated information is considered in this work. A content-based method of retrieving images using a computational model of visual attention was proposed, implemented, and evaluated. This work is based on a study of contemporary research in the field of vision science, particularly computational models of bottom-up visual attention. The use of computational models of visual attention to detect salient by design regions of interest in images is investigated. The method is then refined to detect objects of interest in broad image databases that are not necessarily salient by design. An interface for image retrieval, organization, and annotation that is compatible with the attention-based retrieval method has also been implemented. It incorporates the ability to simultaneously execute querying by image content, keyword, and collaborative filtering. The user is central to the design and evaluation of the system. A game was developed to evaluate the entire system, which includes the user, the user interface, and retrieval methods. / by Liam M. Mayron. / Thesis (Ph.D.)--Florida Atlantic University, 2008. / Includes bibliography. / Electronic reproduction. Boca Raton, FL : 2008 Mode of access: World Wide Web.
|
188 |
Uso do teste de Scott-Knott e da análise de agrupamentos, na obtenção de grupos de locais para experimentos com cana-de-açúcar / Scott-Knott test and cluster analysis use in the obtainment of placement groups for sugar cane experimentsSilva, Cristiane Mariana Rodrigues da 15 February 2008 (has links)
O Centro de Tecnologia Canavieira (CTC), situado na cidade de Piracicaba, é uma associação civil de direito privado, criada em agosto de 2004, com o objetivo de realizar pesquisa e desenvolvimento em novas tecnologias para aplicação nas atividades agrícolas, logísticas e industriais dos setores canavieiro e sucroalcooleiro e desenvolver novas variedades de cana-de-açúcar. Há 30 anos, são feitos experimentos, principalmente no estado de São Paulo onde se concentra a maior parte dessas unidades produtoras associadas. No ano de 2004 foram instalados ensaios em 11 destas Unidades Experimentais dentro do estado de São Paulo, e há a necessidade de se saber se é possível a redução deste número, visando aos aspectos econômicos. Se se detectarem grupos de Unidades com dados muito similares, pode-se reduzir o número destas, reduzindo-se, conseqüentemente, o custo dessas pesquisas, e é através do teste estatístico de Scott-Knott e da Análise de Agrupamento, que essa similaridade será comprovada. Este trabalho tem por objetivo, aplicar as técnicas da Análise de Agrupamento (\"Cluster Analisys\") e o teste de Scott-Knott na identificação da existência de grupos de Unidades Industriais, visando à diminuição do número de experimentos do Centro de Tecnologia Canavieira (CTC) e, por conseguinte, visando ao menor custo operacional. Os métodos de comparação múltipla baseados em análise de agrupamento univariada, têm por objetivo separar as médias de tratamentos que, para esse estudo foram médias de locais, em grupos homogêneos, pela minimização da variação dentro, e maximização entre grupos e um desses procedimentos é o teste de Scott-Knott. A análise de agrupamento permite classificar indivíduos ou objetos em subgrupos excludentes, em que se pretende, de uma forma geral, maximizar a homogeneidade de objetos ou indivíduos dentro de grupos e maximizar a heterogeneidade entre os grupos, sendo que a representação desses grupos é feita num gráfico com uma estrutura de árvore denominado dendrograma. O teste de Scott- Knott, é um teste para Análise Univariada, portanto, mais indicado quando se tem apenas uma variável em estudo, sendo que a variável usada foi TPH5C, por se tratar de uma variável calculada a partir das variáveis POL, TCH e FIB. A Análise de Agrupamento, através do Método de Ligação das Médias, mostrou-se mais confiável, pois possuía-se, nesse estudo, três variáveis para análise, que foram: TCH (tonelada de cana por hectare), POL (porcentagem de açúcar), e FIB (porcentagem de fibra). Comparando-se o teste de Scott-Knott com a Análise de Agrupamentos, confirmam-se os agrupamentos entre os locais L020 e L076 e os locais L045 e L006. Conclui-se, portanto, que podem ser eliminadas dos experimentos duas unidades experimentais, optando por L020 (Ribeirão Preto) ou L076 (Assis), e L045 (Ribeirão Preto) ou L006 (Região de Jaú), ficando essa escolha, a critério do pesquisador, podendo assim, reduzir seu custo operacional. / The Centre of Sugar Cane Technology (CTC), placed at the city of Piracicaba, is a private right civilian association, created in August of 2004, aiming to research and develop new technologies with application in agricultural and logistic activities, as well as industrial activities related to sugar and alcohol sectors, such as the development of new sugar cane varieties. Experiments have been made for 30 years, mainly at the state of São Paulo, where most of the associated unities of production are located. At the year of 2004, experiments were installed in 11 of those Experimental Unities within the state of São Paulo, and there is the need to know if it is possible the reduction of this number, aiming at the economical aspects. If it were detected groups of Unities with very similar data, it would be possible to eliminate some of these Unities, diminishing, consequently, the researches cost, and it is through the Scott-Knott statistical test and the Cluster Analysis that this similarity may be corroborated. This work aims to apply the Cluster Analysis techniques and the Scott-Knott test to the identification of the existence of groups of Industrial Unities, aiming at the reduction of the CTC\'s experiments number and, consequently, aiming at the smaller operational cost. The methods of multiple comparison based on univariate cluster analysis aim to split the treatments means in homogenous groups, for this work were used the placement groups means, through the minimization of the variation within, and the maximization amongst groups; one of these methods is the Scott-Knott test. The cluster analysis allows the classification of individual or objects in excludent groups; again, the idea is to maximize the homogeneity of objects or individual within groups and to maximize the heterogeneity amongst groups, being that these groups are represented by a tree structured graphic by the name of dendogram. The Scott-Knott test is a Univariate Analysis test, therefore is appropriate for studies with only one variable of interest. The Cluster Analysis, through the Linkage of Means Method, proved to be more reliable, for, in this case, there were three variables of interest for analysis, and these were: TCH (weight, in tons, of sugar cane by hectare), POL (percentage of sugar) and FIB (percentage of fiber). By comparing the Scott-Knott test with the Cluster Analysis, two pairs of clustering are confirmed, these are: placements L020 and L076; and L045 and L006. Therefore it is concluded that two of the experimental unities may be removed, one can choose from L020 (Ribeirão Preto) or L076 (Assis), and L045 (Ribeirão Preto) or L006 (Região de Jaú), the choice lies with the researcher, and it can diminish the operational cost. Keywords: Cluster Analysis; Sugar Cane
|
189 |
Bayesian decision theoretical framework for clustering. / CUHK electronic theses & dissertations collectionJanuary 2011 (has links)
By the Bayesian decision theoretical view, we propose several extensions of current popular graph based methods. Several data-dependent graph construction approaches are proposed by adopting more flexible density estimators. The advantage of these approaches is that the parameters for constructing the graph can be estimated from the data. The constructed graph explores the intrinsic distribution of the data. As a result, the algorithm is more robust. It can obtain good performance constantly across different data sets. Using the flexible density models can result in directed graphs which cannot be handled by traditional graph partitioning algorithms. To tackle this problem, we propose general algorithms for graph partitioning, which can deal with both undirected and directed graphs in a unified way. / In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. / We prove that the spectral clustering (to be specific, the normalized cut) algorithm can be derived from this framework. Especially, it can be shown that the normalized cut is a nonparametric clustering method which adopts a kernel density estimator as its density model and tries to minimize the expected classification error or Bayes risk. / Chen, Mo. / Adviser: Xiaoou Tang. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 96-104). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
|
190 |
Applications of clustering analysis to signal processing problems.January 1999 (has links)
Wing-Keung Sim. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 109-114). / Abstracts in English and Chinese. / Abstract --- p.2 / 摘要 --- p.3 / Acknowledgements --- p.4 / Contents --- p.5 / List of Figures --- p.8 / List of Tables --- p.9 / Introductions --- p.10 / Chapter 1.1 --- Motivation & Aims --- p.10 / Chapter 1.2 --- Contributions --- p.11 / Chapter 1.3 --- Structure of Thesis --- p.11 / Electrophysiological Spike Discrimination --- p.13 / Chapter 2.1 --- Introduction --- p.13 / Chapter 2.2 --- Cellular Physiology --- p.13 / Chapter 2.2.1 --- Action Potential --- p.13 / Chapter 2.2.2 --- Recording of Spikes Activities --- p.15 / Chapter 2.2.3 --- Demultiplexing of Multi-Neuron Recordings --- p.17 / Chapter 2.3 --- Application of Clustering for Mixed Spikes Train Separation --- p.17 / Chapter 2.3.1 --- Design Principles for Spike Discrimination Procedures --- p.17 / Chapter 2.3.2 --- Clustering Analysis --- p.18 / Chapter 2.3.3 --- Comparison of Clustering Techniques --- p.19 / Chapter 2.4 --- Literature Review --- p.19 / Chapter 2.4.1 --- Template Spike Matching --- p.19 / Chapter 2.4.2 --- Reduced Feature Matching --- p.20 / Chapter 2.4.3 --- Artificial Neural Networks --- p.21 / Chapter 2.4.4 --- Hardware Implementation --- p.21 / Chapter 2.5 --- Summary --- p.22 / Correlation of Perceived Headphone Sound Quality with Physical Parameters --- p.23 / Chapter 3.1 --- Introduction --- p.23 / Chapter 3.2 --- Sound Quality Evaluation --- p.23 / Chapter 3.3 --- Headphone Characterization --- p.26 / Chapter 3.3.1 --- Frequency Response --- p.26 / Chapter 3.3.2 --- Harmonic Distortion --- p.26 / Chapter 3.3.3 --- Voice-Coil Driver Parameters --- p.27 / Chapter 3.4 --- Statistical Correlation Measurement --- p.29 / Chapter 3.4.1 --- Correlation Coefficient --- p.29 / Chapter 3.4.2 --- t Test for Correlation Coefficients --- p.30 / Chapter 3.5 --- Summary --- p.31 / Algorithms --- p.32 / Chapter 4.1 --- Introduction --- p.32 / Chapter 4.2 --- Principal Component Analysis --- p.32 / Chapter 4.2.1 --- Dimensionality Reduction --- p.32 / Chapter 4.2.2 --- PCA Transformation --- p.33 / Chapter 4.2.3 --- PCA Implementation --- p.36 / Chapter 4.3 --- Traditional Clustering Methods --- p.37 / Chapter 4.3.1 --- Online Template Matching (TM) --- p.37 / Chapter 4.3.2 --- Online Template Matching Implementation --- p.40 / Chapter 4.3.3 --- K-Means Clustering --- p.41 / Chapter 4.3.4 --- K-Means Clustering Implementation --- p.44 / Chapter 4.4 --- Unsupervised Neural Learning --- p.45 / Chapter 4.4.1 --- Neural Network Basics --- p.45 / Chapter 4.4.2 --- Artificial Neural Network Model --- p.46 / Chapter 4.4.3 --- Simple Competitive Learning (SCL) --- p.47 / Chapter 4.4.4 --- SCL Implementation --- p.49 / Chapter 4.4.5 --- Adaptive Resonance Theory Network (ART). --- p.50 / Chapter 4.4.6 --- ART2 Implementation --- p.53 / Chapter 4.6 --- Summary --- p.55 / Experimental Design --- p.57 / Chapter 5.1 --- Introduction --- p.57 / Chapter 5.2 --- Electrophysiological Spike Discrimination --- p.57 / Chapter 5.2.1 --- Experimental Design --- p.57 / Chapter 5.2.2 --- Extracellular Recordings --- p.58 / Chapter 5.2.3 --- PCA Feature Extraction --- p.59 / Chapter 5.2.4 --- Clustering Analysis --- p.59 / Chapter 5.3 --- Correlation of Headphone Sound Quality with physical Parameters --- p.61 / Chapter 5.3.1 --- Experimental Design --- p.61 / Chapter 5.3.2 --- Frequency Response Clustering --- p.62 / Chapter 5.3.3 --- Additional Parameters Measurement --- p.68 / Chapter 5.3.4 --- Listening Tests --- p.68 / Chapter 5.3.5 --- Confirmation Test --- p.69 / Chapter 5.4 --- Summary --- p.70 / Results --- p.71 / Chapter 6.1 --- Introduction --- p.71 / Chapter 6.2 --- Electrophysiological Spike Discrimination: A Comparison of Methods --- p.71 / Chapter 6.2.1 --- Clustering Labeled Spike Data --- p.72 / Chapter 6.2.2 --- Clustering of Unlabeled Data --- p.78 / Chapter 6.2.3 --- Remarks --- p.84 / Chapter 6.3 --- Headphone Sound Quality Control --- p.89 / Chapter 6.3.1 --- Headphones Frequency Response Clustering --- p.89 / Chapter 6.3.2 --- Listening Tests --- p.90 / Chapter 6.3.3 --- Correlation with Measured Parameters --- p.90 / Chapter 6.3.4 --- Confirmation Listening Test --- p.92 / Chapter 6.4 --- Summary --- p.93 / Conclusions --- p.97 / Chapter 7.1 --- Future Work --- p.98 / Chapter 7.1.1 --- Clustering Analysis --- p.98 / Chapter 7.1.2 --- Potential Applications of Clustering Analysis --- p.99 / Chapter 7.2 --- Closing Remarks --- p.100 / Appendix --- p.101 / Chapter A.1 --- Tables of Experimental Results: (Spike Discrimination) --- p.101 / Chapter A.2 --- Tables of Experimental Results: (Headphones Measurement) --- p.104 / Bibliography --- p.109 / Publications --- p.114
|
Page generated in 0.0619 seconds