Spelling suggestions: "subject:"agglomerative clustering"" "subject:"agglomerativ clustering""
1 |
An Ontology-Based Personalized Document Clustering ApproachHuang, Tse-hsiu 05 August 2004 (has links)
With the proliferation of electronic commerce and knowledge economy environments, both persons and organizations increasingly have generated and consumed large amounts of online information, typically available as textual documents. To manage this rapid growth of the number of textual documents, people often use categories or folders to organize their documents. These document grouping behaviors are intentional acts that reflect the persons¡¦ (or organizations¡¦) preferences with regard to semantic coherency, or relevant groupings between subjects. For this thesis, we design and implement an ontology-based personalized document clustering (OnPEC) technique by incorporating both an individual user¡¦s partial clustering and an ontology into the document clustering process. Our use of a target user¡¦s partial clustering supports the personalization of document categorization, whereas our use of the ontology turns document clustering from a feature-based to a concept-based approach. In addition, we combine two hierarchical agglomerative clustering (HAC) approaches (i.e., pre-cluster-based and atomic-based) in our proposed OnPEC technique. Using the clustering effectiveness achieved by a traditional content-based document clustering technique and previously proposed feature-based document clustering (PEC) techniques as performance benchmarks, we find that use of partial clusters improves document clustering effectiveness, as measured by cluster precision and cluster recall. Moreover, for both OnPEC and PEC techniques, the clustering effectiveness of pre-cluster-based HAC methods greatly outperforms that of atomic-based HAC methods.
|
2 |
Preference-Anchored Document clustering Technique for Supporting Effective Knowledge and Document ManagementWang, Shin 03 August 2005 (has links)
Effective knowledge management of proliferating volume of documents within a knowledge repository is vital to knowledge sharing, reuse, and assimilation. In order to facilitate accesses to documents in a knowledge repository, use of a knowledge map to organize these documents represents a prevailing approach. Document clustering techniques typically are employed to produce knowledge maps. However, existing document clustering techniques are not tailored to individuals¡¦ preferences and therefore are unable to facilitate the generation of knowledge maps from various preferential perspectives. In response, we propose the Preference-Anchored Document Clustering (PAC) technique that takes a user¡¦s categorization preference (represented as a list of anchoring terms) into consideration to generate a knowledge map (or a set of document clusters) from this specific preferential perspective. Our empirical evaluation results show that our proposed technique outperforms the traditional content-based document clustering technique in the high cluster precision area. Furthermore, benchmarked with Oracle Categorizer, our proposed technique also achieves better clustering effectiveness in the high cluster precision area. Overall, our evaluation results demonstrate the feasibility and potential superiority of the proposed PAC technique.
|
3 |
Preference-Anchored Document Clustering Technique: Effects of Term Relationships and ThesaurusLin, Hao-hsiang 30 August 2006 (has links)
According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clustering techniques need to be able to take into account users¡¦ categorization preferences and thus can generate document clusters from different preferential perspectives. The Preference-Anchored Document Clustering (PAC) technique was proposed for supporting preference-based document-clustering. Specifically, PAC takes a user¡¦s categorization preference into consideration and subsequently generates a set of document clusters from this specific preferential perspective. In this study, we attempt to investigate two research questions concerning the PAC technique. The first research question investigates ¡§whether the incorporation of the broader-term expansion (i.e., the proposed PAC2 technique in this study) will improve the effectiveness of preference-based document-clustering, whereas the second research question is ¡§whether the use of a statistical-based thesaurus constructed from a larger document corpus will improve the effectiveness of preference-based document-clustering.¡¨ Compared with the effectiveness achieved by PAC, our empirical results show that the proposed PAC2 technique neither improves nor deteriorates the effectiveness of preference-based document-clustering when the complete set of anchoring terms is used. However, when only a partial set of anchoring terms is provided, PAC2 cannot improve and even deteriorate the effectiveness of preference-based document-clustering. As to the second research question, our empirical results suggest the use of a statistical-based thesaurus constructed from a larger document corpus (i.e., the ACM corpus consisting of 14,729 documents) does not improve the effectiveness of PAC and PAC2 for preference-based document-clustering.
|
4 |
Supporting Data Warehouse Design with Data Mining ApproachTsai, Tzu-Chao 06 August 2001 (has links)
Traditional relational database model does not have enough capability to cope with a great deal of data in finite time. To address these requirements, data warehouses and online analytical processing (OLAP) have emerged. Data warehouses improve the productivity of corporate decision makers through consolidation, conversion, transformation, and integration of operational data, and supports online analytical processing (OLAP). The data warehouse design is a complex and knowledge intensive process. It needs to consider not only the structure of the underlying operational databases (source-driven), but also the information requirements of decision makers (user-driven). Past research focused predominately on supporting the source-driven data warehouse design process, but paid less attention to supporting the user-driven data warehouse design process. Thus, the goal of this research is to propose a user-driven data warehouse design support system based on the knowledge discovery approach. Specifically, a Data Warehouse Design Support System was proposed and the generalization hierarchy and generalized star schemas were used as the data warehouse design knowledge. The technique for learning these design knowledge and reasoning upon them were developed. An empirical evaluation study was conducted to validate the effectiveness on the proposed techniques in supporting data warehouse design process. The result of empirical evaluation showed that this technique was useful to support data warehouse design especially on reducing the missing design and enhancing the potentially useful design.
|
5 |
Personalized Document Clustering: Technique Development and Empirical EvaluationWu, Chia-Chen 14 August 2003 (has links)
With the proliferation of an electronic commerce and knowledge economy environment, both organizations and individuals generate and consume a large amount of online information, typically available as textual documents. To manage the ever-increasing volume of documents, organizations and individuals typically organize their documents into categories to facilitate document management and subsequent information access and browsing. However, document grouping behaviors are intentional acts, reflecting individuals¡¦ (or organizations¡¦) preferential perspective on semantic coherency or relevant groupings between subjects. Thus, an effective document clustering needs to address the described preferential perspective on document grouping and support personalized document clustering. In this thesis, we designed and implemented a personalized document clustering approach by incorporating individual¡¦s partial clustering into the document clustering process. Combining two document representation methods (i.e., feature refinement and feature weighting) with two clustering processes (i.e., pre-cluster-based and atomic-based), four personalized document clustering techniques are proposed. Using the clustering effectiveness achieved by a traditional content-based document clustering technique as performance benchmarks, our evaluation results suggest that use of partial clusters would improve the document clustering effectiveness. Moreover, the pre-cluster-based technique outperforms the atomic-based one, and the feature weighting method for document representation achieves a higher clustering effectiveness than the feature refinement method does.
|
6 |
Exploring transcription patterns and regulatory motifs in Arabidopsis thalianaBahirwani, Vishal January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / Doina Caragea / Recent work has shown that bidirectional genes (genes located on opposite strands of DNA, whose transcription start sites are not more than 1000 basepairs apart) are often co-expressed and have similar biological functions. Identification of such genes can be useful in the process of constructing gene regulatory networks. Furthermore, analysis of the intergenic regions corresponding to bidirectional genes can help to identify regulatory elements, such as transcription factor binding sites. Approximately 2500 bidirectional gene pairs have been identified in Arabidopsis thaliana and the corresponding intergenic regions have been shown to be rich in regulatory elements that are essential for the initiation of transcription. Identifying such elements is especially important, as simply searching for known transcription factor binding sites in the promoter of a gene can result in many hits that are not always important for transcription initiation. Encouraged by the findings about the presence of essential regulatory elements in the intergenic regions corresponding to bidirectional genes, in this thesis, we explore a motif-based machine learning approach to identify intergenic regulatory elements. More precisely, we consider the problem of predicting the transcription pattern for pairs of consecutive genes in Arabidopsis thaliana using motifs from AthaMap and PLACE. We use machine learning algorithms to learn models that can predict the direction of transcription for pairs of consecutive genes. To identify the most predictive motifs and, therefore, the most significant regulatory elements, we perform feature selection based on mutual information and feature abstraction based on family or sequence similarity. Preliminary results demonstrate the feasibility of our approach.
|
7 |
Agrupamento de faces em vídeos digitais.MOURA, Eduardo Santiago. 06 June 2018 (has links)
Submitted by Maria Medeiros (maria.dilva1@ufcg.edu.br) on 2018-06-06T11:40:34Z
No. of bitstreams: 1
EDUARDO SANTIAGO MOURA - TESE (PPGCC) 2016.pdf: 4888830 bytes, checksum: b0fd54b306e9a1dfeb9e68ce43716fa2 (MD5) / Made available in DSpace on 2018-06-06T11:40:34Z (GMT). No. of bitstreams: 1
EDUARDO SANTIAGO MOURA - TESE (PPGCC) 2016.pdf: 4888830 bytes, checksum: b0fd54b306e9a1dfeb9e68ce43716fa2 (MD5)
Previous issue date: 2016 / Faces humanas são algumas das entidades mais importantes frequentemente encontradas em vídeos. Devido ao substancial volume de produção e consumo de vídeos digitais na atualidade (tanto vídeos pessoais quanto provenientes das indústrias de comunicação e entretenimento), a extração automática de informações relevantes de tais vídeos se tornou um tema ativo de pesquisa. Parte dos esforços realizados nesta área tem se concentrado no uso do reconhecimento e agrupamento facial para auxiliar o processo de anotação automática de faces em vídeos. No entanto, algoritmos de agrupamento de faces atuais ainda não são robustos às variações de aparência de uma mesma face em situações de aquisição típicas. Neste contexto, o problema abordado nesta tese é o agrupamento de faces em vídeos digitais, com a proposição de nova abordagem com desempenho superior (em termos de qualidade do agrupamento e custo computacional) em relação ao estado-da-arte, utilizando bases de vídeos de referência da literatura. Com fundamentação em uma revisão bibliográfica sistemática e em avaliações experimentais, chegou-se à proposição da abordagem, a qual é constituída por módulos de pré-processamento, detecção de faces, rastreamento, extração de características, agrupamento, análise de similaridade temporal e reagrupamento espacial. A abordagem de agrupamento de faces proposta alcançou os objetivos planejados obtendo resultados superiores (no tocante a diferentes métricas) a métodos avaliados utilizando as bases de vídeos YouTube Celebrities (KIM et al., 2008) e SAIVT-Bnews (GHAEMMAGHAMI, DEAN e SRIDHARAN, 2013). / Human faces are some of the most important entities frequently encountered in videos. As a result of the currently high volumes of digital videos production and consumption both personal and profissional videos, automatic extraction of relevant information from those videos has become an active research topic. Many efforts in this area have focused on the use of face clustering and recognition in order to aid with the process of annotating faces in videos. However, current face clustering algorithms are not robust to variations of appearance that a same face may suffer due to typical changes in acquisition scenarios. Hence, this thesis proposes a novel approach to the problem of face clustering in digital videos which achieves superior performance (in terms of clustering quality and computational cost) in comparison to the state-of-the-art, using reference video databases according to the literature. After performing a systematic literature review and experimental evaluations, the current approach has been proposed, which has the following modules: preprocessing, face detection, tracking, feature extraction, clustering, temporal similarity analysis, and spatial reclustering. The proposed approach for face clustering achieved the planned objectives obtaining better results (according to different metrics) than those presented by methods evaluated on the YouTube Celebrities videos dataset (KIM et al., 2008) and SAIVT-Bnews videos dataset (GHAEMMAGHAMI, DEAN e SRIDHARAN, 2013).
|
8 |
Understanding Cortical Neuron Dynamics through Simulation-Based Applications of Machine LearningJanuary 2020 (has links)
abstract: It is increasingly common to see machine learning techniques applied in conjunction with computational modeling for data-driven research in neuroscience. Such applications include using machine learning for model development, particularly for optimization of parameters based on electrophysiological constraints. Alternatively, machine learning can be used to validate and enhance techniques for experimental data analysis or to analyze model simulation data in large-scale modeling studies, which is the approach I apply here. I use simulations of biophysically-realistic cortical neuron models to supplement a common feature-based technique for analysis of electrophysiological signals. I leverage these simulated electrophysiological signals to perform feature selection that provides an improved method for neuron-type classification. Additionally, I validate an unsupervised approach that extends this improved feature selection to discover signatures associated with neuron morphologies - performing in vivo histology in effect. The result is a simulation-based discovery of the underlying synaptic conditions responsible for patterns of extracellular signatures that can be applied to understand both simulation and experimental data. I also use unsupervised learning techniques to identify common channel mechanisms underlying electrophysiological behaviors of cortical neuron models. This work relies on an open-source database containing a large number of computational models for cortical neurons. I perform a quantitative data-driven analysis of these previously published ion channel and neuron models that uses information shared across models as opposed to information limited to individual models. The result is simulation-based discovery of model sub-types at two spatial scales which map functional relationships between activation/inactivation properties of channel family model sub-types to electrophysiological properties of cortical neuron model sub-types. Further, the combination of unsupervised learning techniques and parameter visualizations serve to integrate characterizations of model electrophysiological behavior across scales. / Dissertation/Thesis / Doctoral Dissertation Applied Mathematics 2020
|
9 |
Aplicação de método Monte Carlo para cálculos de dose em folículos tiroideanosSILVA, Frank Sinatra Gomes da 25 February 2008 (has links)
Submitted by (ana.araujo@ufrpe.br) on 2016-07-05T19:39:13Z
No. of bitstreams: 1
Frank Sinatra Gomes da Silva.pdf: 1131089 bytes, checksum: 2c4bf5cf9af313b266e2630e4726c0c9 (MD5) / Made available in DSpace on 2016-07-05T19:39:13Z (GMT). No. of bitstreams: 1
Frank Sinatra Gomes da Silva.pdf: 1131089 bytes, checksum: 2c4bf5cf9af313b266e2630e4726c0c9 (MD5)
Previous issue date: 2008-02-25 / The Monte Carlo method is an important tool to simulate radioactive particles interaction with biologic medium. The principal advantage of the method when compared with deterministic methods is the ability to simulate a complex geometry. Several computational codes use the Monte Carlo method to simulate the particles transport and they have the capacity to simulate energy deposition in models of organs and/or tissues, as well models of cells of human body. Thus, the calculation of the absorbed dose to thyroid’s follicles (compound of colloid and follicles’ cells) have a fundamental importance to dosimetry, because these cells are radiosensitive due to ionizing radiation exposition, in particular, exposition due to radioisotopes of iodine, because a great amount of radioiodine may be released into the environment in case of a nuclear accidents. In this case, the goal of this work was use the code of particles transport MNCP4C to calculate absorbed doses in models of thyroid’s follicles, for Auger electrons, internal conversion electrons and beta particles, by iodine-131 and short-lived iodines (131, 132, 133, 134 e 135), with diameters varying from 30 to 500 μm. The results obtained from simulation with the MCNP4C code shown an average percentage of the 25% of total absorbed dose by colloid to iodine- 131 and 75% to short-lived iodine’s. For follicular cells, this percentage was of 13% toiodine-131 and 87% to short-lived iodine’s. The contributions from particles with low energies, like Auger and internal conversion electrons should not be neglected, to assessment the absorbed dose in cellular level. Agglomerative hierarchical clustering was used to compare doses obtained by codes MCNP4C, EPOTRAN, EGS4 and by deterministic methods. / O método Monte Carlo é uma poderosa ferramenta para simular a interação de partículas radioativas com a matéria biológica. A principal vantagem do método, quando comparado com métodos determinísticos, tem sido a habilidade de adequarse de forma precisa a qualquer geometria complexa. Vários códigos computacionais simulam o transporte de partículas via método Monte Carlo, com capacidade para simular o depósito de energia em modelos geométricos que vão desde órgãos e/ou tecidos do corpo, como em modelos de células pertencentes a órgãos do corpo humano. Nesse sentido, o cálculo da dose absorvida pelos folículos tiroideanos (composto de colóide e células foliculares) tem sido de fundamental importância na dosimetria, uma vez que essas células são bastante radiosensíveis à exposição pela radiação ionizante, em particular exposição essa devido aos radioisótopos de iodo, que são resultados de produtos de fissão em casos de acidentes nucleares. Dessa forma, o objetivo desse trabalho foi o de utilizar o código para transporte de partículas MCNP4C para calcular doses absorvidas em modelos de folículos tiroideanos, devido aos elétrons Auger, elétrons de conversão interna e partículas beta, do iodo-131 e dos isótopos de meia-vida curta (iodos 132, 133, 134 e 135),para folículos com diâmetros que variaram de 30 até 500 μm. Os resultados obtidos pela simulação com o MCNP4C apresentaram um percentual médio de 25% da dose total absorvida pelo colóide para o iodo-131 e de 75% para os iodos de meia-vida curta. Para as células foliculares, esse percentual foi em média de 13% para o iodo- 131 e de 87% para os iodos de meia-vida curta, ressaltando assim a importância de simular partículas de baixa energia, como os elétrons Auger e elétrons de conversão interna, para a avaliação da dose absorvida a nível celular. Técnicas hierárquicas de análise de agrupamento foram usadas para comparações entre doses obtidas pelos códigos MCNP4C, EPOTRAN, EGS4 e doses calculadas por métodos determinísticos.
|
10 |
Efficient Hierarchical Clustering Techniques For Pattern ClassificationVijaya, P A 07 1900 (has links) (PDF)
No description available.
|
Page generated in 0.1491 seconds