Global ETD Search

1	An Ontology-Based Personalized Document Clustering Approach Huang, Tse-hsiu 05 August 2004 (has links) With the proliferation of electronic commerce and knowledge economy environments, both persons and organizations increasingly have generated and consumed large amounts of online information, typically available as textual documents. To manage this rapid growth of the number of textual documents, people often use categories or folders to organize their documents. These document grouping behaviors are intentional acts that reflect the persons¡¦ (or organizations¡¦) preferences with regard to semantic coherency, or relevant groupings between subjects. For this thesis, we design and implement an ontology-based personalized document clustering (OnPEC) technique by incorporating both an individual user¡¦s partial clustering and an ontology into the document clustering process. Our use of a target user¡¦s partial clustering supports the personalization of document categorization, whereas our use of the ontology turns document clustering from a feature-based to a concept-based approach. In addition, we combine two hierarchical agglomerative clustering (HAC) approaches (i.e., pre-cluster-based and atomic-based) in our proposed OnPEC technique. Using the clustering effectiveness achieved by a traditional content-based document clustering technique and previously proposed feature-based document clustering (PEC) techniques as performance benchmarks, we find that use of partial clusters improves document clustering effectiveness, as measured by cluster precision and cluster recall. Moreover, for both OnPEC and PEC techniques, the clustering effectiveness of pre-cluster-based HAC methods greatly outperforms that of atomic-based HAC methods. Document clustering Hierarchical agglomerative clustering Ontology learning Ontology Personalized document clustering Ontology-based document clustering
2	Preference-Anchored Document clustering Technique for Supporting Effective Knowledge and Document Management Wang, Shin 03 August 2005 (has links) Effective knowledge management of proliferating volume of documents within a knowledge repository is vital to knowledge sharing, reuse, and assimilation. In order to facilitate accesses to documents in a knowledge repository, use of a knowledge map to organize these documents represents a prevailing approach. Document clustering techniques typically are employed to produce knowledge maps. However, existing document clustering techniques are not tailored to individuals¡¦ preferences and therefore are unable to facilitate the generation of knowledge maps from various preferential perspectives. In response, we propose the Preference-Anchored Document Clustering (PAC) technique that takes a user¡¦s categorization preference (represented as a list of anchoring terms) into consideration to generate a knowledge map (or a set of document clusters) from this specific preferential perspective. Our empirical evaluation results show that our proposed technique outperforms the traditional content-based document clustering technique in the high cluster precision area. Furthermore, benchmarked with Oracle Categorizer, our proposed technique also achieves better clustering effectiveness in the high cluster precision area. Overall, our evaluation results demonstrate the feasibility and potential superiority of the proposed PAC technique. Document clustering Knowledge map Preference-based document clustering Text mining
3	Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus Lin, Hao-hsiang 30 August 2006 (has links) According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clustering techniques need to be able to take into account users¡¦ categorization preferences and thus can generate document clusters from different preferential perspectives. The Preference-Anchored Document Clustering (PAC) technique was proposed for supporting preference-based document-clustering. Specifically, PAC takes a user¡¦s categorization preference into consideration and subsequently generates a set of document clusters from this specific preferential perspective. In this study, we attempt to investigate two research questions concerning the PAC technique. The first research question investigates ¡§whether the incorporation of the broader-term expansion (i.e., the proposed PAC2 technique in this study) will improve the effectiveness of preference-based document-clustering, whereas the second research question is ¡§whether the use of a statistical-based thesaurus constructed from a larger document corpus will improve the effectiveness of preference-based document-clustering.¡¨ Compared with the effectiveness achieved by PAC, our empirical results show that the proposed PAC2 technique neither improves nor deteriorates the effectiveness of preference-based document-clustering when the complete set of anchoring terms is used. However, when only a partial set of anchoring terms is provided, PAC2 cannot improve and even deteriorate the effectiveness of preference-based document-clustering. As to the second research question, our empirical results suggest the use of a statistical-based thesaurus constructed from a larger document corpus (i.e., the ACM corpus consisting of 14,729 documents) does not improve the effectiveness of PAC and PAC2 for preference-based document-clustering. Personalized document clustering Document clustering Preference-based document clustering Text mining
4	Supporting Data Warehouse Design with Data Mining Approach Tsai, Tzu-Chao 06 August 2001 (has links) Traditional relational database model does not have enough capability to cope with a great deal of data in finite time. To address these requirements, data warehouses and online analytical processing (OLAP) have emerged. Data warehouses improve the productivity of corporate decision makers through consolidation, conversion, transformation, and integration of operational data, and supports online analytical processing (OLAP). The data warehouse design is a complex and knowledge intensive process. It needs to consider not only the structure of the underlying operational databases (source-driven), but also the information requirements of decision makers (user-driven). Past research focused predominately on supporting the source-driven data warehouse design process, but paid less attention to supporting the user-driven data warehouse design process. Thus, the goal of this research is to propose a user-driven data warehouse design support system based on the knowledge discovery approach. Specifically, a Data Warehouse Design Support System was proposed and the generalization hierarchy and generalized star schemas were used as the data warehouse design knowledge. The technique for learning these design knowledge and reasoning upon them were developed. An empirical evaluation study was conducted to validate the effectiveness on the proposed techniques in supporting data warehouse design process. The result of empirical evaluation showed that this technique was useful to support data warehouse design especially on reducing the missing design and enhancing the potentially useful design. Data Mining Data Warehouse Knowledge Discovery Data Warehouse Design Star Schema
5	Personalized Document Clustering: Technique Development and Empirical Evaluation Wu, Chia-Chen 14 August 2003 (has links) With the proliferation of an electronic commerce and knowledge economy environment, both organizations and individuals generate and consume a large amount of online information, typically available as textual documents. To manage the ever-increasing volume of documents, organizations and individuals typically organize their documents into categories to facilitate document management and subsequent information access and browsing. However, document grouping behaviors are intentional acts, reflecting individuals¡¦ (or organizations¡¦) preferential perspective on semantic coherency or relevant groupings between subjects. Thus, an effective document clustering needs to address the described preferential perspective on document grouping and support personalized document clustering. In this thesis, we designed and implemented a personalized document clustering approach by incorporating individual¡¦s partial clustering into the document clustering process. Combining two document representation methods (i.e., feature refinement and feature weighting) with two clustering processes (i.e., pre-cluster-based and atomic-based), four personalized document clustering techniques are proposed. Using the clustering effectiveness achieved by a traditional content-based document clustering technique as performance benchmarks, our evaluation results suggest that use of partial clusters would improve the document clustering effectiveness. Moreover, the pre-cluster-based technique outperforms the atomic-based one, and the feature weighting method for document representation achieves a higher clustering effectiveness than the feature refinement method does. Supervised Document Clustering Personalized Document Clustering Document Clustering Hierarchical Agglomerative Clustering
6	Exploring transcription patterns and regulatory motifs in Arabidopsis thaliana Bahirwani, Vishal January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / Recent work has shown that bidirectional genes (genes located on opposite strands of DNA, whose transcription start sites are not more than 1000 basepairs apart) are often co-expressed and have similar biological functions. Identification of such genes can be useful in the process of constructing gene regulatory networks. Furthermore, analysis of the intergenic regions corresponding to bidirectional genes can help to identify regulatory elements, such as transcription factor binding sites. Approximately 2500 bidirectional gene pairs have been identified in Arabidopsis thaliana and the corresponding intergenic regions have been shown to be rich in regulatory elements that are essential for the initiation of transcription. Identifying such elements is especially important, as simply searching for known transcription factor binding sites in the promoter of a gene can result in many hits that are not always important for transcription initiation. Encouraged by the findings about the presence of essential regulatory elements in the intergenic regions corresponding to bidirectional genes, in this thesis, we explore a motif-based machine learning approach to identify intergenic regulatory elements. More precisely, we consider the problem of predicting the transcription pattern for pairs of consecutive genes in Arabidopsis thaliana using motifs from AthaMap and PLACE. We use machine learning algorithms to learn models that can predict the direction of transcription for pairs of consecutive genes. To identify the most predictive motifs and, therefore, the most significant regulatory elements, we perform feature selection based on mutual information and feature abstraction based on family or sequence similarity. Preliminary results demonstrate the feasibility of our approach. Gene regulatory networks Machine learning Arabidopsis thaliana Motif Hierarchical agglomerative clustering Bioinformatics Bidirectional genes Computer Science (0984)
7	Agrupamento de faces em vídeos digitais. MOURA, Eduardo Santiago. 06 June 2018 (has links) Submitted by Maria Medeiros (maria.dilva1@ufcg.edu.br) on 2018-06-06T11:40:34Z No. of bitstreams: 1 EDUARDO SANTIAGO MOURA - TESE (PPGCC) 2016.pdf: 4888830 bytes, checksum: b0fd54b306e9a1dfeb9e68ce43716fa2 (MD5) / Made available in DSpace on 2018-06-06T11:40:34Z (GMT). No. of bitstreams: 1 EDUARDO SANTIAGO MOURA - TESE (PPGCC) 2016.pdf: 4888830 bytes, checksum: b0fd54b306e9a1dfeb9e68ce43716fa2 (MD5) Previous issue date: 2016 / Faces humanas são algumas das entidades mais importantes frequentemente encontradas em vídeos. Devido ao substancial volume de produção e consumo de vídeos digitais na atualidade (tanto vídeos pessoais quanto provenientes das indústrias de comunicação e entretenimento), a extração automática de informações relevantes de tais vídeos se tornou um tema ativo de pesquisa. Parte dos esforços realizados nesta área tem se concentrado no uso do reconhecimento e agrupamento facial para auxiliar o processo de anotação automática de faces em vídeos. No entanto, algoritmos de agrupamento de faces atuais ainda não são robustos às variações de aparência de uma mesma face em situações de aquisição típicas. Neste contexto, o problema abordado nesta tese é o agrupamento de faces em vídeos digitais, com a proposição de nova abordagem com desempenho superior (em termos de qualidade do agrupamento e custo computacional) em relação ao estado-da-arte, utilizando bases de vídeos de referência da literatura. Com fundamentação em uma revisão bibliográfica sistemática e em avaliações experimentais, chegou-se à proposição da abordagem, a qual é constituída por módulos de pré-processamento, detecção de faces, rastreamento, extração de características, agrupamento, análise de similaridade temporal e reagrupamento espacial. A abordagem de agrupamento de faces proposta alcançou os objetivos planejados obtendo resultados superiores (no tocante a diferentes métricas) a métodos avaliados utilizando as bases de vídeos YouTube Celebrities (KIM et al., 2008) e SAIVT-Bnews (GHAEMMAGHAMI, DEAN e SRIDHARAN, 2013). / Human faces are some of the most important entities frequently encountered in videos. As a result of the currently high volumes of digital videos production and consumption both personal and profissional videos, automatic extraction of relevant information from those videos has become an active research topic. Many efforts in this area have focused on the use of face clustering and recognition in order to aid with the process of annotating faces in videos. However, current face clustering algorithms are not robust to variations of appearance that a same face may suffer due to typical changes in acquisition scenarios. Hence, this thesis proposes a novel approach to the problem of face clustering in digital videos which achieves superior performance (in terms of clustering quality and computational cost) in comparison to the state-of-the-art, using reference video databases according to the literature. After performing a systematic literature review and experimental evaluations, the current approach has been proposed, which has the following modules: preprocessing, face detection, tracking, feature extraction, clustering, temporal similarity analysis, and spatial reclustering. The proposed approach for face clustering achieved the planned objectives obtaining better results (according to different metrics) than those presented by methods evaluated on the YouTube Celebrities videos dataset (KIM et al., 2008) and SAIVT-Bnews videos dataset (GHAEMMAGHAMI, DEAN e SRIDHARAN, 2013). Ciências Ciência da Computação Agrupamento de Faces em Vídeos Agrupamento Aglomerativo Hierárquico Avaliação de Agrupamento Video Face Clustering Hierarchical Agglomerative Clustering Clustering Evaluation
8	Efficient Hierarchical Clustering Techniques For Pattern Classification Vijaya, P A 07 1900 (has links) (PDF) No description available. Hierarchical Clustering Pattern Classification Clustering Techniques (Computer Science) Clustering Algorithms Cluster Analysis Incremental Clustering Hierarchlcal Clustering Algorithm Protein Sequence Classification Computer Science
9	Shluková analýza signálu EKG / ECG Cluster Analysis Pospíšil, David January 2013 (has links) This diploma thesis deals with the use of some methods of cluster analysis on the ECG signal in order to sort QRS complexes according to their morphology to normal and abnormal. It is used agglomerative hierarchical clustering and non-hierarchical method K – Means for which an application in Mathworks MATLAB programming equipment was developed. The first part deals with the theory of the ECG signal and cluster analysis, and then the second is the design, implementation and evaluation of the results of the usage of developed software on the ECG signal for the automatic division of QRS complexes into clusters.

Search results