Global ETD Search

1	Semantic Relationship Annotation for Knowledge Documents in Knowledge Sharing Environments Pai, Yi-chung 29 July 2004 (has links) A typical online knowledge-sharing environment would generate vast amount of formal knowledge elements or interactions that generally available as textual documents. Thus, an effective management of the ever-increasing volume of online knowledge documents is essential to organizational knowledge sharing. Reply-semantic relationships between knowledge documents may exist either explicitly or implicitly. Such reply-semantic relationships between knowledge documents, once discovered or identified, would facilitate subsequent knowledge access by providing a novel and more semantic retrieval mechanism. In this study, we propose a preliminary taxonomy of reply-semantic relationships for documents organized in reply-replied structures and develop a SEmantic Enrichment between Knowledge documents (SEEK) technique for automatically annotating reply-semantic relationships between reply-pair documents. Based on the content-based text categorization techniques and genre classification techniques, we propose and evaluate different feature-set models, combinations of keyword features, POS statistics features, and/or given/new information (GI/NI) features. Our empirical evaluation results show that the proposed SEEK technique can achieve a satisfactory classification accuracy. Furthermore, use of keyword and GI/NI features by the proposed SEEK technique resulted in the best classification accuracy for the Answer/Comment classification task. On the other hand, the use of keyword features only can best differentiate Explanation and Instruction relationships. Genre Classification Text Categorization Reply-semantic Relationship Knowledge Sharing
2	Automated genre classification in literature Jordan, Emily January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William Hsu / This thesis examines automated genre classification in literature. The approach described uses text based comparison of book summaries to examine if word similarity is a feasible method for identifying genre types. Genres help users form impressions of what form a text will take. Knowing the genre of a literary work provides librarians, information scientists, and other users of a text collection with a summative guide to its form, its possible content, and what its members are about without having to peruse individual topic titles. This makes automatically generating genre labels a potentially useful tool in sorting unmarked text collections or searching the web. This thesis provides a brief overview of the problems faced by researchers wishing to automate genre classification as well as the current work in the field. My own methodology will also be discussed. I implemented two basic methods for labeling genre. The results collected using them will be covered, as well as future work and improvements to the project that I wish to implement. Genre Automated genre classification Labeling Automated Classification Computer Science (0984) Library Science (0399)
3	Processamento e análise de vídeos utilizando Floresta de Caminhos Ótimos / Processing and video analysis through Optimum-Path Forest Martins, Guilherme Brandão [UNESP] 20 May 2016 (has links) Submitted by GUILHERME BRANDÃO MARTINS null (guilherme-bm@outlook.com) on 2016-06-09T18:22:45Z No. of bitstreams: 1 Dissertacao_Guilherme_Brandão_Martins.pdf: 11362535 bytes, checksum: c1da2ab3e80ead0846eae49d9a1bc40e (MD5) / Approved for entry into archive by Ana Paula Grisoto (grisotoana@reitoria.unesp.br) on 2016-06-13T17:06:19Z (GMT) No. of bitstreams: 1 martins_gb_me_sjrp.pdf: 11362535 bytes, checksum: c1da2ab3e80ead0846eae49d9a1bc40e (MD5) / Made available in DSpace on 2016-06-13T17:06:19Z (GMT). No. of bitstreams: 1 martins_gb_me_sjrp.pdf: 11362535 bytes, checksum: c1da2ab3e80ead0846eae49d9a1bc40e (MD5) Previous issue date: 2016-05-20 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Com os avanços relacionados às tecnologias de redes computacionais e armazenamento de dados observa-se que, atualmente, uma grande quantidade de conteúdo digital está sendo disponibilizada via internet, em especial por meio de redes sociais. A fim de explorar esse contexto, abordagens relacionadas ao processamento e apredizado de padrões em vídeos têm recebido crescente atenção nos últimos anos. Sistemas de recomendação de filmes, amplamente empregados em lojas virtuais, são umas das principais aplicações no que se refere aos avanços de pesquisa na área de processamento de vídeos. Com o objetivo de acelerar o processo de recomendação e redução de armazenamento, técnicas para classificação e sumarização de vídeos por meio de aprendizado de máquina têm sido utilizadas com o intuito de explorar conteúdo informativo e também redundante. Por meio de técnicas de agrupamento e descrição de dados, é possível identificar quadros-chave de um conjunto de amostras a fim de que, posteriormente, estes sejam usados para sumarização do vídeo. Além disso, por meio de bases de vídeos rotuladas, podemos classificar amostras de modo a organizá-las por gêneros de vídeo. O presente trabalho objetiva utilizar o classificador Floresta de Caminhos Ótimos para sumarização automática e classificação de vídeos por gênero, bem como o estudo de sua viabilidade nestes contextos. Os resultados obtidos mostram que o referido classificador obteve desempenhos bastante promissores e próximos à algumas das técnicas de sumarização automática e classificação de vídeos que, atualmente, representam o estado-da-arte no atual contexto. / Currently, a number of improvements related to computational networks and data storage technologies have allowed a considerable amount of digital content to be provided on the internet, mainly through social networks. In order to exploit this context, video processing and pattern recognition approaches have received a considerable attention in the last years. Movie recommendation systems are widely employed in virtual stores, thus being one of the main applications regarding to research advances in the video processing field. Aiming to boost the content recommendation and storage cutback, different video categorization and video summarization techniques have been applied to handle with more informative and redundant content. By availing clustering and data description techniques, it is possible to identify keyframes from a given sample collection in order to consider them as part of the video summarization process. Furthermore, through labeled video data collections it is possible to classify samples in order to arrange them by video genres. The main goal of this work is to employ the Optimum-Path Forest classifier in both video summarization and video genre classification processes as well as to conduct a viability study of such classifier in the aforementioned contexts. The results have shown this classifier can achieve promising performances, being very close in terms of summary quality and consistent recognition rates to some state-of-the-art video summarization and classification approaches. Sumarização de vídeos Classificação de vídeos Floresta de Caminhos Ótimos Video summarization Video genre classification Optimum-Path Forest
4	The Impact of Semantic and Stylistic Features in Genre Classification for News Pei, Ziming January 2022 (has links) In this thesis, we investigate the usefulness of a group of features in genre classification problems for news. We choose a diverse feature set, covering features related to content and styles of the texts. The features are divided into two groups: semantic and stylistic. More specifically, the semantic features include genre-exclusive words, emotional words and synonyms. The stylistic features include character-level and document-level features. We use three traditional machine learning classification models and one neural network model to evaluate the effects of our features: Support Vector Machine, Complement Naive Bayes, k-Nearest Neighbor, and Convolutional Neural Networks. The results are evaluated by F1 score, precision and recall (both micro- and macro-averaged). We compare the performance of different models to find the optimal feature set for this news genre classification task, and meanwhile seek the most suitable classifier. We show that genre-exclusive words and synonyms are beneficial to the classification task, in that they are the most informative features in the training process. Emotional words have negative effect on the results. We present the best result of 0.97 by macro-average F1 score, precision and recall on the feature set combining the preprocessed dataset and its synonym sets generated based on contexts classified by the Complement Naive Bayes model. We discuss the results achieved from the experiments and the best-performing models, answer the research questions, and provide suggestions for future studies. news genre classification supervised machine learning
5	Genre classification using syntactic features Brigadoi, Ivan January 2021 (has links) This thesis work adresses text classification in relation to genre identification using different feature sets, with a focus on syntactic based features. We built our models by means of traditional machine learning algorithms, i.e. Naive Bayes, K-nearest neighbour, Support Vector Machine and Random Forest in order to predict the literary genre of books. We trained our models using as feature sets bag-of-words (BOW), bigrams, syntactic-based bigrams and emotional features, as well as combinations of features. Results obtained using the best features, i.e. BOW combined with bigrams based on syntactic relations between words, on the test set showed an enhancement in performance by 2% in F1-score over the baseline using BOW features, which translates into a positive impact of using syntactic information in the task of text classification. genre classification text classification machine learning
6	Klasifikace žánrů pomocí strojového učení / Genres classification by means of machine learning Bílek, Jan January 2018 (has links) In this thesis, we compare the bag of words approach with doc2vec doc- ument embeddings on the task of classification of book genres. We cre- ate 3 datasets with different text lengths by extracting short snippets from books in Project Gutenberg repository. Each dataset comprises of more than 200000 documents and 14 different genres. For 3200-character documents, we achieve F1-score of 0.862 when stacking models trained on both bag of words and doc2vec representations. We also explore the relationships be- tween documents, genres and words using similarity metrics on their vector representations and report typical words for each genre. As part of the thesis, we also present an online webapp for book genre classification. 1
7	Web genre classification using feature selection and semi-supervised learning Chetry, Roshan January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / As the web pages continuously change and their number grows exponentially, the need for genre classification of web pages also increases. One simple reason for this is given by the need to group web pages into various genre categories in order to reduce the complexities of various web tasks (e.g., search). Experts unanimously agree on the huge potential of genre classification of web pages. However, while everybody agrees that genre classification of web pages is necessary, researchers face problems in finding enough labeled data to perform supervised classification of web pages into various genres. The high cost of skilled manual labor, rapid changing nature of web and never ending growth of web pages are the main reasons for the limited amount of labeled data. On the contrary unlabeled data can be acquired relatively inexpensively in comparison to labeled data. This suggests the use of semi-supervised learning approaches for genre classification, instead of using supervised approaches. Semi-supervised learning makes use of both labeled and unlabeled data for training - typically a small amount of labeled data and a large amount of unlabeled data. Semi-supervised learning have been extensively used in text classification problems. Given the link structure of the web, for web-page classification one can use link features in addition to the content features that are used for general text classification. Hence, the feature set corresponding to web-pages can be easily divided into two views, namely content and link based feature views. Intuitively, the two feature views are conditionally independent given the genre category and have the ability to predict the class on their own. The scarcity of labeled data, availability of large amounts of unlabeled data, richer set of features as compared to the conventional text classification tasks (specifically complementary and sufficient views of features) have encouraged us to use co-training as a tool to perform semi-supervised learning. During co-training labeled examples represented using the two views are used to learn distinct classifiers, which keep improving at each iteration by sharing the most confident predictions on the unlabeled data. In this work, we classify web-pages of .eu domain consisting of 1232 labeled host and 20000 unlabeled hosts (provided by the European Archive Foundation [Benczur et al., 2010]) into six different genres, using co-training. We compare our results with the results produced by standard supervised methods. We find that co-training can be an effective and cheap alternative to costly supervised learning. This is mainly due to the two independent and complementary feature sets of web: content based features and link based features. Web genre classification Co-training Semi-supervised learning Feature selection Roshan Chetry Computer Science (0984) Information Technology (0489) Web Studies (0646)
8	Classificação automática de gênero musical baseada em entropia e fractais / Automatic music genre classification based on entropy and fractals Goulart, Antonio José Homsi 16 February 2012 (has links) A classificação automática de gênero musical tem como finalidade o conforto de ouvintes de músicas auxiliando no gerenciamento das coleções de músicas digitais. Existem sistemas que se baseiam em cabeçalhos de metadados (tais como nome de artista, gênero cadastrado, etc.) e também os que extraem parâmetros dos arquivos de música para a realização da tarefa. Enquanto a maioria dos trabalhos do segundo tipo utilizam-se do conteúdo rítmico e tímbrico, este utiliza-se apenas de conceitos da teoria da informação e da geometria de fractais. Entropia, lacunaridade e dimensão do fractal são os parâmetros que treinam os classificadores. Os testes foram realizados com duas coleções criadas para este trabalho e os resultados foram proeminentes / The goal of automatic music genre classification is givingmusic listeners ease and confort when managing digital music databases. Some systems are based on tags of metadata (such as artist name, genre labeled, etc.), while others explore characteristics from the music files to complete the task. While the majority of works of the second type analyse rhytmic, timbric and pitch content, this one explores only information theoretic and fractal geometry concepts. Entropy, fractal dimension and lacunarity are the parameters adopted to train the classifiers. Tests were carried out on two databases assembled by the author. Results were prominent Automatic music genre classification Entropia baseada em wavelet GMM GMM Lacunaridade Lacunarity SVM SVM Wavelet based entropy
9	Identificação de padrões de sinais acústicos com base em classificação paraconsistente / Identification of acoustic signal patterns based on paraconsistent classification Paulo, Katia Cristina Silva 20 September 2016 (has links) Com o uso de um conceito ainda não explorado para fins de classificação de dados, baseado em Lógica Paraconsistente Anotada (LPA), este trabalho visa à construção de um sistema inteligente para classificação de gêneros musicais (Music Genre Classification - MGC). Este tema, de caráter emergente na literatura, tem recebido atenção crescente da comunidade científica, tendo em vista a sua grande aplicabilidade, destacando-se o potencial de comercialização de dados multimídia pela Internet, assim como a automatização de inúmeras tarefas de data mining que envolvem sinais musicais. Utilizando uma base de dados composta por amostras de músicas representativas de cada gênero musical, tais como jazz, bolero, bossa nova, forró, salsa e sertanejo, assim como de um classificador discriminativo paraconsistente, uma abordagem supervisionada é proposta para solucionar o problema. O primeiro módulo do sistema realiza a extração de características dos diversos segmentos das músicas com base na análise tempo-frequência associada com as bandas críticas do ouvido humano. Por outro lado, o segundo módulo utiliza o classificador proposto, que deve permitir a manipulação de sinais com características contraditórias de uma maneira mais semelhante àquela realizada pelo cérebro humano. Os resultados, quando comparados com as abordagens pré-existentes para MGC, demonstram a viabilidade do uso da LPA para tal fim. Além disso, caracteriza-se neste trabalho, uma contribuição original ao estado-da-arte no tema, que consiste justamente no uso da LPA para MGC, procedimento para o qual inexiste descrição na literatura até este momento. / By using a new concept, which is based on Paraconsistent Logic (LPA) and has not yet been applied for classification, this work aims at constructing an intelligent system for Music Genre Classification (MGC). This topic, that is emergent in the literature, has received an increasing attention from the scientific community due to its applicability, emphazising both a commercial potential to commercialize multimedia content on the Internet and data mining tasks involving music signals. By adopting a database formed by samples of songs, which represent different styles of music, such as jazz, bolero, bossa nova, forró, salsa and sertanejo, and a discriminative paraconsistent classifier, a supervised procedure is used to solve the problem. The system is divided in two modules. The first extracts features from the music files, based on the concepts of time-frequency analysis and crictical bands of the human ear. On the other hand, the second implements the proposed classifier, which allows an efficient treatment of contradictions in such a way that is more similar to the human brain. The results obtained, when compared with existing approaches used to MGC, demonstrate how LPA is suitable for this purpose. Additionally, this is the original contribution to the state-of-the-art: the use of LPA for MGC, an inexistent approach up to date. Artificial intelligence Automatic music genre classification Inteligência artificial Lógica Paraconsistente Paraconsistent logic
10	Outomatiese genreklassifikasie vir hulpbronskaars tale / Dirk Snyman Snyman, Dirk Petrus January 2012 (has links) When working in the terrain of text processing, metadata about a particular text plays an important role. Metadata is often generated using automatic text classification systems which classifies a text into one or more predefined classes or categories based on its contents. One of the dimensions by which a text can be can be classified, is the genre of a text. In this study the development of an automatic genre classification system in a resource scarce environment is postulated. This study aims to: i) investigate the techniques and approaches that are generally used for automatic genre classification systems, and identify the best approach for Afrikaans (a resource scarce language), ii) transfer this approach to other indigenous South African resource scarce languages, and iii) investigate the effectiveness of technology recycling for closely related languages in a resource scarce environment. To achieve the first goal, five machine learning approaches were identified from the literature that are generally used for text classification, together with five common approaches to feature extraction. Two different approaches to the identification of genre classes are presented. The machine learning-, feature extraction- and genre class identification approaches were used in a series of experiments to identify the best approach for genre classification for a resource scarce language. The best combination is identified as the multinomial naïve Bayes algorithm, using a bag of words approach as features to classify texts into three abstract classes. This results in an f-score (performance measure) of 0.929 and it was subsequently shown that this approach can be successfully applied to other indigenous South African languages. To investigate the viability of technology recycling for genre classification systems for closely related languages, Dutch test data was classified using an Afrikaans genre classification system and it is shown that this approach works well. A pre-processing step was implemented by using a machine translation system to increase the compatibility between Afrikaans and Dutch by translating the Dutch texts before classification. This results in an f-score of 0.577, indicating that technology recycling between closely related languages has merit. This approach can be used to promote and fast track the development of genre classification systems in a resource scarce environment. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2013 Genre classification Resource scarce languages Machine learning Technology recycling Human language technology Natural language processing Genreklassifikasie Hulpbronskaars tale Masjienleer Tegnologieherwinning Mensetaaltegnologie Natuurliketaalprosessering

Search results