Global ETD Search

11	Automatic Document Classification Applied to Swedish News Blein, Florent January 2005 (has links) The first part of this paper presents briefly the ELIN[1] system, an electronic newspaper project. ELIN is a framework that stores news and displays them to the end-user. Such news are formatted using the xml[2] format. The project partner Corren[3] provided ELIN with xml articles, however the format used was not the same. My first task has been to develop a software that converts the news from one xml format (Corren) to another (ELIN). The second and main part addresses the problem of automatic document classification and tries to find a solution for a specific issue. The goal is to automatically classify news articles from a Swedish newspaper company (Corren) into the IPTC[4] news categories. This work has been carried out by implementing several classification algorithms, testing them and comparing their accuracy with existing software. The training and test documents were 3 weeks of the Corren newspaper that had to be classified into 2 categories. The last tests were run with only one algorithm (Naïve Bayes) over a larger amount of data (7, then 10 weeks) and categories (12) to simulate a more real environment. The results show that the Naïve Bayes algorithm, although the oldest, was the most accurate in this particular case. An issue raised by the results is that feature selection improves speed but can seldom reduce accuracy by removing too many features. ELIN automatic text classification Naïve Bayes network Rocchio
12	Using sentence-level classification to predict sentiment at the document-level Hutton, Amanda Rachel 21 August 2012 (has links) This report explores various aspects of sentiment mining. The two research goals for the report were: (1) to determine useful methods in increasing recall of negative sentences and (2) to determine the best method for applying sentence level classification to the document level. The methods in this report were applied to the Movie Reviews corpus at both the document and sentence level. The basic approach was to first identify polar and neutral sentences within the text and then classify the polar sentences as either positive or negative. The Maximum Entropy classifier was used as the baseline system in which the application of further methods was explored. Part-of-speech tagging was used for its effectiveness to determine if its inclusion increased recall of negative sentences. It was also used to aid in the handling of negations within sentences at the sentence level. Smoothing was investigated and various metrics to describe the sentiment composition were explored to address goal (2). Negative recall was shown to increase with the adjustment of the classification threshold and was also seen to increase through the methods used to address goal (2). Overall, classifying at the sentence level using bigrams and a cutoff value of one was observed to result in the highest evaluation scores. / text Sentiment mining Sentence-level classification Text classification Recall
13	Time to Open the Black Box : Explaining the Predictions of Text Classification Löfström, Helena January 2018 (has links) The purpose of this thesis has been to evaluate if a new instance based explanation method, called Automatic Instance Text Classification Explanator (AITCE), could provide researchers with insights about the predictions of automatic text classification and decision support about documents requiring human classification. Making it possible for researchers, that normally use manual classification, to cut time and money in their research, with the maintained quality. In the study, AITCE was implemented and applied to the predictions of a black box classifier. The evaluation was performed at two levels: at instance level, where a group of 3 senior researchers, that use human classification in their research, evaluated the results from AITCE from an expert view; and at model level, where a group of 24 non experts evaluated the characteristics of the classes. The evaluations indicate that AITCE produces insights about which words that most strongly affect the prediction. The research also suggests that the quality of an automatic text classification may increase through an interaction between the user and the classifier in situations with unsure predictions. Text Classification Explanation Methods Machine Learning Information Studies Biblioteks- och informationsvetenskap
14	AI Approaches for Classification and Attribute Extraction in Text Magnusson, Ludvig, Rovala, Johan January 2017 (has links) As the amount of data online grows, the urge to use this data for different applications grows as well. Machine learning can be used with the intent to reconstruct and validate the data you are interested in. Although the problem is very domain specific, this report will attempt to shed some light on what we call strategies for classification, which in broad terms mean, a set of steps in a process where the end goal is to have classified some part of the original data. As a result, we hope to introduce clarity into the classification process in detail as well as from a broader perspective. The report will investigate two classification objectives, one of which is dependent on many variables found in the input data and one that is more literal and only dependent on one or two variables. Specifically, the data we will classify are sales-objects. Each sales-object has a text describing the object and a related image. We will attempt to place these sales-objects into the correct product category. We will also try to derive the year of creation and it’s dimensions such as height and width. Different approaches are presented in the aforementioned strategies in order to classify such attributes. The results showed that for broader attributes such as a product category, supervised learning is indeed an appropriate approach, while the same can not be said for narrower attributes, which instead had to rely on entity recognition. Experiments on image analytics in conjunction with supervised learning proved image analytics to be a good addition when requiring a higher precision score. text classification feature extraction machine learning scikit Software Engineering Programvaruteknik
15	State-of-Mind Classification From Unstructured Texts Using Statistical Features and Lexical Network Features Bayram, Ulya 01 October 2019 (has links) No description available. Computer Science Machine Learning Text Classification Suicide Party affiliation
16	Information Retrieval for Call Center Quality Assurance McMurtry, William F. 02 October 2020 (has links) No description available. Computer Science Automatic Speech Recognition Information Retrieval Text Classification
17	Information and Representation Tradeoffs in Document Classification Jin, Timothy 23 May 2022 (has links) No description available. Computer Science natural language processing text classification document classification
18	Document Classification using Characteristic Signatures Mondal, Abhro Jyoti January 2017 (has links) No description available. Computer Science Text classification Template matching Signatures Information retrieval
19	Short Text Classification in Twitter to Improve Information Filtering Sriram, Bharath 03 September 2010 (has links) No description available. Computer Science Text Classification Twitter Short Text Information filtering
20	Deep Active Learning for Short-Text Classification / Aktiv inlärning i djupa nätverk för klassificering av korta texter Zhao, Wenquan January 2017 (has links) In this paper, we propose a novel active learning algorithm for short-text (Chinese) classification applied to a deep learning architecture. This topic thus belongs to a cross research area between active learning and deep learning. One of the bottlenecks of deeplearning for classification is that it relies on large number of labeled samples, which is expensive and time consuming to obtain. Active learning aims to overcome this disadvantage through asking the most useful queries in the form of unlabeled samples to belabeled. In other words, active learning intends to achieve precise classification accuracy using as few labeled samples as possible. Such ideas have been investigated in conventional machine learning algorithms, such as support vector machine (SVM) for imageclassification, and in deep neural networks, including convolutional neural networks (CNN) and deep belief networks (DBN) for image classification. Yet the research on combining active learning with recurrent neural networks (RNNs) for short-text classificationis rare. We demonstrate results for short-text classification on datasets from Zhuiyi Inc. Importantly, to achieve better classification accuracy with less computational overhead,the proposed algorithm shows large reductions in the number of labeled training samples compared to random sampling. Moreover, the proposed algorithm is a little bit better than the conventional sampling method, uncertainty sampling. The proposed activelearning algorithm dramatically decreases the amount of labeled samples without significantly influencing the test classification accuracy of the original RNNs classifier, trainedon the whole data set. In some cases, the proposed algorithm even achieves better classification accuracy than the original RNNs classifier. / I detta arbete studerar vi en ny aktiv inlärningsalgoritm som appliceras på en djup inlärningsarkitektur för klassificering av korta (kinesiska) texter. Ämnesområdet hör därmedtill ett ämnesöverskridande område mellan aktiv inlärning och inlärning i djupa nätverk .En av flaskhalsarna i djupa nätverk när de används för klassificering är att de beror avtillgången på många klassificerade datapunkter. Dessa är dyra och tidskrävande att skapa. Aktiv inlärning syftar till att överkomma denna typ av nackdel genom att generera frågor rörande de mest informativa oklassade datapunkterna och få dessa klassificerade. Aktiv inlärning syftar med andra ord till att uppnå bästa klassificeringsprestanda medanvändandet av så få klassificerade datapunkter som möjligt. Denna idé har studeratsinom konventionell maskininlärning, som tex supportvektormaskinen (SVM) för bildklassificering samt inom djupa neuronnätverk inkluderande bl.a. convolutional networks(CNN) och djupa beliefnetworks (DBN) för bildklassificering. Emellertid är kombinationenav aktiv inlärning och rekurrenta nätverk (RNNs) för klassificering av korta textersällsynt. Vi demonstrerar här resultat för klassificering av korta texter ur en databas frånZhuiyi Inc. Att notera är att för att uppnå bättre klassificeringsnoggranhet med lägre beräkningsarbete (overhead) så uppvisar den föreslagna algoritmen stora minskningar i detantal klassificerade träningspunkter som behövs jämfört med användandet av slumpvisadatapunkter. Vidare, den föreslagna algoritmen är något bättre än den konventionellaurvalsmetoden, osäkherhetsurval (uncertanty sampling). Den föreslagna aktiva inlärningsalgoritmen minska dramatiskt den mängd klassificerade datapunkter utan att signifikant påverka klassificeringsnoggranheten hos den ursprungliga RNN-klassificeraren när den tränats på hela datamängden. För några fall uppnår den föreslagna algoritmen t.o.m.bättre klassificeringsnoggranhet än denna ursprungliga RNN-klassificerare. Active Learning Deep Learning Text Classification Computer Sciences Datavetenskap (datalogi)

Search results