Global ETD Search

1	Technical Term Extraction Using Measures of Neology / Facktermsdetektering medelst neologiska kriteria Norman, Christopher January 2016 (has links) This study aims to show that frequency of occurrence over time for technical terms differs from general language terms in the sense that technical terms are strongly biased to be recent occurrences, and that this difference can be exploited for the automatic identification and extraction of technical terms from text. To this end, we propose two features extracted from temporally labelled datasets designed to capture surface level n-gram neology. The analysis shows that these features, calculated over consecutive bigrams, are highly indicative of technical terms, which suggests that technical terms are strongly biased to be surface level neologisms. Finally, we implement a technical term extractor using the proposed features and compare its performance against a number of baselines. / Detta arbete ämnar visa att den tidsberoende frekvensen för facktermer skiljer sig från motsvarande frekvens för termer i vardagligt språk, i det avseendet att facktermer med hög sannolikhet är lingvistiska nybildningar, samt att denna iaktagelse kan nyttjas i syfte att automatiskt identifiera och extrahera facktermer i löptext. I detta syfte introducerar vi två särdrag extraherade från kronologiskt annoterade datamängder avsedda att fånga nybildningar av förekommande n-gram. Analysen visar att dessa särdrag, beräknade över konsekutiva bigram, är starkt indikativa för facktermer, vilket antyder att facktermer har en starkt tendens att vara nybildningar. Slutligtvis implementerar vi en facktermsextraktor baserad på dessa särdrag och jämför dess prestanda med ett antal referenssärdrag. Read more technical term extraction keyphrase extraction terminology facktermsdetektering nyckelordsextraktion terminologi Computer Sciences Datavetenskap (datalogi)
2	Keyword Extraction from Swedish Court Documents / Extraktion av nyckelord från svenska rättsdokument Grosz, Sandra January 2020 (has links) This thesis addresses the problem of extracting keywords which represent the rulings and and grounds for the rulings in Swedish court documents. The problem of identifying the candidate keywords was divided into two steps; first preprocessing the documents and second extracting keywords using a keyword extraction algorithm on the preprocessed documents. The preprocessing methods used in conjunction with the keywords extraction algorithms were that of using stop words and a stemmer. Then, three different approaches for extracting keywords were used; one statistic approach, one machine learning approach and lastly one graph-based approach. The three different approaches used to extract keywords were then evaluated to measure the quality of the keywords and the rejection rate of keywords which were not of a high enough quality. Out of the three approaches implemented and evaluated the results indicated that the graph-based approach showed the most promise. However, the results also showed that neither of the three approaches had a high enough accuracy to be used without human supervision. / Detta examensarbete behandlar problemet om att extrahera nyckelord som representerar domslut och domskäl ur svenska rättsdokument. Problemet med att identifiera möjliga nyckelord delades upp i två steg; det första steget är att använda förbehandlingsmetoder och det andra steget att extrahera nyckelord genom att använda en algoritm för nyckelordsextraktion. Förbehandlingsmetoderna som användes tillsammans med nyckelordsextraktionsalgoritmerna var stoppord samt avstammare. Sedan användes tre olika metoder för att extrahera nyckelord; en statistisk, en maskininlärningsbaserad och slutligen en grafbaserad. De tre metoderna för att extrahera nyckelord blev sedan evaluerade för att kunna mäta kvaliteten på nyckelorden samt i vilken grad nyckelord som inte var av tillräckligt hög kvalitet förkastades. Av de tre implementerade och evaluerade tillvägagångssätten visade den grafbaserade metoden mest lovande resultat. Däremot visade resultaten även att ingen av de tre metoderna hade en tillräckligt hög riktighet för att kunna användas utan mänsklig övervakning. Read more Keywords extraction Information Retrieval Natural Language Processing nyckelordsextraktion informationssökning naturligt språkbehandling. Computer and Information Sciences Data- och informationsvetenskap
3	Automatisk extraktion av nyckelord ur ett kundforum / Automatic keyword extraction from a customer forum Ekman, Sara January 2018 (has links) Konversationerna i ett kundforum rör sig över olika ämnen och språket är inkonsekvent. Texterna uppfyller inte de krav som brukar ställas på material inför automatisk nyckelordsextraktion. Uppsatsens undersöker hur nyckelord automatiskt kan extraheras ur ett kundforum trots dessa svårigheter. Fokus i undersökningen ligger på tre aspekter av nyckelordsextraktion. Den första faktorn rör hur den etablerade nyckelordsextraktionsmetoden TFIDF presterar jämfört med fyra metoder som skapas med hänsyn till materialets ovanliga struktur. Nästa faktor som testas är om olika sätt att räkna ordfrekvens påverkar resultatet. Den tredje faktorn är hur metoderna presterar om de endast använder inläggen, rubrikerna eller båda texttyperna i sina extraktioner. Icke-parametriska test användes för utvärdering av extraktionerna. Ett antal Friedmans test visar att metoderna i några fall skiljer sig åt gällande förmåga att identifiera relevanta nyckelord. I post-hoc-test mellan de högst presterande metoderna ses en av de nya metoderna i ett fall prestera signifikant bättre än de andra nya metoderna men inte bättre än TFIDF. Ingen skillnad hittades mellan användning av olika texttyper eller sätt att räkna ordfrekvens. För framtida forskning rekommenderas reliabilitetstest av manuellt annoterade nyckelord. Ett större stickprov bör användas än det i aktuell studie och olika förslag ges för att förbättra rättning av extraherade nyckelord. / Conversations in a customer forum span across different topics and the language is inconsistent. The text type do not meet the demands for automatic keyword extraction. This essay examines how keywords can be automatically extracted despite these difficulties. Focus in the study are three areas of keyword extraction. The first factor regards how the established keyword extraction method TFIDF performs compared to four methods created with the unusual material in mind. The next factor deals with different ways to calculate word frequency. The third factor regards if the methods use only posts, only titles, or both in their extractions. Non-parametric tests were conducted to evaluate the extractions. A number of Friedman's tests shows the methods in some cases differ in their ability to identify relevant keywords. In post-hoc tests performed between the highest performing methods, one of the new methods perform significantly better than the other new methods but not better than TFIDF. No difference was found between the use of different text types or ways to calculate word frequency. For future research reliability test of manually annotated keywords is recommended. A larger sample size should be used than in the current study and further suggestions are given to improve the results of keyword extractions. Read more Automatic keyword extraction Information extraction Noisy text TFIDF User generated text Användargenererad text Automatisk nyckelordsextraktion Brusig text Informationsextraktion TFIDF General Language Studies and Linguistics

1

Page generated in 0.0889 seconds