Global ETD Search

891	Trump sentiment: Dopad novinek v médiích na finanční trh Spojených států / The Trump Sentiment: The Effect of News on the US Stock Market Pinteková, Aneta January 2019 (has links) This thesis examines how the American economy is affected by the market sentiment that arises from the news about actions and decisions of the American President Donald Trump. The news articles are obtained from Reuters for the period between the 1st of May and the 30th of November 2018, based on which a sentiment variable is created using natural language processing methods. Firstly, the impact of Trump sentiment on the returns on the S&P 500 Index is examined. The results show a positive and statistically significant impact of sentiment from the previous day on today's S&P 500 Index return. A statisti cally significant effect of the sentiment from a week ago is also found, however, this effect is negative. This result indicates that there is an initial overreaction to the new information, followed by subsequent market correction to the mean. Such result is consistent with the findings of the field of behavioural finance, which incorporates the idea that investor psychology is involved in investment decision making. Secondly, the impact of the news sentiment on the performance of individual sectors of the American economy, as measured by the returns on S&P 500 sector indices, is analysed. A statistically significant effect of sentiment on sector index return is found in the case of Consumer...
892	Extracting Particular Information from Swedish Public Procurement Using Machine Learning Waade, Eystein January 2020 (has links) The Swedish procurement process has a yearly value of 706 Billion SEK over approximately 18 000 procurements. With each process comes many documents written in different formats that need to be understood to be able to be a possible tender. With the development of new technology and the age of Machine Learning it is of huge interest to investigate how we can use this knowledge to enhance the way we procure. The goal of this project was to investigate if public procurements written in Swedish in PDF format can be parsed and segmented into a structured format. This process was divided into three parts; pre-processing, annotation, and training/evaluation. The pre-processing was accomplished using an open-source pdf-parser called pdfalto that produces structured XML-files with layout and lexical information. The annotation process consisted of generalizing a procurement into high-level segments that are applicable to different document structures as well as finding relevant features. This was accomplished by identifying frequent document formats so that many documents could be annotated using deterministic rules. Finally, a linear chain Conditional Random Field was trained and tested to segment the documents. The models showed a high performance when they were tested on documents of the same format as it was trained on. However, the data from five different documents were not sufficient or general enough to make the model able to make reliable predictions on a sixth format that it had not seen before. The best result was a total accuracy of 90,6% where two of the labels had a f1-score above 95% and the two other labels had a f1-score of 51,8% and 63,3%. Machine Learning Conditional Random Field procurement Natural Language Processing segmentation maskininlärning offentlig upphandling maskinlæring offentlig anskaffelse Engineering and Technology Teknik och teknologier
893	Evaluation of text classification techniques for log file classification / Utvärdering av textklassificeringstekniker för klassificering avloggfiler Olin, Per January 2020 (has links) System log files are filled with logged events, status codes, and other messages. By analyzing the log files, the systems current state can be determined, and find out if something during its execution went wrong. Log file analysis has been studied for some time now, where recent studies have shown state-of-the-art performance using machine learning techniques. In this thesis, document classification solutions were tested on log files in order to classify regular system runs versus abnormal system runs. To solve this task, supervised and unsupervised learning methods were combined. Doc2Vec was used to extract document features, and Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based architectures on the classification task. With the use of the machine learning models and preprocessing techniques the tested models yielded an f1-score and accuracy above 95% when classifying log files. Text classification machine learning NLP natural language processing log file doc2vec CNN LSTM LSTM-CNN Computer Sciences Datavetenskap (datalogi)
894	Vícejazyčná syntéza řeči / Multilingual speech synthesis Nekvinda, Tomáš January 2020 (has links) This work explores multilingual speech synthesis. We compare three models based on Tacotron that utilize various levels of parameter sharing. Two of them follow recent multilingual text-to-speech systems. The first one makes use of a fully-shared encoder and an adversarial classifier that removes speaker-dependent information from the encoder. The other uses language-specific encoders. We introduce a new approach that combines the best of both previous methods. It enables effective parameter sharing using a meta- learning technique, preserves encoder's flexibility, and actively removes speaker-specific information in the encoder. We compare the three models on two tasks. The first one aims at joint multilingual training on ten languages and reveals their knowledge-sharing abilities. The second concerns code-switching. We show that our model effectively shares information across languages, and according to a subjective evaluation test, it produces more natural and accurate code-switching speech.
895	Bipolarizace společnosti? Analýza debaty o imigraci v online médiích / Bipolarization of the society? Analysing online media debate on immigration Hrdina, Jakub January 2020 (has links) This thesis explores the UK's media environment and aims to decide whether the media show bipolar trends in reporting about immigration during the infamous EU immigration crisis. It utilizes the Natural Language Processing AI to assess a dataset of five major publishers in the UK - The Sun, The Daily Mail, The Guardian, The Independent and The Daily Telegraph - through the years 2015, 2016 and 2017. The focus of the analysis are the dynamics of the media space in general as well as specifics of reporting of separate publishers. The applied method is a novelty to quantitative assessment of qualitative aspects of given texts and the thesis serves as an example of successful utilization of such approach. In comparison to the previous researches conducted on similar topics, the main benefits of utilization of the AI includes the ability to assess huge datasets, assured consistency and huge innovative potential. Being able to analyze a dataset of 1813 articles quantitatively, the method is as important as the study itself.
896	Exploring State-of-the-Art Natural Language Processing Models with Regards to Matching Job Adverts and Resumes Rückert, Lise, Sjögren, Henry January 2022 (has links) The ability to automate the process of comparing and matching resumes with job adverts is a growing research field. This can be done through the use of the machine learning area Natural Language Processing (NLP), which enables a model to learn human language. This thesis explores and evaluates the application of the state-of-the-art NLP model, SBERT, on the task of comparing and calculating a measure of similarity between extracted text from resumes and adverts. This thesis also investigates what type of data that generates the best performing model on said task. The results show that SBERT quickly can be trained on unlabeled data from the HR domain with the usage of a Triplet network, and achieves high performance and good results when tested on various tasks. The models are shown to be bilingual, can tackle unseen vocabulary and understand the concept and descriptive context of entire sentences instead of solely single words. Thus, the conclusion is that the models have a neat understanding of semantic similarity and relatedness. However, in some cases the models are also shown to become binary in their calculations of similarity between inputs. Moreover, it is hard to tune a model that is exhaustively comprehensive of such diverse domain such as HR. A model fine-tuned on clean and generic data extracted from adverts shows the overall best performance in terms of loss and consistency. Deep Learning Natural Language Processing SBERT Cosine similarity Recruitment Triplet network Semantic similarity Computer Sciences Datavetenskap (datalogi)
897	Algoritm för automatiserad generering av metadata / Algorithm for Automated Generation of Metadata Karlsson, Fredrik, Berg, Fredrik January 2015 (has links) Sveriges Radio sparar sin data i stora arkiv vilket gör det svårt att hitta specifik information. På grund av denna storlek blir uppgiften att hitta specifik information om händelser ett stort problem. För att lösa problemet krävs en mer konsekvent användning av metadata, därför har en undersökning om metadata och nyckelordsgenerering gjorts.Arbetet gick ut på att utveckla en algoritm som automatisk kan generera nyckelord från transkriberade radioprogram. Det ingick också i arbetet att göra en undersökning av tidigare arbeten för att se vilka system och algoritmer som kan användas för att generera nyckelord. Dessutom utvecklades en applikation som generar färdiga nyckelord som förslag till en användare. Denna applikation jämfördes och utvärderades med redan existerande program. Metoderna som använts bygger på både lingvistiska och statistiska algoritmer. En analys av resultaten gjordes och visade att den utvecklade applikationen genererade många precisa nyckelord, men även till antalet stora mängder nyckelord. Jämförelsen med ett redan existe-rande program visade att täckningen var bättre för den utvecklade applikationen, samtidigt som precisionen var bättre för det redan existerande programmet. / Sveriges Radio stores their data in large archives which makes it hard to retrieve specific information. The sheer size of the archives makes retrieving information about a specific event difficult and causes a big problem. To solve this problem a more consistent use of metadata is needed. This resulted in an investigation about metadata and keyword genera-tion.The appointed task was to automatically generate keywords from transcribed radio shows. This included an investigation of which systems and algorithms that can be used to generate keywords, based on previous works. An application was also developed which suggests keywords based on a text to a user. This application was tested and compared to other al-ready existing software, as well as different methods/techniques based on both linguistic and statistic algorithms. The resulting analysis displayed that the developed application generated many accurate keywords, but also a large amount of keywords in general. The comparison also showed that the recall for the developed algorithm got better results than the already existing software, which in turn produced a better precision in their keywords. Metadata keywords Natural Language Processing algorithms. Metadata nyckelord textutvinning naturliga språk algoritmer.
898	Finding Implicit Citations in Scientific Publications : Improvements to Citation Context Detection Methods Murray, Jonathan January 2015 (has links) This thesis deals with the task of identifying implicit citations between scientiﬁc publications. Apart from being useful knowledge on their own, the citations may be used as input to other problems such as determining an author’s sentiment towards a reference, or summarizing a paper based on what others have written about it. We extend two recently proposed methods, a Machine Learning classiﬁer and an iterative Belief Propagation algorithm. Both are implemented and evaluated on a common pre-annotated dataset. Several changes to the algorithms are then presented, incorporating new sentence features, diﬀerent semantic text similarity measures as well as combining the methods into a single classiﬁer. Our main ﬁnding is that the introduction of new sentence features yield signiﬁcantly improved F-scores for both approaches. / Detta examensarbete behandlar frågan om att hitta implicita citeringar mellan vetenskapliga publikationer. Förutom att vara intressanta på egen hand kan dessa citeringar användas inom andra problem, såsom att bedöma en författares inställning till en referens eller att sammanfatta en rapport utifrån hur den har blivit citerad av andra. Vi utgår från två nyliga metoder, en maskininlärningsbaserad klassiﬁcerare och en iterativ algoritm baserad på en grafmodell. Dessa implementeras och utvärderas på en gemensam förannoterad datamängd. Ett antal förändringar till algoritmerna presenteras i form av nya särdrag hos meningarna (eng. sentence features), olika semantiska textlikhetsmått och ett sätt att kombinera de två metoderna. Arbetets huvudsakliga resultat är att de nya meningssärdragen leder till anmärkningsvärt förbättrade F-värden för de båda metoderna. implicit citations citation context citations natural language processing nlp machine learning belief propagation Computer Sciences Datavetenskap (datalogi)
899	Language Learning Using Models of Intentionality in Repeated Games with Cheap Talk Skaggs, Jonathan Berry 31 May 2022 (has links) Language is critical to establishing long-term cooperative relationships among intelligent agents (including people), particularly when the agents' preferences are in conflict. In such scenarios, an agent uses speech to coordinate and negotiate behavior with its partner(s). While recent work has shown that neural language modeling can produce effective speech agents, such algorithms typically only accept previous text as input. However, in relationships among intelligent agents, not all relevant context is expressed in conversation. Thus, in this paper, we propose and analyze an algorithm, called Llumi, that incorporates other forms of context to learn to speak in long-term relationships modeled as repeated games with cheap talk. Llumi combines models of intentionality with neural language modeling techniques to learn speech from data that is relevant to the agent's current context. A user study illustrates that, while imperfect, Llumi does learn context-aware speech repeated games with cheap talk when partnered with people, including games in which it was not trained. We believe these results are useful in determining how autonomous agents can learn to use speech to facilitate successful human-agent teaming. natural language generation nlp machine learning deep learning natural language processing chat bot multi agent systems Physical Sciences and Mathematics
900	Coreference Resolution for Swedish / Koreferenslösning för svenska Vällfors, Lisa January 2022 (has links) This report explores possible avenues for developing coreference resolution methods for Swedish. Coreference resolution is an important topic within natural language processing, as it is used as a preprocessing step in various information extraction tasks. The topic has been studied extensively for English, but much less so for smaller languages such as Swedish. In this report we adapt two coreference resolution algorithms that were originally used for English, for use on Swedish texts. One algorithm is entirely rule-based, while the other uses machine learning. We have also annotated a Swedish dataset to be used for training and evaluation. Both algorithms showed promising results and as none clearly outperformed the other we can conclude that both would be good candidates for further development. For the rule-based algorithm more advanced rules, especially ones that could incorporate some semantic knowledge, was identified as the most important avenue of improvement. For the machine learning algorithm more training data would likely be the most beneficial. For both algorithms improved detection of mention spans would also help, as this was identified as one of the most error-prone components. / I denna rapport undersöks möjliga metoder för koreferenslösning för svenska. Koreferenslösning är en viktig uppgift inom språkteknologi, eftersom det utgör ett första steg i många typer av informationsextraktion. Uppgiften har studerats utförligt för flera större språk, framförallt engelska, men är ännu relativt outforskad för svenska och andra mindre språk. I denna rapport har vi anpassat två algoritmer som ursprungligen utvecklades för engelska för användning på svensk text. Den ena algoritmen bygger på maskininlärning och den andra är helt regelbaserad. Vi har också annoterat delar av Talbankens korpus med koreferensrelationer, för att användas för träning och utvärdering av koreferenslösningsalgoritmer. Båda algoritmerna visade lovande resultat, och ingen var tydligt bättre än den andra. Bägge vore därför lämpliga alternativ för vidareutveckling. För ML-algoritmen vore mer träningsdata den viktigaste punkten för förbättring, medan den regelbaserade algoritmen skulle kunna förbättras med mer komplexa regler, för att inkorporera exempelvis semantisk information i besluten. Ett annat viktigt utvecklingsområde är identifieringen av de fraser som utvärderas för möjlig koreferens, eftersom detta steg introducerade många fel i bägge algoritmerna. Natural language processing Information extraction Machine learning Random forests Coreference resolution Språkteknologi informationsextraktion maskininlärning beslutsträdsinlärning koreferenslösning Computer Sciences Datavetenskap (datalogi)

Search results