Global ETD Search

1	BERT and FFT measurement systems for high-speed communications and magnetometry Zhu, Qiwei January 2011 (has links) This master thesis presents the development and implementation of two digital systems based on Field-programmable Gate Array (FPGA): a Bit Error Rate Testing (BERT) system for an Optical Communication (OCOM) application, and a Digital Signal Processing (DSP) system for a Spin-Dependent Tunneling Magnetometer (SDTM). Both applications are intended for space and currently under development at the Ångström Space Technology Center (ÅSTC). The DSP system samples analog signals and applies a Fast Fourier Transform (FFT) for to provide frequency spectrum analysis. The report covers detailed system designs, state machine designs, and accounts for system verifications and measurements. As the live OCOM system and SDTM were unavailable by the time of testing, a series of emulated testing cases was set up to evaluate the digital systems developed. The BERT system was evaluated by checking the bit error rate of a stranded wire connection and a coaxial cable. Analog square and sine wave signals were used to evaluate the performance and accuracy of the FFT in the DSP system. Both systems were functionally verified using the Altera SignalTap II Logic Analyzer. Analysis of the measurement results for the testing cases indicates that the BERT works well at clock frequencies of 50 and 125 MHz, and that the coaxial cable is more suitable for data transmission as it gives a lower bit error rate than the stranded wire. The DSP system was verified to work well at a clock frequency of 62.5 MHz, and is able to sample any waveform at a sampling frequency of 62.5 MHz and continuously gets, at maximum, 14-bit wide digital signals. The sample point lengths for FFT are 64, 512 and 1024, and the data transfer rate between the FPGA and the computer reaches 115200 baud. In conclusion, the developed BERT and DSP system can be used to support the OCOM and the SDTM hardware, respectively. fpga BERT DSP 8b10b
2	Towards a Malware Language for Use with BERT Transformer—An Approach Using API Call Sequences Owens, Joshua 23 August 2022 (has links) No description available. Computer Science Malware BERT
3	Extractive Text Summarization of Norwegian News Articles Using BERT Biniam, Thomas Indrias, Morén, Adam January 2021 (has links) Extractive text summarization has over the years been an important research area in Natural Language Processing. Numerous methods have been proposed for extracting information from text documents. Recent works have shown great success for English summarization tasks by fine-tuning the language model BERT using large summarization datasets. However, less research has been made for low-resource languages. This work contributes by investigating how BERT can be used for Norwegian text summarization. Two models are developed by applying a modified BERT architecture, called BERTSum, on pre-trained Norwegian and Multilingual BERT. The results are models able to predict key sentences from articles to generate bullet-point summaries. These models are evaluated with the automatic metric ROUGE and in this evaluation, the Multilingual BERT model outperforms the Norwegian model. The multilingual model is further evaluated in a human evaluation by journalists, revealing that the generated summaries are not entirely satisfactory in some aspects. With some improvements, the model shows to be a valuable tool for journalists to edit and rewrite generated summaries, saving time and workload. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> extractive text summarization NLP deep learning BERT BERTSum Multilingual BERT Norwegian BERT transformer Norwegian news articles Computer Sciences Datavetenskap (datalogi)
4	A Study of Transformer Models for Emotion Classification in Informal Text Esperanca, Alvaro Soares de Boa 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Textual emotion classification is a task in affective AI that branches from sentiment analysis and focuses on identifying emotions expressed in a given text excerpt. It has a wide variety of applications that improve human-computer interactions, particularly to empower computers to understand subjective human language better. Significant research has been done on this task, but very little of that research leverages one of the most emotion-bearing symbols we have used in modern communication: Emojis. In this thesis, we propose several transformer-based models for emotion classification that processes emojis as input tokens and leverages pretrained models and uses them , a model that processes Emojis as textual inputs and leverages DeepMoji to generate affective feature vectors used as reference when aggregating different modalities of text encoding. To evaluate ReferEmo, we experimented on the SemEval 2018 and GoEmotions datasets, two benchmark datasets for emotion classification, and achieved competitive performance compared to state-of-the-art models tested on these datasets. Notably, our model performs better on the underrepresented classes of each dataset. NLP Deep Learning Emotion Classification BERT Emojis
5	Leverage Fusion of Sentiment Features and Bert-based Approach to Improve Hate Speech Detection Cheng, Kai Hsiang 23 June 2022 (has links) Social media has become an important place for modern people to conveniently share and exchange their ideas and opinions. However, not all content on the social media have positive impact. Hate speech is one kind of harmful content that people use abusive speech attacking or promoting hate towards a specific group or an individual. With online hate speech on the rise these day, people have explored ways to automatically recognize the hate speech, and among the ways people have studied, the Bert-based approach is promising and thus dominates SemEval-2019 Task 6, a hate speech detection competition. In this work, the method of fusion of sentiment features and Bert-based approach is proposed. The classic Bert architecture for hate speech detection is modified to fuse with additional sentiment features, provided by an extractor pre-trained on Sentiment140. The proposed model is compared with top-3 models in SemEval-2019 Task 6 Subtask A and achieves 83.1% F1 score that better than the models in the competition. Also, to see if additional sentiment features benefit the detectoin of hate speech, the features are fused with three kind of deep learning architectures respectively. The results show that the models with sentiment features perform better than those models without sentiment features. / Master of Science / Social media has become an important place for modern people to conveniently share and exchange their ideas and opinions. However, not all content on the social media have positive impact. Hate speech is one kind of harmful content that people use abusive speech attacking or promoting hate towards a specific group or an individual. With online hate speech on the rise these day, people have explored ways to automatically recognize the hate speech, and among the ways people have studied, Bert is one of promising approach for automatic hate speech recognition. Bert is a kind of deep learning model for natural language processing (NLP) that originated from Transformer developed by Google in 2017. The Bert has applied to many NLP tasks and achieved astonished results such as text classification, semantic similarity between pairs of sentences, question answering with given paragraph, and text summarization. So in this study, Bert will be adopted to learn the meaning of given text and distinguish the hate speech from tons of tweets automatically. In order to let Bert better capture hate speech, the approach in this work modifies Bert to take additional source of sentiment-related features for learning the pattern of hate speech, given that the emotion will be negative when people trying to put out abusive speech. For evaluation of the approach, our model is compared against those in SemEval-2019 Task 6, a famous hate speech detection competition, and the results show that the proposed model achieves 83.1\% F1 score better than the models in the competition. Also, to see if additional sentiment features benefit the detection of hate speech, the features are fused with three different kinds of deep learning architectures respectively, and the results show that the models with sentiment features perform better than those without sentiment features. hate speech detection sentiment features BERT
6	What the BERT? : Fine-tuning KB-BERT for Question Classification / Vad i BERT? : Finjustering av KB-BERT för frågeklassificering Cervall, Jonatan January 2021 (has links) This work explores the capabilities of KB-BERT on the downstream task of Question Classification. The TREC data set for Question Classification with the Li and Roth taxonomy was translated to Swedish, by manually correcting the output of Google’s Neural Machine Translation. 500 new data points were added. The fine-tuned model was compared with a similarly trained model based on Multilingual BERT, a human evaluation, and a simple rule-based baseline. Out of the four methods of this work, the Swedish BERT model (SwEAT- BERT) performed the best, achieving 91.2% accuracy on TREC-50 and 96.2% accuracy on TREC-6. The performance of the human evaluation was worse than both BERT models, but doubt is cast on how fair this comparison is. SwEAT-BERTs results are competitive even when compared to similar models based on English BERT. This furthers the notion that the only roadblock in training language models for smaller languages is the amount of readily available training data. / Detta arbete utforskar hur bra den svenska BERT-modellen, KB-BERT, är på frågeklassificering. BERT är en transformermodell som skapar kontextuella, bidirektionella ordinbäddningar. Det engelska datasetet för frågeklassificering, TREC, översattes till svenska och utökades med 500 nya datapunkter. Två BERT-modeller finjusterades på detta nya TREC-dataset, en baserad på KB-BERT och en baserad på Multilingual BERT, en flerspråkig variant av BERT tränad på data från 104 språk (däribland svenska). En regel-baserad modell byggdes som en nedre gräns på problemet, och en mänsklig klassificeringsstudie utfördes som jämförelse. BERT-modellen baserad på KB-BERT (SwEAT-BERT) uppnådde 96.2% korrekthet på TREC med 6 kategorier, och 91.2% korrekthet på TREC med 50 kategorier. Den mänskliga klassificeringen uppnådde sämre resultat än båda BERT-modellerna, men det är tvivelaktigt hur rättvis denna jämförelse är. SwEAT-BERT presterade bäst av metoderna som testades i denna studie, och konkurrenskraftigt i jämförelse med engelska BERT-modeller finjusterade på det engelska TREC-datasetet. Detta resultat stärker uppfattningen att tillgänglighet till träningsdata är det enda som står i vägen för starkare språkmodeller för mindre språk. BERT KB-BERT Question Classification TREC Li and Roth Taxonomy BERT KB-BERT Frågeklassificering TREC Li och Roth Taxonomi Computer and Information Sciences Data- och informationsvetenskap
7	Re-ranking search results with KB-BERT / Omrankning av sökresultat med KB-BERT Viðar Kristjánsson, Bjarki January 2022 (has links) This master thesis aims to determine if a Swedish BERT model can improve a BM25 search by re-ranking the top search results. We compared a standard BM25 search algorithm with a more complex algorithm composed of a BM25 search followed by re-ranking the top 10 results by a BERT model. The BERT model used is KB-BERT, a publicly available neural network model built by the National Library of Sweden. We fine-tuned this model to solve the specific task of evaluating the relevancy of search results. A new Swedish search evaluation dataset was automatically generated from Wikipedia text to compare the algorithms. The search evaluation dataset is a standalone product and can be beneficial for evaluating other search algorithms on Swedish text in the future. The comparison of the two algorithms resulted in a slightly better ranking for the BERT re-ranking algorithm. These results align with similar studies using an English BERT and an English search evaluation dataset. / Denna masteruppsats syftar till att avgöra om en svensk BERT-modell kan förbättra en BM25-sökning genom att ranka om de bästa sökresultaten. Vi jämförde en standard BM25-sökalgoritm med en mer komplex algoritm som består av en BM25-sökning följt av omrankning av de 10 bästa resultaten med en BERT-modell. BERT-modellen som används är KB-BERT, en allmänt tillgänglig neural nätverksmodell byggd av Kungliga biblioteket. Vi finjusterade den här modellen för att lösa den specifika uppgiften att utvärdera sökresultatens relevans. En ny svensk datamängd för utvärdering av sökresultat genererades automatiskt från Wikipedia-text för att jämföra algoritmerna. Datamängden är en fristående produkt och kan vara till nytta för att utvärdera andra sökalgoritmer på svensk text i framtiden. Jämförelsen av de två algoritmerna resulterade i en något bättre rankning för BERT-omrankningsalgoritmen. Dessa resultat överensstämmer med liknande studier som använder en engelsk BERT och en engelsk datamängd för utvärdering av sökresultat. Natural language processing Information retrieval BERT KB-BERT Search evaluation Naturlig språkbehandling Informationssökning BERT KB-BERT Sökutvärdering Computer Sciences Datavetenskap (datalogi)
8	Period Drama : Punctuation restoration in Swedish through fine- tuned KB-BERT / Dags att sätta punkt : Återställning av skiljetecken genom finjusterad KB-BERT Sinderwing, John January 2021 (has links) Presented here is a method for automatic punctuation restoration in Swedish using a BERT model. The method is based on KB-BERT, a publicly available, neural network language model pre-trained on a Swedish corpus by National Library of Sweden. This model has then been fine-tuned for this specific task using a corpus of government texts. With a lower-case and unpunctuated Swedish text as input, the model is supposed to return a grammatically correct punctuated copy of the text as output. A successful solution to this problem brings benefits for an array of NLP domains, such as speech-to-text and automated text. Only the punctuation marks period, comma and question marks were considered for the project, due to a lack of data for more rare marks such as semicolon. Additionally, some marks are somewhat interchangeable with the more common, such as exclamation points and periods. Thus, the data set had all exclamation points replaced with periods. The fine-tuned Swedish BERT model, dubbed prestoBERT, achieved an overall F1-score of 78.9. The proposed model scored similarly to international counterparts, with Hungarian and Chinese models obtaining F1-scores of 82.2 and 75.6 respectively. As further comparison, a human evaluation case study was carried out. The human test group achieved an overall F1-score of 81.7, but scored substantially worse than prestoBERT on both period and comma. Inspecting output sentences from the model and humans show satisfactory results, despite the difference in F1-score. The disconnect seems to stem from an unnecessary focus on replicating the exact same punctuation used in the test set, rather than providing any of the number of correct interpretations. If the loss function could be rewritten to reward all grammatically correct outputs, rather than only the one original example, the performance could improve significantly for both prestoBERT and the human group. / Här presenteras en metod för automatisk återinföring av skiljetecken på svenska med hjälp av ett neuralt nätverk i formen av en BERT-modell. Metoden bygger på KB-BERT, en allmänt tillgänglig språkmodell, tränad på ett svensk korpus, av Kungliga Biblioteket. Denna modell har sedan finjusterats för den här specifika uppgiften med hjälp av ett korpus av offentliga texter från landsting och dylikt. Med svensk text utan versaler och skiljetecken som inmatning, ska modellen returnera en kopia av texten där korrekta skiljetecken har placerats ut på rätta platser. En framgångsrik modell ger fördelar för en rad domäner inom neurolingvistisk programmering, såsom tal- till- texttranskription och automatiserad textgenerering. Endast skiljetecknen punkt, kommatecken och frågetecken tas i beaktande i projektet på grund av en brist på data för de mer sällsynta skiljetecknen såsom semikolon. Dessutom är vissa skiljetecken någorlunda utbytbara mot de vanligaste tre, såsom utropstecken mot punkt. Således har datasetets alla utropstecken ersatts med punkter. Den finjusterade svenska BERT-modellen, kallad prestoBERT, fick en övergripande F1-poäng på 78,9. De internationella motsvarande modellerna för ungerska och kinesiska fick en övergripande F1-poäng på 82,2 respektive 75,6. Det tyder på att prestoBERT är på en liknande nivå som toppmoderna motsvarigheter. Som ytterligare jämförelse genomfördes en fallstudie med mänsklig utvärdering. Testgruppen uppnådde en övergripande F1-poäng på 81,7, men presterade betydligt sämre än prestoBERT på både punkt och kommatecken. Inspektion av utdata från modellen och människorna visar tillfredsställande resultat från båda, trots skillnaden i F1-poäng. Skillnaden verkar härstamma från ett onödigt fokus på att replikera exakt samma skiljetecken som används i indatan, snarare än att återge någon av de många korrekta tolkningar som ofta finns. Om loss-funktionen kunde skrivas om för att belöna all grammatiskt korrekt utdata, snarare än bara originalexemplet, skulle prestandan kunna förbättras avsevärt för både prestoBERT såväl som den mänskliga gruppen. NLP Transformer punctuation restoration BERT KB-BERT machine learning neural network NLP Transformer återinföring av skiljetecken BERT KB-BERT maskininlärning neurala nätverk Computer and Information Sciences Data- och informationsvetenskap
9	Who is Doing What: Tracing and Understanding the Contributions in Collaborative Software DevelopmentProjects Nimér, Ebba, Pesjak, Emma January 2024 (has links) Context - In the fast-paced world of software development, understanding and tracking contributions within project teams is crucial for efficient project management and collaboration. Git, a popular Version Control System, facilitates collaboration but lacks comprehensive tools for analyzing individual contributions in detail. Objective - This thesis proposes an approach to classify and analyze Git commit messages and the associated file paths of the changed files in the commits, using Natural Language Processing (NLP) techniques, aiming to improve project transparency and contributor recognition. Method - By employing Bidirectional Encoder Representations from Transformers (BERT) models, an NLP technique, this study categorizes data from multiple collected Git repositories. A tool named DevAnalyzer is developed to automate the classification and analysis process, enhancing the understanding of contribution patterns. Results - The Git commit message model demonstrated high accuracy with an average of 98.9%, and the file path model showed robust performance with an average accuracy of 99.8%. Thereby, both models provided detailed insights into the types and locations of contributions within projects. Conclusions - The findings validate the effectiveness of using BERT models for classifying and categorizing both Git commit messages and file paths with the DevAnalyzer. This approach provides a more comprehensive understanding of contributions, benefiting project management and team collaboration. Transformers BERT NLP Git Software Engineering Programvaruteknik
10	Klasifikace textu s omezeným množstvím dat / Low-resource Text Classification Szabó, Adam January 2021 (has links) The aim of the thesis is to evaluate Czech text classification tasks in the low-resource settings. We introduce three datasets, two of which were publicly available and one was created partly by us. This dataset is based on contracts provided by the web platform Hlídač Státu. It has most of the data annotated automatically and only a small part manually. Its distinctive feature is that it contains long contracts in the Czech language. We achieve outstanding results with the proposed model on publicly available datasets, which confirms the sufficient performance of our model. In addition, we performed ex- perimental measurements of noisy data and of various amounts of data needed to train the model on these publicly available datasets. On the contracts dataset, we focused on selecting the right part of each contract and we studied with which part we can get the best result. We have found that for a dataset that contains some systematic errors due to automatic annotation, it is more advantageous to use a shorter but more relevant part of the contract for classification than to take a longer text from the contract and rely on BERT to learn correctly. 1

Search results