Global ETD Search

711	Categorization of Customer Reviews Using Natural Language Processing / Kategorisering av kundrecensioner med naturlig språkbehandling Liliemark, Adam, Enghed, Viktor January 2021 (has links) Databases of user generated data can quickly become unmanageable. Klarna faced this issue, with a database of around 700,000 customer reviews. Ideally, the database would be cleaned of uninteresting reviews and the remaining reviews categorized. Without knowing what categories might emerge, the idea was to use an unsupervised clustering algorithm to ﬁnd categories. This thesis describes the work carried out to solve this problem, and proposes a solution for Klarna that involves artiﬁcial neural networks rather than unsupervised clustering. The implementation done by us is able to categorize reviews as either interesting or uninteresting. We propose a workﬂow that would create means to categorize reviews not only in these two categories, but in multiple. The method revolved around experimentation with clustering algorithms and neural networks. Previous research shows that texts can be clustered, however, the datasets used seem to be vastly diﬀerent from the Klarna dataset. The Klarna dataset consists of short reviews and contain a large amount of uninteresting reviews. Using unsupervised clustering yielded unsatisfactory results, as no discernible categories could be found. In some cases, the technique created clusters of uninteresting reviews. These clusters were used as training data for an artiﬁcial neural network, together with manually labeled interesting reviews. The results from this artiﬁcial neural network was satisfactory; it can with an accuracy of around 86% say whether a review is interesting or not. This was achieved using the aforementioned clusters and ﬁve feedback loops, where the model’s wrongfully predicted reviews from an evaluation dataset was fed back to it as training data. We argue that the main reason behind why unsupervised clustering failed is that the length of the reviews are too short. In comparison, other researchers have successfully clustered text data with an average length in the hundreds. These items pack much more features than the short reviews in the Klarna dataset. We show that an artiﬁcial neural network is able to detect these features despite the short length, through its intrinsic design. Further research in feature extraction of short text strings could provide means to cluster this kind of data. If features can be extracted, the clustering can thus be done on the features rather than the actual words. Our artiﬁcial neural network shows that the arbitrary features interesting and uninteresting can be extracted, so we are hopeful that future researchers will ﬁnd ways of extracting more features from short text strings. In theory, this should mean that text of all lengths can be clustered unsupervised. / Databaser med användargenererad data kan snabbt bli ohanterbara. Klarna stod inför detta problem, med en databas innehållande cirka 700 000 recensioner från kunder. De såg helst att databasen skulle rensas från ointressanta recensioner och att de kvarvarande kategoriseras. Eftersom att kategorierna var okända initialt, var tanken att använda en oövervakad grupperingsalgoritm. Denna rapport beskriver det arbete som utfördes för att lösa detta problem, och föreslår en lösning till Klarna som involverar artiﬁciella neurala nätverk istället för oövervakad gruppering. Implementationen skapad av oss är kapabel till att kategorisera recensioner som intressanta eller ointressanta. Vi föreslår ett arbetsﬂöde som skulle skapa möjlighet att kategorisera recensioner inte bara i dessa två kategorier, utan i ﬂera. Metoden kretsar kring experimentering med grupperingsalgoritmer och artiﬁciella neurala nätverk. Tidigare forskning visar att texter kan grupperas oövervakat, dock med ingångsdata som väsentligt skiljer sig från Klarnas data. Recensionerna i Klarnas data är generellt sett korta och en stor andel av dem kan ses som ointressanta. Oövervakad grupperingen gav otillräckliga resultat, då inga skönjbara kategorier stod att ﬁnna. I vissa fall skapades grupperingar av ointressanta recensioner. Dessa användes som träningsdata för ett artiﬁciellt neuralt nätverk. Till träningsdatan lades intressanta recensioner som tagits fram manuellt. Resultaten från detta var positivt; med en träﬀsäkerhet om cirka 86% avgörs om en recension är intressant eller inte. Detta uppnåddes genom den tidigare skapade träningsdatan samt fem återkopplingsprocesser, där modellens felaktiga prediktioner av evalueringsdata matades in som träningsdata. Vår uppfattning är att den korta längden på recensionerna gör att den oövervakade grupperingen inte fungerar. Andra forskare har lyckats gruppera textdata med snittlängder om hundratals ord per text. Dessa texter rymmer ﬂer meningsfulla enheter än de korta recensionerna i Klarnas data. Det ﬁnns lösningar som innefattar artiﬁciella neurala nätverk å andra sidan kan upptäcka dessa meningsfulla enheter, tack vare sin grundläggande utformning. Vårt arbete visar att ett artiﬁciellt neuralt nätverk kan upptäcka dessa meningsfulla enheter, trots den korta längden per recension. Extrahering av meningsfulla enheter ur korta texter är ett ¨ämne som behöver mer forskning för att underlätta problem som detta. Om meningsfulla enheter kan extraheras ur texter, kan grupperingen göras på dessa enheter istället för orden i sig. Vårt artiﬁciella neurala nätverk visar att de arbiträra enheterna intressant och ointressant kan extraheras, vilket gör oss hoppfulla om att framtida forskare kan ﬁnna sätt att extrahera ﬂer enheter ur korta texter. I teorin innebär detta att texter av alla längder kan grupperas oövervakat. Machine Learning Natural Language Processing Unsupervised Clustering Artificial Neural Network Text Categorization Maskininlärning Natural Language Processing Naturlig Språkbehandling Oövervakad Gruppering Artiﬁciella Neurala Nätverk Textkategorisering Computer and Information Sciences Data- och informationsvetenskap
712	Mining Biomedical Literature to Extract Pharmacokinetic Drug-Drug Interactions Karnik, Shreyas 03 February 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Polypharmacy is a general clinical practice, there is a high chance that multiple administered drugs will interfere with each other, such phenomenon is called drug-drug interaction (DDI). DDI occurs when drugs administered change each other's pharmacokinetic (PK) or pharmacodynamic (PD) response. DDIs in many ways affect the overall effectiveness of the drug or at some times pose a risk of serious side effects to the patients thus, it becomes very challenging to for the successful drug development and clinical patient care. Biomedical literature is rich source for in-vitro and in-vivo DDI reports and there is growing need to automated methods to extract the DDI related information from unstructured text. In this work we present an ontology (PK ontology), which defines annotation guidelines for annotation of PK DDI studies. Using the ontology we have put together a corpora of PK DDI studies, which serves as excellent resource for training machine learning, based DDI extraction algorithms. Finally we demonstrate the use of PK ontology and corpora for extracting PK DDIs from biomedical literature using machine learning algorithms. Bioinformatics Natural Language Processing Machine Learning Pharmacokinetics Information Extraction Polypharmacy -- Information services Drug interactions -- Research Pharmacokinetics Patients -- Care -- Research Machine learning Bioinformatics -- Research Ontology Algorithms -- Research Drug development -- Research Data mining -- Research Medical informatics
713	Advanced natural language processing and temporal mining for clinical discovery Mehrabi, Saeed 17 August 2015 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / There has been vast and growing amount of healthcare data especially with the rapid adoption of electronic health records (EHRs) as a result of the HITECH act of 2009. It is estimated that around 80% of the clinical information resides in the unstructured narrative of an EHR. Recently, natural language processing (NLP) techniques have offered opportunities to extract information from unstructured clinical texts needed for various clinical applications. A popular method for enabling secondary uses of EHRs is information or concept extraction, a subtask of NLP that seeks to locate and classify elements within text based on the context. Extraction of clinical concepts without considering the context has many complications, including inaccurate diagnosis of patients and contamination of study cohorts. Identifying the negation status and whether a clinical concept belongs to patients or his family members are two of the challenges faced in context detection. A negation algorithm called Dependency Parser Negation (DEEPEN) has been developed in this research study by taking into account the dependency relationship between negation words and concepts within a sentence using the Stanford Dependency Parser. The study results demonstrate that DEEPEN, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. Additionally, an NLP system consisting of section segmentation and relation discovery was developed to identify patients' family history. To assess the generalizability of the negation and family history algorithm, data from a different clinical institution was used in both algorithm evaluations. Deep learning Family history Natural language processing Negation Pancreatic cancer Temporal pattern discovery Medical records -- Data processing Forms management Electronic records -- Access control Computational linguistics Data mining
714	A Smart Patent Monitoring Assistant : Using Natural Language Processing / Ett smart verktyg för patentövervakning baserat på natural language processing Fsha Nguse, Selemawit January 2022 (has links) Patent monitoring is about tracking the upcoming inventions in a particular field, predicting future trends, and specific intellectual property rights of interest. It is the process of finding relevant patents on a particular topic based on a specific query. With patent monitoring, one can keep them updated on the new technology in the market. Also, they can find potential licensing opportunities for their inventions. The outputs of patent monitoring are essential for companies, academics, and inventors looking forward to using the latest patents that can enhance further innovation. Nevertheless, there is no widely accepted best approach to patent monitoring. Usually, most patent monitoring systems are based on complex search and find, often leading to insignificant hit rates and highly human intervention. As the number of patents published each year increases massively and with patents being critical to accelerating innovation, the current approach to patent monitoring has two main drawbacks. Firstly, human-driven patent monitoring is time consuming and expensive process. In addition, there is a risk of overlooking interesting documents due to inadequate searching tools and processes, which could cost companies fortunes while at the same time hindering further innovation and creativity. This thesis presents a smart patent monitoring assistant tool that applies natural language processing. The use of several natural language processing methods is investigated to find, classify and rank relevant documents. The tool was trained on a dataset that contains the title, abstract, and claims of patent documents. Given a dataset of patent documents, the aim of this thesis is to create a tool that can classify patents into two classes relevant and not relevant. Furthermore, the tool can rank documents based on relevancy. The evaluation result of the tool gave satisfying results when it came to receiving the expected patents. In addition, there is a significant improvement in terms of performance for memory usage and the time it took to train the model and get results. / Patentövervakning handlar om att övervaka kommande uppfinningar, förutsäga framtida trender, eller specifika immateriella rättigheter och används för att hitta relevanta patent inom ett visst område. Med patentövervakning är det möjligt att hålla patent uppdaterade enligt den senaste tekniken på marknaden samt att hitta potentiella möjligheter att licensiera innehavda patent till tredje part. Målgruppen för patentövervakning är företag, akademiker, och uppfinnare som vill hitta de senaste patenten för att uppnå maximal innovation. Dock finns det ingen generell metod för att bedriva patentövervakning. Vanligtvis används komplexa sökmetoder som resulterar i undermåliga resultat och kräver manuellt ingripande. I och med att andelen patent ökar varje år har nuvarande metod två huvudsakliga nackdelar. Till att börja med är mänsklig patentövervakning en tidskrävande och dyr process. Vidare är det en betydande risk att missa viktiga eller på andra sätt intressanta dokument till följd av en bristande sökprocess. Detta kan möjligtvis resultera i att företag missar stora möjligheter samt utebliven innovation och kreativitet. Detta arbete presenterar ett smart verktyg för patentövervakning baserat på natural language processing. Vi analyserar användningen av ett flertal processer för att hitta, klassificera, och rangordna relevant dokument. Verktyget tränades på ett dataset som innehåller patentets titel, abstrakt, och vad patentet gör anspråk på. Givet ett godtyckligt dataset är målet med detta arbete att utveckla ett verktyg med förmågan att klassificera relevanta och icke-relevanta patent samt rangordna dessa utifrån relevans. Resultatet visar att verktyget gav tillfredsställande gällande att hitta önskvärda patent. Vidare uppnåddes en signifikant förbättring när det gäller prestanda för minnesanvändning och tiden som krävs för att träna modeller och erhålla resultat. Patent monitoring natural language processing machine learning patent classification relevance ranking Patentövervakning natural language processing maskininlärning klassificering av patent relevansranking Computer and Information Sciences Data- och informationsvetenskap
715	Generative Adversarial Networks and Natural Language Processing for Macroeconomic Forecasting / Generativt motstridande nätverk och datorlingvistik för makroekonomisk prognos Evholt, David, Larsson, Oscar January 2020 (has links) Macroeconomic forecasting is a classic problem, today most often modeled using time series analysis. Few attempts have been made using machine learning methods, and even fewer incorporating unconventional data, such as that from social media. In this thesis, a Generative Adversarial Network (GAN) is used to predict U.S. unemployment, beating the ARIMA benchmark on all horizons. Furthermore, attempts at using Twitter data and the Natural Language Processing (NLP) model DistilBERT are performed. While these attempts do not beat the benchmark, they do show promising results with predictive power. The models are also tested at predicting the U.S. stock index S&P 500. For these models, the Twitter data does improve the accuracy and shows the potential of social media data when predicting a more erratic index with less seasonality that is more responsive to current trends in public discourse. The results also show that Twitter data can be used to predict trends in both unemployment and the S&P 500 index. This sets the stage for further research into NLP-GAN models for macroeconomic predictions using social media data. / Makroekonomiska prognoser är sedan länge en svår utmaning. Idag löses de oftast med tidsserieanalys och få försök har gjorts med maskininlärning. I denna uppsats används ett generativt motstridande nätverk (GAN) för att förutspå amerikansk arbetslöshet, med resultat som slår samtliga riktmärken satta av en ARIMA. Ett försök görs också till att använda data från Twitter och den datorlingvistiska (NLP) modellen DistilBERT. Dessa modeller slår inte riktmärkena men visar lovande resultat. Modellerna testas vidare på det amerikanska börsindexet S&P 500. För dessa modeller förbättrade Twitterdata resultaten vilket visar på den potential data från sociala medier har när de appliceras på mer oregelbunda index, utan tydligt säsongsberoende och som är mer känsliga för trender i det offentliga samtalet. Resultaten visar på att Twitterdata kan användas för att hitta trender i både amerikansk arbetslöshet och S&P 500 indexet. Detta lägger grunden för fortsatt forskning inom NLP-GAN modeller för makroekonomiska prognoser baserade på data från sociala medier. Machine learning natural language processing generative adversarial nets GAN LSTM CNN macroeconomics S&P500 unemployment forecasting Machine learning natural language processing generative adversarial nets GAN LSTM CNN macroeconomics S&P500 unemployment forecasting Mathematics Matematik
716	Multi-Class Emotion Classification for Interactive Presentations : A case study on how emotional sentiment analysis can help end users better convey intended emotion Andersson, Charlotte January 2022 (has links) Mentimeter is one of the fastest-growing startups in Sweden. They are an audience engagement platform that allows users to create interactive presentations and engage an audience. As online information spreads increasingly faster, methods of analyzing, understanding, and categorizing information are developing and improving rapidly. Natural Language Processing (NLP) is the ability to break down input, for instance, text or audio, and process it using technologies such as computational linguistics and statistical learning, machine learning, and deep learning models. This thesis aimed to investigate if a tool that applies multi-class emotion classification of text could benefit end users when they are creating presentations using Mentimeter. A case study was conducted where a pre-trained BERT base model that had been fine-tuned and trained to the GoEmotions data set was applied as a tool to Mentimeter’s presentation software and then evaluated by end users. The results found that the tool was accurate; however, overall was not helpful for end users. For future research, improvements such as including emotions/tones that are more related to presentations would make the tool more applicable to presentations and would be helpful according to end users. / Mentimeter är en av Sveriges snabbast växande startupbolag som erbjuder en tjänst där användare kan skapa interaktiva presenationer och engagera sin publik. Medan infomration online sprids allt snabbare utvecklas och förbättras metoder för att kunna analysera, förstå och kategorisera information. Natural Language Processing (NLP) är förmågan att kunna bryta ner indata, som text och ljud, och processera det med hjälp av teknologier som datalingvistik och statistisk inlärnings, maskininlärnings, och djupinlärnings modeller. Syftet med denna uppsats var att undersöka om ett verktyg som applicerar multi-class emotion classification med text skulle gynna användare när de skapar presentation med Mentimeter. En fallstudie utfördes där en förtränad BERT modell som hade finjusterats och tränats på GoEmotions dataset applicerades som ett verktyg på Mentimeters programvara som användare sen fick utvärdera. Resultaten visar att verktyget var motsvarande men övergripande fann användarna att verktyget inte var hjälpsamt. För framtida forskning skulle förbättringar av verktyget som att använda känslor/toner som är mer relterade till presentationer göra verktyget mer hjälpsamt enligt användare. Interactive Presentations Audience Engagement Platform Emotion Prediction Natural Language Processing Text Classification Sentiment Analysis BERT Case Study Interaktiva Presenationer Publikengagemangsplattform Förutsägelse av Känslor Natural Language Processing Textklassificering Attitydanalys BERT Fallstudie Computer and Information Sciences Data- och informationsvetenskap
717	Detecting PowerShell Obfuscation Techniques using Natural Language Processing / Detektering av obfuskeringstekniker för PowerShell med hälp av Natural Language Processing Klasmark, Jacob January 2022 (has links) PowerShell obfuscation is often used to avoid getting detected by Anti Virus programs. There are several different techniques to change a PowerShell script and still perform the same tasks. Detecting these obfuscated files is a good addition in order to detect malicious files. Identifying the specific technique used can also be beneficial for an analyst tasked with investigating the detected files. In order to detect these different techniques we are using Natural Language Processing with the idea that each technique will be sort of like a unique language that can be detected. We tried several different models and iterations of data processing and ended up using a Random Forest Classifier and achieved a detection accuracy of 98%. / PowerShell obfuskering används ofta för att undvika att bli upptäckt av Antivirusprogram. Det finns flera olika tekniker för att förändra ett PowerShell script me ändå behålla dess funktionalitet. Att detektera dessa obfuskerade filer är ett bra tillägg för att identifiera skadliga filer. Identifiering av den specifika tekniken som används kan vara en hjälp för analytiker som har som uppgift att utreda den identifierade filen. För att detektera dessa tekniker använder vi Natural Language Processing med idén att varje teknik på något sätt kommer se ut som ett eget språk som då kan detekteras. Vi provade flera olika modeller och kom fram till att Random Forest Classifier presterade bäst med en träffsäkerhet på 98%. Obfuscation detection PowerShell Natural Language Processing Machine Learning Security Operations Center Obfuskeringsdetektion PowerShell Natural Language Processing Maskinlärning Security Operations Center Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik Computer and Information Sciences Data- och informationsvetenskap
718	Prerequisites for Extracting Entity Relations from Swedish Texts Lenas, Erik January 2020 (has links) Natural language processing (NLP) is a vibrant area of research with many practical applications today like sentiment analyses, text labeling, questioning an- swering, machine translation and automatic text summarizing. At the moment, research is mainly focused on the English language, although many other lan- guages are trying to catch up. This work focuses on an area within NLP called information extraction, and more specifically on relation extraction, that is, to ex- tract relations between entities in a text. What this work aims at is to use machine learning techniques to build a Swedish language processing pipeline with part-of- speech tagging, dependency parsing, named entity recognition and coreference resolution to use as a base for later relation extraction from archival texts. The obvious difficulty lies in the scarcity of Swedish annotated datasets. For exam- ple, no large enough Swedish dataset for coreference resolution exists today. An important part of this work, therefore, is to create a Swedish coreference solver using distantly supervised machine learning, which means creating a Swedish dataset by applying an English coreference solver on an unannotated bilingual corpus, and then using a word-aligner to translate this machine-annotated En- glish dataset to a Swedish dataset, and then training a Swedish model on this dataset. Using Allen NLP:s end-to-end coreference resolution model, both for creating the Swedish dataset and training the Swedish model, this work achieves an F1-score of 0.5. For named entity recognition this work uses the Swedish BERT models released by the Royal Library of Sweden in February 2020 and achieves an overall F1-score of 0.95. To put all of these NLP-models within a single Lan- guage Processing Pipeline, Spacy is used as a unifying framework. / Natural Language Processing (NLP) är ett stort och aktuellt forskningsområde idag med många praktiska tillämpningar som sentimentanalys, textkategoriser- ing, maskinöversättning och automatisk textsummering. Forskningen är för när- varande mest inriktad på det engelska språket, men många andra språkområ- den försöker komma ikapp. Det här arbetet fokuserar på ett område inom NLP som kallas informationsextraktion, och mer specifikt relationsextrahering, det vill säga att extrahera relationer mellan namngivna entiteter i en text. Vad det här ar- betet försöker göra är att använda olika maskininlärningstekniker för att skapa en svensk Language Processing Pipeline bestående av part-of-speech tagging, de- pendency parsing, named entity recognition och coreference resolution. Denna pipeline är sedan tänkt att användas som en bas for senare relationsextrahering från svenskt arkivmaterial. Den uppenbara svårigheten med detta ligger i att det är ont om stora, annoterade svenska dataset. Till exempel så finns det inget till- räckligt stort svenskt dataset för coreference resolution. En stor del av detta arbete går därför ut på att skapa en svensk coreference solver genom att implementera distantly supervised machine learning, med vilket menas att använda en engelsk coreference solver på ett oannoterat engelskt-svenskt corpus, och sen använda en word-aligner för att översätta detta maskinannoterade engelska dataset till ett svenskt, och sen träna en svensk coreference solver på detta dataset. Det här arbetet använder Allen NLP:s end-to-end coreference solver, både för att skapa det svenska datasetet, och för att träna den svenska modellen, och uppnår en F1-score på 0.5. Vad gäller named entity recognition så använder det här arbetet Kungliga Bibliotekets BERT-modeller som bas, och uppnår genom detta en F1- score på 0.95. Spacy används som ett enande ramverk för att samla alla dessa NLP-komponenter inom en enda pipeline. Machine Learning Natural Language Processing Relation Extraction Named Entity Recognition Coreference resolution BERT Maskininlärning Natural Language Processing Relationsextrahering Named Entity Recognition Coreference resolution BERT Computer and Information Sciences Data- och informationsvetenskap
719	NATURAL LANGUAGE PROCESSING-BASED AUTOMATED INFORMATION EXTRACTION FROM BUILDING CODES TO SUPPORT AUTOMATED COMPLIANCE CHECKING Xiaorui Xue (13171173) 29 July 2022 (has links) <p> </p> <p>Traditional manual code compliance checking process is a time-consuming, costly, and error-prone process that has many shortcomings (Zhang & El-Gohary, 2015). Therefore, automated code compliance checking systems have emerged as an alternative to traditional code compliance checking. However, computer software cannot directly process regulatory information in unstructured building code texts. To support automated code compliance checking, building codes need to be transformed to a computer-processable, structured format. In particular, the problem that most automated code compliance checking systems can only check a limited number of building code requirements stands out.</p> <p>The transformation of building code requirements into a computer-processable, structured format is a natural language processing (NLP) task that requires highly accurate part-of-speech (POS) tagging results on building codes beyond the state of the art. To address this need, this dissertation research was conducted to provide a method to improve the performance of POS taggers by error-driven transformational rules that revise machine-tagged POS results. The proposed error-driven transformational rules fix errors in POS tagging results in two steps. First, error-driven transformational rules locate errors in POS tagging by their context. Second, error-driven transformational rules replace the erroneous POS tag with the correct POS tag that is stored in the rule. A dataset of POS tagged building codes, namely the Part-of-Speech Tagged Building Codes (PTBC) dataset (Xue & Zhang, 2019), was published in the Purdue University Research Repository (PURR). Testing on the dataset illustrated that the method corrected 71.00% of errors in POS tagging results for building codes. As a result, the POS tagging accuracy on building codes was increased from 89.13% to 96.85%.</p> <p>This dissertation research was conducted to provide a new POS tagger that is tailored to building codes. The proposed POS tagger utilized neural network models and error-driven transformational rules. The neural network model contained a pre-trained model and one or more trainable neural layers. The neural network model was trained and fine-tuned on the PTBC (Xue & Zhang, 2019) dataset, which was published in the Purdue University Research Repository (PURR). In this dissertation research, a high-performance POS tagger for building codes using one bidirectional Long-short Term Memory (LSTM) Recurrent Neural Network (RNN) trainable layer, a BERT-Cased-Base pre-trained model, and 50 epochs of training was discovered. This model achieved 91.89% precision without error-driven transformational rules and 95.11% precision with error-driven transformational rules, outperforming the otherwise most advanced POS tagger’s 89.82% precision on building codes in the state of the art.</p> <p>Other automated information extraction methods were also developed in this dissertation. Some automated code compliance checking systems represented building codes in logic clauses and used pattern matching-based rules to convert building codes from natural language text to logic clauses (Zhang & El-Gohary 2017). A ruleset expansion method that can expand the range of checkable building codes of such automated code compliance checking systems by expanding their pattern matching-based ruleset was developed in this dissertation research. The ruleset expansion method can guarantee: (1) the ruleset’s backward compatibility with the building codes that the ruleset was already able to process, and (2) forward compatibility with building codes that the ruleset may need to process in the future. The ruleset expansion method was validated on Chapters 5 and 10 of the International Building Code 2015 (IBC 2015). The Chapter 10 of IBC 2015 was used as the training dataset and the Chapter 5 of the IBC 2015 was used as the testing dataset. A gold standard of logic clauses was published in the Logic Clause Representation of Building Codes (LCRBC) dataset (Xue & Zhang, 2021). Expanded pattern matching-based rules were published in the dissertation (Appendix A). The expanded ruleset increased the precision, recall, and f1-score of the logic clause generation at the predicate-level by 10.44%, 25.72%, and 18.02%, to 95.17%, 96.60%, and 95.88%, comparing to the baseline ruleset, respectively. </p> <p>Most of the existing automated code compliance checking research focused on checking regulatory information that was stored in textual format in building code in text. However, a comprehensive automated code compliance checking process should be able to check regulatory information stored in other parts, such as, tables. Therefore, this dissertation research was conducted to provide a semi-automated information extraction and transformation method for tabular information processing in building codes. The proposed method can semi-automatically detect the layouts of tables and store the extracted information of a table in a database. Automated code compliance checking systems can then query the database for regulatory information in the corresponding table. The algorithm’s initial implementation accurately processed 91.67 % of the tables in the testing dataset composed of tables in Chapter 10 of IBC 2015. After iterative upgrades, the updated method correctly processed all tables in the testing dataset. </p> Construction engineering Natural language processing techniques Artificial Intelligence (AI) Automated compliance checking Automated information extraction Natural language processing Part-of-speech tagging Building design review Machine Learning
720	Καρδία in the New Testament and Other Ancient Greek Literature : Using a Corpus Approach to Investigate the Semantics of καρδία Against the Backdrop of New Testament Lexicography Möller, Gustaf January 2024 (has links) The semantics of New Testament words is a complex subject as these words often have backgrounds consisting of a usage in both extrabiblical Greek literature and the Septuagint, in extension also being the object of Hebraic influence. Καρδία, often translated ”heart”, is no exception. In some Greek literature, the organ is referred to literally, but in the New Testament, καρδία is exclusively used figuratively. Another layer of complexity is added when the nature of this figurative usage is considered, as it includes aspects of cognition, volition, morality, and more. In this thesis, I studied how καρδία is used in the New Testament in comparison to the Septuagint, investigating the existing notion of a “biblical usage” of the word. This usage was then compared to its usage in periods ranging from 800–270 BCE, further exploring the existence of a distinct biblical usage but from a diachronic perspective. For this study, I adopted an interdisciplinary approach inspired by computational and corpora linguistics, dedicating a substantial part of this thesis to evaluating the approach within the field of New Testament lexicography. Its usage in the New Testament and the Septuagint was found to be similar, and I was able to propose some areas where this similarity became the most evident. This biblical usage of καρδία was not found to share much similarity with its usage in extrabiblical literature, with a biblical “moral” and “theological” usage standing out as being the main points of contrast. For the purposes of New Testament lexicography, the approach was found beneficial regarding the collection of evidence, although some issues will need to be further investigated. Καρδία Heart New Testament New Testament lexicography Septuagint Ancient Greek literature Lexical semantics Computational linguistics Natural Language Processing Word2Vec Καρδία Hjärta Nya testamentet Nytestamentlig lexikografi Septuaginta Antik grekisk litteratur Lexikal semantik Datorlingvistik Natural Language Processing Word2Vec Religious Studies Religionsvetenskap

Search results