Global ETD Search

261	Comments as reviews: Predicting answer acceptance by measuring sentiment on stack exchange William Chase Ledbetter IV (12261440) 16 June 2023 (has links) <p>Online communication has increased the need to rapidly interpret complex emotions due to the volatility of the data involved; machine learning tasks that process text, such as sentiment analysis, can help address this challenge by automatically classifying text as positive, negative, or neutral. However, while much research has focused on detecting offensive or toxic language online, there is also a need to explore and understand the ways in which people express positive emotions and support for one another in online communities. This is where sentiment dictionaries and other computational methods can be useful, by analyzing the language used to express support and identifying common patterns or themes.</p> <p><br></p> <p>This research was conducted by compiling data from social question and answering around machine learning on the site Stack Exchange. Then a classification model was constructed using binary logistic regression. The objective was to discover whether predictions of marked solutions are accurate by treating the comments as reviews. Measuring collaboration signals may help capture the nuances of language around support and assistance, which could have implications for how people understand and respond to expressions of help online. By exploring this topic further, researchers can gain a more complete understanding of the ways in which people communicate and connect online.</p> Natural language processing Affective computing natural language processing affective computing relational database systems question and answer polarity determination
262	Skadligt innehåll på nätet - Toxiskt språk på TikTok Wester, Linn, Stenvall, Elin January 2024 (has links) Toxiskt språk på internet och det som ofta i vardagliga termer benämns som näthat innefattar kränkningar, hot och stötande språk. Toxiskt språk är särskilt märkbart på sociala medier. Det går att upptäcka toxiskt språk på internet med hjälp av maskininlärning som automatiskt känner igen typiska särdrag för toxiskt språk. Tidigare svensk forskning har undersökt förekomsten av toxiskt språk på sociala medier med hjälp av maskininlärning, men det saknas fortfarande forskning på den allt mer populära plattformen TikTok. Syftet med denna studie är att undersöka förekomsten och särdragen av toxiska kommentarer på TikTok med hjälp av maskininlärning och manuella metoder. Studien är menad att ge en bättre förståelse för vad unga möts av i kommentarerna på TikTok. Studien applicerar en mixad metod i en dokumentundersökning av 69 895 kommentarer. Maskininlärningsmodellen Hatescan användes för att automatiskt klassificera sannolikheten att toxiskt språk förekommer i kommentarerna. Utifrån denna sannolikhet analyserades ett urval av kommentarerna manuellt, vilket ledde till både kvantitativa och kvalitativa fynd. Resultatet av studien visade att omfattningen av toxiskt språk var relativt liten, där 0,24% av 69 895 kommentarer ansågs vara toxiska enligt en både automatiserad och manuell bedömning. Den typ av toxiskt språk som mest förekom i undersökningen visades vara obscent språk, som till majoriteten innehöll svordomar. / Toxic language on the internet and what is often referred to in everyday terms as cyberbullying includes insults, threats and offensive language. Toxic language is particularly noticeable on social media. It is possible to detect toxic language on the internet with the help of machine learning in the form of, among other things, Natural Language Processing (NLP) techniques, which automatically recognize typical characteristics of toxic language. Previous Swedish research has investigated the presence of toxic language on social media using machine learning, but there is still a lack of research on the increasingly popular platform TikTok. Through the study, the authors intend to investigate the prevalence and characteristics of toxic comments on TikTok using both a machine learning technique and manual methods. The study is meant to provide a better understanding of what young people encounter in the comments on TikTok. The study applies a mixed method in a document survey of 69 895 comments. Hatescan was used to automatically classify the likelihood of toxic language appearing in the comments. Based on this probability, a section of the comments could be sampled and manually analysed using theory, leading to both quantitative and qualitative findings. The results of the study showed that the prevalence of toxic language was relatively small, with 0.24% of 69 895 comments considered toxic based on an automatic and manual analysis. The type of toxic language that occurred the most in the study was shown to be obscene language, the majority of which contained swear words. TikTok Machine Learning Natural Language Processing Toxic language TikTok Maskininlärning Natural Language Processing Toxiskt språk Information Systems
263	ENHANCING ELECTRONIC HEALTH RECORDS SYSTEMS AND DIAGNOSTIC DECISION SUPPORT SYSTEMS WITH LARGE LANGUAGE MODELS Furqan Ali Khan (19203916) 26 July 2024 (has links) <p dir="ltr">Within Electronic Health Record (EHR) Systems, physicians face extensive documentation, leading to alarming mental burnout. The disproportionate focus on data entry over direct patient care underscores a critical concern. Integration of Natural Language Processing (NLP) powered EHR systems offers relief by reducing time and effort in record maintenance.</p><p dir="ltr">Our research introduces the Automated Electronic Health Record System, which not only transcribes dialogues but also employs advanced clinical text classification. With an accuracy exceeding 98.97%, it saves over 90% of time compared to manual entry, as validated on MIMIC III and MIMIC IV datasets.</p><p dir="ltr">In addition to our system's advancements, we explore integration of Diagnostic Decision Support System (DDSS) leveraging Large Language Models (LLMs) and transformers, aiming to refine healthcare documentation and improve clinical decision-making. We explore the advantages, like enhanced accuracy and contextual understanding, as well as the challenges, including computational demands and biases, of using various LLMs.</p> Natural language processing EHR systems BERT for biomedical data NLP-Natural Language Processing Speech to text text classification model
264	Neural-Symbolic Modeling for Natural Language Discourse Maria Leonor Pacheco Gonzales (12480663) 13 May 2022 (has links) <p>Language “in the wild” is complex and ambiguous and relies on a shared understanding of the world for its interpretation. Most current natural language processing methods represent language by learning word co-occurrence patterns from massive amounts of linguistic data. This representation can be very powerful, but it is insufficient to capture the meaning behind written and spoken communication. </p> <p> </p> <p>In this dissertation, I will motivate neural-symbolic representations for dealing with these challenges. On the one hand, symbols have inherent explanatory power, and they can help us express contextual knowledge and enforce consistency across different decisions. On the other hand, neural networks allow us to learn expressive distributed representations and make sense of large amounts of linguistic data. I will introduce a holistic framework that covers all stages of the neural-symbolic pipeline: modeling, learning, inference, and its application for diverse discourse scenarios, such as analyzing online discussions, mining argumentative structures, and understanding public discourse at scale. I will show the advantages of neural-symbolic representations with respect to end-to-end neural approaches and traditional statistical relational learning methods.</p> <p><br></p> <p>In addition to this, I will demonstrate the advantages of neural-symbolic representations for learning in low-supervision settings, as well as their capabilities to decompose and explain high-level decision. Lastly, I will explore interactive protocols to help human experts in making sense of large repositories of textual data, and leverage neural-symbolic representations as the interface to inject expert human knowledge in the process of partitioning, classifying and organizing large language resources. </p> Natural language processing neural-symbolic neuro-symbolic structured prediction statistical relational learning natural language processing discourse analysis argumentation mining stance prediction conversation analysis moral foundation theory opinion mining Natural Language Processing
265	Modelovanje i pretraživanje nad nestruktuiranim podacima i dokumentima u e-Upravi Republike Srbije / Modeling and searching over unstructured data and documents in e-Government of the Republic of Serbia Nikolić Vojkan 27 September 2016 (has links) <p>Danas, servisi e-Uprave u različitim oblastima koriste question answer sisteme koncepta u poku&scaron;aju da se razume tekst i da pomognu građanima u dobijanju odgovora na svoje upite u bilo koje vreme i veoma brzo. Automatsko mapiranje relevantnih dokumenata se ističe kao važna aplikacija za automatsku strategiju klasifikacije: upit-dokumenta. Ova doktorska disertacija ima za cilj doprinos u identifikaciji nestruktuiranih dokumenata i predstavlja važan korak ka razja&scaron;njavanju uloge eksplicitnih koncepata u pronalaženju podataka uop&scaron;te ajče&scaron; a reprezenta vna &scaron;ema u tekstualnoj kategorizaciji je BoW pristup, kada je u pozadini veliki skup znanja. Ova disertacija uvodi novi pristup ka stvaranju koncepta zasnovanog na tekstualnoj prezantaciji i primeni kategorizacije teksta, kako bi se stvorile definisane klase u slučaju sažetih tekstualnih dokumenata Takođe, ovde je prikazan algoritam zasnovan na klasifikaciji, modelovan za upite koji odgovaraju temi. Otežavaju a okolnost u slučaju ovog koncepta, koji prezentuje termine sa visokom frekvencijom pojavljivanja u upitma, zasniva se na sličnostima u prethodno definisanim klasama dokumenata Rezultati eksperimenta iz oblasti Krivičnog zakonika Republike Srbije, u ovom slučaju i studija, pokazuju da prezentacija teksta zasnovana na konceptu ima zadovoljavaju e rezultate i u slučaju kada ne postoji rečnik za datu oblast.</p> / <p>Nowadays, the concept of Question Answering Systems (QAS) has been used by e-government services in various fields as an attempt to understand the text and help citizens in getting answers to their questions promptly and at any time. Automatic mapping of relevant documents stands out as an important application for automatic classification strategy: query-document. This doctoral thesis aims to contribute to identification of unstructured documents and represents an important step towards clarifying the role of explicit concepts within Information Retrieval in general. The most common scheme in text categorization is BoW approach, especially when, as a basis, we have a large set of knowledge. This thesis introduces a new approach to the creation of text presentation based concept and applying text categorization, with the aim to create a defined class in case of compressed text documents.Also, this paper discusses the classification based algorithm modeled for queries that suit the theme. What makes the situation more complicated is the fact that this concept is based on the similarities in previously defined classes of documents and terms with a high frequency of appearance presented in queries. The results of the experiment in the field of the Criminal Code, and this paper as well, show that the text presentation based concept has satisfactory results even in case where there is no vocabulary for certain field.</p>
266	Automatic summarization of mouse gene information for microarray analysis by functional gene clustering and ranking of sentences in MEDLINE abstracts : a dissertation Yang, Jianji 06 1900 (has links) (PDF) Ph.D. / Medical Informatics and Clinical Epidemiology / Tools to automatically summarize gene information from the literature have the potential to help genomics researchers better interpret gene expression data and investigate biological pathways. Even though several useful human-curated databases of information about genes already exist, these have significant limitations. First, their construction requires intensive human labor. Second, curation of genes lags behind the rapid publication rate of new research and discoveries. Finally, most of the curated knowledge is limited to information on single genes. As such, most original and up-to-date knowledge on genes can only be found in the immense amount of unstructured, free text biomedical literature. Genomic researchers frequently encounter the task of finding information on sets of differentially expressed genes from the results of common highthroughput technologies like microarray experiments. However, finding information on a set of genes by manually searching and scanning the literature is a time-consuming and daunting task for scientists. For example, PubMed, the first choice of literature research for biologists, usually returns hundreds of references for a search on a single gene in reverse chronological order. Therefore, a tool to summarize the available textual information on genes could be a valuable tool for scientists. In this study, we adapted automatic summarization technologies to the biomedical domain to build a query-based, task-specific automatic summarizer of information on mouse genes studied in microarray experiments - mouse Gene Information Clustering and Summarization System (GICSS). GICSS first clusters a set of differentially expressed genes by Medical Subject Heading (MeSH), Gene Ontology (GO), and free text features into functionally similar groups;next it presents summaries for each gene as ranked sentences extracted from MEDLINE abstracts, with the ranking emphasizing the relation between genes, similarity to the function cluster it belongs to, and recency. GICSS is available as a web application with links to the PubMed (www.pubmed.gov) website for each extracted sentence. It integrates two related steps, functional gene clustering and gene information gathering, of the microarray data analysis process. The information from the clustering step was used to construct the context for summarization. The evaluation of the system was conducted with scientists who were analyzing their real microarray datasets. The evaluation results showed that GICSS can provide meaningful clusters for real users in the genomic research area. In addition, the results also indicated that presenting sentences in the abstract can provide more important information to the user than just showing the title in the default PubMed format. Both domain-specific and non-domain-specific terminologies contributed in the informative sentences selection. Summarization may serve as a useful tool to help scientists to access information at the time of microarray data analysis. Further research includes setting up the automatic update of MEDLINE records; extending and fine-tuning of the feature parameters for sentence scoring using the available evaluation data; and expanding GICSS to incorporate textual information from other species. Finally, dissemination and integration of GICSS into the current workflow of the microarray analysis process will help to make GICSS a truly useful tool for the targeted users, biomedical genomics researchers.
267	Dokumentenbasierte Steuerung von Geschäftsprozessen Reichelt, Dominik 10 October 2014 (has links) (PDF) Geschäftsprozesse im Verwaltungs- und Dienstleistungsbereich werden häufig durch den Eingang von Dokumenten angestoßen. Hierfür ist es unerlässlich, dass sie den richtigen Mitarbeiter im Unternehmen oder der Organisation erreichen. Oftmals sind jedoch dem externen Sender die internen Organisationsstrukturen nicht klar, so dass eine zentrale Stelle angeschrieben wird. Diese muss dann das Dokument, basierend auf seinem Inhalt, an die zuständigen Kollegen weiterleiten. Dies kann beträchtlichen personellen Aufwand mit sich bringen. In der Forschungsarbeit wird ein System entwickelt, das diese Aufgabe maschinell erfüllen soll. Hierzu werden verschiedenartige Klassifikationsverfahren erprobt und hinsichtlich ihrer Verlässlichkeit beurteilt. Weiterhin werden Verbesserungen gegenüber gängigen maschinellen Verfahren angestrebt. Dokumentenmanagement Klassifikationsverfahren Geschäftsprozesse Natural Language Processing Geschäftsprozessanalyse document management classification methods business processes natural language processing business process analysis ddc:378 rvk:AL 59872 rvk:AL 59878 rvk:NZ 14420
268	The mat sat on the cat : investigating structure in the evaluation of order in machine translation McCaffery, Martin January 2017 (has links) We present a multifaceted investigation into the relevance of word order in machine translation. We introduce two tools, DTED and DERP, each using dependency structure to detect differences between the structures of machine-produced translations and human-produced references. DTED applies the principle of Tree Edit Distance to calculate edit operations required to convert one structure into another. Four variants of DTED have been produced, differing in the importance they place on words which match between the two sentences. DERP represents a more detailed procedure, making use of the dependency relations between words when evaluating the disparities between paths connecting matching nodes. In order to empirically evaluate DTED and DERP, and as a standalone contribution, we have produced WOJ-DB, a database of human judgments. Containing scores relating to translation adequacy and more specifically to word order quality, this is intended to support investigations into a wide range of translation phenomena. We report an internal evaluation of the information in WOJ-DB, then use it to evaluate variants of DTED and DERP, both to determine their relative merit and their strength relative to third-party baselines. We present our conclusions about the importance of structure to the tools and their relevance to word order specifically, then propose further related avenues of research suggested or enabled by our work.
269	An Evaluation of NLP Toolkits for Information Quality Assessment Karlin, Ievgen January 2012 (has links) Documentation is often the first source, which can help user to solve problems or provide conditions of use of some product. That is why it should be clear and understandable. But what does “understandable” mean? And how to detect whether some text is unclear? And this thesis can answer on those questions.The main idea of current work is to measure clarity of the text information using natural language processing capabilities. There are three global steps to achieve this goal: to define criteria of bad clarity of text information, to evaluate different natural language toolkits and find suitable for us, and to implement a prototype system that, given a text, measures text clarity.Current thesis project is planned to be included to VizzAnalyzer (quality analysis tool, which processes information on structure level) and its main task is to perform a clarity analysis of text information extracted by VizzAnalyzer from different XML-files. Natural language processing analysis information quality clarity guidelines natural language processing toolkits graph format Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
270	Dokumentenbasierte Steuerung von Geschäftsprozessen Reichelt, Dominik January 2014 (has links) Geschäftsprozesse im Verwaltungs- und Dienstleistungsbereich werden häufig durch den Eingang von Dokumenten angestoßen. Hierfür ist es unerlässlich, dass sie den richtigen Mitarbeiter im Unternehmen oder der Organisation erreichen. Oftmals sind jedoch dem externen Sender die internen Organisationsstrukturen nicht klar, so dass eine zentrale Stelle angeschrieben wird. Diese muss dann das Dokument, basierend auf seinem Inhalt, an die zuständigen Kollegen weiterleiten. Dies kann beträchtlichen personellen Aufwand mit sich bringen. In der Forschungsarbeit wird ein System entwickelt, das diese Aufgabe maschinell erfüllen soll. Hierzu werden verschiedenartige Klassifikationsverfahren erprobt und hinsichtlich ihrer Verlässlichkeit beurteilt. Weiterhin werden Verbesserungen gegenüber gängigen maschinellen Verfahren angestrebt. info:eu-repo/classification/ddc/378 ddc:378

Search results