• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 943
  • 156
  • 74
  • 56
  • 27
  • 23
  • 18
  • 13
  • 10
  • 9
  • 8
  • 7
  • 5
  • 5
  • 4
  • Tagged with
  • 1622
  • 1622
  • 1622
  • 626
  • 573
  • 469
  • 387
  • 376
  • 271
  • 256
  • 246
  • 230
  • 221
  • 212
  • 208
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
261

An NLP-based framework for early identification of design reliability issues from heterogeneous automotive lifecycle data

Uglanov, Alexey, Campean, Felician, Abdullatiff, Amr R.A., Neagu, Ciprian Daniel, Doikin, Alexandr, Delaux, David, Bonnaud, P. 04 August 2024 (has links)
Yes / Natural Language Processing is increasingly used in different areas of design and product development with varied objectives, from enhancing productivity to embedding resilience into systems. In this paper, we introduce a framework that draws on NLP algorithms and expert knowledge for the automotive engineering domain, to extract actionable insight for system reliability improvement from data available from the operational phase of the system. Specifically, we are looking at the systematic exploration and exploitation of automotive heterogeneous data sources, including both closed-source (such as warranty records) and open-source (e.g., social networks, chatrooms, recall systems) data, to extract and classify information about faults, with predictive capability for early detection of issues. We present a preliminary NLP-based framework for enhancing system knowledge representation to increase the effectiveness and robustness of information extraction from data, and discuss the temporal alignment of data sources and insight to improve prediction ability. We demonstrate the effectiveness of the proposed framework using real-world automotive data in a recall study for a vehicle lighting system and a particular manufacturer: four recall campaigns were identified leading to corrective actions by the warranty experts.
262

Event Centric Approaches in Natural Language Processing / 自然言語処理におけるイベント中心のアプローチ

Huang, Yin Jou 26 July 2021 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23438号 / 情博第768号 / 新制||情||131(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 黒橋 禎夫, 教授 河原 達也, 教授 伊藤 孝行 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
263

JOKE RECOMMENDER SYSTEM USING HUMOR THEORY

Soumya Agrawal (9183053) 29 July 2020 (has links)
<p>The fact that every individual has a different sense of humor and it varies greatly from one person to another means that it is a challenge to learn any individual’s humor preferences. Humor is much more than just a source of entertainment; it is an essential tool that aids communication. Understanding humor preferences can lead to improved social interactions and bridge existing social or economic gaps.</p><p> </p><p>In this study, we propose a methodology that aims to develop a recommendation system for jokes by analyzing its text. Various researchers have proposed different theories of humor depending on their area of focus. This exploratory study focuses mainly on Attardo and Raskin’s (1991) General Theory of Verbal Humor and implements the knowledge resources defined by it to annotate the jokes. These annotations contain the characteristics of the jokes and also play an important role in determining how alike these jokes are. We use Lin’s similarity metric (Lin, 1998) to computationally capture this similarity. The jokes are clustered in a hierarchical fashion based on their similarity values used for the recommendation. We also compare our joke recommendations to those obtained by the Eigenstate algorithm (Goldberg, Roeder, Gupta, & Perkins, 2001), an existing joke recommendation system that does not consider the content of the joke in its recommendation.</p>
264

Giant Pigeon and Small Person: Prompting Visually Grounded Models about the Size of Objects

Yi Zhang (12438003) 22 April 2022 (has links)
<p>Empowering machines to understand our physical world should go beyond models with only natural language and models with only vision. Vision and language is a growing field of study that attempts to bridge the gap between natural language processing and computer vision communities by enabling models to learn visually grounded language. However, as an increasing number of pre-trained visual linguistic models focus on the alignment between visual regions and natural language, it is difficult to claim that these models capture certain properties of objects in their latent space, such as size. Inspired by recent trends in prompt learning, this study will design a prompt learning framework for two visual linguistic models, ViLBERT and ViLT, and use different manually crafted prompt templates to evaluate the consistency of performance of these models in comparing the size of objects. The results of this study showed that ViLT is more consistent in prediction accuracy for the given task with six pairs of objects under four prompt designs. However, the overall prediction accuracy is lower than the expectation on this object size comparison task; even the better model in this study, ViLT, has only 16 out of 24 cases better than the proposed random chance baseline. As this study is a preliminary study to explore the potential of pre-trained visual linguistic models on object size comparison, there are many directions for future work, such as investigating more models, choosing more object pairs, and trying different methods for feature engineering and prompt engineering.</p>
265

Comments as reviews: Predicting answer acceptance by measuring sentiment on stack exchange

William Chase Ledbetter IV (12261440) 16 June 2023 (has links)
<p>Online communication has increased the need to rapidly interpret complex emotions due to the volatility of the data involved; machine learning tasks that process text, such as sentiment analysis, can help address this challenge by automatically classifying text as positive, negative, or neutral. However, while much research has focused on detecting offensive or toxic language online, there is also a need to explore and understand the ways in which people express positive emotions and support for one another in online communities. This is where sentiment dictionaries and other computational methods can be useful, by analyzing the language used to express support and identifying common patterns or themes.</p> <p><br></p> <p>This research was conducted by compiling data from social question and answering around machine learning on the site Stack Exchange. Then a classification model was constructed using binary logistic regression. The objective was to discover whether predictions of marked solutions are accurate by treating the comments as reviews. Measuring collaboration signals may help capture the nuances of language around support and assistance, which could have implications for how people understand and respond to expressions of help online. By exploring this topic further, researchers can gain a more complete understanding of the ways in which people communicate and connect online.</p>
266

Skadligt innehåll på nätet - Toxiskt språk på TikTok

Wester, Linn, Stenvall, Elin January 2024 (has links)
Toxiskt språk på internet och det som ofta i vardagliga termer benämns som näthat innefattar kränkningar, hot och stötande språk. Toxiskt språk är särskilt märkbart på sociala medier. Det går att upptäcka toxiskt språk på internet med hjälp av maskininlärning som automatiskt känner igen typiska särdrag för toxiskt språk. Tidigare svensk forskning har undersökt förekomsten av toxiskt språk på sociala medier med hjälp av maskininlärning, men det saknas fortfarande forskning på den allt mer populära plattformen TikTok. Syftet med denna studie är att undersöka förekomsten och särdragen av toxiska kommentarer på TikTok med hjälp av maskininlärning och manuella metoder. Studien är menad att ge en bättre förståelse för vad unga möts av i kommentarerna på TikTok. Studien applicerar en mixad metod i en dokumentundersökning av 69 895 kommentarer. Maskininlärningsmodellen Hatescan användes för att automatiskt klassificera sannolikheten att toxiskt språk förekommer i kommentarerna. Utifrån denna sannolikhet analyserades ett urval av kommentarerna manuellt, vilket ledde till både kvantitativa och kvalitativa fynd. Resultatet av studien visade att omfattningen av toxiskt språk var relativt liten, där 0,24% av 69 895 kommentarer ansågs vara toxiska enligt en både automatiserad och manuell bedömning. Den typ av toxiskt språk som mest förekom i undersökningen visades vara obscent språk, som till majoriteten innehöll svordomar. / Toxic language on the internet and what is often referred to in everyday terms as cyberbullying includes insults, threats and offensive language. Toxic language is particularly noticeable on social media. It is possible to detect toxic language on the internet with the help of machine learning in the form of, among other things, Natural Language Processing (NLP) techniques, which automatically recognize typical characteristics of toxic language. Previous Swedish research has investigated the presence of toxic language on social media using machine learning, but there is still a lack of research on the increasingly popular platform TikTok. Through the study, the authors intend to investigate the prevalence and characteristics of toxic comments on TikTok using both a machine learning technique and manual methods. The study is meant to provide a better understanding of what young people encounter in the comments on TikTok. The study applies a mixed method in a document survey of 69 895 comments. Hatescan was used to automatically classify the likelihood of toxic language appearing in the comments. Based on this probability, a section of the comments could be sampled and manually analysed using theory, leading to both quantitative and qualitative findings. The results of the study showed that the prevalence of toxic language was relatively small, with 0.24% of 69 895 comments considered toxic based on an automatic and manual analysis. The type of toxic language that occurred the most in the study was shown to be obscene language, the majority of which contained swear words.
267

ENHANCING ELECTRONIC HEALTH RECORDS SYSTEMS AND DIAGNOSTIC DECISION SUPPORT SYSTEMS WITH LARGE LANGUAGE MODELS

Furqan Ali Khan (19203916) 26 July 2024 (has links)
<p dir="ltr">Within Electronic Health Record (EHR) Systems, physicians face extensive documentation, leading to alarming mental burnout. The disproportionate focus on data entry over direct patient care underscores a critical concern. Integration of Natural Language Processing (NLP) powered EHR systems offers relief by reducing time and effort in record maintenance.</p><p dir="ltr">Our research introduces the Automated Electronic Health Record System, which not only transcribes dialogues but also employs advanced clinical text classification. With an accuracy exceeding 98.97%, it saves over 90% of time compared to manual entry, as validated on MIMIC III and MIMIC IV datasets.</p><p dir="ltr">In addition to our system's advancements, we explore integration of Diagnostic Decision Support System (DDSS) leveraging Large Language Models (LLMs) and transformers, aiming to refine healthcare documentation and improve clinical decision-making. We explore the advantages, like enhanced accuracy and contextual understanding, as well as the challenges, including computational demands and biases, of using various LLMs.</p>
268

Neural-Symbolic Modeling for Natural Language Discourse

Maria Leonor Pacheco Gonzales (12480663) 13 May 2022 (has links)
<p>Language “in the wild” is complex and ambiguous and relies on a shared understanding of the world for its interpretation. Most current natural language processing methods represent language by learning word co-occurrence patterns from massive amounts of linguistic data. This representation can be very powerful, but it is insufficient to capture the meaning behind written and spoken communication. </p> <p> </p> <p>In this dissertation, I will motivate neural-symbolic representations for dealing with these challenges. On the one hand, symbols have inherent explanatory power, and they can help us express contextual knowledge and enforce consistency across different decisions. On the other hand, neural networks allow us to learn expressive distributed representations and make sense of large amounts of linguistic data. I will introduce a holistic framework that covers all stages of the neural-symbolic pipeline: modeling, learning, inference, and its application for diverse discourse scenarios, such as analyzing online discussions, mining argumentative structures, and understanding public discourse at scale. I will show the advantages of neural-symbolic representations with respect to end-to-end neural approaches and traditional statistical relational learning methods.</p> <p><br></p> <p>In addition to this, I will demonstrate the advantages of neural-symbolic representations for learning in low-supervision settings, as well as their capabilities to decompose and explain high-level decision. Lastly, I will explore interactive protocols to help human experts in making sense of large repositories of textual data, and leverage neural-symbolic representations as the interface to inject expert human knowledge in the process of partitioning, classifying and organizing large language resources. </p>
269

Modelovanje i pretraživanje nad nestruktuiranim podacima i dokumentima u e-Upravi Republike Srbije / Modeling and searching over unstructured data and documents in e-Government of the Republic of Serbia

Nikolić Vojkan 27 September 2016 (has links)
<p>Danas, servisi e-Uprave u različitim oblastima koriste question answer sisteme koncepta u poku&scaron;aju da se razume tekst i da pomognu građanima u dobijanju odgovora na svoje upite u bilo koje vreme i veoma brzo. Automatsko mapiranje relevantnih dokumenata se ističe kao važna aplikacija za automatsku strategiju klasifikacije: upit-dokumenta. Ova doktorska disertacija ima za cilj doprinos u identifikaciji nestruktuiranih dokumenata i predstavlja važan korak ka razja&scaron;njavanju uloge eksplicitnih koncepata u pronalaženju podataka uop&scaron;te ajče&scaron; a reprezenta vna &scaron;ema u tekstualnoj kategorizaciji je BoW pristup, kada je u pozadini veliki skup znanja. Ova disertacija uvodi novi pristup ka stvaranju koncepta zasnovanog na tekstualnoj prezantaciji i primeni kategorizacije teksta, kako bi se stvorile definisane klase u slučaju sažetih tekstualnih dokumenata Takođe, ovde je prikazan algoritam zasnovan na klasifikaciji, modelovan za upite koji odgovaraju temi. Otežavaju a okolnost u slučaju ovog koncepta, koji prezentuje termine sa visokom frekvencijom pojavljivanja u upitma, zasniva se na sličnostima u prethodno definisanim klasama dokumenata Rezultati eksperimenta iz oblasti Krivičnog zakonika Republike Srbije, u ovom slučaju i studija, pokazuju da prezentacija teksta zasnovana na konceptu ima zadovoljavaju e rezultate i u slučaju kada ne postoji rečnik za datu oblast.</p> / <p>Nowadays, the concept of Question Answering Systems (QAS) has been used by e-government services in various fields as an attempt to understand the text and help citizens in getting answers to their questions promptly and at any time. Automatic mapping of relevant documents stands out as an important application for automatic classification strategy: query-document. This doctoral thesis aims to contribute to identification of unstructured documents and represents an important step towards clarifying the role of explicit concepts within Information Retrieval in general. The most common scheme in text categorization is BoW approach, especially when, as a basis, we have a large set of knowledge. This thesis introduces a new approach to the creation of text presentation based concept and applying text categorization, with the aim to create a defined class in case of compressed text documents.Also, this paper discusses the classification based algorithm modeled for queries that suit the theme. What makes the situation more complicated is the fact that this concept is based on the similarities in previously defined classes of documents and terms with a high frequency of appearance presented in queries. The results of the experiment in the field of the Criminal Code, and this paper as well, show that the text presentation based concept has satisfactory results even in case where there is no vocabulary for certain field.</p>
270

Automatic summarization of mouse gene information for microarray analysis by functional gene clustering and ranking of sentences in MEDLINE abstracts : a dissertation

Yang, Jianji 06 1900 (has links) (PDF)
Ph.D. / Medical Informatics and Clinical Epidemiology / Tools to automatically summarize gene information from the literature have the potential to help genomics researchers better interpret gene expression data and investigate biological pathways. Even though several useful human-curated databases of information about genes already exist, these have significant limitations. First, their construction requires intensive human labor. Second, curation of genes lags behind the rapid publication rate of new research and discoveries. Finally, most of the curated knowledge is limited to information on single genes. As such, most original and up-to-date knowledge on genes can only be found in the immense amount of unstructured, free text biomedical literature. Genomic researchers frequently encounter the task of finding information on sets of differentially expressed genes from the results of common highthroughput technologies like microarray experiments. However, finding information on a set of genes by manually searching and scanning the literature is a time-consuming and daunting task for scientists. For example, PubMed, the first choice of literature research for biologists, usually returns hundreds of references for a search on a single gene in reverse chronological order. Therefore, a tool to summarize the available textual information on genes could be a valuable tool for scientists. In this study, we adapted automatic summarization technologies to the biomedical domain to build a query-based, task-specific automatic summarizer of information on mouse genes studied in microarray experiments - mouse Gene Information Clustering and Summarization System (GICSS). GICSS first clusters a set of differentially expressed genes by Medical Subject Heading (MeSH), Gene Ontology (GO), and free text features into functionally similar groups;next it presents summaries for each gene as ranked sentences extracted from MEDLINE abstracts, with the ranking emphasizing the relation between genes, similarity to the function cluster it belongs to, and recency. GICSS is available as a web application with links to the PubMed (www.pubmed.gov) website for each extracted sentence. It integrates two related steps, functional gene clustering and gene information gathering, of the microarray data analysis process. The information from the clustering step was used to construct the context for summarization. The evaluation of the system was conducted with scientists who were analyzing their real microarray datasets. The evaluation results showed that GICSS can provide meaningful clusters for real users in the genomic research area. In addition, the results also indicated that presenting sentences in the abstract can provide more important information to the user than just showing the title in the default PubMed format. Both domain-specific and non-domain-specific terminologies contributed in the informative sentences selection. Summarization may serve as a useful tool to help scientists to access information at the time of microarray data analysis. Further research includes setting up the automatic update of MEDLINE records; extending and fine-tuning of the feature parameters for sentence scoring using the available evaluation data; and expanding GICSS to incorporate textual information from other species. Finally, dissemination and integration of GICSS into the current workflow of the microarray analysis process will help to make GICSS a truly useful tool for the targeted users, biomedical genomics researchers.

Page generated in 0.1098 seconds