Spelling suggestions: "subject:"batural anguage aprocessing (NLP)"" "subject:"batural anguage eprocessing (NLP)""
11 |
Propagation of online consumer-perceived negativity: Quantifying the effect of supply chain underperformance on passenger car salesSingh, A., Jenamani, M., Thakker, J.J., Rana, Nripendra P. 10 April 2021 (has links)
Yes / The paper presents a text analytics framework that analyses online reviews to explore how consumer-perceived negativity corresponding to the supply chain propagates over time and how it affects car sales. In particular, the framework integrates aspect-level sentiment analysis using SentiWordNet, time-series decomposition, and bias-corrected least square dummy variable (LSDVc) – a panel data estimator. The framework facilitates the business community by providing a list of consumers’ contemporary interests in the form of frequently discussed product attributes; quantifying consumer-perceived performance of supply chain (SC) partners and comparing the competitors; and a model assessing various firms’ sales performance. The proposed framework demonstrated to the automobile supply chain using a review dataset received from a renowned car-portal in India. Our findings suggest that consumer-voiced negativity is maximum for dealers and minimum for manufacturing and assembly related features. Firm age, GDP, and review volume significantly influence car sales whereas the sentiments corresponding to SC partners do not. The proposed research framework can help the manufacturers in inspecting their SC partners; realising consumer-cited critical car sales influencers; and accurately predicting the sales, which in turn can help them in better production planning, supply chain management, marketing, and consumer relationships.
|
12 |
Functional linguistic based motivations for a conversational software agentPanesar, Kulvinder 07 October 2020 (has links)
Yes / This chapter discusses a linguistically orientated model of a conversational software agent (CSA) (Panesar 2017) framework sensitive to natural language processing (NLP) concepts and the levels of adequacy of a functional linguistic theory (LT). We discuss the relationship between NLP and knowledge representation (KR), and connect this with the goals of a linguistic theory (Van Valin and LaPolla 1997), in particular Role and Reference Grammar (RRG) (Van Valin Jr 2005). We debate the advantages of RRG and consider its fitness and computational adequacy. We present a design of a computational model of the linking algorithm that utilises a speech act construction as a grammatical object (Nolan 2014a, Nolan 2014b) and the sub-model of belief, desire and intentions (BDI) (Rao and Georgeff 1995). This model has been successfully implemented in software, using the resource description framework (RDF), and we highlight some implementation issues that arose at the interface between language and knowledge representation (Panesar 2017).
|
13 |
Analysis of Security Findings and Reduction of False Positives through Large Language ModelsWagner, Jonas 18 October 2024 (has links)
This thesis investigates the integration of State-of-the-Art (SOTA) Large Language Models
(LLMs) into the process of reassessing security findings generated by Static Application
Security Testing (SAST) tools. The primary objective is to determine whether LLMs are
able to detect false positives (FPs) while maintaining a high true positive (TP) rate, thereby
enhancing the efficiency and effectiveness of security assessments.
Four consecutive experiments were conducted, each addressing specific research questions.
The initial experiment, using a dataset of security findings extracted from the OWASP Bench-
mark, identified the optimal combination of context items provided by the SAST tool Spot-
Bugs, which, when used with GPT-3.5 Turbo, reduced FPs while minimizing the loss of
TPs. The second experiment, conducted on the same dataset, demonstrated that advanced
prompting techniques, particularly few-shot Chain-of-Thought (CoT) prompting combined
with Self-Consistency (SC), further improved the reassessment process. The third experiment
compared both proprietary and open-source LLMs on an OWASP Benchmark dataset about
one-fourth the size of the previously used dataset. GPT-4o achieved the highest performance,
detecting 80 out of 128 FPs without missing any TPs, resulting in a perfect TPR of 100% and
a decrease in FPR by 41.27 percentage points. Meanwhile, Llama 3.1 70B detected 112 out
of the 128 FPs but missed 10 TPs, resulting in a TPR of 94.94% and a reduction in FPR by
56.62 percentage points. To validate these findings in a real-world context, the approach was
applied to a dataset generated from the open-source project Mnestix using multiple SAST
tools. GPT-4o again emerged as the top performer, detecting 26 out of 68 FPs while only
missing one TP, resulting in a TPR decreased by 2.22 percentage points but simultaneously
an FPR decreased 37.57 percentage points.:Table of Contents IV
List of Figures VI
List of Tables VIII
List of Source Codes IX
List of Abbreviations XI
1. Motivation 1
2. Background 3
3. Related Work 17
4. Concept 31
5. Preparing a Security Findings Dataset 39
6. Implementing a Workflow 51
7. Identifying Context Items 67
8. Comparing Prompting Techniques 85
9. Comparing Large Language Models 101
10.Evaluating Developed Approach 127
11.Discussion 141
12.Conclusion 145
A. Appendix: Figures 147
A.1. Repository Directory Tree 148
A.2. Precision-Recall Curve of Compared Large Language Models 149
A.3. Performance Metrics Self-Consistency on Mnestix Dataset 150
B. Appendix: Tables 151
B.1. Design Science Research Concept 151
C. Appendix: Code 153
C.1. Pydantic Base Config Documentation 153
C.2. Pydantic LLM Client Config Documentation 155
C.3. LLM BaseClient Class 157
C.4. Test Cases Removed From Dataset 158
|
14 |
Automatic Reconstruction of Itineraries from Descriptive Texts / Reconstruction automatique d’itinéraires à partir de textes descriptifsMoncla, Ludovic 03 December 2015 (has links)
Cette thèse s'inscrit dans le cadre du projet PERDIDO dont les objectifs sont l'extraction et la reconstruction d'itinéraires à partir de documents textuels. Ces travaux ont été réalisés en collaboration entre le laboratoire LIUPPA de l'université de Pau et des Pays de l'Adour (France), l'équipe IAAA de l'université de Saragosse (Espagne) et le laboratoire COGIT de l'IGN (France). Les objectifs de cette thèse sont de concevoir un système automatique permettant d'extraire, dans des récits de voyages ou des descriptions d’itinéraires, des déplacements, puis de les représenter sur une carte. Nous proposons une approche automatique pour la représentation d'un itinéraire décrit en langage naturel. Notre approche est composée de deux tâches principales. La première tâche a pour rôle d'identifier et d'extraire les informations qui décrivent l'itinéraire dans le texte, comme par exemple les entités nommées de lieux et les expressions de déplacement ou de perception. La seconde tâche a pour objectif la reconstruction de l'itinéraire. Notre proposition combine l'utilisation d'information extraites grâce au traitement automatique du langage ainsi que des données extraites de ressources géographiques externes (comme des gazetiers). L'étape d'annotation d'informations spatiales est réalisée par une approche qui combine l'étiquetage morpho-syntaxique et des patrons lexico-syntaxiques (cascade de transducteurs) afin d'annoter des entités nommées spatiales et des expressions de déplacement ou de perception. Une première contribution au sein de la première tâche est la désambiguïsation des toponymes, qui est un problème encore mal résolu en NER et essentiel en recherche d'information géographique. Nous proposons un algorithme non-supervisé de géo-référencement basé sur une technique de clustering capable de proposer une solution pour désambiguïser les toponymes trouvés dans les ressources géographiques externes, et dans le même temps proposer une estimation de la localisation des toponymes non référencés. Nous proposons un modèle de graphe générique pour la reconstruction automatique d'itinéraires, où chaque noeud représente un lieu et chaque segment représente un chemin reliant deux lieux. L'originalité de notre modèle est qu'en plus de tenir compte des éléments habituels (chemins et points de passage), il permet de représenter les autres éléments impliqués dans la description d'un itinéraire, comme par exemple les points de repères visuels. Un calcul d'arbre de recouvrement minimal à partir d'un graphe pondéré est utilisé pour obtenir automatiquement un itinéraire sous la forme d'un graphe. Chaque segment du graphe initial est pondéré en utilisant une méthode d'analyse multi-critère combinant des critères qualitatifs et des critères quantitatifs. La valeur des critères est déterminée à partir d'informations extraites du texte et d'informations provenant de ressources géographique externes. Par exemple, nous combinons les informations issues du traitement automatique de la langue comme les relations spatiales décrivant une orientation (ex: se diriger vers le sud) avec les coordonnées géographiques des lieux trouvés dans les ressources pour déterminer la valeur du critère "relation spatiale". De plus, à partir de la définition du concept d'itinéraire et des informations utilisées dans la langue pour décrire un itinéraire, nous avons modélisé un langage d'annotation d'information spatiale adapté à la description de déplacements, s'appuyant sur les recommendations du consortium TEI (Text Encoding and Interchange). Enfin, nous avons implémenté et évalué les différentes étapes de notre approche sur un corpus multilingue de descriptions de randonnées (Français, Espagnol et Italien). / This PhD thesis is part of the research project PERDIDO, which aims at extracting and retrieving displacements from textual documents. This work was conducted in collaboration with the LIUPPA laboratory of the university of Pau (France), the IAAA team of the university of Zaragoza (Spain) and the COGIT laboratory of IGN (France). The objective of this PhD is to propose a method for establishing a processing chain to support the geoparsing and geocoding of text documents describing events strongly linked with space. We propose an approach for the automatic geocoding of itineraries described in natural language. Our proposal is divided into two main tasks. The first task aims at identifying and extracting information describing the itinerary in texts such as spatial named entities and expressions of displacement or perception. The second task deal with the reconstruction of the itinerary. Our proposal combines local information extracted using natural language processing and physical features extracted from external geographical sources such as gazetteers or datasets providing digital elevation models. The geoparsing part is a Natural Language Processing approach which combines the use of part of speech and syntactico-semantic combined patterns (cascade of transducers) for the annotation of spatial named entities and expressions of displacement or perception. The main contribution in the first task of our approach is the toponym disambiguation which represents an important issue in Geographical Information Retrieval (GIR). We propose an unsupervised geocoding algorithm that takes profit of clustering techniques to provide a solution for disambiguating the toponyms found in gazetteers, and at the same time estimating the spatial footprint of those other fine-grain toponyms not found in gazetteers. We propose a generic graph-based model for the automatic reconstruction of itineraries from texts, where each vertex represents a location and each edge represents a path between locations. %, combining information extracted from texts and information extracted from geographical databases. Our model is original in that in addition to taking into account the classic elements (paths and waypoints), it allows to represent the other elements describing an itinerary, such as features seen or mentioned as landmarks. To build automatically this graph-based representation of the itinerary, our approach computes an informed spanning tree on a weighted graph. Each edge of the initial graph is weighted using a multi-criteria analysis approach combining qualitative and quantitative criteria. Criteria are based on information extracted from the text and information extracted from geographical sources. For instance, we compare information given in the text such as spatial relations describing orientation (e.g., going south) with the geographical coordinates of locations found in gazetteers. Finally, according to the definition of an itinerary and the information used in natural language to describe itineraries, we propose a markup langugage for encoding spatial and motion information based on the Text Encoding and Interchange guidelines (TEI) which defines a standard for the representation of texts in digital form. Additionally, the rationale of the proposed approach has been verified with a set of experiments on a corpus of multilingual hiking descriptions (French, Spanish and Italian).
|
15 |
Design und Implementierung eines Algorithmus zum maschinellen Lernen der Flexion eines Korpus deutscher SpracheMoritz, Julian 20 October 2017 (has links)
Die vorliegende Arbeit beschreibt das Design und die Implementierung eines Algorithmus zur Flexion. Es wird am Beispiel des Deutschen eine konkrete Implementierung entwickelt. Hierfür findet zunächst eine ausführliche Analyse der Flexion des Deutschen statt, bevor ein Verfahren erarbeitet wird, das sprachunabhängig ist und somit prinzipiell auf andere Sprachen übertragen werden kann. Die tatsächliche Machbarkeit des Verfahrens wird anhand von Beispielen nachgewiesen. Die hohe Komplexität der Aufgabe führt allerdings dazu, dass es in der Praxis zu Abstrichen bei der Qualität der flektierten Wortformen kommt. Dies ist insbesondere deswegen der Fall, da das entwickelte System auch ihm unbekannte Grundformen flektiert.
|
16 |
An AI-based System for Assisting Planners in a Supply Chain with Email CommunicationDantu, Sai Shreya Spurthi, Yadlapalli, Akhilesh January 2023 (has links)
Background: Communication plays a crucial role in supply chain management (SCM) as it facilitates the flow of information, materials, and goods across various stages of the supply chain. In the context of supply planning, each planner manages thousands of supply chain entities and spends a lot of time reading and responding to high volumes of emails related to part orders, delays, and backorders that can lead to information overload and hinder workflow and decision-making. Therefore, streamlining communication and enhancing email management are essential for optimizing supply chain efficiency. Objectives: This study aims to create an automated system that can summarize email conversations between planners, suppliers, and other stakeholders. The goal is to increase communication efficiency using Natural Language Processing (NLP) algorithms to extract important information from lengthy conversations. Additionally, the study will explore the effectiveness of using conditional random fields (CRF) to filter out irrelevant content during preprocessing. Methods: We chose four advanced pre-trained abstractive dialogue summarization models, BART, PEGASUS, T5, and CODS, and evaluation metrics, ROUGE and BERTScore, to compare their performance in effectively summarizing our email conversations. We used CRF to preprocess raw data from around 400 planner-supplier email conversations to extract important sentences in a dialogue format and label them with specific dialogue act tags. We then manually summarized the 400 conversations and fine-tuned the four chosen models. Finally, we evaluated the models using ROUGE and BERTScore metrics to determine their similarity to human references. Results: The results show that the performance of the summarization models has significantly improved after fine-tuning the models with domain-specific data. The BART model achieved the highest ROUGE-1 score of 0.65, ROUGE-L score of 0.56, and BERTScore of 0.95 compared to other models. Additionally, CRF-based preprocessing proved to be crucial in extracting essential information and minimizing unnecessary details for the summarization process. Conclusions: This study shows that advanced NLP techniques can make supply chain communication workflows more efficient. The BART-based email summarization tool that we created showed great potential in giving important insights and helping planners deal with information overload.
|
17 |
Natural language processing (NLP) in Artificial Intelligence (AI): a functional linguistic perspectivePanesar, Kulvinder 07 October 2020 (has links)
Yes / This chapter encapsulates the multi-disciplinary nature that facilitates
NLP in AI and reports on a linguistically orientated conversational software
agent (CSA) (Panesar 2017) framework sensitive to natural language processing
(NLP), language in the agent environment. We present a novel computational approach of using the functional linguistic theory of Role and Reference Grammar (RRG) as the linguistic engine. Viewing language as action, utterances
change the state of the world, and hence speakers and hearer’s mental state
change as a result of these utterances. The plan-based method of discourse
management (DM) using the BDI model architecture is deployed, to support a
greater complexity of conversation. This CSA investigates the integration,
intersection and interface of the language, knowledge, speech act constructions
(SAC) as a grammatical object, and the sub-model of BDI and DM for NLP. We
present an investigation into the intersection and interface between our
linguistic and knowledge (belief base) models for both dialogue management
and planning. The architecture has three-phase models: (1) a linguistic model based on RRG; (2) Agent Cognitive Model (ACM) with (a) knowledge representation model employing conceptual graphs (CGs) serialised to Resource Description Framework (RDF); (b) a planning model underpinned by BDI concepts and intentionality and rational interaction; and (3) a dialogue model employing common ground. Use of RRG as a linguistic engine for the CSA was successful. We identify the complexity of the semantic gap of internal representations with details of a conceptual bridging solution.
|
18 |
Automatic Text Ontological Representation and Classification via Fundamental to Specific Conceptual Elements (TOR-FUSE)Razavi, Amir Hossein 16 July 2012 (has links)
In this dissertation, we introduce a novel text representation method mainly used for text classification purpose. The presented representation method is initially based on a variety of closeness relationships between pairs of words in text passages within the entire corpus. This representation is then used as the basis for our multi-level lightweight ontological representation method (TOR-FUSE), in which documents are represented based on their contexts and the goal of the learning task. The method is unlike the traditional representation methods, in which all the documents are represented solely based on the constituent words of the documents, and are totally isolated from the goal that they are represented for. We believe choosing the correct granularity of representation features is an important aspect of text classification. Interpreting data in a more general dimensional space, with fewer dimensions, can convey more discriminative knowledge and decrease the level of learning perplexity. The multi-level model allows data interpretation in a more conceptual space, rather than only containing scattered words occurring in texts. It aims to perform the extraction of the knowledge tailored for the classification task by automatic creation of a lightweight ontological hierarchy of representations. In the last step, we will train a tailored ensemble learner over a stack of representations at different conceptual granularities. The final result is a mapping and a weighting of the targeted concept of the original learning task, over a stack of representations and granular conceptual elements of its different levels (hierarchical mapping instead of linear mapping over a vector). Finally the entire algorithm is applied to a variety of general text classification tasks, and the performance is evaluated in comparison with well-known algorithms.
|
19 |
Automatic Text Ontological Representation and Classification via Fundamental to Specific Conceptual Elements (TOR-FUSE)Razavi, Amir Hossein 16 July 2012 (has links)
In this dissertation, we introduce a novel text representation method mainly used for text classification purpose. The presented representation method is initially based on a variety of closeness relationships between pairs of words in text passages within the entire corpus. This representation is then used as the basis for our multi-level lightweight ontological representation method (TOR-FUSE), in which documents are represented based on their contexts and the goal of the learning task. The method is unlike the traditional representation methods, in which all the documents are represented solely based on the constituent words of the documents, and are totally isolated from the goal that they are represented for. We believe choosing the correct granularity of representation features is an important aspect of text classification. Interpreting data in a more general dimensional space, with fewer dimensions, can convey more discriminative knowledge and decrease the level of learning perplexity. The multi-level model allows data interpretation in a more conceptual space, rather than only containing scattered words occurring in texts. It aims to perform the extraction of the knowledge tailored for the classification task by automatic creation of a lightweight ontological hierarchy of representations. In the last step, we will train a tailored ensemble learner over a stack of representations at different conceptual granularities. The final result is a mapping and a weighting of the targeted concept of the original learning task, over a stack of representations and granular conceptual elements of its different levels (hierarchical mapping instead of linear mapping over a vector). Finally the entire algorithm is applied to a variety of general text classification tasks, and the performance is evaluated in comparison with well-known algorithms.
|
20 |
Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association ThesaurusLyall-Wilson, Jennifer Rae January 2013 (has links)
The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of all the concepts within the document collection as well as the relationships these concepts have to one another. Because the representation is generated using data from the association thesaurus, a mapping will exist between the representation of the concepts and the terms used to describe these concepts. The research applies to search engines designed for use in an individual website with content focused on a specific conceptual domain. Therefore, both the document collection and the subject content must be well-bounded, which affords the ability to make use of techniques not currently feasible for general purpose search engine used on the entire web.
|
Page generated in 0.1035 seconds