Spelling suggestions: "subject:"anguage codels"" "subject:"anguage 2models""
1 |
Computational models for declarative languages and their formal specifications in CSPLee, M. K. O. January 1986 (has links)
No description available.
|
2 |
InjectBench: An Indirect Prompt Injection Benchmarking FrameworkKong, Nicholas Ka-Shing 20 August 2024 (has links)
The integration of large language models (LLMs) with third party applications has allowed for LLMs to retrieve information from up-to-date or specialized resources. Although this integration offers numerous advantages, it also introduces the risk of indirect prompt injection attacks. In such scenarios, an attacker embeds malicious instructions within the retrieved third party data, which when processed by the LLM, can generate harmful and untruthful outputs for an unsuspecting user. Although previous works have explored how these attacks manifest, there is no benchmarking framework to evaluate indirect prompt injection attacks and defenses at scale, limiting progress in this area. To address this gap, we introduce InjectBench, a framework that empowers the community to create and evaluate custom indirect prompt injection attack samples. Our study demonstrate that InjectBench has the capabilities to produce high quality attack samples that align with specific attack goals, and that our LLM evaluation method aligns with human judgement. Using InjectBench, we investigate the effects of different components of an attack sample on four LLM backends, and subsequently use this newly created dataset to do preliminary testing on defenses against indirect prompt injections. Experiment results suggest that while more capable models are susceptible to attacks, they are better equipped at utilizing defense strategies. To summarize, our work helps the research community to systematically evaluate features of attack samples and defenses by introducing a dataset creation and evaluation framework. / Master of Science / Large language models (LLMs), such as ChatGPT, are now able to retrieve up-to-date information from online resources like Google Flights or Wikipedia. This ultimately allows the LLM to utilize current information to generate truthful, helpful and accurate responses. Despite the numerous advantages, it also exposes a user to a new vector of attacks known as indirect prompt injections. In this attack, an attacker will write a instruction onto an online resource that an LLM will process when retrieved from the online resource. The primary aim of the attacker is to instruct the LLM to say something it is not supposed to, and thus may manifest as a blatant lie or misinformation given to the user. Prior works have studied and showcased the harmfulness of this attack, however not many works have tried to understand which LLMs are more vulnerable to indirect prompt injection attacks and how we may defend from them. We believe that this is mainly due to the non-availability of a benchmarking dataset which allows us to test LLMs and new defenses. To address this gap, we introduce InjectBench, a methodology that allows the automated creation of these benchmarking datasets, and the evaluation of LLMs and defenses. We show that InjectBench can produce a high quality dataset that we can customize to specific attack goals, and that our evaluation process is accurate and agrees with human judgement. Using the benchmarking dataset created from InjectBench, we evaluate four LLMs and investigate defenses for indirect prompt injection attacks.
|
3 |
A Case Study of Compact Core French Models: A Pedagogic PerspectiveMarshall, Pamela 10 January 2012 (has links)
The overriding objective of core French (CF) teaching in Canada since the National Core French Study (NCFS) is that of communicative competence (R. Leblanc, 1990). Results from the traditional form of CF, though, suggest that students are not developing desired levels of communicative competence in the drip-feed (short daily periods) model (Lapkin, Harley, & Taylor, 1993). The present study aims to investigate the role of compacted second language program formats in promoting higher levels of language proficiency and achievement among elementary core French students; in particular, the study investigates the pedagogic approach, based on the principle that longer class periods should facilitate a more communicative/ experiential teaching approach.
Students in three Grade 7 classes served as participants. Two of the classes served as the compacted experimental classes, and the other as a comparison class. Pre-tests, immediate post-tests and delayed post-tests recorded differences in student achievement. A multi-dimensional, project-based curriculum approach was implemented in all three classes, and was recorded by teacher observations in her daybook and daily journal. Student attitudes toward their CF program format and their self-assessed language proficiency were measured during recorded focus group sessions and on student questionnaires. Parental and teacher perceptions of student attitudes were measured using a short survey.
Results indicate that students in both the compact and comparison classes performed similarly, with few significant differences in measured language growth or retention over time. Parents of all classes indicated satisfaction with the teaching and learning activities, and with the program format in which their child was enrolled. Excerpts from the teacher daybook and reflective journal demonstrated that communicative activities fostering student interaction in the target language were more frequently and readily implemented in the longer compact CF periods. Students generally stated a preference for the program format in which they were enrolled, although only students in the compact classes outlined pedagogic reasons in support for their preference. Additionally, most students self-assessed a higher level of language competence than in previous years, which students in the compact (experimental) classes attributed to the longer class periods, stating that they promoted task completion, group work, in-depth projects and communicative activities.
|
4 |
A Case Study of Compact Core French Models: A Pedagogic PerspectiveMarshall, Pamela 10 January 2012 (has links)
The overriding objective of core French (CF) teaching in Canada since the National Core French Study (NCFS) is that of communicative competence (R. Leblanc, 1990). Results from the traditional form of CF, though, suggest that students are not developing desired levels of communicative competence in the drip-feed (short daily periods) model (Lapkin, Harley, & Taylor, 1993). The present study aims to investigate the role of compacted second language program formats in promoting higher levels of language proficiency and achievement among elementary core French students; in particular, the study investigates the pedagogic approach, based on the principle that longer class periods should facilitate a more communicative/ experiential teaching approach.
Students in three Grade 7 classes served as participants. Two of the classes served as the compacted experimental classes, and the other as a comparison class. Pre-tests, immediate post-tests and delayed post-tests recorded differences in student achievement. A multi-dimensional, project-based curriculum approach was implemented in all three classes, and was recorded by teacher observations in her daybook and daily journal. Student attitudes toward their CF program format and their self-assessed language proficiency were measured during recorded focus group sessions and on student questionnaires. Parental and teacher perceptions of student attitudes were measured using a short survey.
Results indicate that students in both the compact and comparison classes performed similarly, with few significant differences in measured language growth or retention over time. Parents of all classes indicated satisfaction with the teaching and learning activities, and with the program format in which their child was enrolled. Excerpts from the teacher daybook and reflective journal demonstrated that communicative activities fostering student interaction in the target language were more frequently and readily implemented in the longer compact CF periods. Students generally stated a preference for the program format in which they were enrolled, although only students in the compact classes outlined pedagogic reasons in support for their preference. Additionally, most students self-assessed a higher level of language competence than in previous years, which students in the compact (experimental) classes attributed to the longer class periods, stating that they promoted task completion, group work, in-depth projects and communicative activities.
|
5 |
Supervised language models for temporal resolution of text in absence of explicit temporal cuesKumar, Abhimanu 18 March 2014 (has links)
This thesis explores the temporal analysis of text using the implicit temporal cues
present in document. We consider the case when all explicit temporal expressions such as
specific dates or years are removed from the text and a bag of words based approach is used
for timestamp prediction for the text. A set of gold standard text documents with times-
tamps are used as the training set. We also predict time spans for Wikipedia biographies
based on their text. We have training texts from 3800 BC to present day. We partition this
timeline into equal sized chronons and build a probability histogram for a test document
over this chronon sequence. The document is assigned to the chronon with the highest
probability.
We use 2 approaches: 1) a generative language model with Bayesian priors, and 2) a
KL divergence based model. To counter the sparsity in the documents and chronons we use
3 different smoothing techniques across models. We use 3 diverse datasets to test our mod-
els: 1) Wikipedia Biographies, 2) Guttenberg Short Stories, and 3) Wikipedia Years dataset.
Our models are trained on a subset of Wikipedia biographies. We concentrate on
two prediction tasks: 1) time-stamp prediction for a generic text or mid-span prediction for
a Wikipedia biography , and 2) life-span prediction for a Wikipedia biography. We achieve
an f-score of 81.1% for life-span prediction task and a mean error of around 36 years for
mid-span prediction for biographies from present day to 3800 BC. The best model gives a
mean error of 18 years for publication date prediction for short stories that are uniformly
distributed in the range 1700 AD to 2010 AD. Our models exploit the temporal distribu-
tion of text for associating time. Our error analysis reveals interesting properties about the
models and datasets used.
We try to combine explicit temporal cues extracted from the document with its
implicit cues and obtain combined prediction model. We show that a combination of the
date-based predictions and language model divergence predictions is highly effective for this
task: our best model obtains an f-score of 81.1% and the median error between actual and
predicted life span midpoints is 6 years. This would be one of the emphasis for our future
work.
The above analyses demonstrates that there are strong temporal cues within texts
that can be exploited statistically for temporal predictions. We also create good benchmark
datasets along the way for the research community to further explore this problem. / text
|
6 |
A Language-Model-Based Approach for Detecting Incompleteness in Natural-Language RequirementsLuitel, Dipeeka 24 May 2023 (has links)
[Context and motivation]: Incompleteness in natural-language requirements is a challenging problem. [Question/Problem]: A common technique for detecting incompleteness in requirements is checking the requirements against external sources. With the emergence of language models such as BERT, an interesting question is whether language models are useful external sources for finding potential incompleteness in requirements. [Principal ideas/results]: We mask words in requirements and have BERT's masked language model (MLM) generate contextualized predictions for filling the masked slots. We simulate incompleteness by withholding content from requirements and measure BERT's ability to predict terminology that is present in the withheld content but absent in the content disclosed to BERT. [Contributions]: BERT can be configured to generate multiple predictions per mask. Our first contribution is to determine how many predictions per mask is an optimal trade-off between effectively discovering omissions in requirements and the level of noise in the predictions. Our second contribution is devising a machine learning-based filter that post-processes predictions made by BERT to further reduce noise. We empirically evaluate our solution over 40 requirements specifications drawn from the PURE dataset [30]. Our results indicate that: (1) predictions made by BERT are highly effective at pinpointing terminology that is missing from requirements, and (2) our filter can substantially reduce noise from the predictions, thus making BERT a more compelling aid for improving completeness in requirements.
|
7 |
Identifying High Acute Care Users Among Bipolar and Schizophrenia PatientsShuo Li (17499660) 03 January 2024 (has links)
<p dir="ltr">The electronic health record (EHR) documents the patient’s medical history, with information such as demographics, diagnostic history, procedures, laboratory tests, and observations made by healthcare providers. This source of information can help support preventive health care and management. The present thesis explores the potential for EHR-driven models to predict acute care utilization (ACU) which is defined as visits to an emergency department (ED) or inpatient hospitalization (IH). ACU care is often associated with significant costs compared to outpatient visits. Identifying patients at risk can improve the quality of care for patients and can reduce the need for these services making healthcare organizations more cost-effective. This is important for vulnerable patients including those suffering from schizophrenia and bipolar disorders. This study compares the ability of the MedBERT architecture, the MedBERT+ architecture and standard machine learning models to identify at risk patients. MedBERT is a deep learning language model which was trained on diagnosis codes to predict the patient’s at risk for certain disease conditions. MedBERT+, the architecture introduced in this study is also trained on diagnosis codes. However, it adds socio-demographic embeddings and targets a different outcome, namely ACU. MedBERT+ outperformed the original architecture, MedBERT, as well as XGB achieving an AUC of 0.71 for both bipolar and schizophrenia patients when predicting ED visits and an AUC of 0.72 for bipolar patients when predicting IH visits. For schizophrenia patients, the IH predictive model had an AUC of 0.66 requiring further improvements. One potential direction for future improvement is the encoding of the demographic variables. Preliminary results indicate that an appropriate encoding of the age of the patient increased the AUC of Bipolar ED models to up to 0.78.</p>
|
8 |
Leveraging Transformer Models and Elasticsearch to Help Prevent and Manage Diabetes through EFT CuesShah, Aditya Ashishkumar 16 June 2023 (has links)
Diabetes in humans is a long-term (chronic) illness that affects how our body converts food into energy. Approximately one in ten individuals residing in the United States is affected with diabetes and more than 90% of those have type 2 diabetes (T2D). Human bodies fail to produce insulin in type 1 diabetes, causing you to take insulin for survival. However, with type 2 diabetes, the body can't use insulin well. A proven way to manage diabetes is through a positive mindset and a healthy lifestyle. Several studies have been conducted at Virginia Tech and the University of Buffalo on discovering different helpful characteristics in a person's day-to-day life, which relate to important events. They consider Episodic Fu- ture Thinking (EFT), where participants identify several events/actions that might occur at multiple future time frames (1 month to 10 years) in text-based descriptions (cues). This re- search aims to detect content characteristics from these EFT cues. However, class imbalance often presents a challenging issue when dealing with such domain-specific data. To mitigate this issue, this research employs Elasticsearch to address data imbalance and enhance the machine learning (ML) pipeline for improved accuracy of predictions. By leveraging Elas- ticsearch and transformer models, this study constructs classifiers and regression models, which can be utilized to identify various content characteristics from the cues. To the best of our knowledge, this work represents the first such attempt to employ natural language processing (NLP) techniques to analyze EFT cues and establish a correlation between those characteristics and their impacts on decision-making and health outcomes. / Master of Science / Diabetes is a serious and long-term illness that impacts how the body converts food into energy. It affects around one in ten individuals residing in the United States, and over 90% of these individuals have type 2 diabetes (T2D). While a positive attitude and healthy lifestyle can help with management of diabetes, it is unclear exactly which mental attitudes most affect health outcomes. To gain a better understanding of this relationship, researchers from Virginia Tech and the University of Buffalo conducted multiple studies on Episodic Future Thinking (EFT), where participants identify several events or actions that could take place in the future. This research uses natural language processing (NLP) to analyze the descriptions of these events (cues) and identify different characteristics that relate to a person's day-to-day life. With the help of Elasticsearch and transformer models, this work handles the data imbalance and improves the model predictions for different categories within cues. Overall, this research has the potential to provide valuable insights that can impact their diabetes risk, potentially leading to better management and prevention strategies and treatments.
|
9 |
Transforming SDOH Screening: Towards a General Framework for Transformer-based Prediction of Social Determinants of HealthKing III, Kenneth Hale 09 September 2024 (has links)
Social Determinants of Health (SDOH) play a crucial role in healthcare outcomes, yet identifying them from unstructured patient data remains a challenge. This research explores the potential of Large Language Models (LLMs) for automated SDOH identification from patient notes. We propose a general framework for SDOH screening that is simple and straightforward. We leverage existing SDOH datasets, adapting and combining them to create a more comprehensive benchmark for this task, addressing the research gap of limited datasets. Using the benchmark and proposed framework, we conclude by conducting several preliminary experiments exploring and comparing promising LLM system implementations. Our findings highlight the potential of LLMs for automated SDOH screening while emphasizing the need for more robust datasets and evaluation frameworks. / Master of Science / Social Determinants of Health (SDOH) have been shown to significantly impact health outcomes and are seen as a major contributor to global health inequities. However, their use within the healthcare industry is still significantly under emphasized, largely due to the difficulty of manually identifying SDOH factors. While previous works have explored automated approaches for SDOH identification, they lack standardization, data transparency and robustness, and are largely outdated compared to the latest Artificial Intelligence (AI) approaches. Therefore, in this work we propose a holistic framework for automated SDOH identification. We also present a higher quality SDOH benchmark, merging existing publicly available datasets, standardizing them, and cleaning them for errors. With this benchmark, we then conducted experiments to gain greater insights into the best performance across different state-of-the-art AI approaches. Through this work, we contribute a better way to think about automated SDOH screening systems, the first publicly accessible multi-clinic and multi-annotator benchmark, as well as greater insights into the latest AI approaches for state-of-the-art results.
|
10 |
A Multimodal Framework for Automated Content Moderation of Children's VideosAhmed, Syed Hammad 01 January 2024 (has links) (PDF)
Online video platforms receive hundreds of hours of uploads every minute, making manual moderation of inappropriate content impossible. The most vulnerable consumers of malicious video content are children from ages 1-5 whose attention is easily captured by bursts of color and sound. Prominent video hosting platforms like YouTube have taken measures to mitigate malicious content, but these videos often go undetected by current automated content moderation tools that are focused on removing explicit or copyrighted content. Scammers attempting to monetize their content may craft malicious children's videos that are superficially similar to educational videos, but include scary and disgusting characters, violent motions, loud music, and disturbing noises. A robust classification of malicious videos requires audio representations in addition to video features. However, recent content moderation approaches rarely employ multimodal architectures that explicitly consider non-speech audio cues. Additionally, there is a dearth of comprehensive datasets for content moderation tasks which include these audio-visual feature annotations. This dissertation addresses these challenges and makes several contributions to the problem of content moderation for children’s videos. The first contribution is identifying a set of malicious features that are harmful to preschool children but remain unaddressed and publishing a labeled dataset (Malicious or Benign) of cartoon video clips that include these features. We provide a user-friendly web-based video annotation tool which can easily be customized and used for video classification tasks with any number of ground truth classes. The second contribution is adapting state-of-the-art Vision-Language models to apply content moderation techniques on the MOB benchmark. We perform prompt engineering and an in-depth analysis of how context-specific language prompts affect the content moderation performance of different CLIP (Contrastive Language-Image Pre-training) variants. This dissertation introduces new benchmark natural language prompt templates for cartoon videos that can be used with Vision-Language models. Finally, we introduce a multimodal framework that includes the audio modality for more robust content moderation of children's cartoon videos and extend our dataset to include audio labels. We present ablations to demonstrate the enhanced performance of adding audio. The audio modality and prompt learning are incorporated while keeping the backbone modules of each modality frozen. Experiments were conducted on a multimodal version of the MOB (Malicious or Benign) dataset in both supervised and few-shot settings.
|
Page generated in 0.0516 seconds