Global ETD Search

21	Context-Aware Malware Detection Using Topic Modeling Stegner, Wayne 28 September 2021 (has links) No description available. Computer Engineering Cybersecurity Context-Aware Malware Detection Topic Modeling AI
22	Still No Crystal Ball: Toward an Application for Broad Evaluation of Student Understanding Armstrong, Piper 11 August 2022 (has links) Evaluation of student understanding of learning material is critical to effective teaching. Current computer-aided evaluation tools exist, such as Computer Adaptive Testing (CAT); however, they require expert knowledge to implement and update. We propose a novel task, to create an evaluation tool that can predict student performance (knowledge) based on general performance on test questions without expert curation of the questions or expert understanding of the evaluation tool. We implement two methods for creating such a tool, find both methods lacking, and urge further investigation. NLP educational application evaluation topic modeling Physical Sciences and Mathematics
23	Closed-loop Greenhouse Agriculture Systems Ragany, Michelle January 2024 (has links) The growing global population and climate change threaten the availability of many critical resources, and have been directly impacting the food and agriculture sector. Therefore, new cultivation technologies must be rapidly developed and implemented to secure the world's future food needs. Closed-loop greenhouse agriculture systems provide an opportunity to decrease resource reliance and increase crop yield. Greenhouses provide versatility in what can be grown and the resources required to function. Greenhouses can become highly efficient and resilient through the application of a closed-loop systems approach that prioritizes repurposing, reusing, and recirculating resources. Here, we employ a text mining approach to research the available research (meta-research) and publications within the area of closed-loop systems in greenhouses. This meta-research provides a clearer definition of the term “closed-loop system” within the context of greenhouses, as the term was previously vaguely defined. Using this meta-research approach, we identify six major existing research topic areas in closed-loop agriculture systems, which include: models and controls; food waste; nutrient systems; growing media; heating; and energy. Furthermore, we identify four areas that require further urgent work, which include the establishment of better connection between academic research to industry applications; clearer criteria surrounding growing media selection; critical operational requirements of a closed-loop system; and the functionality and synergy between the many modules that comprise a closed-loop greenhouse systems. / Thesis / Master of Applied Science (MASc) closed-loop agriculture greenhouse food systems topic modeling meta-research
24	Topic modeling: a novel approach to drug repositioning using metadata Bogard, Britney A. January 2014 (has links) No description available. Computer Science drug repositioning topic modeling drug-disease similarity
25	Novel Algorithms for Understanding Online Reviews Shi, Tian 14 September 2021 (has links) This dissertation focuses on the review understanding problem, which has gained attention from both industry and academia, and has found applications in many downstream tasks, such as recommendation, information retrieval and review summarization. In this dissertation, we aim to develop machine learning and natural language processing tools to understand and learn structured knowledge from unstructured reviews, which can be investigated in three research directions, including understanding review corpora, understanding review documents, and understanding review segments. For the corpus-level review understanding, we have focused on discovering knowledge from corpora that consist of short texts. Since they have limited contextual information, automatically learning topics from them remains a challenging problem. We propose a semantics-assisted non-negative matrix factorization model to deal with this problem. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of a corpus. We conduct extensive sets of experiments on several short text corpora to demonstrate the proposed model can discover meaningful and coherent topics. For document-level review understanding, we have focused on building interpretable and reliable models for the document-level multi-aspect sentiment analysis (DMSA) task, which can help us to not only recover missing aspect-level ratings and analyze sentiment of customers, but also detect aspect and opinion terms from reviews. We conduct three studies in this research direction. In the first study, we collect a new DMSA dataset in the healthcare domain and systematically investigate reviews in this dataset, including a comprehensive statistical analysis and topic modeling to discover aspects. We also propose a multi-task learning framework with self-attention networks to predict sentiment and ratings for given aspects. In the second study, we propose corpus-level and concept-based explanation methods to interpret attention-based deep learning models for text classification, including sentiment classification. The proposed corpus-level explanation approach aims to capture causal relationships between keywords and model predictions via learning importance of keywords for predicted labels across a training corpus based on attention weights. We also propose a concept-based explanation method that can automatically learn higher level concepts and their importance to model predictions. We apply these methods to the classification task and show that they are powerful in extracting semantically meaningful keywords and concepts, and explaining model predictions. In the third study, we propose an interpretable and uncertainty aware multi-task learning framework for DMSA, which can achieve competitive performance while also being able to interpret the predictions made. Based on the corpus-level explanation method, we propose an attention-driven keywords ranking method, which can automatically discover aspect terms and aspect-level opinion terms from a review corpus using the attention weights. In addition, we propose a lecture-audience strategy to estimate model uncertainty in the context of multi-task learning. For the segment-level review understanding, we have focused on the unsupervised aspect detection task, which aims to automatically extract interpretable aspects and identify aspect-specific segments from online reviews. The existing deep learning-based topic models suffer from several problems such as extracting noisy aspects and poorly mapping aspects discovered by models to the aspects of interest. To deal with these problems, we propose a self-supervised contrastive learning framework in order to learn better representations for aspects and review segments. We also introduce a high-resolution selective mapping method to efficiently assign aspects discovered by the model to the aspects of interest. In addition, we propose using a knowledge distillation technique to further improve the aspect detection performance. / Doctor of Philosophy / Nowadays, online reviews are playing an important role in our daily lives. They are also critical to the success of many e-commerce and local businesses because they can help people build trust in brands and businesses, provide insights into products and services, and improve consumers' confidence. As a large number of reviews accumulate every day, a central research problem is to build an artificial intelligence system that can understand and interact with these reviews, and further use them to offer customers better support and services. In order to tackle challenges in these applications, we first have to get an in-depth understanding of online reviews. In this dissertation, we focus on the review understanding problem and develop machine learning and natural language processing tools to understand reviews and learn structured knowledge from unstructured reviews. We have addressed the review understanding problem in three directions, including understanding a collection of reviews, understanding a single review, and understanding a piece of a review segment. In the first direction, we proposed a short-text topic modeling method to extract topics from review corpora that consist of primary complaints of consumers. In the second direction, we focused on building sentiment analysis models to predict the opinions of consumers from their reviews. Our deep learning models can provide good prediction accuracy as well as a human-understandable explanation for the prediction. In the third direction, we develop an aspect detection method to automatically extract sentences that mention certain features consumers are interested in, from reviews, which can help customers efficiently navigate through reviews and help businesses identify the advantages and disadvantages of their products. Topic Modeling Sentiment Analysis Aspect Detection Model Interpretation
26	Analyzing and Navigating Electronic Theses and Dissertations Ahuja, Aman 21 July 2023 (has links) Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the scholarly community. Millions of ETDs are now publicly available online, often through one of many digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier. In recent years, with advances in the field of machine learning for text data, several techniques have been proposed to support such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation. The key contributions of this research are as follows: - A system to help with parsing long scholarly documents such as ETDs by means of object-detection methods trained to extract digital objects from long documents. The parsed documents can be used for further downstream tasks such as long document navigation, figure and/or table search, etc. - Datasets to support supervised training of object detection models on scholarly documents of multiple types, such as born-digital and scanned. In addition to manually annotated datasets, a framework (along with the resulting dataset) for AI-aided annotation also is proposed. - A web-based system for information extraction from long PDF theses and dissertations, into a structured format such as XML, aimed at making scholarly literature more accessible to users with disabilities. - A topic-modeling based framework to support exploration tasks such as searching and/or browsing documents (and document portions, e.g., chapters) by topic, document recommendation, topic recommendation, and describing temporal topic trends. / Doctor of Philosophy / Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the research community. Millions of ETDs are now publicly available online, often through one of many online digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier and accessible. Several advances in the field of machine learning for text data in recent years have led to the development of techniques that can serve as the backbone of such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, by parsing the information contained in the long PDF documents that make up ETDs, into a more compute-friendly format. This would enable researchers and developers to build end-user services for digital libraries. We also propose a framework to support document browsing and long document navigation, which are some of the important end-user services required in digital libraries. Topic Modeling Object Detection
27	TimeLink: Visualizing Diachronic Word Embeddings and Topics Williams, Lemara Faith 11 June 2024 (has links) The task of analyzing a collection of documents generated over time is daunting. A natural way to ease the task is by summarizing documents into the topics that exist within these documents. The temporal aspect of topics can frame relevance based on when topics are introduced and when topics stop being mentioned. It creates trends and patterns that can be traced by individual key terms taken from the corpus. If trends are being established, there must be a way to visualize them through the key terms. Creating a visual system to support this analysis can help users quickly gain insights from the data, significantly easing the burden from the original analysis technique. However, creating a visual system for terms is not easy. Work has been done to develop word embeddings, allowing researchers to treat words like any number. This makes it possible to create simple charts based on word embeddings like scatter plots. However, these methods are inefficient due to loss of effectiveness with multiple time slices and point overlap. A visualization method that addresses these problems while also visualizing diachronic word embeddings in an interesting way with added semantic meaning is hard to find. These problems are managed through TimeLink. TimeLink is proposed as a dashboard system to help users gain insights from the movement of diachronic word embeddings. It comprises a Sankey diagram showing the path of a selected key term to a cluster in a time period. This local cluster is also mapped to a global topic based on an original corpus of documents from which the key terms are drawn. On the dashboard, different tools are given to users to aid in a focused analysis, such as filtering key terms and emphasizing specific clusters. TimeLink provides insightful visualizations focused on temporal word embeddings while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time. / Master of Science / The task of analyzing documents collected over time is daunting. Grouping documents into topics can help frame relevancy based on when topics are introduced and hampered. The creation of topics also enables the ability to visualize trends and patterns. Creating a visual system to support this analysis can help users quickly gain insights from the data, significantly easing the burden from the original analysis technique of browsing individual documents. A visualization system for this analysis typically focuses on the terms that affect established topics. Some visualization methods, like scatter plots, implement this but can be inefficient due to loss of effectiveness as more data is introduced. TimeLink is proposed as a dashboard system to aid users in drawing insights from the development of terms over time. In addition to addressing problems in other visualizations, it visualizes the movement of terms intuitively and adds semantic meaning. TimeLink provides insightful visualizations focused on the movement of terms while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time. High Dimensional Visualizations Clustering Diachronic Word Embeddings Topic Modeling
28	Examining the Educational Depth of Medical Case Reports and Radiology with Text Mining Collinsworth, Amy L. 12 1900 (has links) The purpose of this dissertation was to use the technology of text mining and topic modeling to explore unobserved themes of medical case reports that involve medical imaging. Case reports have a valuable place in medical research because they provide educational benefits, offer evidence, and encourage discussions. Their form has evolved throughout the years, but they have remained a key staple in providing important information to the medical communities around the world with educational context and illuminating visuals. Examining medical case reports that have been published throughout the years on multiple medical subjects can be challenging, therefore text mining and topic modeling methods were used to analyze a large set of abstracts from medical case reports involving radiology. The total number of abstracts used for the data analysis was 68,845 that were published between the years 1975 to 2022. The findings indicate that text mining and topic modeling can offer a unique and reproducible approach to examine a large quantity of abstracts for theme analysis. Case reports Medical imaging Radiology Text mining Topic modeling
29	FROM MEMES TO MOVEMENTS: THE INTERPLAY BETWEEN THE GAMESTOP PHENOMENON AND PERIPHERAL SUBREDDITS IN DIGITAL FINANCIAL MOBILIZATION Han, Jing, 0000-0003-3251-6549 12 1900 (has links) Reddit studies have examined community formation and collective identity on individualsubreddits, analyzed the migration of users among subreddits, or provided activity metrics across the platform. Fewer studies have examined Reddit affordances on sub-community and topical levels. In my dissertation, I introduce the concept of ‘peripheral subreddits’, which are offshoots of more prominent subreddits, by studying peripheral subreddits of the GameStop movement, which initially occurred on r/WallStreetBets (r/Superstonk, r/GME, r/GMEJungle, and r/DDintoGME). I argue that analyzing the communicative activity occurring on peripheral subreddits may help communication scholars understand the growth of emerging movements that are anonymously social and ‘digital-first’. Specifically, I examine peripheral subreddits by focusing on three characteristics: topic inheritance (the provision of content themes from a more popular root subreddit), topic similarity (the shared interests among peripheral subreddits), and topic connectivity (the explicit or implicit associations among peripheral subreddits in the form of shared dialogue, activities, beliefs, or sentiments). I use computational methods such as topic modeling and sentiment analysis to analyze user activity and posts in these peripheral subreddits. Further, following the literature on digitally mediated stock market communities, I examine whether these peripheral subreddits engage in communicative processes such as aestheticization, virtualization, and de-realization, and reflexivities such as performativity, transactionality, and gamification. / Media & Communication Communication GameStop LIWC-22 Meme Reddit Sentiment analysis Topic modeling
30	Inference Methods for Token-Level Topic Assignments with Fixed Topics Cowley, Stephen 23 December 2023 (has links) (PDF) Topic modeling, an unsupervised technique used to gain high-level understanding of a large collection of documents, often involves two major goals: The discovery of topics used in the corpus (topic-discovery) and the assignment of topics to individual words (token-level topic assignment). While Latent Dirichlet Allocation (LDA) normally performs both of these steps simultaneously, some situations require only the token-level topic assignments, using fixed topics. We evaluate three topic assignment strategies using fixed topics -- Gibbs sampling, iterated conditional modes, and mean field variational inference -- to determine which should be used when only token-level topic assignment is needed. Among these methods, we find iterated conditional modes performs best with respect to significance, consistency, and runtime, and variational inference performs best with down-stream classification accuracy. Topic Modeling token-level topics topic assignment Physical Sciences and Mathematics

Search results