Global ETD Search

21	Context-Aware Malware Detection Using Topic Modeling Stegner, Wayne 28 September 2021 (has links) No description available. Computer Engineering Cybersecurity Context-Aware Malware Detection Topic Modeling AI
22	Still No Crystal Ball: Toward an Application for Broad Evaluation of Student Understanding Armstrong, Piper 11 August 2022 (has links) Evaluation of student understanding of learning material is critical to effective teaching. Current computer-aided evaluation tools exist, such as Computer Adaptive Testing (CAT); however, they require expert knowledge to implement and update. We propose a novel task, to create an evaluation tool that can predict student performance (knowledge) based on general performance on test questions without expert curation of the questions or expert understanding of the evaluation tool. We implement two methods for creating such a tool, find both methods lacking, and urge further investigation. NLP educational application evaluation topic modeling Physical Sciences and Mathematics
23	Closed-loop Greenhouse Agriculture Systems Ragany, Michelle January 2024 (has links) The growing global population and climate change threaten the availability of many critical resources, and have been directly impacting the food and agriculture sector. Therefore, new cultivation technologies must be rapidly developed and implemented to secure the world's future food needs. Closed-loop greenhouse agriculture systems provide an opportunity to decrease resource reliance and increase crop yield. Greenhouses provide versatility in what can be grown and the resources required to function. Greenhouses can become highly efficient and resilient through the application of a closed-loop systems approach that prioritizes repurposing, reusing, and recirculating resources. Here, we employ a text mining approach to research the available research (meta-research) and publications within the area of closed-loop systems in greenhouses. This meta-research provides a clearer definition of the term “closed-loop system” within the context of greenhouses, as the term was previously vaguely defined. Using this meta-research approach, we identify six major existing research topic areas in closed-loop agriculture systems, which include: models and controls; food waste; nutrient systems; growing media; heating; and energy. Furthermore, we identify four areas that require further urgent work, which include the establishment of better connection between academic research to industry applications; clearer criteria surrounding growing media selection; critical operational requirements of a closed-loop system; and the functionality and synergy between the many modules that comprise a closed-loop greenhouse systems. / Thesis / Master of Applied Science (MASc) closed-loop agriculture greenhouse food systems topic modeling meta-research
24	Topic modeling: a novel approach to drug repositioning using metadata Bogard, Britney A. January 2014 (has links) No description available. Computer Science drug repositioning topic modeling drug-disease similarity
25	Examining the Educational Depth of Medical Case Reports and Radiology with Text Mining Collinsworth, Amy L. 12 1900 (has links) The purpose of this dissertation was to use the technology of text mining and topic modeling to explore unobserved themes of medical case reports that involve medical imaging. Case reports have a valuable place in medical research because they provide educational benefits, offer evidence, and encourage discussions. Their form has evolved throughout the years, but they have remained a key staple in providing important information to the medical communities around the world with educational context and illuminating visuals. Examining medical case reports that have been published throughout the years on multiple medical subjects can be challenging, therefore text mining and topic modeling methods were used to analyze a large set of abstracts from medical case reports involving radiology. The total number of abstracts used for the data analysis was 68,845 that were published between the years 1975 to 2022. The findings indicate that text mining and topic modeling can offer a unique and reproducible approach to examine a large quantity of abstracts for theme analysis. Case reports Medical imaging Radiology Text mining Topic modeling
26	Novel Algorithms for Understanding Online Reviews Shi, Tian 14 September 2021 (has links) This dissertation focuses on the review understanding problem, which has gained attention from both industry and academia, and has found applications in many downstream tasks, such as recommendation, information retrieval and review summarization. In this dissertation, we aim to develop machine learning and natural language processing tools to understand and learn structured knowledge from unstructured reviews, which can be investigated in three research directions, including understanding review corpora, understanding review documents, and understanding review segments. For the corpus-level review understanding, we have focused on discovering knowledge from corpora that consist of short texts. Since they have limited contextual information, automatically learning topics from them remains a challenging problem. We propose a semantics-assisted non-negative matrix factorization model to deal with this problem. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of a corpus. We conduct extensive sets of experiments on several short text corpora to demonstrate the proposed model can discover meaningful and coherent topics. For document-level review understanding, we have focused on building interpretable and reliable models for the document-level multi-aspect sentiment analysis (DMSA) task, which can help us to not only recover missing aspect-level ratings and analyze sentiment of customers, but also detect aspect and opinion terms from reviews. We conduct three studies in this research direction. In the first study, we collect a new DMSA dataset in the healthcare domain and systematically investigate reviews in this dataset, including a comprehensive statistical analysis and topic modeling to discover aspects. We also propose a multi-task learning framework with self-attention networks to predict sentiment and ratings for given aspects. In the second study, we propose corpus-level and concept-based explanation methods to interpret attention-based deep learning models for text classification, including sentiment classification. The proposed corpus-level explanation approach aims to capture causal relationships between keywords and model predictions via learning importance of keywords for predicted labels across a training corpus based on attention weights. We also propose a concept-based explanation method that can automatically learn higher level concepts and their importance to model predictions. We apply these methods to the classification task and show that they are powerful in extracting semantically meaningful keywords and concepts, and explaining model predictions. In the third study, we propose an interpretable and uncertainty aware multi-task learning framework for DMSA, which can achieve competitive performance while also being able to interpret the predictions made. Based on the corpus-level explanation method, we propose an attention-driven keywords ranking method, which can automatically discover aspect terms and aspect-level opinion terms from a review corpus using the attention weights. In addition, we propose a lecture-audience strategy to estimate model uncertainty in the context of multi-task learning. For the segment-level review understanding, we have focused on the unsupervised aspect detection task, which aims to automatically extract interpretable aspects and identify aspect-specific segments from online reviews. The existing deep learning-based topic models suffer from several problems such as extracting noisy aspects and poorly mapping aspects discovered by models to the aspects of interest. To deal with these problems, we propose a self-supervised contrastive learning framework in order to learn better representations for aspects and review segments. We also introduce a high-resolution selective mapping method to efficiently assign aspects discovered by the model to the aspects of interest. In addition, we propose using a knowledge distillation technique to further improve the aspect detection performance. / Doctor of Philosophy / Nowadays, online reviews are playing an important role in our daily lives. They are also critical to the success of many e-commerce and local businesses because they can help people build trust in brands and businesses, provide insights into products and services, and improve consumers' confidence. As a large number of reviews accumulate every day, a central research problem is to build an artificial intelligence system that can understand and interact with these reviews, and further use them to offer customers better support and services. In order to tackle challenges in these applications, we first have to get an in-depth understanding of online reviews. In this dissertation, we focus on the review understanding problem and develop machine learning and natural language processing tools to understand reviews and learn structured knowledge from unstructured reviews. We have addressed the review understanding problem in three directions, including understanding a collection of reviews, understanding a single review, and understanding a piece of a review segment. In the first direction, we proposed a short-text topic modeling method to extract topics from review corpora that consist of primary complaints of consumers. In the second direction, we focused on building sentiment analysis models to predict the opinions of consumers from their reviews. Our deep learning models can provide good prediction accuracy as well as a human-understandable explanation for the prediction. In the third direction, we develop an aspect detection method to automatically extract sentences that mention certain features consumers are interested in, from reviews, which can help customers efficiently navigate through reviews and help businesses identify the advantages and disadvantages of their products. Topic Modeling Sentiment Analysis Aspect Detection Model Interpretation
27	Analyzing and Navigating Electronic Theses and Dissertations Ahuja, Aman 21 July 2023 (has links) Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the scholarly community. Millions of ETDs are now publicly available online, often through one of many digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier. In recent years, with advances in the field of machine learning for text data, several techniques have been proposed to support such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation. The key contributions of this research are as follows: - A system to help with parsing long scholarly documents such as ETDs by means of object-detection methods trained to extract digital objects from long documents. The parsed documents can be used for further downstream tasks such as long document navigation, figure and/or table search, etc. - Datasets to support supervised training of object detection models on scholarly documents of multiple types, such as born-digital and scanned. In addition to manually annotated datasets, a framework (along with the resulting dataset) for AI-aided annotation also is proposed. - A web-based system for information extraction from long PDF theses and dissertations, into a structured format such as XML, aimed at making scholarly literature more accessible to users with disabilities. - A topic-modeling based framework to support exploration tasks such as searching and/or browsing documents (and document portions, e.g., chapters) by topic, document recommendation, topic recommendation, and describing temporal topic trends. / Doctor of Philosophy / Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the research community. Millions of ETDs are now publicly available online, often through one of many online digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier and accessible. Several advances in the field of machine learning for text data in recent years have led to the development of techniques that can serve as the backbone of such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, by parsing the information contained in the long PDF documents that make up ETDs, into a more compute-friendly format. This would enable researchers and developers to build end-user services for digital libraries. We also propose a framework to support document browsing and long document navigation, which are some of the important end-user services required in digital libraries. Topic Modeling Object Detection
28	TimeLink: Visualizing Diachronic Word Embeddings and Topics Williams, Lemara Faith 11 June 2024 (has links) The task of analyzing a collection of documents generated over time is daunting. A natural way to ease the task is by summarizing documents into the topics that exist within these documents. The temporal aspect of topics can frame relevance based on when topics are introduced and when topics stop being mentioned. It creates trends and patterns that can be traced by individual key terms taken from the corpus. If trends are being established, there must be a way to visualize them through the key terms. Creating a visual system to support this analysis can help users quickly gain insights from the data, significantly easing the burden from the original analysis technique. However, creating a visual system for terms is not easy. Work has been done to develop word embeddings, allowing researchers to treat words like any number. This makes it possible to create simple charts based on word embeddings like scatter plots. However, these methods are inefficient due to loss of effectiveness with multiple time slices and point overlap. A visualization method that addresses these problems while also visualizing diachronic word embeddings in an interesting way with added semantic meaning is hard to find. These problems are managed through TimeLink. TimeLink is proposed as a dashboard system to help users gain insights from the movement of diachronic word embeddings. It comprises a Sankey diagram showing the path of a selected key term to a cluster in a time period. This local cluster is also mapped to a global topic based on an original corpus of documents from which the key terms are drawn. On the dashboard, different tools are given to users to aid in a focused analysis, such as filtering key terms and emphasizing specific clusters. TimeLink provides insightful visualizations focused on temporal word embeddings while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time. / Master of Science / The task of analyzing documents collected over time is daunting. Grouping documents into topics can help frame relevancy based on when topics are introduced and hampered. The creation of topics also enables the ability to visualize trends and patterns. Creating a visual system to support this analysis can help users quickly gain insights from the data, significantly easing the burden from the original analysis technique of browsing individual documents. A visualization system for this analysis typically focuses on the terms that affect established topics. Some visualization methods, like scatter plots, implement this but can be inefficient due to loss of effectiveness as more data is introduced. TimeLink is proposed as a dashboard system to aid users in drawing insights from the development of terms over time. In addition to addressing problems in other visualizations, it visualizes the movement of terms intuitively and adds semantic meaning. TimeLink provides insightful visualizations focused on the movement of terms while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time. High Dimensional Visualizations Clustering Diachronic Word Embeddings Topic Modeling
29	Growing local food: charting meaning emergence through the dynamics of discourse, rhetoric and framing Karmali, Shazia 28 August 2020 (has links) This dissertation seeks to understand how new meanings emerge in the context of institutional change. Existing research seeking to understand shifts in meaning has primarily accessed meaning, across numerous contexts, via the three key constructs of discourse, rhetoric, or framing. Within the context of the emergence of the local food movement in Canada, I employ a mixed methods approach using term frequencies, topic modelling and qualitative content analysis, within a computational grounded theory framework for Big Data analysis. My data consists of all articles containing any mention of the term “local food” in popular Canadian press over 37 years from 1978-2014, a database totalling 31,421 articles. My results show that firstly, new meanings pertaining to local food emerged rapidly over the 37-year period. The emergence of a new meaning for local food, associated with the politicization of food production occurred in the second half of my dataset, whereas the first half was marked by connotations of poverty and hunger, associated with the local food bank. Secondly, unexpected actors were found to significantly impact the propulsion of meaning change, by establishing new vocabularies surrounding the term “local food”. Finally, this dissertation shows that the new meanings associated with local food emerged as a result of discursive opportunities, momentarily arising through the confluence of discourse, rhetoric and framing. I propose an emergent process model of meaning change and, further, propose that discursive opportunity structures can be better understood through the metaphor of an emergent property. / Graduate / 2022-08-01 Discourse Rhetoric Framing Dynamic Topic Modeling Topic Modeling Big data Discursive Opportunity Structure Institutional Change Meaning Institutional Theory
30	Pretopology and Topic Modeling for Complex Systems Analysis : Application on Document Classification and Complex Network Analysis / Prétopologie et modélisation de sujets pour l'analyse de systèmes complexes : application à la classification de documents et à l'analyse de réseaux complexes Bui, Quang Vu 27 September 2018 (has links) Les travaux de cette thèse présentent le développement d'algorithmes de classification de documents d'une part, ou d'analyse de réseaux complexes d'autre part, en s'appuyant sur la prétopologie, une théorie qui modélise le concept de proximité. Le premier travail développe un cadre pour la classification de documents en combinant une approche de topicmodeling et la prétopologie. Notre contribution propose d'utiliser des distributions de sujets extraites à partir d'un traitement topic-modeling comme entrées pour des méthodes de classification. Dans cette approche, nous avons étudié deux aspects : déterminer une distance adaptée entre documents en étudiant la pertinence des mesures probabilistes et des mesures vectorielles, et effet réaliser des regroupements selon plusieurs critères en utilisant une pseudo-distance définie à partir de la prétopologie. Le deuxième travail introduit un cadre général de modélisation des Réseaux Complexes en développant une reformulation de la prétopologie stochastique, il propose également un modèle prétopologique de cascade d'informations comme modèle général de diffusion. De plus, nous avons proposé un modèle agent, Textual-ABM, pour analyser des réseaux complexes dynamiques associés à des informations textuelles en utilisant un modèle auteur-sujet et nous avons introduit le Textual-Homo-IC, un modèle de cascade indépendant de la ressemblance, dans lequel l'homophilie est fondée sur du contenu textuel obtenu par un topic-model. / The work of this thesis presents the development of algorithms for document classification on the one hand, or complex network analysis on the other hand, based on pretopology, a theory that models the concept of proximity. The first work develops a framework for document clustering by combining Topic Modeling and Pretopology. Our contribution proposes using topic distributions extracted from topic modeling treatment as input for classification methods. In this approach, we investigated two aspects: determine an appropriate distance between documents by studying the relevance of Probabilistic-Based and Vector-Based Measurements and effect groupings according to several criteria using a pseudo-distance defined from pretopology. The second work introduces a general framework for modeling Complex Networks by developing a reformulation of stochastic pretopology and proposes Pretopology Cascade Model as a general model for information diffusion. In addition, we proposed an agent-based model, Textual-ABM, to analyze complex dynamic networks associated with textual information using author-topic model and introduced Textual-Homo-IC, an independent cascade model of the resemblance, in which homophily is measured based on textual content obtained by utilizing Topic Modeling. Prétopologie Topic Modeling Allocation de Dirichlet latente Clustering de documents Réseaux complexes Diffusion de l'information Pretopology Topic Modeling Latent Dirichlet Allocation Document Clustering Complex Networks Information diffusion

Search results