Global ETD Search

11	AUTOMATED EVALUATION OF NEUROLOGICAL DISORDERS THROUGH ELECTRONIC HEALTH RECORD ANALYSIS Md Rakibul Islam Prince (18771646) 03 September 2024 (has links) <p dir="ltr">Neurological disorders present a considerable challenge due to their variety and diagnostic complexity especially for older adults. Early prediction of the onset and ongoing assessment of the severity of these disease conditions can allow timely interventions. Currently, most of the assessment tools are time-consuming, costly, and not suitable for use in primary care. To reduce this burden, the present thesis introduces passive digital markers for different disease conditions that can effectively automate the severity assessment and risk prediction from different modalities of electronic health records (EHR). The focus of the first phase of the present study in on developing passive digital markers for the functional assessment of patients suffering from Bipolar disorder and Schizophrenia. The second phase of the study explores different architectures for passive digital markers that can predict patients at risk for dementia. The functional severity PDM uses only a single EHR modality, namely medical notes in order to assess the severity of the functioning of schizophrenia, bipolar type I, or mixed bipolar patients. In this case, the input of is a single medical note from the electronic medical record of the patient. This note is submitted to a hierarchical BERT model which classifies at-risk patients. A hierarchical attention mechanism is adopted because medical notes can exceed the maximum allowed number of tokens by most language models including BERT. The functional severity PDM follows three steps. First, a sentence-level embedding is produced for each sentence in the note using a token-level attention mechanism. Second, an embedding for the entire note is constructed using a sentence-level attention mechanism. Third, the final embedding is classified using a feed-forward neural network which estimates the impairment level of the patient. When used prior to the onset of the disease, this PDM is able to differentiate between severe and moderate functioning levels with an AUC of 76%. Disease-specific severity assessment PDMs are only applicable after the onset of the disease and have AUCs of nearly 85% for schizophrenia and bipolar patients. The dementia risk prediction PDM considers multiple EHR modalities including socio-demographic data, diagnosis codes and medical notes. Moreover, the observation period and prediction horizon are varied for a better understanding of the practical limitations of the model. This PDM is able to identify patients at risk of dementia with AUCs ranging from 70% to 92% as the observation period approaches the index date. The present study introduces methodologies for the automation of important clinical outcomes such as the assessment of the general functioning of psychiatric patients and the prediction of risk for dementia using only routine care data.</p> Natural language processing Deep learning Neural networks Semi- and unsupervised learning language model integration Large Language Models (LLMs) machine learning and AI Dementia -- Prevention Schizophrenia Patients Schizophrenia bipolar disorder patients Psychiatric patient BERT models Llama-2
12	Multimodal Data Management in Open-world Environment K M A Solaiman (16678431) 02 August 2023 (has links) <p>The availability of abundant multimodal data, including textual, visual, and sensor-based information, holds the potential to improve decision-making in diverse domains. Extracting data-driven decision-making information from heterogeneous and changing datasets in real-world data-centric applications requires achieving complementary functionalities of multimodal data integration, knowledge extraction and mining, situationally-aware data recommendation to different users, and uncertainty management in the open-world setting. To achieve a system that encompasses all of these functionalities, several challenges need to be effectively addressed: (1) How to represent and analyze heterogeneous source contents and application context for multimodal data recommendation? (2) How to predict and fulfill current and future needs as new information streams in without user intervention? (3) How to integrate disconnected data sources and learn relevant information to specific mission needs? (4) How to scale from processing petabytes of data to exabytes? (5) How to deal with uncertainties in open-world that stem from changes in data sources and user requirements?</p> <p><br></p> <p>This dissertation tackles these challenges by proposing novel frameworks, learning-based data integration and retrieval models, and algorithms to empower decision-makers to extract valuable insights from diverse multimodal data sources. The contributions of this dissertation can be summarized as follows: (1) We developed SKOD, a novel multimodal knowledge querying framework that overcomes the data representation, scalability, and data completeness issues while utilizing streaming brokers and RDBMS capabilities with entity-centric semantic features as an effective representation of content and context. Additionally, as part of the framework, a novel text attribute recognition model called HART was developed, which leveraged language models and syntactic properties of large unstructured texts. (2) In the SKOD framework, we incrementally proposed three different approaches for data integration of the disconnected sources from their semantic features to build a common knowledge base with the user information need: (i) EARS: A mediator approach using schema mapping of the semantic features and SQL joins was proposed to address scalability challenges in data integration; (ii) FemmIR: A data integration approach for more susceptible and flexible applications, that utilizes neural network-based graph matching techniques to learn coordinated graph representations of the data. It introduces a novel graph creation approach from the features and a novel similarity metric among data sources; (iii) WeSJem: This approach allows zero-shot similarity matching and data discovery by using contrastive learning<br> to embed data samples and query examples in a high-dimensional space using features as a novel source of supervision instead of relevance labels. (3) Finally, to manage uncertainties in multimodal data management for open-world environments, we characterized novelties in multimodal information retrieval based on data drift. Moreover, we proposed a novelty detection and adaptation technique as an augmentation to WeSJem.<br> </p> <p>The effectiveness of the proposed frameworks, models, and algorithms was demonstrated<br> through real-world system prototypes that solved open problems requiring large-scale human<br> endeavors and computational resources. Specifically, these prototypes assisted law enforcement officers in automating investigations and finding missing persons.<br> </p> Knowledge representation and reasoning Natural language processing Data mining and knowledge discovery Information extraction and fusion Recommender systems Collaborative and social computing Knowledge and information management Context learning Semi- and unsupervised learning Multimodal Information Retrieval Data Integration Text Attribute Extraction Missing Persons Situational Knowledge Extraction Representation Learning
13	LEVERAGING MACHINE LEARNING FOR ENHANCED SATELLITE TRACKING TO BOLSTER SPACE DOMAIN AWARENESS Charles William Grey (16413678) 23 June 2023 (has links) <p>Our modern society is more dependent on its assets in space now more than ever. For<br> example, the Global Positioning System (GPS) many rely on for navigation uses data from a<br> 24-satellite constellation. Additionally, our current infrastructure for gas pumps, cell phones,<br> ATMs, traffic lights, weather data, etc. all depend on satellite data from various constel-<br> lations. As a result, it is increasingly necessary to accurately track and predict the space<br> domain. In this thesis, after discussing how space object tracking and object position pre-<br> diction is currently being done, I propose a machine learning-based approach to improving<br> the space object position prediction over the standard SGP4 method, which is limited in<br> prediction accuracy time to about 24 hours. Using this approach, we are able to show that<br> meaningful improvements over the standard SGP4 model can be achieved using a machine<br> learning model built based on a type of recurrent neural network called a long short term<br> memory model (LSTM). I also provide distance predictions for 4 different space objects over<br> time frames of 15 and 30 days. Future work in this area is likely to include extending and<br> validating this approach on additional satellites to construct a more general model, testing a<br> wider range of models to determine limits on accuracy across a broad range of time horizons,<br> and proposing similar methods less dependent on antiquated data formats like the TLE.</p> Neural networks Semi- and unsupervised learning LSTM TLE LSTM RNN Long Short Term Memory (LSTM) Network Machine Learning Satellite Tracking SGP4 Space Object Tracking low earth orbit (LEO)
14	PROGRAM ANOMALY DETECTION FOR INTERNET OF THINGS Akash Agarwal (13114362) 01 September 2022 (has links) <p>Program anomaly detection — modeling normal program executions to detect deviations at runtime as cues for possible exploits — has become a popular approach for software security. To leverage high performance modeling and complete tracing, existing techniques however focus on subsets of applications, e.g., on system calls or calls to predefined libraries. Due to limited scope, it is insufficient to detect subtle control-oriented and data-oriented attacks that introduces new illegal call relationships at the application level. Also such techniques are hard to apply on devices that lack a clear separation between OS and the application layer. This dissertation advances the design and implementation of program anomaly detection techniques by providing application context for library and system calls making it powerful for detecting advanced attacks targeted at manipulating intra- and inter-procedural control-flow and decision variables. </p> <p><br></p> <p>This dissertation has two main parts. The first part describes a statically initialized generic calling context program anomaly detection technique LANCET based on Hidden Markov Modeling to provide security against control-oriented attacks at program runtime. It also establishes an efficient execution tracing mechanism facilitated through source code instrumentation of applications. The second part describes a program anomaly detection framework EDISON to provide security against data-oriented attacks using graph representation learning and language models for intra and inter-procedural behavioral modeling respectively.</p> <p><br> This dissertation makes three high-level contributions. First, the concise descriptions demonstrates the design, implementation and extensive evaluation of an aggregation-based anomaly detection technique using fine-grained generic calling context-sensitive modeling that allows for scaling the detection over entire applications. Second, the precise descriptions show the design, implementation, and extensive evaluation of a detection technique that maps runtime traces to the program’s control-flow graph and leverages graphical feature representation to learn dynamic program behavior. Finally, this dissertation provides details and experience for designing program anomaly detection frameworks from high-level concepts, design, to low-level implementation techniques.</p> Software and application security System and network security Deep learning Neural networks Semi- and unsupervised learning Application Security Software security Anomaly detection (Computer security) Software instrumentation Cyber-physical systems (CPS) Deep Learning Applications IoT Security data-oriented attacks control-oriented attacks Memory Corruption Hidden Markov model, HMM
15	VISUAL ANALYTICS OF BIG DATA FROM MOLECULAR DYNAMICS SIMULATION Catherine Jenifer Rajam Rajendran (5931113) 03 February 2023 (has links) <p>Protein malfunction can cause human diseases, which makes the protein a target in the process of drug discovery. In-depth knowledge of how protein functions can widely contribute to the understanding of the mechanism of these diseases. Protein functions are determined by protein structures and their dynamic properties. Protein dynamics refers to the constant physical movement of atoms in a protein, which may result in the transition between different conformational states of the protein. These conformational transitions are critically important for the proteins to function. Understanding protein dynamics can help to understand and interfere with the conformational states and transitions, and thus with the function of the protein. If we can understand the mechanism of conformational transition of protein, we can design molecules to regulate this process and regulate the protein functions for new drug discovery. Protein Dynamics can be simulated by Molecular Dynamics (MD) Simulations.</p> <p>The MD simulation data generated are spatial-temporal and therefore very high dimensional. To analyze the data, distinguishing various atomic interactions within a protein by interpreting their 3D coordinate values plays a significant role. Since the data is humongous, the essential step is to find ways to interpret the data by generating more efficient algorithms to reduce the dimensionality and developing user-friendly visualization tools to find patterns and trends, which are not usually attainable by traditional methods of data process. The typical allosteric long-range nature of the interactions that lead to large conformational transition, pin-pointing the underlying forces and pathways responsible for the global conformational transition at atomic level is very challenging. To address the problems, Various analytical techniques are performed on the simulation data to better understand the mechanism of protein dynamics at atomic level by developing a new program called Probing Long-distance interactions by Tapping into Paired-Distances (PLITIP), which contains a set of new tools based on analysis of paired distances to remove the interference of the translation and rotation of the protein itself and therefore can capture the absolute changes within the protein.</p> <p>Firstly, we developed a tool called Decomposition of Paired Distances (DPD). This tool generates a distance matrix of all paired residues from our simulation data. This paired distance matrix therefore is not subjected to the interference of the translation or rotation of the protein and can capture the absolute changes within the protein. This matrix is then decomposed by DPD</p> <p>using Principal Component Analysis (PCA) to reduce dimensionality and to capture the largest structural variation. To showcase how DPD works, two protein systems, HIV-1 protease and 14-3-3 σ, that both have tremendous structural changes and conformational transitions as displayed by their MD simulation trajectories. The largest structural variation and conformational transition were captured by the first principal component in both cases. In addition, structural clustering and ranking of representative frames by their PC1 values revealed the long-distance nature of the conformational transition and locked the key candidate regions that might be responsible for the large conformational transitions.</p> <p>Secondly, to facilitate further analysis of identification of the long-distance path, a tool called Pearson Coefficient Spiral (PCP) that generates and visualizes Pearson Coefficient to measure the linear correlation between any two sets of residue pairs is developed. PCP allows users to fix one residue pair and examine the correlation of its change with other residue pairs.</p> <p>Thirdly, a set of visualization tools that generate paired atomic distances for the shortlisted candidate residue and captured significant interactions among them were developed. The first tool is the Residue Interaction Network Graph for Paired Atomic Distances (NG-PAD), which not only generates paired atomic distances for the shortlisted candidate residues, but also display significant interactions by a Network Graph for convenient visualization. Second, the Chord Diagram for Interaction Mapping (CD-IP) was developed to map the interactions to protein secondary structural elements and to further narrow down important interactions. Third, a Distance Plotting for Direct Comparison (DP-DC), which plots any two paired distances at user’s choice, either at residue or atomic level, to facilitate identification of similar or opposite pattern change of distances along the simulation time. All the above tools of PLITIP enabled us to identify critical residues contributing to the large conformational transitions in both HIV-1 protease and 14-3-3σ proteins.</p> <p>Beside the above major project, a side project of developing tools to study protein pseudo-symmetry is also reported. It has been proposed that symmetry provides protein stability, opportunities for allosteric regulation, and even functionality. This tool helps us to answer the questions of why there is a deviation from perfect symmetry in protein and how to quantify it.</p> Applications in life sciences Spatial data and applications Semi- and unsupervised learning Visual Analytics Data Visualization Principal Component Analysis Parallel Computing Pearson Coefficient Correlation Protein Structure Analysis Molecular Dynamics Simulation Study Paired-Distance Spatial-Temporal Data Pseudo-Symmetry
16	Dynamics of Forest Ecosystems Under Global Change: Applications of Artificial Intelligence in Mapping, Classification, and Projection Akane Ota Abbasi (17123185) 10 October 2023 (has links) <p dir="ltr">Global forest ecosystems provide essential ecosystem services that contribute to water and climate regulation, food production, recreation, and raw materials. They also serve as crucial habitats for numerous terrestrial species of amphibians, birds, and mammals worldwide. However, recent decades have witnessed unprecedented changes in forest ecosystems due to climate change, shifts in species distribution patterns, increased planted forest areas, and various disturbances such as forest fires, insect infestations, and urbanization. These changes can have far-reaching impacts on ecological networks, human well-being, and the well-being of global forest ecosystems. To address these challenges, I present four studies to quantify forest dynamics through mapping, classification, and projection, using artificial intelligence tools in combination with a vast amount of training data. (I) I present a spatially continuous map of planted forest distribution across East Asia, produced by integrating multiple sources of planted and natural forest data. I found that China contributed 87% of the total planted forest areas in East Asia, most of which are located in the lowland tropical/subtropical regions and Sichuan Basin. I also estimated the dominant genus in each planted forest location. (II) I used continent-wide forest inventory data to compare the range shifts of forest types and their constituent tree species in North America in the past 50 years. I found that forest types shifted more than three times as fast as the average of their constituent tree species. This marked difference was attributable to a predominant positive covariance between tree species ranges and the change of species relative abundance. (III) Based on individual-level field surveys of trees and breeding birds across North America, I characterized New World wood-warbler (<i>Parulidae</i>) species richness and its potential drivers. I identified forest type as the most powerful predictor of New World wood-warbler species richness, which adds valuable evidence to the ongoing physiognomy versus composition debate among ornithologists. (IV) In the appendix, I utilized continent-wide forest inventory data from North America and South America and the combination of supervised and unsupervised machine learning algorithms to produce the first data-driven map of forest types in the Americas. I revealed the distribution of forest types, which are useful for cost-effective forest and biodiversity management and planning. Taken together, these studies provide insight into the dynamics of forest ecosystems at a large geographic scale and have implications for effective decision-making in conservation, management, and global restoration programs in the midst of ongoing global change.</p> Forest biodiversity Forest ecosystems Modelling and simulation Deep learning Neural networks Semi- and unsupervised learning forest dynamics modeling Global Change Climate Change Machine Learning Biodiversity Forest Ecology forest ecosystem modeling Planted Forests Forest type classification Species Distribution Deep Learning Forest Inventory & Analysis Program Forest Inventory Parulidae Species Richness Habitat physiognomy Habitat Heterogeneity Habitat Composition Markowitz portfolio selection
17	EXPLORING GRAPH NEURAL NETWORKS FOR CLUSTERING AND CLASSIFICATION Fattah Muhammad Tahabi (14160375) 03 February 2023 (has links) <p><strong>Graph Neural Networks</strong> (GNNs) have become excessively popular and prominent deep learning techniques to analyze structural graph data for their ability to solve complex real-world problems. Because graphs provide an efficient approach to contriving abstract hypothetical concepts, modern research overcomes the limitations of classical graph theory, requiring prior knowledge of the graph structure before employing traditional algorithms. GNNs, an impressive framework for representation learning of graphs, have already produced many state-of-the-art techniques to solve node classification, link prediction, and graph classification tasks. GNNs can learn meaningful representations of graphs incorporating topological structure, node attributes, and neighborhood aggregation to solve supervised, semi-supervised, and unsupervised graph-based problems. In this study, the usefulness of GNNs has been analyzed primarily from two aspects - <strong>clustering and classification</strong>. We focus on these two techniques, as they are the most popular strategies in data mining to discern collected data and employ predictive analysis.</p> Biomechanical engineering Neural engineering Health promotion Preventative health care Applications in health Spatial data and applications Evolutionary computation Natural language processing Planning and decision making Data engineering and data science Data mining and knowledge discovery Graph, social and multimedia data Information retrieval and web search Knowledge and information management Context learning Deep learning Neural networks Semi- and unsupervised learning Data structures and algorithms Graph neural network Node classification Graph clustering Temporal graphs dynamic graphs NODE2VEC Graph Attention Mechanism Hunting BiLSTM model EHR data colorectal Cancer Cancers Cancer symptoms symptom Symptom cluster studies Coauthorship networks network analysis Word2vec Hierarchical Clustering method Dunn index semantic analysis text mining Natural Language Processing Tool UMLS identifiers umls Clinical Data Management

Page generated in 0.1184 seconds