Global ETD Search

1	New Procedures for Data Mining and Measurement Error Models with Medical Imaging Applications Wang, Xiaofeng 15 July 2005 (has links) No description available. Statistics Spatial-temporal data Medical imaging Registration Smoothing Measurement error models Deconvolution Semiparametrics
2	The Evolution of Urban-Rural Space Olson, Jeffrey L. January 2013 (has links) No description available. Geography
3	Semiparametric Varying Coefficient Models for Matched Case-Crossover Studies Ortega Villa, Ana Maria 23 November 2015 (has links) Semiparametric modeling is a combination of the parametric and nonparametric models in which some functions follow a known form and some others follow an unknown form. In this dissertation we made contributions to semiparametric modeling for matched case-crossover data. In matched case-crossover studies, it is generally accepted that the covariates on which a case and associated controls are matched cannot exert a confounding effect on independent predictors included in the conditional logistic regression model. Any stratum effect is removed by the conditioning on the fixed number of sets of the case and controls in the stratum. However, some matching covariates such as time, and/or spatial location often play an important role as an effect modification. Failure to include them makes incorrect statistical estimation, prediction and inference. Hence in this dissertation, we propose several approaches that will allow the inclusion of time and spatial location as well as other effect modifications such as heterogeneous subpopulations among the data. To address modification due to time, three methods are developed: the first is a parametric approach, the second is a semiparametric penalized approach and the third is a semiparametric Bayesian approach. We demonstrate the advantage of the one stage semiparametric approaches using both a simulation study and an epidemiological example of a 1-4 bi-directional case-crossover study of childhood aseptic meningitis with drinking water turbidity. To address modifications due to time and spatial location, two methods are developed: the first one is a semiparametric spatial-temporal varying coefficient model for a small number of locations. The second method is a semiparametric spatial-temporal varying coefficient model, and is appropriate when the number of locations among the subjects is medium to large. We demonstrate the accuracy of these approaches by using simulation studies, and when appropriate, an epidemiological example of a 1-4 bi-directional case-crossover study. Finally, to explore further effect modifications by heterogeneous subpopulations among strata we propose a nonparametric Bayesian approach constructed with Dirichlet process priors, which clusters subpopulations and assesses heterogeneity. We demonstrate the accuracy of our approach using a simulation study, as well a an example of a 1-4 bi-directional case-crossover study. / Ph. D. Bayesian Nonparametric Conditional logistic regression Matched case-control study Regression splines Spatial-temporal data Varying Coefficient Model
4	Impacts of Climate Change on US Commercial and Residential Building Energy Demand January 2016 (has links) abstract: Energy consumption in buildings, accounting for 41% of 2010 primary energy consumption in the United States (US), is particularly vulnerable to climate change due to the direct relationship between space heating/cooling and temperature. Past studies have assessed the impact of climate change on long-term mean and/or peak energy demands. However, these studies usually neglected spatial variations in the “balance point” temperature, population distribution effects, air-conditioner (AC) saturation, and the extremes at smaller spatiotemporal scales, making the implications of local-scale vulnerability incomplete. Here I develop empirical relationships between building energy consumption and temperature to explore the impact of climate change on long-term mean and extremes of energy demand, and test the sensitivity of these impacts to various factors. I find increases in summertime electricity demand exceeding 50% and decreases in wintertime non-electric energy demand of more than 40% in some states by the end of the century. The occurrence of the most extreme (appearing once-per-56-years) electricity demand increases more than 2600 fold, while the occurrence of the once per year extreme events increases more than 70 fold by the end of this century. If the changes in population and AC saturation are also accounted for, the impact of climate change on building energy demand will be exacerbated. Using the individual building energy simulation approach, I also estimate the impact of climate change to different building types at over 900 US locations. Large increases in building energy consumption are found in the summer, especially during the daytime (e.g., >100% increase for warehouses, 5-6 pm). Large variation of impact is also found within climate zones, suggesting a potential bias when estimating climate-zone scale changes with a small number of representative locations. As a result of climate change, the building energy expenditures increase in some states (as much as $3 billion/year) while in others, costs decline (as much as $1.4 billion/year). Integrated across the contiguous US, these variations result in a net savings of roughly $4.7 billion/year. However, this must be weighed against the cost (exceeding $19 billion) of adding electricity generation capacity in order to maintain the electricity grid’s reliability in summer. / Dissertation/Thesis / Doctoral Dissertation Environmental Social Science 2016 Environmental science Climate change Energy building energy consumption climate change impact climate change mitigation electricity demand extreme weather spatial temporal data analysis
5	Preemptivní bezpečnostní analýza dopravního chování z trajektorií / Preemptive Safety Analysis of Road Users' Behavior from Trajectories Zapletal, Dominik January 2018 (has links) This work deals with the and preemptive road users behaviour safety analysis problem. Safety analysis is based on a processing of road users trajectories obtained from processed aerial videos captured by drons. A system for traffic conflicts detection from spatial-temporal data is presented in this work. The standard approach for pro-active traffic conflict indicators evaluation was extended by simulating traffic objects movement in the scene using Ackerman steering geometry in order to get more accurate results.
6	Statistical Inference for Change Points in High-Dimensional Offline and Online Data Li, Lingjun 07 April 2020 (has links) No description available. Mathematics Statistics Change point analysis Change-point detection Spatial-temporal data Large p small n High-dimensional data Average run length Expected detection delay
7	Visual Analytics of Big Data from Molecular Dynamics Simulation Rajendran, Catherine Jenifer Rajam 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Protein malfunction can cause human diseases, which makes the protein a target in the process of drug discovery. In-depth knowledge of how protein functions can widely contribute to the understanding of the mechanism of these diseases. Protein functions are determined by protein structures and their dynamic properties. Protein dynamics refers to the constant physical movement of atoms in a protein, which may result in the transition between different conformational states of the protein. These conformational transitions are critically important for the proteins to function. Understanding protein dynamics can help to understand and interfere with the conformational states and transitions, and thus with the function of the protein. If we can understand the mechanism of conformational transition of protein, we can design molecules to regulate this process and regulate the protein functions for new drug discovery. Protein Dynamics can be simulated by Molecular Dynamics (MD) Simulations. The MD simulation data generated are spatial-temporal and therefore very high dimensional. To analyze the data, distinguishing various atomic interactions within a protein by interpreting their 3D coordinate values plays a significant role. Since the data is humongous, the essential step is to find ways to interpret the data by generating more efficient algorithms to reduce the dimensionality and developing user-friendly visualization tools to find patterns and trends, which are not usually attainable by traditional methods of data process. The typical allosteric long-range nature of the interactions that lead to large conformational transition, pin-pointing the underlying forces and pathways responsible for the global conformational transition at atomic level is very challenging. To address the problems, Various analytical techniques are performed on the simulation data to better understand the mechanism of protein dynamics at atomic level by developing a new program called Probing Long-distance interactions by Tapping into Paired-Distances (PLITIP), which contains a set of new tools based on analysis of paired distances to remove the interference of the translation and rotation of the protein itself and therefore can capture the absolute changes within the protein. Firstly, we developed a tool called Decomposition of Paired Distances (DPD). This tool generates a distance matrix of all paired residues from our simulation data. This paired distance matrix therefore is not subjected to the interference of the translation or rotation of the protein and can capture the absolute changes within the protein. This matrix is then decomposed by DPD using Principal Component Analysis (PCA) to reduce dimensionality and to capture the largest structural variation. To showcase how DPD works, two protein systems, HIV-1 protease and 14-3-3 σ, that both have tremendous structural changes and conformational transitions as displayed by their MD simulation trajectories. The largest structural variation and conformational transition were captured by the first principal component in both cases. In addition, structural clustering and ranking of representative frames by their PC1 values revealed the long-distance nature of the conformational transition and locked the key candidate regions that might be responsible for the large conformational transitions. Secondly, to facilitate further analysis of identification of the long-distance path, a tool called Pearson Coefficient Spiral (PCP) that generates and visualizes Pearson Coefficient to measure the linear correlation between any two sets of residue pairs is developed. PCP allows users to fix one residue pair and examine the correlation of its change with other residue pairs. Thirdly, a set of visualization tools that generate paired atomic distances for the shortlisted candidate residue and captured significant interactions among them were developed. The first tool is the Residue Interaction Network Graph for Paired Atomic Distances (NG-PAD), which not only generates paired atomic distances for the shortlisted candidate residues, but also display significant interactions by a Network Graph for convenient visualization. Second, the Chord Diagram for Interaction Mapping (CD-IP) was developed to map the interactions to protein secondary structural elements and to further narrow down important interactions. Third, a Distance Plotting for Direct Comparison (DP-DC), which plots any two paired distances at user’s choice, either at residue or atomic level, to facilitate identification of similar or opposite pattern change of distances along the simulation time. All the above tools of PLITIP enabled us to identify critical residues contributing to the large conformational transitions in both HIV-1 protease and 14-3-3σ proteins. Beside the above major project, a side project of developing tools to study protein pseudo-symmetry is also reported. It has been proposed that symmetry provides protein stability, opportunities for allosteric regulation, and even functionality. This tool helps us to answer the questions of why there is a deviation from perfect symmetry in protein and how to quantify it. Visual Analytics Data Visualization Principal Component Analysis Parallel Computing Protein Structure Analysis Molecular Dynamics Simulation Study Spatial-Temporal Data Paired-Distances Pseudo-Symmetry in Proteins
8	Learning with Sparcity: Structures, Optimization and Applications Chen, Xi 01 July 2013 (has links) The development of modern information technology has enabled collecting data of unprecedented size and complexity. Examples include web text data, microarray & proteomics, and data from scientific domains (e.g., meteorology). To learn from these high dimensional and complex data, traditional machine learning techniques often suffer from the curse of dimensionality and unaffordable computational cost. However, learning from large-scale high-dimensional data promises big payoffs in text mining, gene analysis, and numerous other consequential tasks. Recently developed sparse learning techniques provide us a suite of tools for understanding and exploring high dimensional data from many areas in science and engineering. By exploring sparsity, we can always learn a parsimonious and compact model which is more interpretable and computationally tractable at application time. When it is known that the underlying model is indeed sparse, sparse learning methods can provide us a more consistent model and much improved prediction performance. However, the existing methods are still insufficient for modeling complex or dynamic structures of the data, such as those evidenced in pathways of genomic data, gene regulatory network, and synonyms in text data. This thesis develops structured sparse learning methods along with scalable optimization algorithms to explore and predict high dimensional data with complex structures. In particular, we address three aspects of structured sparse learning: 1. Efficient and scalable optimization methods with fast convergence guarantees for a wide spectrum of high-dimensional learning tasks, including single or multi-task structured regression, canonical correlation analysis as well as online sparse learning. 2. Learning dynamic structures of different types of undirected graphical models, e.g., conditional Gaussian or conditional forest graphical models. 3. Demonstrating the usefulness of the proposed methods in various applications, e.g., computational genomics and spatial-temporal climatological data. In addition, we also design specialized sparse learning methods for text mining applications, including ranking and latent semantic analysis. In the last part of the thesis, we also present the future direction of the high-dimensional structured sparse learning from both computational and statistical aspects. Machine Learning Sparse Learning Optimization Structure Regression Multi-task Regression Canonical Correlation Analysis Undirected Graphical Models First-order Method Stochastic Optimization Text Mining Ranking Latent Semantic Analysis Spatial-temporal Data Computational Genomics Computer Sciences
9	TEMPORAL EVENT MODELING OF SOCIAL HARM WITH HIGH DIMENSIONAL AND LATENT COVARIATES Xueying Liu (13118850) 09 September 2022 (has links) <p> </p> <p>The counting process is the fundamental of many real-world problems with event data. Poisson process, used as the background intensity of Hawkes process, is the most commonly used point process. The Hawkes process, a self-exciting point process fits to temporal event data, spatial-temporal event data, and event data with covariates. We study the Hawkes process that fits to heterogeneous drug overdose data via a novel semi-parametric approach. The counting process is also related to survival data based on the fact that they both study the occurrences of events over time. We fit a Cox model to temporal event data with a large corpus that is processed into high dimensional covariates. We study the significant features that influence the intensity of events. </p> Data mining and knowledge discovery Graph, social and multimedia data Information extraction and fusion Deep learning Semi- and unsupervised learning Hawke Process Latent Covariates Social Harms Spatial-Temporal Data Cox Proportional Hazard Model Temporal Event Sequence Counting Process
10	VISUAL ANALYTICS OF BIG DATA FROM MOLECULAR DYNAMICS SIMULATION Catherine Jenifer Rajam Rajendran (5931113) 03 February 2023 (has links) <p>Protein malfunction can cause human diseases, which makes the protein a target in the process of drug discovery. In-depth knowledge of how protein functions can widely contribute to the understanding of the mechanism of these diseases. Protein functions are determined by protein structures and their dynamic properties. Protein dynamics refers to the constant physical movement of atoms in a protein, which may result in the transition between different conformational states of the protein. These conformational transitions are critically important for the proteins to function. Understanding protein dynamics can help to understand and interfere with the conformational states and transitions, and thus with the function of the protein. If we can understand the mechanism of conformational transition of protein, we can design molecules to regulate this process and regulate the protein functions for new drug discovery. Protein Dynamics can be simulated by Molecular Dynamics (MD) Simulations.</p> <p>The MD simulation data generated are spatial-temporal and therefore very high dimensional. To analyze the data, distinguishing various atomic interactions within a protein by interpreting their 3D coordinate values plays a significant role. Since the data is humongous, the essential step is to find ways to interpret the data by generating more efficient algorithms to reduce the dimensionality and developing user-friendly visualization tools to find patterns and trends, which are not usually attainable by traditional methods of data process. The typical allosteric long-range nature of the interactions that lead to large conformational transition, pin-pointing the underlying forces and pathways responsible for the global conformational transition at atomic level is very challenging. To address the problems, Various analytical techniques are performed on the simulation data to better understand the mechanism of protein dynamics at atomic level by developing a new program called Probing Long-distance interactions by Tapping into Paired-Distances (PLITIP), which contains a set of new tools based on analysis of paired distances to remove the interference of the translation and rotation of the protein itself and therefore can capture the absolute changes within the protein.</p> <p>Firstly, we developed a tool called Decomposition of Paired Distances (DPD). This tool generates a distance matrix of all paired residues from our simulation data. This paired distance matrix therefore is not subjected to the interference of the translation or rotation of the protein and can capture the absolute changes within the protein. This matrix is then decomposed by DPD</p> <p>using Principal Component Analysis (PCA) to reduce dimensionality and to capture the largest structural variation. To showcase how DPD works, two protein systems, HIV-1 protease and 14-3-3 σ, that both have tremendous structural changes and conformational transitions as displayed by their MD simulation trajectories. The largest structural variation and conformational transition were captured by the first principal component in both cases. In addition, structural clustering and ranking of representative frames by their PC1 values revealed the long-distance nature of the conformational transition and locked the key candidate regions that might be responsible for the large conformational transitions.</p> <p>Secondly, to facilitate further analysis of identification of the long-distance path, a tool called Pearson Coefficient Spiral (PCP) that generates and visualizes Pearson Coefficient to measure the linear correlation between any two sets of residue pairs is developed. PCP allows users to fix one residue pair and examine the correlation of its change with other residue pairs.</p> <p>Thirdly, a set of visualization tools that generate paired atomic distances for the shortlisted candidate residue and captured significant interactions among them were developed. The first tool is the Residue Interaction Network Graph for Paired Atomic Distances (NG-PAD), which not only generates paired atomic distances for the shortlisted candidate residues, but also display significant interactions by a Network Graph for convenient visualization. Second, the Chord Diagram for Interaction Mapping (CD-IP) was developed to map the interactions to protein secondary structural elements and to further narrow down important interactions. Third, a Distance Plotting for Direct Comparison (DP-DC), which plots any two paired distances at user’s choice, either at residue or atomic level, to facilitate identification of similar or opposite pattern change of distances along the simulation time. All the above tools of PLITIP enabled us to identify critical residues contributing to the large conformational transitions in both HIV-1 protease and 14-3-3σ proteins.</p> <p>Beside the above major project, a side project of developing tools to study protein pseudo-symmetry is also reported. It has been proposed that symmetry provides protein stability, opportunities for allosteric regulation, and even functionality. This tool helps us to answer the questions of why there is a deviation from perfect symmetry in protein and how to quantify it.</p> Applications in life sciences Spatial data and applications Semi- and unsupervised learning Visual Analytics Data Visualization Principal Component Analysis Parallel Computing Pearson Coefficient Correlation Protein Structure Analysis Molecular Dynamics Simulation Study Paired-Distance Spatial-Temporal Data Pseudo-Symmetry

Search results