Global ETD Search

1	COUNTING SORGHUM LEAVES FROM RGB IMAGES BY PANOPTIC SEGMENTATION Ian Ostermann (15321589) 19 April 2023 (has links) <p dir="ltr">Meeting the nutritional requirements of an increasing population in a changing climate is the foremost concern of agricultural research in recent years. A solution to some of the many questions posed by this existential threat is breeding crops that more efficiently produce food with respect to land and water use. A key aspect to this optimization is geometric aspects of plant physiology such as canopy architecture that, while based in the actual 3D structure of the organism, does not necessarily require such a representation to measure. Although deep learning is a powerful tool to answer phenotyping questions that do not require an explicit intermediate 3D representation, training a network traditionally requires a large number of hand-segmented ground truth images. To bypass the enormous time and expense of hand- labeling datasets, we utilized a procedural sorghum image pipeline from another student in our group that produces images similar enough to the ground truth images from the phenotyping facility that the network can be directly used on real data while training only on automatically generated data. The synthetic data was used to train a deep segmentation network to identify which pixels correspond to which leaves. The segmentations were then processed to find the number of leaves identified in each image to use for the leaf-counting task in high-throughput phenotyping. Overall, our method performs comparably with human annotation accuracy by correctly predicting within a 90% confidence interval of the true leaf count in 97% of images while being faster and cheaper. This helps to add another expensive- to-collect phenotypic trait to the list of those that can be automatically collected.</p> Applications in life sciences Computer vision Image processing panoptic segmentation image segmentation task High Throughput Phenotyping
2	DATA DRIVEN TECHNIQUES FOR THE ANALYSIS OF ORAL DOSAGE DRUG FORMULATIONS Ziyi Cao (16986465) 20 September 2024 (has links) <p dir="ltr">This thesis focusses on developing novel data driven oral drug formulation analysis methods by employing technologies such as Fourier transform analysis and generative adversarial learning. Data driven measurements have been addressing challenges in advanced manufacturing and analysis for pharmaceutical development for the last two decade. Data science combined with analytical chemistry holds the future to solving key problems in the next wave of industrial research and development. Data acquisition is expensive in the realm of pharmaceutical development, and how to leverage the capability of data science to extract information in data deprived circumstances is a key aspect for improving such data driven measurements. Among multiple measurement techniques, chemical imaging is an informative tool for analyzing oral drug formulations. However, chemical imaging can often fall into data deprived situations, where data could be limited from the time-consuming sample preparation or related chemical synthesis. An integrated imaging approach, which folds data science techniques into chemical measurements, could lead to a future of informative and cost-effective data driven measurements. In this thesis, the development of data driven chemical imaging techniques for the analysis of oral drug formulations via Fourier transformation and generative adversarial learning are elaborated. Chapter 1 begins with a brief introduction of current techniques commonly implemented within the pharmaceutical industry, their limitations, and how the limitations are being addressed. Chapter 2 discusses how Fourier transform fluorescence recovery after photobleaching (FT-FRAP) technique can be used for monitoring the phase separated drug-polymer aggregation. Chapter 3 follows the innovation presented in Chapter 1 and illustrates how analysis can be improved by incorporating diffractive optical elements in the patterned illumination. While previous chapters discuss dynamic analysis aspects of drug product formulation, Chapter 4 elaborates on the innovation in composition analysis of oral drug products via use of novel generative adversarial learning methods for linear analyses.</p> Analytical spectrometry Applications in life sciences Medical physics Machine Learning in Chemistry chemical imaging analysis Amorphous Solid Dispersions hyperspectral raman analysis Generative Adversarial Networks (GAN) Data Driven Designs Chemometrics
3	Myson Burch Thesis Myson C Burch (16637289) 08 August 2023 (has links) <p>With the completion of the Human Genome Project and many additional efforts since, there is an abundance of genetic data that can be leveraged to revolutionize healthcare. Now, there are significant efforts to develop state-of-the-art techniques that reveal insights about connections between genetics and complex diseases such as diabetes, heart disease, or common psychiatric conditions that depend on multiple genes interacting with environmental factors. These methods help pave the way towards diagnosis, cure, and ultimately prediction and prevention of complex disorders. As a part of this effort, we address high dimensional genomics-related questions through mathematical modeling, statistical methodologies, combinatorics and scalable algorithms. More specifically, we develop innovative techniques at the intersection of technology and life sciences using biobank scale data from genome-wide association studies (GWAS) and machine learning as an effort to better understand human health and disease. <br> <br> The underlying principle behind Genome Wide Association Studies (GWAS) is a test for association between genotyped variants for each individual and the trait of interest. GWAS have been extensively used to estimate the signed effects of trait-associated alleles, mapping genes to disorders and over the past decade about 10,000 strong associations between genetic variants and one (or more) complex traits have been reported. One of the key challenges in GWAS is population stratification which can lead to spurious genotype-trait associations. Our work proposes a simple clustering-based approach to correct for stratification better than existing methods. This method takes into account the linkage disequilibrium (LD) while computing the distance between the individuals in a sample. Our approach, called CluStrat, performs Agglomerative Hierarchical Clustering (AHC) using a regularized Mahalanobis distance-based GRM, which captures the population-level covariance (LD) matrix for the available genotype data.<br> <br> Linear mixed models (LMMs) have been a popular and powerful method when conducting genome-wide association studies (GWAS) in the presence of population structure. LMMs are computationally expensive relative to simpler techniques. We implement matrix sketching in LMMs (MaSk-LMM) to mitigate the more expensive computations. Matrix sketching is an approximation technique where random projections are applied to compress the original dataset into one that is significantly smaller and still preserves some of the properties of the original dataset up to some guaranteed approximation ratio. This technique naturally applies to problems in genetics where we can treat large biobanks as a matrix with the rows representing samples and columns representing SNPs. These matrices will be very large due to the large number of individuals and markers in biobanks and can benefit from matrix sketching. Our approach tackles the bottleneck of LMMs directly by using sketching on the samples of the genotype matrix as well as sketching on the markers during the computation of the relatedness or kinship matrix (GRM). <br> <br> Predictive analytics have been used to improve healthcare by reinforcing decision-making, enhancing patient outcomes, and providing relief for the healthcare system. These methods help pave the way towards diagnosis, cure, and ultimately prediction and prevention of complex disorders. The prevalence of these complex diseases varies greatly around the world. Understanding the basis of this prevalence difference can help disentangle the interaction among different factors causing complex disorders and identify groups of people who may be at a greater risk of developing certain disorders. This could become the basis of the implementation of early intervention strategies for populations at higher risk with significant benefits for public health.<br> <br> This dissertation broadens our understanding of empirical population genetics. It proposes a data-driven perspective to a variety of problems in genetics such as confounding factors in genetic structure. This dissertation highlights current computational barriers in open problems in genetics and provides robust, scalable and efficient methods to ease the analysis of genotype data.</p> Applications in health Applications in life sciences Data engineering and data science computational biology and chemistry Statistical methods and models numerical linear algebra Genetics & Genomics big data challenges
4	VISUAL ANALYTICS OF BIG DATA FROM MOLECULAR DYNAMICS SIMULATION Catherine Jenifer Rajam Rajendran (5931113) 03 February 2023 (has links) <p>Protein malfunction can cause human diseases, which makes the protein a target in the process of drug discovery. In-depth knowledge of how protein functions can widely contribute to the understanding of the mechanism of these diseases. Protein functions are determined by protein structures and their dynamic properties. Protein dynamics refers to the constant physical movement of atoms in a protein, which may result in the transition between different conformational states of the protein. These conformational transitions are critically important for the proteins to function. Understanding protein dynamics can help to understand and interfere with the conformational states and transitions, and thus with the function of the protein. If we can understand the mechanism of conformational transition of protein, we can design molecules to regulate this process and regulate the protein functions for new drug discovery. Protein Dynamics can be simulated by Molecular Dynamics (MD) Simulations.</p> <p>The MD simulation data generated are spatial-temporal and therefore very high dimensional. To analyze the data, distinguishing various atomic interactions within a protein by interpreting their 3D coordinate values plays a significant role. Since the data is humongous, the essential step is to find ways to interpret the data by generating more efficient algorithms to reduce the dimensionality and developing user-friendly visualization tools to find patterns and trends, which are not usually attainable by traditional methods of data process. The typical allosteric long-range nature of the interactions that lead to large conformational transition, pin-pointing the underlying forces and pathways responsible for the global conformational transition at atomic level is very challenging. To address the problems, Various analytical techniques are performed on the simulation data to better understand the mechanism of protein dynamics at atomic level by developing a new program called Probing Long-distance interactions by Tapping into Paired-Distances (PLITIP), which contains a set of new tools based on analysis of paired distances to remove the interference of the translation and rotation of the protein itself and therefore can capture the absolute changes within the protein.</p> <p>Firstly, we developed a tool called Decomposition of Paired Distances (DPD). This tool generates a distance matrix of all paired residues from our simulation data. This paired distance matrix therefore is not subjected to the interference of the translation or rotation of the protein and can capture the absolute changes within the protein. This matrix is then decomposed by DPD</p> <p>using Principal Component Analysis (PCA) to reduce dimensionality and to capture the largest structural variation. To showcase how DPD works, two protein systems, HIV-1 protease and 14-3-3 σ, that both have tremendous structural changes and conformational transitions as displayed by their MD simulation trajectories. The largest structural variation and conformational transition were captured by the first principal component in both cases. In addition, structural clustering and ranking of representative frames by their PC1 values revealed the long-distance nature of the conformational transition and locked the key candidate regions that might be responsible for the large conformational transitions.</p> <p>Secondly, to facilitate further analysis of identification of the long-distance path, a tool called Pearson Coefficient Spiral (PCP) that generates and visualizes Pearson Coefficient to measure the linear correlation between any two sets of residue pairs is developed. PCP allows users to fix one residue pair and examine the correlation of its change with other residue pairs.</p> <p>Thirdly, a set of visualization tools that generate paired atomic distances for the shortlisted candidate residue and captured significant interactions among them were developed. The first tool is the Residue Interaction Network Graph for Paired Atomic Distances (NG-PAD), which not only generates paired atomic distances for the shortlisted candidate residues, but also display significant interactions by a Network Graph for convenient visualization. Second, the Chord Diagram for Interaction Mapping (CD-IP) was developed to map the interactions to protein secondary structural elements and to further narrow down important interactions. Third, a Distance Plotting for Direct Comparison (DP-DC), which plots any two paired distances at user’s choice, either at residue or atomic level, to facilitate identification of similar or opposite pattern change of distances along the simulation time. All the above tools of PLITIP enabled us to identify critical residues contributing to the large conformational transitions in both HIV-1 protease and 14-3-3σ proteins.</p> <p>Beside the above major project, a side project of developing tools to study protein pseudo-symmetry is also reported. It has been proposed that symmetry provides protein stability, opportunities for allosteric regulation, and even functionality. This tool helps us to answer the questions of why there is a deviation from perfect symmetry in protein and how to quantify it.</p> Applications in life sciences Spatial data and applications Semi- and unsupervised learning Visual Analytics Data Visualization Principal Component Analysis Parallel Computing Pearson Coefficient Correlation Protein Structure Analysis Molecular Dynamics Simulation Study Paired-Distance Spatial-Temporal Data Pseudo-Symmetry

1

Page generated in 0.1376 seconds