Global ETD Search

1	Biological Inference from Single Cell RNA-Sequencing Levitin, Hanna M. January 2020 (has links) Tissues are heterogeneous communities of cells that work together to achieve a higher-order function. Large-scale single cell RNA-sequencing (scRNA-seq) offers an unprecedented opportunity to systematically map the transcriptional programs underlying this diversity. However, extracting biological signal from noisy, high-dimensional scRNA-seq data requires carefully designed, statistically robust methodology that makes appropriate assumptions both for the data and for the biological question of interest. This thesis explores computational approaches to finding biological signal in scRNA-seq datasets. Chapter 2 focuses on preprocessing and cell-centric approaches to downstream analysis that have become a mainstay of analytical pipelines for scRNA-seq, and includes dissections of lineage diversity in high grade glioma and in the largest neural stem cell niche in the adult mouse brain. Notably, the former study suggests that heterogeneity in high grade glioma arises from at least two distinct biological processes: aberrant neural development and mesenchymal transformation. Chapter 3 presents a flexible approach for de novo discovery of gene expression programs without an a priori structure across cells, revealing subtle properties of a spatially sampled high grade glioma that would not have been apparent with previous approaches. Chapter 4 leverages our prior work and a unique tissue resource to build a unified reference map of human T cell functional states across tissues and ages. We discover and validate a novel pan-T cell activation marker and a previously undescribed kinetic intermediate in CD4+ T cell activation. Finally, ongoing work defines key programs of gene expression in tissue-associated T cells in infants and adults and predicts their candidate regulators. Biology--Classification Bioinformatics--Methodology RNA Computational biology T cells--Research
2	Understanding the Utility of Social Risk Factors Documented in Clinical Notes to Predict Hospitalization and Emergency Department Visits in Home Healthcare Hobensack, Mollie January 2023 (has links) Background: Approximately 5 million older adults receive home healthcare (HHC) annually in the United Sates, and nearly 90% of HHC recipients are 65 years or older. HHC encompasses in-home interdisciplinary services such as skilled nursing, social work, and physical, speech, and occupational therapy. One in every five patients is hospitalized during their time in HHC. Researchers have explored machine learning models that use data in the electronic health record (EHR) to aid clinicians in identifying patients at high risk for hospitalization and emergency department (ED) visits. Failure to consider social risk factors can exacerbate health inequities. Some studies suggest that including social risk factors in machine learning models can help to mitigate bias in model performance among individuals from racial and ethnic minority groups. Prior literature has reported that a majority of social information is documented in clinical notes. In the HHC setting, there is a gap in understanding how social risk factors are documented in clinical notes and whether adding social risk factors in machine learning models can improve model performance. Thus, this dissertation aims to: 1) summarize the literature on machine learning conducted in the HHC setting, 2) extract social risk factors documented in HHC clinical notes, and 3) examine how social risk factors influence machine learning model performance. Methods: The data from this dissertation is from one HHC agency in New York, New York, including approximately 65,000 unique patients and 2.3 million clinical notes. The Biopsychosocial Model guided this study by providing a framework to report the features included in the machine learning models. To address the first aim, a scoping review was conducted to summarize the literature on machine learning applied to EHR data in the HHC setting. To address the second aim, a natural language processing system was developed to extract social risk factors from HHC clinical notes. Then, logistic regression was utilized to examine the association between the social risk factors documented in clinical notes and hospitalization and ED visits. Finally, to address the third aim, social risk factors were included in four machine learning models to predict hospitalization and ED visit risk in HHC. A sub-analysis was conducted to explore the utility of social risk factors in machine learning models across individuals from different racial and ethnic groups. Results: The results from all three aims suggest that there has been a rise in machine learning applied in HHC, but few studies have incorporated clinical notes. There are gaps in implementing machine learning models in practice and standardizing social risk factors in documentation. HHC clinicians are documenting the following social risk factors in 4% of their clinical notes: Social Environment, Physical Environment, Education and Literacy, Food Insecurity, and Access to Care. These social risk factors are significantly associated with hospitalization and ED visits; however, their contribution showed minimal differences in machine learning model performance. Conclusion: This dissertation study demonstrates the feasibility and utility of leveraging HHC clinicians’ clinical notes to understand social risk factors. Further exploration is needed to tease out the nuances in how HHC clinicians perceive, assess, and document social risk factors in the EHR. Stakeholders are encouraged to standardize social risk factors and develop informatics tools tailored to the HHC setting to improve the identification of patients at risk for hospitalization and ED visits. Nursing Home health aides Machine learning Hospital patients Racism in medicine Bioinformatics--Methodology
3	Towards continuous sensing for human health: platforms for early detection and personalized treatment of disease Behnam, Vira January 2024 (has links) Wearable technology offers the promise of decentralized and personalized healthcare, which can both alleviate current burdens on medical resources, and also help individuals to be more informed about their health. The heterogeneity of disease phenotypes necessitates adaptations to both diagnosing and surveilling disease, but to ensure user adoption and behavioral change, there needs to be a convenient way to amass such health information continuously. This can be in part accomplished by the development of continuously monitoring, compact wearable medical sensors and analytics technology that provide updates on analyte and biosignal measurements at regular intervals in situ. This dissertation investigates methods for collecting and analyzing information from wearable devices with these principles in mind. In Aim 1, we developed new methods for analysis of cardiovascular biosignals. Current methods of estimating left ventricular mass index (LVMI, a strong risk factor for cardiac outcomes), rely on the analysis of echocardiographic signals. Though still the gold standard, echocardiography can typically only be performed in the clinic, making it inconvenient to obtain frequent measurements of LVMI. Frequent measurements can be useful for monitoring cardiac risk, particularly for high-risk individuals, so we investigated the feasibility of predicting LVMI using a deep learning-based approach through ambulatory blood pressure readings, a one-time laboratory test and demographic information. We find that adding blood pressure waveform information in conjunction with multitask learning improved prediction errors (compared to baseline linear regression and neural network models), pointing to its potential as a clinical tool. Using transfer learning, we developed a model that does not require waveform data, but achieved similar prediction accuracies as methods that do require such data – opening the door to use cases that eliminate the need for wearing a blood pressure cuff continuously during the measurement period. Overall, such a technique has the potential to provide information to individuals who are at high risk of cardiac outcomes both inside and outside the clinic. In Aims 2 and 3, we developed a minimally invasive hydrogel patch for continuous monitoring of calcium, as proof-of-concept for wearable measurement of a wide variety of analytes typically assayed in the lab – a technology that can facilitate treatment and management of many prevalent diseases. Specifically, in Aim 2, we engineered a DNA polyacrylamide hydrogel microneedle array that sensed physiologically relevant calcium levels, for potential use by individuals who have hypoparathyroidism, a condition in which blood calcium levels are low and calcium supplements are needed. A negative mold was made using a CNC mill, the positive mold was cast in silicone, and the aptamer along with acrylamide and bis-acrylamide was seeded into the silicone mold. The DNA hydrogel was then fabricated using a simple UV curing protocol. The optimized DNA hydrogel was specific to calcium, used simple fabrication methods and had a fast, reversible signal response. Finally, in Aim 3, we developed the DNA hydrogel sensor into a wearable, integrated system with real-time fluorescence monitoring for testing in vivo. The microneedle array needed to be hydrated for the DNA aptamer to function, but polyacrylamide was too weak in its hydrated state to effectively pierce through skin epidermis. We demonstrated a method for strengthening our hydrogel system with polyethylene glycol diacrylate (PEGDA), while maintaining an optically translucent gel for detection purposes. We conducted piercing studies with a skin phantom on different microneedle array sizes and shapes, and determined that a 3x3 array of beveled microneedles required the least amount of force to pierce through a skin phantom. A custom complementary metal-oxide semiconductor (CMOS) system was developed to capture real-time fluorescence signals from the microneedle array, which correlated to calcium levels in vitro. This setup was then validated in a rat study. In this dissertation, we demonstrated methods for monitoring human biosignals using signal processing techniques, material innovations and integrated sensing platforms. While a work in progress, this dissertation is a step towards realizing the goal of decentralized, connected health for earlier detection and better management of disease. Biomedical engineering Wearable technology Calcium Cardiology Bioinformatics--Methodology Polyethylene glycol Deep learning (Machine learning)
4	Inferring Race and Ethnicity from Clinical Notes: Annotation, Model Auditing, and Ethical Implications Bear Don't Walk, Oliver J. January 2022 (has links) Many areas of clinical informatics research rely on accurate and complete race and ethnicity (RE) patient information, such as estimating disease risk, assessing quality and performance metrics, and identifying health disparities. Structured data in the electronic health record (EHR) is an easily accessible source for patient-level information, however RE information is often missing or inaccurate in structured EHR data. Furthermore, current federal standards on RE categories have been acknowledged as inadequate, and in need of more granular realizations. While more difficult to extract data from, clinical notes provide a rich, nuanced and subjective source of information that can be leveraged to increase granularity and/or recover RE information missing in structured data. State-of-the-art clinical natural language processing (NLP) approaches can enable researchers to extract RE information from clinical notes, however, NLP has also been shown to inherit, exacerbate, and create new biased and harmful associations, especially in modern deep learning approaches. This thesis explores the relationships between direct and indirect explicit mentions of RE and RE inferences in clinical text annotated by humans, and leverages an approach to audit deep NLP models for their learned associations. We develop gold-standard annotations for information related to RE (RE indicators) and RE labels. We leverage four RE indicators: country of origin, spoken language, direct race, and direct ethnicity mention. We find high agreement between annotators for RE label assignments, and that sentences assigned RE categories have drastically different distributions of RE indicators. Furthermore, we find high agreement between structured and unstructured sources of RE information, and that unstructured data can be used to recover missing RE information in structured data. Leveraging the gold-standard RE annotations, we train a model to label sentences with RE information and audit the model to examine the alignment between salient features and RE indicators. While our models attain good classification performance, this does not translate into high overlap with RE indicators. We find evidence for learned associations that are benign mistakes, helpful but not strictly correct, and potentially harmful mistakes if not addressed by future work. Bioinformatics--Methodology Diseases--Risk factors Ethnic groups--Health and hygiene Health and race Medical ethics
5	Metodologias de bioinformatica para detecção e estudo de sequencias repetitivas em loci genicos de transcritos quimericos / Bioinformatics methodologies for detection and study of repetitive sequences in gene loci of chimeric transcripts Herai, Roberto Hirochi 15 August 2018 (has links) Orientador: Michel Eduardo Beleza Yamagishi / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Biologia / Made available in DSpace on 2018-08-15T17:21:19Z (GMT). No. of bitstreams: 1 Herai_RobertoHirochi_D.pdf: 3625854 bytes, checksum: 3f19d10a9b0bb7f77091197cd302f66e (MD5) Previous issue date: 2010 / Resumo: A grande quantidade de dados biológicos gerados recentemente permitiu verificar que os genomas são repletos de seqüências repetitivas (SR), como microsatélites e elementos genéticos móveis, altamente improváveis de ocorrer estatisticamente se os genomas fossem gerados a partir de uma distribuição aleatória de nucleotídeos. Tal comprovação motivou a classificação de tais seqüências e também a construção de diversas ferramentas de bioinformática, além de mecanismos de armazenamento baseados em sistemas de gerenciamento de bancos de dados (SGBD) para permitir localizá-las e armazená-las para posterior estudo. Entretanto, foi com a comprovação biológica da importância das SR, como no mecanismo de interferência por RNAi (SR reversa complementar), que as SR despertaram maior interesse por parte da comunidade científica. Atualmente, já há fortes evidências que associam as SR com fenômenos biológicos bastante interessantes, como o processamento de RNA por cis-splicing e a formação de transcritos quiméricos, freqüentes em organismos inferiores e muito raro em organismos superiores. Tais tipos de transcritos podem ser gerados a partir de trans-splicing ou, como conjecturamos nesse trabalho, pela transposição de elementos genéticos móveis (como por exemplo transposons ou retrotransposons). Em virtude disso, este projeto propõe a construção de metodologias de Bioinformática, disponibilizadas na WEB, para detectar transcritos quiméricos em genomas de organismos, tanto em versões draft ou em alta qualidade, e também estudar as SR que ocorrem no locus gênico dos transcritos envolvidos na formação de uma seqüência quimérica. As ferramentas propostas permitiram identificar, a partir de bibliotecas de transcritos de full-length cDNA, tanto de humanos quanto de bovinos, novos transcritos quiméricos provenientes de células de tecidos normais, e que não seguem splice-sites canônicos na região de fusão dos transcritos envolvidos. Além disso, as seqüências encontradas apresentam uma elevada taxa de concentração de pares de SR do tipo reverso complementar no locus gênico dos dois transcritos que formam a seqüência quimérica. As ferramentas propostas podem ser utilizadas para outros organismos e direcionar trabalhos experimentais para tentar comprovar em bancada novos transcritos quiméricos, tanto em organismos inferiores quanto em superiores / Abstract: The recent availability of a huge amount of biological data allowed to know about the high concentration of repetitive sequences (SR) like microsatellites and genetic mobile elements in different genomes. Repetitive sequences are improbable to occur statistically if genome data were generated by a random distribution of nucleotides. Such observation motivated the classification of repetitive sequences, and the construction of several bioinformatics tools. Furthermore, several mechanisms to store repetitive sequences, which are based on data base management systems (DBMS) were proposed and created. They can be used to search for specific sequences to make a posteriori study. However, it was with the biological confirmation of the importance of repetitive sequences, like by the RNA interference (reverse complement, or inverted repeat) mechanism, that the scientific community gained more interest by such sequences. Actually, there is strong evidence that associates the repetitive sequences with some interesting biological phenomena, like in RNA processing by cis-splicing, and in chimeric transcript formation mechanism. This last one is very frequently in inferior organism, but rare in superior organisms. Such types of transcripts can be generated by trans-splicing, or like conjectured in this work, by the retrotransposition of mobile genetic elements (like transposons or retrotransposons). In this way, this work proposed the construction of several Bioinformatics methodologies, available in the WEB, to detect new evidences of chimeric transcripts in genomes of different organisms, both in draft genome and in high quality genome assemblage. We also studied repetitive sequences in gene loci of the involved transcripts in a chimeric sequence formation. The proposed tools allowed us to identify, using a full-length cDNA databank, new chimeric transcript candidates in human and in bovine genome. They are from cells of normal tissues, and do not follow canonical splice-sites in the fusion region of the involved transcripts. Moreover, it was possible to show that the detected sequences have high concentration pairs of reverse complement type of repetitive sequences in gene loci of the two involved transcripts, which originated a new chimeric transcript candidate. The created bioinformatics tools can be used in other organisms in addition to the one used in this work, leading to the proposition of new experimental work to try to prove in vivo new chimeric transcripts, both in superior organism and in inferior organism / Doutorado / Bioinformatica / Doutor em Genetica e Biologia Molecular Bioinformática - Metodologia Transcritos quimericos Elementos genéticos egoístas Bioinformatics - Methodology Chimeric transcripts Analytical performance prediction Nucleic acid repetitive sequences

1

Page generated in 0.0821 seconds