Global ETD Search

1	Characterizing low copy DNA signal using simulated and experimental data Peters, Kelsey 13 July 2017 (has links) Sir Alec Jeffreys was the first to describe human identification with deoxyribonucleic acid (DNA) in his seminal work in 1985 (1); the result was the birth of forensic DNA analysis. Since then, DNA has become the primary substance used to conduct human identification testing. Forensic DNA analysis has evolved since the work of Jeffreys and now incorporates the analysis of 15 to 24 STR (short tandem repeat) locations, or loci (2-4). The simultaneous amplification and subsequent electrophoresis of tens of STR polymorphisms results in analysis that are highly discriminating. DNA target masses of 0.5 to 2 nanograms (ng) are sufficient to obtain a full STR profile (4); however, pertinent information can still be obtained if low copy numbers of DNA are collected from the crime scene or evidentiary material (4-9). Despite the sensitivity of polymerase chain reaction (PCR) - capillary electrophoresis (CE) based technology, low copy DNA signal can be difficult to interpret due to the preponderance of low signal-to-noise ratios. Due to the complicated nature of low template signal, optimization of the DNA laboratory process such that high-fidelity signal is regularly produced is necessary; studies designed to effectively hone in on optimized laboratory conditions are presented herein. The STR regions of a set of samples containing 0.0078 ng of DNA were amplified for 29 cycles; the amplified fragments were separated using two types of CE platforms: an ABI 3130 Genetic Analyzer and an ABI 3500 Genetic Analyzer. The result is a genetic trace, or electropherogram (EPG), comprised of three signal components that include noise, artifact, and allele. The EPGs were analyzed using two peak detection software programs. In addition, a tool, termed Simulating Evidentiary Electropherograms (SEEIt) (10, 11), was utilized to simulate EPG signal obtained when one copy of DNA is processed through the forensic pipeline. SEEIt was parameterized to simulate data corresponding to two laboratory scenarios: the amplification of a single copy of DNA injected on an ABI 3130 Genetic Analyzer and on an ABI 3500 Genetic Analyzer. In total, 20,000 allele peaks and 20,000 noise peaks were generated for each CE platform. Comparison of simulated and experimental data was used to elucidate features that are difficult to ascertain by experimental work alone. The data demonstrate that experimental signal obtained with the ABI 3500 platform results in signal that is, on average, a factor of four larger than signal obtained from the ABI 3130 platform. When a histogram of the signal is plotted, a multi modal distribution is observed. The first mode is hypothesized to be the result of noise, while the second, third, etc. modes are the signal obtained when one, two, etc. target DNA molecules are amplified. By evaluating the data in this way, full signal resolution between noise and allelic signal is visualized. Therefore, this methodology may be used to: 1) optimize post-PCR laboratory conditions to obtain excellent resolution between noise and allelic signal; and 2) determine an analytical threshold (AT) that results in few false detections and few cases of allelic dropout. A χ2 test for independence of the experimental signal in noise positions and the experimental signal within allele positions < 12 relative fluorescence units (RFU), i.e. signal in the noise regime, indicate the populations are not independent when sufficient signal-to-noise resolution is obtained. Once sufficient resolution is achieved, optimized ATs may be acquired by evaluating and minimizing the false negative and false positive detection rates. Here, a false negative is defined as the non-detection of an allele and a false positive is defined as the detection of noise. An AT of 15 RFU was found to be the optimal AT for samples injected on the ABI 3130 for at least 10 seconds (sec) as 99.42% of noise peaks did not exceed this critical value while allelic dropout was kept to a minimum, 36.97%, at this AT. Similarily, in examining signal obtained from the ABI 3500, 99.41% and 99.0% of noise fell under an AT of 50 RFU for data analyzed with GeneMapper ID-X (GM) and OSIRIS (OS), respectively. Allelic dropout was 36.34% and 36.55% for GM and OS, respectively, at this AT. Biology Forensic DNA Analytical threshold Limit of detection Low copy DNA Signal to noise Single cell analysis
2	A comparative analysis of the cost-based and simplified upper limit approaches for calculating analytical threshold in support of forensic DNA short tandem repeat analysis Gordon, Daniel Bernard 01 February 2023 (has links) The determination and application of Analytical Threshold (AT) is a vital part of the forensic Deoxyribonucleic Acid (DNA) internal validation process. AT is the relative fluorescence unit (RFU) signal at which allelic peaks can be confidently distinguished from baseline noise. Several methods of calculating AT are currently being implemented within the forensic DNA community. These methods may utilize DNA negative sample data, DNA positive sample data, or both in their calculations. In this study, two of the DNA positive-based AT calculation techniques were chosen for assessment and comparison. The simplified upper limit approach (ULA) and the cost-based approach. ATs were calculated for each dye channel using a dilution series of 3 single source DNA samples ranging from 0.05-0.8ng. The ATs calculated via the cost-based approach consistently exhibited lower values than those determined via the ULA. As a result, the incidence of allelic drop-out exhibited by these AT values was also consistently lower, with an equivalent or only marginally increased incidence of baseline noise drop-in. These results indicated that the cost-based approach may be a more effective and practical method of calculating AT than the ULA, particularly in the analysis of low DNA template samples. Biology Analytical threshold Cost-based approach Forensic DNA Internal validation Short tandem repeat Upper limit approach
3	Structure learning of Bayesian networks via data perturbation / Aprendizagem estrutural de Redes Bayesianas via perturbação de dados Gross, Tadeu Junior 29 November 2018 (has links) Structure learning of Bayesian Networks (BNs) is an NP-hard problem, and the use of sub-optimal strategies is essential in domains involving many variables. One of them is to generate multiple approximate structures and then to reduce the ensemble to a representative structure. It is possible to use the occurrence frequency (on the structures ensemble) as the criteria for accepting a dominant directed edge between two nodes and thus obtaining the single structure. In this doctoral research, it was made an analogy with an adapted one-dimensional random-walk for analytically deducing an appropriate decision threshold to such occurrence frequency. The obtained closed-form expression has been validated across benchmark datasets applying the Matthews Correlation Coefficient as the performance metric. In the experiments using a recent medical dataset, the BN resulting from the analytical cutoff-frequency captured the expected associations among nodes and also achieved better prediction performance than the BNs learned with neighbours thresholds to the computed. In literature, the feature accounted along of the perturbed structures has been the edges and not the directed edges (arcs) as in this thesis. That modified strategy still was applied to an elderly dataset to identify potential relationships between variables of medical interest but using an increased threshold instead of the predict by the proposed formula - such prudence is due to the possible social implications of the finding. The motivation behind such an application is that in spite of the proportion of elderly individuals in the population has increased substantially in the last few decades, the risk factors that should be managed in advance to ensure a natural process of mental decline due to ageing remain unknown. In the learned structural model, it was graphically investigated the probabilistic dependence mechanism between two variables of medical interest: the suspected risk factor known as Metabolic Syndrome and the indicator of mental decline referred to as Cognitive Impairment. In this investigation, the concept known in the context of BNs as D-separation has been employed. Results of the carried out study revealed that the dependence between Metabolic Syndrome and Cognitive Variables indeed exists and depends on both Body Mass Index and age. / O aprendizado da estrutura de uma Rede Bayesiana (BN) é um problema NP-difícil, e o uso de estratégias sub-ótimas é essencial em domínios que envolvem muitas variáveis. Uma delas consiste em gerar várias estruturas aproximadas e depois reduzir o conjunto a uma estrutura representativa. É possível usar a frequência de ocorrência (no conjunto de estruturas) como critério para aceitar um arco dominante entre dois nós e assim obter essa estrutura única. Nesta pesquisa de doutorado, foi feita uma analogia com um passeio aleatório unidimensional adaptado para deduzir analiticamente um limiar de decisão apropriado para essa frequência de ocorrência. A expressão de forma fechada obtida foi validada usando bases de dados de referência e aplicando o Coeficiente de Correlação de Matthews como métrica de desempenho. Nos experimentos utilizando dados médicos recentes, a BN resultante da frequência de corte analítica capturou as associações esperadas entre os nós e também obteve melhor desempenho de predição do que as BNs aprendidas com limiares vizinhos ao calculado. Na literatura, a característica contabilizada ao longo das estruturas perturbadas tem sido as arestas e não as arestas direcionadas (arcos) como nesta tese. Essa estratégia modificada ainda foi aplicada a um conjunto de dados de idosos para identificar potenciais relações entre variáveis de interesse médico, mas usando um limiar aumentado em vez do previsto pela fórmula proposta - essa cautela deve-se às possíveis implicações sociais do achado. A motivação por trás dessa aplicação é que, apesar da proporção de idosos na população ter aumentado substancialmente nas últimas décadas, os fatores de risco que devem ser controlados com antecedência para garantir um processo natural de declínio mental devido ao envelhecimento permanecem desconhecidos. No modelo estrutural aprendido, investigou-se graficamente o mecanismo de dependência probabilística entre duas variáveis de interesse médico: o fator de risco suspeito conhecido como Síndrome Metabólica e o indicador de declínio mental denominado Comprometimento Cognitivo. Nessa investigação, empregou-se o conceito conhecido no contexto de BNs como D-separação. Esse estudo revelou que a dependência entre Síndrome Metabólica e Variáveis Cognitivas de fato existe e depende tanto do Índice de Massa Corporal quanto da idade. Analytical threshold Aprendizado de estruturas robustas Associations discovery Bayesian network Cognitive impairment D-separação D-separation Data perturbation via bootstrap replicas Descoberta de associações Directed acyclic graph Envelhecimento da população Estabilidade de arcos Fatores de risco Grafo acíclico dirigido Learning of robust structures limiar analítico Média de modelos Metabolic syndrome Model averaging Perturbação de dados via bootstrap Population ageing Rede Bayesiana Risk factors Síndrome metabólica Stability of arcs Transtorno cognitivo
4	Structure learning of Bayesian networks via data perturbation / Aprendizagem estrutural de Redes Bayesianas via perturbação de dados Tadeu Junior Gross 29 November 2018 (has links) Structure learning of Bayesian Networks (BNs) is an NP-hard problem, and the use of sub-optimal strategies is essential in domains involving many variables. One of them is to generate multiple approximate structures and then to reduce the ensemble to a representative structure. It is possible to use the occurrence frequency (on the structures ensemble) as the criteria for accepting a dominant directed edge between two nodes and thus obtaining the single structure. In this doctoral research, it was made an analogy with an adapted one-dimensional random-walk for analytically deducing an appropriate decision threshold to such occurrence frequency. The obtained closed-form expression has been validated across benchmark datasets applying the Matthews Correlation Coefficient as the performance metric. In the experiments using a recent medical dataset, the BN resulting from the analytical cutoff-frequency captured the expected associations among nodes and also achieved better prediction performance than the BNs learned with neighbours thresholds to the computed. In literature, the feature accounted along of the perturbed structures has been the edges and not the directed edges (arcs) as in this thesis. That modified strategy still was applied to an elderly dataset to identify potential relationships between variables of medical interest but using an increased threshold instead of the predict by the proposed formula - such prudence is due to the possible social implications of the finding. The motivation behind such an application is that in spite of the proportion of elderly individuals in the population has increased substantially in the last few decades, the risk factors that should be managed in advance to ensure a natural process of mental decline due to ageing remain unknown. In the learned structural model, it was graphically investigated the probabilistic dependence mechanism between two variables of medical interest: the suspected risk factor known as Metabolic Syndrome and the indicator of mental decline referred to as Cognitive Impairment. In this investigation, the concept known in the context of BNs as D-separation has been employed. Results of the carried out study revealed that the dependence between Metabolic Syndrome and Cognitive Variables indeed exists and depends on both Body Mass Index and age. / O aprendizado da estrutura de uma Rede Bayesiana (BN) é um problema NP-difícil, e o uso de estratégias sub-ótimas é essencial em domínios que envolvem muitas variáveis. Uma delas consiste em gerar várias estruturas aproximadas e depois reduzir o conjunto a uma estrutura representativa. É possível usar a frequência de ocorrência (no conjunto de estruturas) como critério para aceitar um arco dominante entre dois nós e assim obter essa estrutura única. Nesta pesquisa de doutorado, foi feita uma analogia com um passeio aleatório unidimensional adaptado para deduzir analiticamente um limiar de decisão apropriado para essa frequência de ocorrência. A expressão de forma fechada obtida foi validada usando bases de dados de referência e aplicando o Coeficiente de Correlação de Matthews como métrica de desempenho. Nos experimentos utilizando dados médicos recentes, a BN resultante da frequência de corte analítica capturou as associações esperadas entre os nós e também obteve melhor desempenho de predição do que as BNs aprendidas com limiares vizinhos ao calculado. Na literatura, a característica contabilizada ao longo das estruturas perturbadas tem sido as arestas e não as arestas direcionadas (arcos) como nesta tese. Essa estratégia modificada ainda foi aplicada a um conjunto de dados de idosos para identificar potenciais relações entre variáveis de interesse médico, mas usando um limiar aumentado em vez do previsto pela fórmula proposta - essa cautela deve-se às possíveis implicações sociais do achado. A motivação por trás dessa aplicação é que, apesar da proporção de idosos na população ter aumentado substancialmente nas últimas décadas, os fatores de risco que devem ser controlados com antecedência para garantir um processo natural de declínio mental devido ao envelhecimento permanecem desconhecidos. No modelo estrutural aprendido, investigou-se graficamente o mecanismo de dependência probabilística entre duas variáveis de interesse médico: o fator de risco suspeito conhecido como Síndrome Metabólica e o indicador de declínio mental denominado Comprometimento Cognitivo. Nessa investigação, empregou-se o conceito conhecido no contexto de BNs como D-separação. Esse estudo revelou que a dependência entre Síndrome Metabólica e Variáveis Cognitivas de fato existe e depende tanto do Índice de Massa Corporal quanto da idade. Aprendizado de estruturas robustas D-separação Descoberta de associações Envelhecimento da população Estabilidade de arcos Fatores de risco Grafo acíclico dirigido limiar analítico Média de modelos Perturbação de dados via bootstrap Rede Bayesiana Síndrome metabólica Transtorno cognitivo Analytical threshold Associations discovery Bayesian network Cognitive impairment D-separation Data perturbation via bootstrap replicas Directed acyclic graph Learning of robust structures Metabolic syndrome Model averaging Population ageing Risk factors Stability of arcs

1

Page generated in 0.0693 seconds