Spelling suggestions: "subject:"penalized"" "subject:"menalized""
71 |
Crash Risk Analysis of Coordinated Signalized IntersectionsQiming Guo (17582769) 08 December 2023 (has links)
<p dir="ltr">The emergence of time-dependent data provides researchers with unparalleled opportunities to investigate disaggregated levels of safety performance on roadway infrastructures. A disaggregated crash risk analysis uses both time-dependent data (e.g., hourly traffic, speed, weather conditions and signal controls) and fixed data (e.g., geometry) to estimate hourly crash probability. Despite abundant research on crash risk analysis, coordinated signalized intersections continue to require further investigation due to both the complexity of the safety problem and the relatively small number of past studies that investigated the risk factors of coordinated signalized intersections. This dissertation aimed to develop robust crash risk prediction models to better understand the risk factors of coordinated signalized intersections and to identify practical safety countermeasures. The crashes first were categorized into three types (same-direction, opposite-direction, and right-angle) within several crash-generating scenarios. The data needed were organized in hourly observations and included the following factors: road geometric features, traffic movement volumes, speeds, weather precipitation and temperature, and signal control settings. Assembling hourly observations for modeling crash risk was achieved by synchronizing and linking data sources organized at different time resolutions. Three different non-crash sampling strategies were applied to the following three statistical models (Conditional Logit, Firth Logit, and Mixed Logit) and two machine learning models (Random Forest and Penalized Support Vector Machine). Important risk factors, such as the presence of light rain, traffic volume, speed variability, and vehicle arrival pattern of downstream, were identified. The Firth Logit model was selected for implementation to signal coordination practice. This model turned out to be most robust based on its out-of-sample prediction performance and its inclusion of important risk factors. The implementation examples of the recommended crash risk model to building daily risk profiles and to estimating the safety benefits of improved coordination plans demonstrated the model’s practicality and usefulness in improving safety at coordinated signals by practicing engineers.</p>
|
72 |
Improvement of Bacteria Detection Accuracy and Speed Using Raman Scattering and Machine LearningMandour, Aseel 15 September 2022 (has links)
Bacteria identification plays an essential role in preventing health complications and saving patients' lives. The most widely used method to identify bacteria, the bacterial cultural method, suffers from long processing times. Hence, an effective, rapid, and non-invasive method is needed as an alternative. Raman spectroscopy is a potential candidate for bacteria identifi cation due to its effective and rapid results and the fact that, similar to the uniqueness of a human fingerprint, the Raman spectrum is unique for every material.
In my lab at the University of Ottawa, we focus on the use of Raman scattering for
biosensing in order to achieve high identifi cation accuracy for different types of bacteria.
Based on the unique Raman fingerprint for each bacteria type, different types of bacteria can be identifi ed successfully. However, using the Raman spectrum to identify bacteria poses a few challenges. First, the Raman signal is a weak signal, and so enhancement of the signal intensity is essential, e.g., by using surface-enhanced Raman scattering (SERS).
Moreover, the Raman signal can be contaminated by different noise sources. Also, the signal consists of a large number of features, and is non-linear due to the correlation between the Raman features. Using machine learning (ML) along with SERS, we can overcome such challenges in the identifi cation process and achieve high accuracy for the system identifying bacteria.
In this thesis, I present a method to improve the identifi cation of different bacteria
types using a support vector machine (SVM) ML algorithm based on SERS. I also present dimension reduction techniques to reduce the complexity and processing time while maintaining high identifi cation accuracy in the classifi cation process. I consider four bacteria types: Escherichia coli (EC), Cutibacterium acnes (CA, it was formerly known as Propi-onibacterium acnes), methicillin-resistant Staphylococcus aureus (MRSA), and methicillin-sensitive Staphylococcus aureus (MSSA). Both the MRSA and MSSA are combined in a single class named MS in the classifi cation. We are focusing on using these types of bacteria as they are the most common types in the joint infection disease.
Using binary classi fication, I present the simulation results for three binary models: EC
vs CA, EC vs MS, and MS vs CA. Using the full data set, binary classi fication achieved a classi fication accuracy of more than 95% for the three models. When the samples data set was reduced, to decrease the complexity based on the samples' signal-to-noise ratio (SNR), a classi fication accuracy of more than 95% for the three models was achieved using less than 60% of the original data set. The recursive feature elimination (RFE) algorithm was then used to reduce the complexity in the feature dimension. Given that a small number of features were more heavily weighted than the rest of the features, the number of features used in the classifi cation could be signi ficantly reduced while maintaining high classi fication accuracy.
I also present the classifi cation accuracy of using the multiclass one-versus-all (OVA)
method, i.e., EC vs all, MS vs all, and CA vs all. Using the complete data set, the OVA
method achieved classi cation accuracy of more than 90%. Similar to the binary classifi cation, the dimension reduction was applied to the input samples. Using the SNR reduction, the input samples were reduced by more than 60% while maintaining classifi cation accuracy higher than 80%. Furthermore, when the RFE algorithm was used to reduce the complexity on the features, and only the 5% top-weighted features of the full data set were used, a classi fication accuracy of more than 90% was achieved. Finally, by combining both reduction dimensions, the classi fication accuracy for the reduced data set was above 92% for a signifi cantly reduced data set.
Both the dimension reduction and the improvement in the classi fication accuracy between different types of bacteria using the ML algorithm and SERS could have a signi ficant impact in ful lfiling the demand for accurate, fast, and non-destructive identi fication of bacteria samples in the medical fi eld, in turn potentially reducing health complications and saving patient lives.
|
73 |
Graphical Tools, Incorporating Cost and Optimizing Central Composite Designs for Split-Plot Response Surface Methodology ExperimentsLiang, Li 14 April 2005 (has links)
In many industrial experiments, completely randomized designs (CRDs) are impractical due to restrictions on randomization, or the existence of one or more hard-to-change factors. Under these situations, split-plot experiments are more realistic. The two separate randomizations in split-plot experiments lead to different error structure from in CRDs, and hence this affects not only response modeling but also the choice of design. In this dissertation, two graphical tools, three-dimensional variance dispersion graphs (3-D VDGs) and fractions of design space (FDS) plots are adapted for split-plot designs (SPDs). They are used for examining and comparing different variations of central composite designs (CCDs) with standard, V- and G-optimal factorial levels. The graphical tools are shown to be informative for evaluating and developing strategies for improving the prediction performance of SPDs. The overall cost of a SPD involves two types of experiment units, and often each individual whole plot is more expensive than individual subplot and measurement. Therefore, considering only the total number of observations is likely not the best way to reflect the cost of split-plot experiments. In this dissertation, cost formulation involving the weighted sum of the number of whole plots and the total number of observations is discussed and the three cost adjusted optimality criteria are proposed. The effects of considering different cost scenarios on the choice of design are shown in two examples. Often in practice it is difficult for the experimenter to select only one aspect to find the optimal design. A realistic strategy is to select a design with good balance for multiple estimation and prediction criteria. Variations of the CCDs with the best cost-adjusted performance for estimation and prediction are studied for the combination of D-, G- and V-optimality criteria and each individual criterion. / Ph. D.
|
74 |
Time-Varying Coefficient Models for Recurrent EventsLiu, Yi 14 November 2018 (has links)
I have developed time-varying coefficient models for recurrent event data to evaluate the temporal profiles for recurrence rate and covariate effects. There are three major parts in this dissertation. The first two parts propose a mixed Poisson process model with gamma frailties for single type recurrent events. The third part proposes a Bayesian joint model based on multivariate log-normal frailties for multi-type recurrent events. In the first part, I propose an approach based on penalized B-splines to obtain smooth estimation for both time-varying coefficients and the log baseline intensity. An EM algorithm is developed for parameter estimation. One issue with this approach is that the estimating procedure is conditional on smoothing parameters, which have to be selected by cross-validation or optimizing certain performance criterion. The procedure can be computationally demanding with a large number of time-varying coefficients. To achieve objective estimation of smoothing parameters, I propose a mixed-model representation approach for penalized splines. Spline coefficients are treated as random effects and smoothing parameters are to be estimated as variance components. An EM algorithm embedded with penalized quasi-likelihood approximation is developed to estimate the model parameters. The third part proposes a Bayesian joint model with time-varying coefficients for multi-type recurrent events. Bayesian penalized splines are used to estimate time-varying coefficients and the log baseline intensity. One challenge in Bayesian penalized splines is that the smoothness of a spline fit is considerably sensitive to the subjective choice of hyperparameters. I establish a procedure to objectively determine the hyperparameters through a robust prior specification. A Markov chain Monte Carlo procedure based on Metropolis-adjusted Langevin algorithms is developed to sample from the high-dimensional distribution of spline coefficients. The procedure includes a joint sampling scheme to achieve better convergence and mixing properties. Simulation studies in the second and third part have confirmed satisfactory model performance in estimating time-varying coefficients under different curvature and event rate conditions. The models in the second and third part were applied to data from a commercial truck driver naturalistic driving study. The application results reveal that drivers with 7-hours-or-less sleep prior to a shift have a significantly higher intensity after 8 hours of on-duty driving and that their intensity remains higher after taking a break. In addition, the results also show drivers' self-selection on sleep time, total driving hours in a shift, and breaks. These applications provide crucial insight into the impact of sleep time on driving performance for commercial truck drivers and highlights the on-road safety implications of insufficient sleep and breaks while driving. This dissertation provides flexible and robust tools to evaluate the temporal profile of intensity for recurrent events. / PHD / The overall objective of this dissertation is to develop models to evaluate the time-varying profiles for event occurrences and the time-varying effects of risk factors upon event occurrences. There are three major parts in this dissertation. The first two parts are designed for single event type. They are based on approaches such that the whole model is conditional on a certain kind of tuning parameter. The value of this tuning parameter has to be pre-specified by users and is influential to the model results. Instead of pre-specifying the value, I develop an approach to achieve an objective estimate for the optimal value of tuning parameter and obtain model results simultaneously. The third part proposes a model for multi-type events. One challenge is that the model results are considerably sensitive to the subjective choice of hyperparameters. I establish a procedure to objectively determine the hyperparameters. Simulation studies have confirmed satisfactory model performance in estimating the temporal profiles for both event occurrences and effects of risk factors. The models were applied to data from a commercial truck driver naturalistic driving study. The results reveal that drivers with 7-hours-or-less sleep prior to a shift have a significantly higher intensity after 8 hours of on-duty driving and that their driving risk remains higher after taking a break. In addition, the results also show drivers’ self-selection on sleep time, total driving hours in a shift, and breaks. These applications provide crucial insight into the impact of sleep time on driving performance for commercial truck drivers and highlights the on-road safety implications of insufficient sleep and breaks while driving. This dissertation provides flexible and robust tools to evaluate the temporal profile of both event occurrences and effects of risk factors.
|
75 |
Metody výpočtu maximálně věrohodných odhadů v zobecněném lineárním smíšeném modelu / Computational Methods for Maximum Likelihood Estimation in Generalized Linear Mixed ModelsOtava, Martin January 2011 (has links)
of the diploma thesis Title: Computational Methods for Maximum Likelihood Estimation in Generalized Linear Mixed Models Author: Bc. Martin Otava Department: Department of Probability and Mathematical Statistics Supervisor: RNDr. Arnošt Komárek, Ph.D., Department of Probability and Mathematical Statistics Abstract: Using maximum likelihood method for generalized linear mixed models, the analytically unsolvable problem of maximization can occur. As solution, iterative and ap- proximate methods are used. The latter ones are core of the thesis. Detailed and general introducing of the widely used methods is emphasized with algorithms useful in practical cases. Also the case of non-gaussian random effects is discussed. The approximate methods are demonstrated using the real data sets. Conclusions about bias and consistency are supported by the simulation study. Keywords: generalized linear mixed model, penalized quasi-likelihood, adaptive Gauss- Hermite quadrature 1
|
76 |
Modèles pour l'estimation de l'incidence de l'infection par le VIH en France à partir des données de surveillance VIH et SIDASommen, Cécile 09 December 2009 (has links)
L'incidence de l'infection par le VIH, définie comme le nombre de sujets nouvellement infectés par le VIH au cours du temps, est le seul indicateur permettant réellement d'appréhender la dynamique de l'épidémie du VIH/SIDA. Sa connaissance permet de prévoir les conséquences démographiques de l'épidémie et les besoins futurs de prise en charge, mais également d'évaluer l'efficacité des programmes de prévention. Jusqu'à très récemment, l'idée de base pour estimer l'incidence de l'infection par le VIH a été d'utiliser la méthode de rétro-calcul à partir des données de l'incidence du SIDA et de la connaissance de la distribution de la durée d'incubation du SIDA. L'avènement, à partir de 1996, de nouvelles combinaisons thérapeutiques très efficaces contre le VIH a contribué à modifier la durée d'incubation du SIDA et, par conséquent, à augmenter la difficulté d'utilisation de la méthode de rétro-calcul sous sa forme classique. Plus récemment, l'idée d'intégrer des informations sur les dates de diagnostic VIH a permis d'améliorer la précision des estimations. La plupart des pays occidentaux ont mis en place depuis quelques années un système de surveillance de l'infection à VIH. En France, la notification obligatoire des nouveaux diagnostics d'infection VIH, couplée à la surveillance virologique permettant de distinguer les contaminations récentes des plus anciennes a été mise en place en mars 2003. L'objectif de ce travail de thèse est de développer de nouvelles méthodes d'estimation de l'incidence de l'infection par le VIH capables de combiner les données de surveillance des diagnostics VIH et SIDA et d'utiliser les marqueurs sérologiques recueillis dans la surveillance virologique dans le but de mieux saisir l'évolution de l'épidémie dans les périodes les plus récentes. / The knowledge of the dynamics of the HIV/AIDS epidemic is crucial for planning current and future health care needs. The HIV incidence, i.e. the number of new HIV infections over time, determines the trajectory and the extent of the epidemic but is difficult to measure. The backcalculation method has been widely developed and used to estimate the past pattern of HIV infections and to project future incidence of AIDS from information on the incubation period distribution and AIDS incidence data. In recent years the incubation period from HIV infection to AIDS has changed dramatically due to increased use of antiretroviral therapy, which lengthens the time from HIV infection to the development of AIDS. Therefore, it has become more difficult to use AIDS diagnosis as the basis for back-calculation. More recently, the idea of integrating information on the dates of HIV diagnosis has improved the precision of estimates. In recent years, most western countries have set up a system for monitoring HIV infection. In France, the mandatory reporting of newly diagnosed HIV infection, coupled with virological surveillance to distinguish recent infections from older, was introduced in March 2003. The goal of this PhD thesis is to develop new methods for estimating the HIV incidence able to combine data from monitoring HIV and AIDS diagnoses and use of serologic markers collected in the virological surveillance in order to better understand the evolution of the epidemic in the most recent periods.
|
77 |
Some extensions in measurement error models / Algumas extensões em modelos com erros de mediçãoTomaya, Lorena Yanet Cáceres 14 December 2018 (has links)
In this dissertation, we approach three different contributions in measurement error model (MEM). Initially, we carry out maximum penalized likelihood inference in MEMs under the normality assumption. The methodology is based on the method proposed by Firth (1993), which can be used to improve some asymptotic properties of the maximum likelihood estimators. In the second contribution, we develop two new estimation methods based on generalized fiducial inference for the precision parameters and the variability product under the Grubbs model considering the two-instrument case. One method is based on a fiducial generalized pivotal quantity and the other one is built on the method of the generalized fiducial distribution. Comparisons with two existing approaches are reported. Finally, we propose to study inference in a heteroscedastic MEM with known error variances. Instead of the normal distribution for the random components, we develop a model that assumes a skew-t distribution for the true covariate and a centered Students t distribution for the error terms. The proposed model enables to accommodate skewness and heavy-tailedness in the data, while the degrees of freedom of the distributions can be different. We use the maximum likelihood method to estimate the model parameters and compute them via an EM-type algorithm. All proposed methodologies are assessed numerically through simulation studies and illustrated with real datasets extracted from the literature. / Neste trabalho abordamos três contribuições diferentes em modelos com erros de medição (MEM). Inicialmente estudamos inferência pelo método de máxima verossimilhança penalizada em MEM sob a suposição de normalidade. A metodologia baseia-se no método proposto por Firth (1993), o qual pode ser usado para melhorar algumas propriedades assintóticas de os estimadores de máxima verossimilhança. Em seguida, propomos construir dois novos métodos de estimação baseados na inferência fiducial generalizada para os parâmetros de precisão e a variabilidade produto no modelo de Grubbs para o caso de dois instrumentos. O primeiro método é baseado em uma quantidade pivotal generalizada fiducial e o outro é baseado no método da distribuição fiducial generalizada. Comparações com duas abordagens existentes são reportadas. Finalmente, propomos estudar inferência em um MEM heterocedástico em que as variâncias dos erros são consideradas conhecidas. Nós desenvolvemos um modelo que assume uma distribuição t-assimétrica para a covariável verdadeira e uma distribuição t de Student centrada para os termos dos erros. O modelo proposto permite acomodar assimetria e cauda pesada nos dados, enquanto os graus de liberdade das distribuições podem ser diferentes. Usamos o método de máxima verossimilhança para estimar os parâmetros do modelo e calculá-los através de um algoritmo tipo EM. Todas as metodologias propostas são avaliadas numericamente em estudos de simulação e são ilustradas com conjuntos de dados reais extraídos da literatura
|
78 |
Análise de diagnóstico em modelos semiparamétricos normais / Diagnostic analysis in semiparametric normal modelsNoda, Gleyce Rocha 18 April 2013 (has links)
Nesta dissertação apresentamos métodos de diagnóstico em modelos semiparamétricos sob erros normais, em especial os modelos semiparamétricos com uma variável explicativa não paramétrica, conhecidos como modelos lineares parciais. São utilizados splines cúbicos para o ajuste da variável resposta e são aplicadas funções de verossimilhança penalizadas para a obtenção dos estimadores de máxima verossimilhança com os respectivos erros padrão aproximados. São derivadas também as propriedades da matriz hat para esse tipo de modelo, com o objetivo de utilizá-la como ferramenta na análise de diagnóstico. Gráficos normais de probabilidade com envelope gerado também foram adaptados para avaliar a adequabilidade do modelo. Finalmente, são apresentados dois exemplos ilustrativos em que os ajustes são comparados com modelos lineares normais usuais, tanto no contexto do modelo aditivo normal simples como no contexto do modelo linear parcial. / In this master dissertation we present diagnostic methods in semiparametric models under normal errors, specially in semiparametric models with one nonparametric explanatory variable, also known as partial linear model. We use cubic splines for the nonparametric fitting, and penalized likelihood functions are applied for obtaining maximum likelihood estimators with their respective approximate standard errors. The properties of the hat matrix are also derived for this kind of model, aiming to use it as a tool for diagnostic analysis. Normal probability plots with simulated envelope graphs were also adapted to evaluate the model suitability. Finally, two illustrative examples are presented, in which the fits are compared with usual normal linear models, such as simple normal additive and partially linear models.
|
79 |
Modelos mistos aditivos semiparamétricos de contornos elípticos / Elliptical contoured semiparametric additive mixed models.Pulgar, Germán Mauricio Ibacache 14 August 2009 (has links)
Neste trabalho estendemos os modelos mistos semiparamétricos propostos por Zhang et al. (1998) para uma classe mais geral de modelos, a qual denominamos modelos mistos aditivos semiparamétricos com erros de contornos elípticos. Com essa nova abordagem, flexibilizamos a curtose da distribuição dos erros possibilitando a escolha de distribuições com caudas mais leves ou mais pesadas do que as caudas da distribuição normal padrão. Funções de verossimilhança penalizadas são aplicadas para a obtenção das estimativas de máxima verossimilhança com os respectivos erros padrão aproximados. Essas estimativas, sob erros de caudas pesadas, são robustas no sentido da distância de Mahalanobis contra observações aberrantes. Curvaturas de influência local são obtidas segundo alguns esquemas de perturbação e gráficos de diagnóstico são propostos. Exemplos ilustrativos são apresentados em que ajustes sob erros normais são comparados, através das metodologias de sensibilidade desenvolvidas no trabalho, com ajustes sob erros de contornos elípticos. / In this work we extend the models proposed by Zhang et al. (1998) to a more general class of models, know as semiparametric additive mixed models with elliptical errors in order to allow distributions with heavier or lighter tails than the normal ones. Penalized likelihood equations are applied to derive the maximum likelihood estimates which appear to be robust against outlying observations in the sense of the Mahalanobis distance. In order to study the sensitivity of the penalized estimates under some usual perturbation schemes in the model or data, the local influence curvatures are derived and some diagnostic graphics are proposed. Motivating examples preliminary analyzed under normal errors are reanalyzed under some appropriate elliptical errors. The local influence approach is used to compare the sensitivity of the model estimates.
|
80 |
Testing new genetic and genomic approaches for trait mapping and prediction in wheat (Triticum aestivum) and rice (Oryza spp)Ladejobi, Olufunmilayo Olubukola January 2018 (has links)
Advances in molecular marker technologies have led to the development of high throughput genotyping techniques such as Genotyping by Sequencing (GBS), driving the application of genomics in crop research and breeding. They have also supported the use of novel mapping approaches, including Multi-parent Advanced Generation Inter-Cross (MAGIC) populations which have increased precision in identifying markers to inform plant breeding practices. In the first part of this thesis, a high density physical map derived from GBS was used to identify QTLs controlling key agronomic traits of wheat in a genome-wide association study (GWAS) and to demonstrate the practicability of genomic selection for predicting the trait values. The results from GBS were compared to a previous study conducted on the same association mapping panel using a less dense physical map derived from diversity arrays technology (DArT) markers. GBS detected more QTLs than DArT markers although some of the QTLs were detected by DArT markers alone. Prediction accuracies from the two marker platforms were mostly similar and largely dependent on trait genetic architecture. The second part of this thesis focused on MAGIC populations, which incorporate diversity and novel allelic combinations from several generations of recombination. Pedigrees representing a wild rice MAGIC population were used to model MAGIC populations by simulation to assess the level of recombination and creation of novel haplotypes. The wild rice species are an important reservoir of beneficial genes that have been variously introgressed into rice varieties using bi-parental population approaches. The level of recombination was found to be highly dependent on the number of crosses made and on the resulting population size. Creation of MAGIC populations require adequate planning in order to make sufficient number of crosses that capture optimal haplotype diversity. The third part of the thesis considers models that have been proposed for genomic prediction. The ridge regression best linear unbiased prediction (RR-BLUP) is based on the assumption that all genotyped molecular markers make equal contributions to the variations of a phenotype. Information from underlying candidate molecular markers are however of greater significance and can be used to improve the accuracy of prediction. Here, an existing Differentially Penalized Regression (DiPR) model which uses modifications to a standard RR-BLUP package and allows two or more marker sets from different platforms to be independently weighted was used. The DiPR model performed better than single or combined marker sets for predicting most of the traits both in a MAGIC population and an association mapping panel. Overall the work presented in this thesis shows that while these techniques have great promise, they should be carefully evaluated before introduction into breeding programmes.
|
Page generated in 0.0473 seconds