Global ETD Search

51	Understanding Fixed Object Crashes with SHRP2 Naturalistic Driving Study Data Hao, Haiyan 30 August 2018 (has links) Fixed-object crashes have long time been considered as major roadway safety concerns. While previous relevant studies tended to address such crashes in the context of roadway departures, and heavily relied on police-reported accidents data, this study integrated the SHRP2 NDS and RID data for analyses, which fully depicted the prior to, during, and after crash scenarios. A total of 1,639 crash, near-crash events, and 1,050 baseline events were acquired. Three analysis methods: logistic regression, support vector machine (SVM) and artificial neural network (ANN) were employed for two responses: crash occurrence and severity level. Logistic regression analyses identified 16 and 10 significant variables with significance levels of 0.1, relevant to driver, roadway, environment, etc. for two responses respectively. The logistic regression analyses led to a series of findings regarding the effects of explanatory variables on fixed-object event occurrence and associated severity level. SVM classifiers and ANN models were also constructed to predict these two responses. Sensitivity analyses were performed for SVM classifiers to infer the contributing effects of input variables. All three methods obtained satisfactory prediction performance, that was around 88% for fixed-object event occurrence and 75% for event severity level, which indicated the effectiveness of NDS event data on depicting crash scenarios and roadway safety analyses. / Master of Science / Fixed-object crashes happen when a single vehicle strikes a roadway feature such as a curb or a median, or runs off the road and hits a roadside feature such as a tree or utility pole. They have long time been considered as major highway safety concerns due to their high frequency, fatality rate, and associated property cost. Previous studies relevant to fixed-object crashes tended to address such crashes in the contexture of roadway departures, and heavily relied on police-reported accident data. However, many fixed-object crashes involved objects in roadway such as traffic control devices, roadway debris, etc. The police-reported accident data were found to be weak in depicting scenarios prior to, during crashes. Also, many minor crashes were often kept unreported. The Second Strategic Highway Research Program (SHRP2) Naturalistic Driving Study (NDS) is the largest NDS project launched across the country till now, aimed to study driver behavior or, performance-related safety problems under real-world scenarios. The data acquisition systems (DASs) equipped on participated vehicles collect vehicle kinematics, roadway, traffic, environment, and driver behavior data continuously, which enable researchers to address such crash scenarios closely. This study integrated SHRP2 NDS and roadway information database (RID) data to conduct a comprehensive analysis of fixed-object crashes. A total of 1,639 crash, near-crash events relevant to fixed objects and animals, and 1,050 baseline events were used. Three analysis methods: logistic regression, support vector machine (SVM) and artificial neural network (ANN) were employed for two responses: crash occurrence and severity level. The logistic regression analyses identified 16 and 10 variables with significance levels of 0.1 for fixed-object event occurrence and severity level models respectively. The influence of explanatory variables was discussed in detail. SVM classifiers and ANN models were also constructed to predict the fixed-object crash occurrence and severity level. Sensitivity analyses were performed for SVM classifiers to infer the contributing effects of input variables. All three methods achieved satisfactory prediction accuracies of around 88% for crash occurrence prediction and 75% for crash severity level prediction, which suggested the effectiveness of NDS event data on depicting crash scenarios and roadway safety analyses. Fixed-object crash Naturalistic driving study Logistic regression analysis Support vector machine Artificial neural network
52	Bayesian analysis of multinomial regression with gamma utilities. / CUHK electronic theses & dissertations collection January 2012 (has links) 多項式回歸模型可用來模擬賽馬過程。不同研究者對模型中馬匹的效用的分佈採取不同的假設，包括指數分佈，它與Harville 模型（Harville, 1973）相同，伽馬分佈（Stern, 1990）和正態分佈（Henery, 1981）。Harville 模型無法模擬賽馬過程中競爭第二位和第三位等非冠軍位置時增加的隨機性（Benter, 1994）。Stern 模型假設效用服從形狀參數大於一的伽馬分佈，Henery 模型假設效用服從正態分佈。Bacon-Shone，Lo 和 Busche（1992），Lo 和 Bacon-Shone（1994）和 Lo（1994）研究證明了相較於Harville 模型，這兩個模型能更好地模擬賽馬過程。本文利用賽馬歷史數據，採用貝葉斯方法對賽馬結果中馬匹勝出的概率進行預測。本文假設效用服從伽馬分佈。本文針對多項式回歸模型，提出一個在Metropolis-Hastings 抽樣方法中選擇提議分佈的簡便方法。此方法由Scott（2008）首次提出。我們在似然函數中加入服從伽馬分佈的效用作為潛變量。通過將服從伽馬分佈的效用變換成一個服從Mihram（1975）所描述的廣義極值分佈的隨機變量，我們得到一個線性回歸模型。由此線性模型我們可得到最小二乘估計，本文亦討論最小二乘估計的漸進抽樣分佈。我們利用此估計的方差得到Metropolis-Hastings 抽樣方法中的提議分佈。最後，我們可以得到回歸參數的後驗分佈樣本。本文用香港賽馬數據做模擬賽馬投資以檢驗本文提出的估計方法。 / In multinomial regression of racetrack betting, dierent distributions of utilities have been proposed: exponential distribution which is equivalent to Harville’s model (Harville, 1973), gamma distribution (Stern, 1990) and normal distribution (Henery, 1981). Harville’s model has the drawback that it ignores the increasing randomness of the competitions for the second and third place (Benter, 1994). The Stern’s model using gamma utilities with shape parameter greater than 1 and the Henery’s model using normal utilities have been shown to produce a better t (Bacon-Shone, Lo and Busche, 1992; Lo and Bacon-Shone, 1994; Lo, 1994). In this thesis, we use the Bayesian methodology to provide prediction on the winning probabilities of horses with the historical observed data. The gamma utility is adopted throughout the thesis. In this thesis, a convenient method of selecting Metropolis-Hastings proposal distributions for multinomial models is developed. A similar method is rst exploited by Scott (2008). We augment the gamma distributed utilities in the likelihood as latent variables. The gamma utility is transformed to a variable that follows generalized extreme value distribution described by Mihram (1975) through which we get a linear regression model. Least squares estimate of the parameters is easily obtained from this linear model. The asymptotic sampling distribution of the least squares estimate is discussed. The Metropolis-Hastings proposal distribution is generated conditioning on the variance of the estimator. Finally, samples from the posterior distribution of regression parameters are obtained. The proposed method is tested through betting simulations using data from Hong Kong horse racing market. / Detailed summary in vernacular field only. / Xu, Wenjun. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 46-48). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Hong Kong Horse Racing Market and Models in Horse Racing --- p.4 / Chapter 2.1 --- Hong Kong Horse Racing Market --- p.4 / Chapter 2.2 --- Models in Horse Racing --- p.6 / Chapter 3 --- Metropolis-Hastings Algorithm in Multinomial Regression with Gamma Utilities --- p.10 / Chapter 3.1 --- Notations and Posterior Distribution --- p.10 / Chapter 3.2 --- Metropolis-Hastings Algorithm --- p.11 / Chapter 4 --- Application --- p.15 / Chapter 4.1 --- Variables --- p.16 / Chapter 4.2 --- Markov Chain Simulation --- p.17 / Chapter 4.3 --- Model Selection --- p.27 / Chapter 4.4 --- Estimation Result --- p.31 / Chapter 4.5 --- Betting Strategies and Comparisons --- p.33 / Chapter 5 --- Conclusion --- p.41 / Appendix A --- p.43 / Appendix B --- p.44 / Bibliography --- p.46 Logistic regression analysis Log-linear models Distribution (Probability theory) Bayesian statistical decision theory
53	Predicting open-source software quality using statistical and machine learning techniques Phadke, Amit Ashok, January 2004 (has links) Thesis (M.S.) -- Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
54	A customer equity-based segmentation of service consumers an application of multicriterion clusterwise regression for joint segmentation settings / Voorhees, Clay M. Cronin, J. Joseph. January 2006 (has links) Thesis (Ph. D.)--Florida State University, 2006. / Advisor: J. Joseph Cronin Jr., Florida State University, College of Business, Dept. of Marketing. Title and description from dissertation home page (viewed Sept. 27, 2006). Document formatted into pages; contains xi, 209 pages. Includes bibliographical references.
55	Proposta para uso da corrente crítica no gerenciamento de múltiplos projetos / Proposal for use of the critical chain in multiple projects management Cooper Ordonez, Robert Eduardo, 1973- 23 July 2013 (has links) Orientador: Olívio Novaski / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Mecânica / Made available in DSpace on 2018-08-23T02:56:25Z (GMT). No. of bitstreams: 1 CooperOrdonez_RobertEduardo_D.pdf: 3017595 bytes, checksum: cd030a2dd41c13c03a265b2befebeb50 (MD5) Previous issue date: 2013 / Resumo: O presente trabalho tem por objetivo propor um modelo para usar os conceitos da Corrente Crítica no gerenciamento de sistemas de múltiplos projetos. Para tal, foi desenvolvida uma pesquisa aplicada, de natureza qualitativa, em ambiente real. As variáveis estudadas foram definidas considerando a revisão da literatura científica e um estudo de campo realizado com antecipação à aplicação do modelo, o qual busca, por meio de uma visão sistêmica, gerenciar melhor a incerteza presente na estimativa de tempo das atividades dos projetos. As diretrizes da pesquisa-ação foram usadas para verificar o funcionamento do modelo e para o levantamento dos dados que posteriormente foram analisados por meio da técnica estatística Regressão Logística Binária. Essa técnica possibilitou encontrar o nível de impacto das variáveis de influência sobre a resposta do sistema, assim como as relações entre essas variáveis. Os dados analisados permitem sugerir que o modelo proposto funciona adequadamente e que os resultados obtidos poderiam ser trasladados do contexto estudado para outros contextos, contribuindo desta forma para o aprimoramento do método da Corrente Crítica / Abstract: The present work aims to propose a model for using the Critical Chain concepts for multiple projects management. To this, it developed an applied research in a real environment. Variables were defined considering scientific literature review and a field study carried out in advance of the application model, which seeks, through a systemic view, a better manage of the uncertainty present in the estimated time of project activities. The guidelines of action-research were used to check the performance of the model and the data collection were later analyzed by statistical technique: Binary Logistic Regression. This technique makes possible to find the impact level of influence variables and the relation with de system response, as well the relationships between them. The analyzed data may suggested that the proposed model works properly and that results could be transferred from de studied context to other contexts, thus contributing to the improvement of the Critical Chain method / Doutorado / Materiais e Processos de Fabricação / Doutor em Engenharia Mecânica Corrente crítica Análise de regressão logística Agenda de execução (Administração) Administração de projetos Critical chain Logistic regression analysis Scheduling (management) Project management
56	Bayesian logistic regression models for credit scoring Webster, Gregg January 2011 (has links) The Bayesian approach to logistic regression modelling for credit scoring is useful when there are data quantity issues. Data quantity issues might occur when a bank is opening in a new location or there is change in the scoring procedure. Making use of prior information (available from the coefficients estimated on other data sets, or expert knowledge about the coefficients) a Bayesian approach is proposed to improve the credit scoring models. To achieve this, a data set is split into two sets, “old” data and “new” data. Priors are obtained from a model fitted on the “old” data. This model is assumed to be a scoring model used by a financial institution in the current location. The financial institution is then assumed to expand into a new economic location where there is limited data. The priors from the model on the “old” data are then combined in a Bayesian model with the “new” data to obtain a model which represents all the available information. The predictive performance of this Bayesian model is compared to a model which does not make use of any prior information. It is found that the use of relevant prior information improves the predictive performance when the size of the “new” data is small. As the size of the “new” data increases, the importance of including prior information decreases Bayesian statistical decision theory Credit scoring systems Regression analysis Logistic regression analysis Monte Carlo method Markov processes Financial institutions
57	Symptom Cluster Analysis for Depression Treatment Outcomes and Growth Mixture Models for Analysis Association between Social Media Use Patterns and Anxiety Symptoms in Young Adults Chen, Ying January 2024 (has links) This dissertation research aims to develop systemic methods to analyze mental disorder and social media use data in young adults in a dynamic way. The first part of the dissertation is a comprehensive review on modeling methods of longitudinal data. The second part describes the methods that we used to identify symptom clusters that can characterize treatment trajectories and to predict responses of anti-depressants for depression patients. Manhattan distance and bottom-up hierarchical clustering methods were used to identify the symptom clusters. Penalized logistic regressions were conducted to identify top baseline predictors of treatment outcomes. The third part presents of Tweedie distribution application with generalized linear models and growth mixed models for analyzing association between social media use patterns and mental health status. The fourth part is future work and research directions. Public health Young adults--Mental health Social media Anxiety in youth Depression, Mental--Treatment Longitudinal method Logistic regression analysis
58	Detection of erroneous payments utilizing supervised and utilizing supervised and unsupervised data mining techniques Yanik, Todd E. 09 1900 (has links) Approved for public release; distribution in unlimited. / In this thesis we develop a procedure for detecting erroneous payments in the Defense Finance Accounting Service, Internal Review's (DFAS IR) Knowledge Base Of Erroneous Payments (KBOEP), with the use of supervised (Logistic Regression) and unsupervised (Classification and Regression Trees (C & RT)) modeling algorithms. S-Plus software was used to construct a supervised model of vendor payment data using Logistic Regression, along with the Hosmer-Lemeshow Test, for testing the predictive ability of the model. The Clementine Data Mining software was used to construct both supervised and unsupervised model of vendor payment data using Logistic Regression and C & RT algorithms. The Logistic Regression algorithm, in Clementine, generated a model with predictive probabilities, which were compared against the C & RT algorithm. In addition to comparing the predictive probabilities, Receiver Operating Characteristic (ROC) curves were generated for both models to determine which model provided the best results for a Coincidence Matrix's True Positive, True Negative, False Positive and False Negative Fractions. The best modeling technique was C & RT and was given to DFAS IR to assist in reducing the manual record selection process currently being used. A recommended ruleset was provided, along with a detailed explanation of the algorithm selection process. / Lieutenant Commander, United States Navy Data mining Logistic regression analysis Regression analysis Data Mining Erroneous Payments Logistic Regression Hosmer Lemeshow Test Classification and Regression Trees Receiver Operator Characteristic curves supervised and unsupervised modeling
59	High-dimensional classification and attribute-based forecasting Lo, Shin-Lian 27 August 2010 (has links) This thesis consists of two parts. The first part focuses on high-dimensional classification problems in microarray experiments. The second part deals with forecasting problems with a large number of categories in predictors. Classification problems in microarray experiments refer to discriminating subjects with different biologic phenotypes or known tumor subtypes as well as to predicting the clinical outcomes or the prognostic stages of subjects. One important characteristic of microarray data is that the number of genes is much larger than the sample size. The penalized logistic regression method is known for simultaneous variable selection and classification. However, the performance of this method declines as the number of variables increases. With this concern, in the first study, we propose a new classification approach that employs the penalized logistic regression method iteratively with a controlled size of gene subsets to maintain variable selection consistency and classification accuracy. The second study is motivated by a modern microarray experiment that includes two layers of replicates. This new experimental setting causes most existing classification methods, including penalized logistic regression, not appropriate to be directly applied because the assumption of independent observations is violated. To solve this problem, we propose a new classification method by incorporating random effects into penalized logistic regression such that the heterogeneity among different experimental subjects and the correlations from repeated measurements can be taken into account. An efficient hybrid algorithm is introduced to tackle computational challenges in estimation and integration. Applications to a breast cancer study show that the proposed classification method obtains smaller models with higher prediction accuracy than the method based on the assumption of independent observations. The second part of this thesis develops a new forecasting approach for large-scale datasets associated with a large number of predictor categories and with predictor structures. The new approach, beyond conventional tree-based methods, incorporates a general linear model and hierarchical splits to make trees more comprehensive, efficient, and interpretable. Through an empirical study in the air cargo industry and a simulation study containing several different settings, the new approach produces higher forecasting accuracy and higher computational efficiency than existing tree-based methods. Classification Microarray experiments Tree-based methods Variable selection Penalized logistic regression Forecasting Computational biology Bioinformatics Pattern recognition systems DNA microarrays Classification Logistic regression analysis
60	A computational approach to discovering p53 binding sites in the human genome Lim, Ji-Hyun January 2013 (has links) The tumour suppressor p53 protein plays a central role in the DNA damage response/checkpoint pathways leading to DNA repair, cell cycle arrest, apoptosis and senescence. The activation of p53-mediated pathways is primarily facilitated by the binding of tetrameric p53 to two 'half-sites', each consisting of a decameric p53 response element (RE). Functional REs are directly adjacent or separated by a small number of 1-13 'spacer' base pairs (bp). The p53 RE is detected by exact or inexact matches to the palindromic sequence represented by the regular expression [AG][AG][AG]C[AT][TA]G[TC][TC][TC] or a position weight matrix (PWM). The use of matrix-based and regular expression pattern-matching techniques, however, leads to an overwhelming number of false positives. A more specific model, which combines multiple factors known to influence p53-dependent transcription, is required for accurate detection of the binding sites. In this thesis, we present a logistic regression based model which integrates sequence information and epigenetic information to predict human p53 binding sites. Sequence information includes the PWM score and the spacer length between the two half-sites of the observed binding site. To integrate epigenetic information, we analyzed the surrounding region of the binding site for the presence of mono- and trimethylation patterns of histone H3 lysine 4 (H3K4). Our model showed a high level of performance on both a high-resolution data set of functional p53 binding sites from the experimental literature (ChIP data) and the whole human genome. Comparing our model with a simpler sequence-only model, we demonstrated that the prediction accuracy of the sequence-only model could be improved by incorporating epigenetic information, such as the two histone modification marks H3K4me1 and H3K4me3. 610

Search results