Global ETD Search

51	[en] A SPATIO-TEMPORAL MODEL FOR AVERAGE SPEED PREDICTION ON ROADS / [pt] UM MODELO ESPAÇO-TEMPORAL PARA A PREVISÃO DE VELOCIDADE MÉDIA EM ESTRADAS PEDRO HENRIQUE FONSECA DA SILVA DINIZ 06 June 2016 (has links) [pt] Muitos fatores podem in uenciar a velocidade de um veículo numa rodovia ou estrada, mas dois deles são observados diariamente pelos motoristas: sua localização e o momento do dia. Obter modelos que retornem a velocidade média como uma função do espaço e do tempo é ainda uma tarefa desafiadora. São muitas as aplicações para esses tipos de modelos, como por exemplo: tempo estimado de chegada, caminho mais curto e previsão de tráfico, deteccção de acidente, entre outros. Este estudo propõe um modelo de previsão baseado em uma média espaço-temporal da velocidade média/instantânea coletada de dados históricos de GPS. A grande vantagem do modelo proposto é a sua simplicidade. Além disso, os resultados experimentais obtidos de caminhões de entrega de combustíveis, por todo o ano de 2013 no Brasil, indicaram que a maioria das observações podem ser preditas usando esse modelo dentro de uma tolerância de erro aceitável. / [en] Many factors may inuence a vehicle speed in a road, but two of them are usually observed by many drivers: its location and the time of the day. To obtain a model that returns the average speed as a function of position and time is still a challenging task. The application of such models can be in different scenarios, such as: estimated time of arrival, shortest route paths, traffic prediction, and accident detection, just to cite a few. This study proposes a prediction model based on a spatio-temporal partition and mean/instantaneous speeds collected from historic GPS data. The main advantage of the proposed model is that it is very simple to compute. Moreover, experimental results obtained from fuel delivery trucks, along the whole year of 2013 in Brazil, indicate that most of the observations can be predicted using this model within an acceptable error tolerance. [pt] MODELAGEM ESPACO-TEMPORAL [en] SPATIOTEMPORAL MODELING [pt] APRENDIZADO ESTATISTICO [en] STATISTICAL LEARNING
52	Alternative Methods via Random Forest to Identify Interactions in a General Framework and Variable Importance in the Context of Value-Added Models January 2013 (has links) abstract: This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions. / Dissertation/Thesis / Ph.D. Statistics 2013 Statistics Data Mining Interactions Random Forest Statistical Learning Value Added Models Variable Importance
53	Data-driven identification of endophenotypes of Alzheimer’s disease progression: implications for clinical trials and therapeutic interventions Geifman, Nophar, Kennedy, Richard E., Schneider, Lon S., Buchan, Iain, Brinton, Roberta Diaz 15 January 2018 (has links) Background: Given the complex and progressive nature of Alzheimer's disease (AD), a precision medicine approach for diagnosis and treatment requires the identification of patient subgroups with biomedically distinct and actionable phenotype definitions. Methods: Longitudinal patient-level data for 1160 AD patients receiving placebo or no treatment with a follow-up of up to 18 months were extracted from an integrated clinical trials dataset. We used latent class mixed modelling (LCMM) to identify patient subgroups demonstrating distinct patterns of change over time in disease severity, as measured by the Alzheimer's Disease Assessment Scale-cognitive subscale score. The optimal number of subgroups (classes) was selected by the model which had the lowest Bayesian Information Criterion. Other patient-level variables were used to define these subgroups' distinguishing characteristics and to investigate the interactions between patient characteristics and patterns of disease progression. Results: The LCMM resulted in three distinct subgroups of patients, with 10.3% in Class 1, 76.5% in Class 2 and 13.2% in Class 3. While all classes demonstrated some degree of cognitive decline, each demonstrated a different pattern of change in cognitive scores, potentially reflecting different subtypes of AD patients. Class 1 represents rapid decliners with a steep decline in cognition over time, and who tended to be younger and better educated. Class 2 represents slow decliners, while Class 3 represents severely impaired slow decliners: patients with a similar rate of decline to Class 2 but with worse baseline cognitive scores. Class 2 demonstrated a significantly higher proportion of patients with a history of statins use; Class 3 showed lower levels of blood monocytes and serum calcium, and higher blood glucose levels. Conclusions: Our results, 'learned' from clinical data, indicate the existence of at least three subgroups of Alzheimer's patients, each demonstrating a different trajectory of disease progression. This hypothesis-generating approach has detected distinct AD subgroups that may prove to be discrete endophenotypes linked to specific aetiologies. These findings could enable stratification within a clinical trial or study context, which may help identify new targets for intervention and guide better care. Alzheimer's disease Precision medicine Endophenotypes Machine learning Statistical learning Latent class mixed models
54	Performance financeira da carteira na avaliação de modelos de análise e concessão de crédito: uma abordagem baseada em aprendizagem estatística / Financial performance portfolio to evaluate and select analyses and credit models: An approach based on Statistical Learning Rodrigo Alves Silva 05 September 2014 (has links) Os modelos de análise e decisão de concessão de crédito buscam associar o perfil do tomador de crédito à probabilidade do não pagamento de obrigações contraídas, identificando assim o risco associado ao tomador e auxiliando a firma a decidir pela aprovação ou negação da solicitação de crédito. Atualmente este campo de pesquisa tem ganhado importância no cenário nacional - pela intensificação da atividade de crédito no país com grande participação dos bancos públicos neste processo - e internacional - pelo aumento das preocupações com potenciais danos à economia derivados de eventos de default. Tal quadro fez com que fossem construídos e adaptados diversos modelos e métodos à análise de risco de crédito tanto para consumidores como para empresas. Estes modelos são testados e comparados com base na acurácia de previsão ou de métricas de otimização estatística. Este é um procedimento que pode não se mostrar eficiente do ponto de vista financeiro, ao mesmo tempo em que dificulta a interpretação e tomada de decisão por parte da firma quanto a qual o melhor modelo, gerando uma lacuna pelo desprendimento observado entre a decisão de qual o modelo a ser adotado e o objetivo financeiro da empresa. Tendo em vista que o desempenho financeiro é um dos principais indicadores de qualquer procedimento gerencial, o presente estudo objetivou preencher a esta lacuna analisando o desempenho financeiro de carteiras de crédito formadas por técnicas de aprendizagem estatística utilizadas atualmente na classificação e análise de risco de crédito em pesquisas nacionais e internacionais. A pesquisa selecionou as técnicas: análise discriminante, regressão logística, redes bayesianas Naïve Bayes, kdB-1, kdB-2, SVC e SVM e aplicou tais técnicas junto à base de dados German Credit Data Set. Os resultados foram analisados e comparados inicialmente em termos de acurácia e custos por erro de classificação. Adicionalmente a pesquisa propôs o emprego de quatro métricas financeiras (RFC, PLR, RAROC e IS), encontrando variações quanto aos resultados produzidos por cada técnica. Estes resultados sugerem variações quanto a sequência de eficiência e consequentemente de emprego das técnicas, demonstrando a importância da consideração destas métricas para a análise e decisão de seleção de modelos de classificação ótimos. / Analysis and decision credit concession models search for relating the borrower\'s credit profile to the nonpayment probability of their obligations, identifying risks related to borrower and helping decision firm to approve or deny the credit request. Currently this search field has increased in Brazilian scenario - by credit activity intensification into the country with a large public banks sharing - and in the international scenario - by growing concerns about economy potential damages resulting from default events. This position leads the construction and adaptation of several models and methods by credit risk analysis from both consumers and companies. These models have been tested and compared based on prediction of accuracy or other statistical optimization metrics. This proceed is eventually not effective when analyzed by a financial standpoint, in the same time that affects the understanding and decision of the enterprise about the best model, creating a gap in the decision model choice and the firm financial goals. Given that the financial performance is a foremost indicator of any management procedure, this study aimed to address this gap by the financial performance analysis of loan portfolios formed by statistical learning techniques currently used in the classification and credit risk analysis in national and international researches. The selected techniques (discriminant analysis, logistic regression, Bayesian networks Naïve Bayes , 1 - KDB , KDB - 2 , SVC and SVM) were applied to the German Credit Data Set and their results were initially analyzed and compared in terms of accuracy and misclassification costs. Regardless of these metrics the research has proposed to use four financial metrics (RFC, PLR, RAROC and IS), finding variations in the results of each statistical learning techniques. These results suggest variations in the sequence of efficiency and, ultimately, techniques choice, demonstrating the importance of considering these metrics for analysis and selection of decision models of optimal classification. Aprendizagem Estatística Classificadores Desempenho Financeiro Risco de crédito Classifiers Credit risk Financial Performance Statistical Learning
55	New Statistical Methods of Single-subject Transcriptome Analysis for Precision Medicine Li, Qike, Li, Qike January 2017 (has links) Precision medicine provides targeted treatment for an individual patient based on disease mechanisms, promoting health care. Matched transcriptomes derived from a single subject enable uncovering patient-specific dynamic changes associated with disease status. Yet, conventional statistical methodologies remain largely unavailable for single-subject transcriptome analysis due to the "single-observation" challenge. We hypothesize that, with statistical learning approaches and large-scale inferences, one can learn useful information from single-subject transcriptome data by identifying differentially expressed genes (DEG) / pathways (DEP) between two transcriptomes of an individual. This dissertation is an ensemble of my research work in single-subject transcriptome analytics, including three projects with varying focuses. The first project describes a two-step approach to identify DEPs by employing a parametric Gaussian mixture model followed by Fisher's exact tests. The second project relaxes the parametric assumption and develops a nonparametric algorithm based on k-means, which is more flexible and robust. The third project proposes a novel variance stabilizing framework to transform raw gene counts before identifying DEGs, and the transformation strategically by-passes the challenge of variance estimation in single-subject transcriptome analysis. In this dissertation, I present the main statistical methods and computational algorithms for all the three projects, as well as their real-data applications to personalized treatments. large scale inference precision medicine RNA-Seq single-subject analysis statistical learning transcriptomics
56	The limits of sensory processing during sleep: how far can we interact with memory contents? Farthouat, Juliane 16 December 2016 (has links) Depuis une dizaine d'années, de nouveaux outils ont été développés visant à stimuler la consolidation mnésique opérant au cours du sommeil. Sur base des connaissances récentes sur la perception sensorielle résiduelle chez le sujet endormi, ainsi que des théories suggérant que la consolidation de souvenirs a lieu suite à leurs réactivations durant le sommeil, de nouveaux paradigmes ont vus le jour, permettant de biaiser cette réactivation spontanée à l'aide de stimulations externes. Toutefois, les limites et les conditions nécessaires de ce processus n'ont pas été clairement définies. Par ailleurs, si la question de stimuler la consolidation en mémoire durant le sommeil a été fortement étudiée, la possibilité d'au contraire interférer avec le contenu mnésique reste en suspens. Enfin, des études récentes suggèrent que la stimulation auditive permet non seulement de stimuler la mémoire, mais qu'il est également possible de créer de nouvelles associations entre un stimulus et une réponse respiratoire durant le sommeil. La possibilité d'établir des associations plus complexes entre stimuli reste à établir.Au cours de cette dissertation, les trois études principales de ma thèse seront successivement présentées, durant lesquelles nous nous sommes attachés à déterminer les limites du traitement sensoriel durant le sommeil et ses possibles interactions avec le contenu mnésique. Dans une première étude, nous nous sommes intéressés à la possibilité de perturber un apprentissage de paires de mots en représentant durant le sommeil des paires de mots interférentes ainsi que les oscillations cérébrales associées à la réactivation du contenu mnésique. Dans une seconde étude, nous avons développé un paradigme et une technique d'analyse des potentiels évoqués stationnaires en magnétoencéphalographie permettant de marquer la segmentation ayant lieu au cours de l'écoute d'une séquence statistique ainsi que la progression temporelle du processus d'apprentissage associé. Enfin, dans une dernière étude, nous avons testé à l'aide des potentiels évoqués stationnaires si la détection de régularités statistiques est possible au cours du sommeil. / Over the last decade, new tools have been developed to boost the memory consolidation processes taking place during sleep. Based on recent knowledge about residual sensory processing in sleeping adults, and on theoretical accounts that memory consolidation during sleep occurs via the reactivation of memory content, new paradigms have been proposed to bias spontaneous memory reactivation using sensory stimulation. However, the limits and necessary conditions for successful memory reactivation remain unclear. Also, if boosting consolidation processes has been widely studied, it remains to be tested whether it is possible, to the contrary, to interfere with memory acquisition using stimulations. Finally, recent studies suggest that auditory stimulation during sleep not only boost memory, but also that new associations can be created between stimuli and breathing responses. Whether it is possible to create more complex associations between stimuli while sleeping is an enduring question.In this doctoral dissertation, I will present 3 main studies in which we studied the limits of auditory processing during sleep and how it can interact with memory content. In a first study, we attempted to interfere with consolidation of learned word pairs by presenting interfering material during sleep, and we studied the neural oscillations associated with the reactivation of memory content. In a second study, we developed a new paradigm using auditory frequency-tagged responses and magnetoencephalography to highlight segmentation processes that take place when listening to auditory statistical streams, and visualizing their temporal evolution. In the third and last study, we tested using steady-states analyses whether statistical learning can be achieved during sleep. / Doctorat en Sciences psychologiques et de l'éducation / info:eu-repo/semantics/nonPublished Neurosciences cognitives Psychologie cognitive sleep memory statistical learning hypnopaedia MEG frequency-tagged responses
57	Partition clustering of High Dimensional Low Sample Size data based on P-Values Von Borries, George Freitas January 1900 (has links) Doctor of Philosophy / Department of Statistics / Haiyan Wang / This thesis introduces a new partitioning algorithm to cluster variables in high dimensional low sample size (HDLSS) data and high dimensional longitudinal low sample size (HDLLSS) data. HDLSS data contain a large number of variables with small number of replications per variable, and HDLLSS data refer to HDLSS data observed over time. Clustering technique plays an important role in analyzing high dimensional low sample size data as is seen commonly in microarray experiment, mass spectrometry data, pattern recognition. Most current clustering algorithms for HDLSS and HDLLSS data are adaptations from traditional multivariate analysis, where the number of variables is not high and sample sizes are relatively large. Current algorithms show poor performance when applied to high dimensional data, especially in small sample size cases. In addition, available algorithms often exhibit poor clustering accuracy and stability for non-normal data. Simulations show that traditional clustering algorithms used in high dimensional data are not robust to monotone transformations. The proposed clustering algorithm PPCLUST is a powerful tool for clustering HDLSS data, which uses p-values from nonparametric rank tests of homogeneous distribution as a measure of similarity between groups of variables. Inherited from the robustness of rank procedure, the new algorithm is robust to outliers and invariant to monotone transformations of data. PPCLUSTEL is an extension of PPCLUST for clustering of HDLLSS data. A nonparametric test of no simple effect of group is developed and the p-value from the test is used as a measure of similarity between groups of variables. PPCLUST and PPCLUSTEL are able to cluster a large number of variables in the presence of very few replications and in case of PPCLUSTEL, the algorithm require neither a large number nor equally spaced time points. PPCLUST and PPCLUSTEL do not suffer from loss of power due to distributional assumptions, general multiple comparison problems and difficulty in controlling heterocedastic variances. Applications with available data from previous microarray studies show promising results and simulations studies reveal that the algorithm outperforms a series of benchmark algorithms applied to HDLSS data exhibiting high clustering accuracy and stability. High Dimensional Data Clustering Nonparametric Inference Bioinformatics Statistical Learning Data Mining Statistics (0463)
58	Identifying illicit graphic in the online community using the neural network framework Vega Ezpeleta, Emilio January 2017 (has links) In this paper two convolutional neural networks are estimated to classify whether an image contains a swastika or not. The images are gathered from the gaming platform Steam and by scraping a web search engine. The architecture of the networks is kept moderate and the difference between the models is the final layer. The first model uses an average type operation while the second uses the conventional fully-connected layer at the end. The results show that the performance of the two models is similar and the test error is in the 6-9 % range. Convolutional neural network CNN Neural networks Image recognition Statistical learning Probability Theory and Statistics Sannolikhetsteori och statistik
59	Efficient Algorithms for Learning Combinatorial Structures from Limited Data Asish Ghoshal (5929691) 15 May 2019 (has links) <div>Recovering combinatorial structures from noisy observations is a recurrent problem in many application domains, including, but not limited to, natural language processing, computer vision, genetics, health care, and automation. For instance, dependency parsing in natural language processing entails recovering parse trees from sentences which are inherently ambiguous. From a computational standpoint, such problems are typically intractable and call for designing efficient approximation or randomized algorithms with provable guarantees. From a statistical standpoint, algorithms that recover the desired structure using an optimal number of samples are of paramount importance.</div><div><br></div><div>We tackle several such problems in this thesis and obtain computationally and statistically efficient procedures. We demonstrate optimality of our methods by proving fundamental lower bounds on the number of samples needed by any method for recovering the desired structures. Specifically, the thesis makes the following contributions:</div><div><br></div><div>(i) We develop polynomial-time algorithms for learning linear structural equation models --- which are a widely used class of models for performing causal inference --- that recover the correct directed acyclic graph structure under identifiability conditions that are weaker than existing conditions. We also show that the sample complexity of our method is information-theoretically optimal.</div><div><br></div><div>(ii) We develop polynomial-time algorithms for learning the underlying graphical game from observations of the behavior of self-interested agents. The key combinatorial problem here is to recover the Nash equilibria set of the true game from behavioral data. We obtain fundamental lower bounds on the number of samples required for learning games and show that our method is statistically optimal.</div><div><br></div><div>(iii) Lastly, departing from the generative model framework, we consider the problem of structured prediction where the goal is to learn predictors from data that predict complex structured objects directly from a given input. We develop efficient learning algorithms that learn structured predictors by approximating the partition function and obtain generalization guarantees for our method. We demonstrate that randomization can not only improve efficiency but also generalization to unseen data.</div><div><br></div> Statistical Learning Theory Causal inferences Structural equation models Game theory. Structured prediction
60	Object-based suppression in auditory selective attention: The influence of statistical learning Daly, Heather R. January 2019 (has links) No description available. Cognitive Psychology Psychology suppression auditory selective attention auditory search statistical learning

Search results