Global ETD Search

61	Performance financeira da carteira na avaliação de modelos de análise e concessão de crédito: uma abordagem baseada em aprendizagem estatística / Financial performance portfolio to evaluate and select analyses and credit models: An approach based on Statistical Learning Rodrigo Alves Silva 05 September 2014 (has links) Os modelos de análise e decisão de concessão de crédito buscam associar o perfil do tomador de crédito à probabilidade do não pagamento de obrigações contraídas, identificando assim o risco associado ao tomador e auxiliando a firma a decidir pela aprovação ou negação da solicitação de crédito. Atualmente este campo de pesquisa tem ganhado importância no cenário nacional - pela intensificação da atividade de crédito no país com grande participação dos bancos públicos neste processo - e internacional - pelo aumento das preocupações com potenciais danos à economia derivados de eventos de default. Tal quadro fez com que fossem construídos e adaptados diversos modelos e métodos à análise de risco de crédito tanto para consumidores como para empresas. Estes modelos são testados e comparados com base na acurácia de previsão ou de métricas de otimização estatística. Este é um procedimento que pode não se mostrar eficiente do ponto de vista financeiro, ao mesmo tempo em que dificulta a interpretação e tomada de decisão por parte da firma quanto a qual o melhor modelo, gerando uma lacuna pelo desprendimento observado entre a decisão de qual o modelo a ser adotado e o objetivo financeiro da empresa. Tendo em vista que o desempenho financeiro é um dos principais indicadores de qualquer procedimento gerencial, o presente estudo objetivou preencher a esta lacuna analisando o desempenho financeiro de carteiras de crédito formadas por técnicas de aprendizagem estatística utilizadas atualmente na classificação e análise de risco de crédito em pesquisas nacionais e internacionais. A pesquisa selecionou as técnicas: análise discriminante, regressão logística, redes bayesianas Naïve Bayes, kdB-1, kdB-2, SVC e SVM e aplicou tais técnicas junto à base de dados German Credit Data Set. Os resultados foram analisados e comparados inicialmente em termos de acurácia e custos por erro de classificação. Adicionalmente a pesquisa propôs o emprego de quatro métricas financeiras (RFC, PLR, RAROC e IS), encontrando variações quanto aos resultados produzidos por cada técnica. Estes resultados sugerem variações quanto a sequência de eficiência e consequentemente de emprego das técnicas, demonstrando a importância da consideração destas métricas para a análise e decisão de seleção de modelos de classificação ótimos. / Analysis and decision credit concession models search for relating the borrower\'s credit profile to the nonpayment probability of their obligations, identifying risks related to borrower and helping decision firm to approve or deny the credit request. Currently this search field has increased in Brazilian scenario - by credit activity intensification into the country with a large public banks sharing - and in the international scenario - by growing concerns about economy potential damages resulting from default events. This position leads the construction and adaptation of several models and methods by credit risk analysis from both consumers and companies. These models have been tested and compared based on prediction of accuracy or other statistical optimization metrics. This proceed is eventually not effective when analyzed by a financial standpoint, in the same time that affects the understanding and decision of the enterprise about the best model, creating a gap in the decision model choice and the firm financial goals. Given that the financial performance is a foremost indicator of any management procedure, this study aimed to address this gap by the financial performance analysis of loan portfolios formed by statistical learning techniques currently used in the classification and credit risk analysis in national and international researches. The selected techniques (discriminant analysis, logistic regression, Bayesian networks Naïve Bayes , 1 - KDB , KDB - 2 , SVC and SVM) were applied to the German Credit Data Set and their results were initially analyzed and compared in terms of accuracy and misclassification costs. Regardless of these metrics the research has proposed to use four financial metrics (RFC, PLR, RAROC and IS), finding variations in the results of each statistical learning techniques. These results suggest variations in the sequence of efficiency and, ultimately, techniques choice, demonstrating the importance of considering these metrics for analysis and selection of decision models of optimal classification. Aprendizagem Estatística Classificadores Desempenho Financeiro Risco de crédito Classifiers Credit risk Financial Performance Statistical Learning
62	New Statistical Methods of Single-subject Transcriptome Analysis for Precision Medicine Li, Qike, Li, Qike January 2017 (has links) Precision medicine provides targeted treatment for an individual patient based on disease mechanisms, promoting health care. Matched transcriptomes derived from a single subject enable uncovering patient-specific dynamic changes associated with disease status. Yet, conventional statistical methodologies remain largely unavailable for single-subject transcriptome analysis due to the "single-observation" challenge. We hypothesize that, with statistical learning approaches and large-scale inferences, one can learn useful information from single-subject transcriptome data by identifying differentially expressed genes (DEG) / pathways (DEP) between two transcriptomes of an individual. This dissertation is an ensemble of my research work in single-subject transcriptome analytics, including three projects with varying focuses. The first project describes a two-step approach to identify DEPs by employing a parametric Gaussian mixture model followed by Fisher's exact tests. The second project relaxes the parametric assumption and develops a nonparametric algorithm based on k-means, which is more flexible and robust. The third project proposes a novel variance stabilizing framework to transform raw gene counts before identifying DEGs, and the transformation strategically by-passes the challenge of variance estimation in single-subject transcriptome analysis. In this dissertation, I present the main statistical methods and computational algorithms for all the three projects, as well as their real-data applications to personalized treatments. large scale inference precision medicine RNA-Seq single-subject analysis statistical learning transcriptomics
63	Statistical Learning with Artificial Neural Network Applied to Health and Environmental Data Sharaf, Taysseer 01 January 2015 (has links) The current study illustrates the utilization of artificial neural network in statistical methodology. More specifically in survival analysis and time series analysis, where both holds an important and wide use in many applications in our real life. We start our discussion by utilizing artificial neural network in survival analysis. In literature there exist two important methodology of utilizing artificial neural network in survival analysis based on discrete survival time method. We illustrate the idea of discrete survival time method and show how one can estimate the discrete model using artificial neural network. We present a comparison between the two methodology and update one of them to estimate survival time of competing risks. To fit a model using artificial neural network, you need to take care of two parts; first one is the neural network architecture and second part is the learning algorithm. Usually neural networks are trained using a non-linear optimization algorithm such as quasi Newton Raphson algorithm. Other learning algorithms are base on Bayesian inference. In this study we present a new learning technique by using a mixture of the two available methodologies for using Bayesian inference in training of neural networks. We have performed our analysis using real world data. We have used patients diagnosed with skin cancer in the United states from SEER database, under the supervision of the National Cancer Institute. The second part of this dissertation presents the utilization of artificial neural to time series analysis. We present a new method of training recurrent artificial neural network with Hybrid Monte Carlo Sampling and compare our findings with the popular auto-regressive integrated moving average (ARIMA) model. We used the carbon dioxide monthly average emission to apply our comparison, data collected from NOAA. Statistical Learning Bayesian Learning for ANN Artiﬁcial Neural Network Cancer Survival Global Warming Mathematics Statistical Methodology
64	Analysis of stock forum texts to examine correlation to stock prices Norlander, Olof January 2016 (has links) In this thesis, four methods of classification from statistical learning have been used to examine correlations between stock forum discussions and stock prices. The classifiers Naive Bayes, support vector machine, AdaBoost and random forest, were used on text data from two different stock forums to see if the text had any predictive power for the stock price of five different companies. The volatility and the direction of the price - whether it would go up or down - over a day was measured. The highest accuracy obtained for predicting high or low volatility came from random forest and was 85.2 %. For price difference the highest accuracy was 69.2 %, using the support vector machine. The average accuracy for predicting the price difference was 58.6 % and the average accuracy for predicting the volatility was 73.4 %. This thesis was made in collaboration with the company Scila which works with stock market security. statistical learning machine learning Computer and Information Sciences Data- och informationsvetenskap Engineering and Technology Teknik och teknologier
65	The limits of sensory processing during sleep: how far can we interact with memory contents? Farthouat, Juliane 16 December 2016 (has links) Depuis une dizaine d'années, de nouveaux outils ont été développés visant à stimuler la consolidation mnésique opérant au cours du sommeil. Sur base des connaissances récentes sur la perception sensorielle résiduelle chez le sujet endormi, ainsi que des théories suggérant que la consolidation de souvenirs a lieu suite à leurs réactivations durant le sommeil, de nouveaux paradigmes ont vus le jour, permettant de biaiser cette réactivation spontanée à l'aide de stimulations externes. Toutefois, les limites et les conditions nécessaires de ce processus n'ont pas été clairement définies. Par ailleurs, si la question de stimuler la consolidation en mémoire durant le sommeil a été fortement étudiée, la possibilité d'au contraire interférer avec le contenu mnésique reste en suspens. Enfin, des études récentes suggèrent que la stimulation auditive permet non seulement de stimuler la mémoire, mais qu'il est également possible de créer de nouvelles associations entre un stimulus et une réponse respiratoire durant le sommeil. La possibilité d'établir des associations plus complexes entre stimuli reste à établir.Au cours de cette dissertation, les trois études principales de ma thèse seront successivement présentées, durant lesquelles nous nous sommes attachés à déterminer les limites du traitement sensoriel durant le sommeil et ses possibles interactions avec le contenu mnésique. Dans une première étude, nous nous sommes intéressés à la possibilité de perturber un apprentissage de paires de mots en représentant durant le sommeil des paires de mots interférentes ainsi que les oscillations cérébrales associées à la réactivation du contenu mnésique. Dans une seconde étude, nous avons développé un paradigme et une technique d'analyse des potentiels évoqués stationnaires en magnétoencéphalographie permettant de marquer la segmentation ayant lieu au cours de l'écoute d'une séquence statistique ainsi que la progression temporelle du processus d'apprentissage associé. Enfin, dans une dernière étude, nous avons testé à l'aide des potentiels évoqués stationnaires si la détection de régularités statistiques est possible au cours du sommeil. / Over the last decade, new tools have been developed to boost the memory consolidation processes taking place during sleep. Based on recent knowledge about residual sensory processing in sleeping adults, and on theoretical accounts that memory consolidation during sleep occurs via the reactivation of memory content, new paradigms have been proposed to bias spontaneous memory reactivation using sensory stimulation. However, the limits and necessary conditions for successful memory reactivation remain unclear. Also, if boosting consolidation processes has been widely studied, it remains to be tested whether it is possible, to the contrary, to interfere with memory acquisition using stimulations. Finally, recent studies suggest that auditory stimulation during sleep not only boost memory, but also that new associations can be created between stimuli and breathing responses. Whether it is possible to create more complex associations between stimuli while sleeping is an enduring question.In this doctoral dissertation, I will present 3 main studies in which we studied the limits of auditory processing during sleep and how it can interact with memory content. In a first study, we attempted to interfere with consolidation of learned word pairs by presenting interfering material during sleep, and we studied the neural oscillations associated with the reactivation of memory content. In a second study, we developed a new paradigm using auditory frequency-tagged responses and magnetoencephalography to highlight segmentation processes that take place when listening to auditory statistical streams, and visualizing their temporal evolution. In the third and last study, we tested using steady-states analyses whether statistical learning can be achieved during sleep. / Doctorat en Sciences psychologiques et de l'éducation / info:eu-repo/semantics/nonPublished Neurosciences cognitives Psychologie cognitive sleep memory statistical learning hypnopaedia MEG frequency-tagged responses
66	Partition clustering of High Dimensional Low Sample Size data based on P-Values Von Borries, George Freitas January 1900 (has links) Doctor of Philosophy / Department of Statistics / Haiyan Wang / This thesis introduces a new partitioning algorithm to cluster variables in high dimensional low sample size (HDLSS) data and high dimensional longitudinal low sample size (HDLLSS) data. HDLSS data contain a large number of variables with small number of replications per variable, and HDLLSS data refer to HDLSS data observed over time. Clustering technique plays an important role in analyzing high dimensional low sample size data as is seen commonly in microarray experiment, mass spectrometry data, pattern recognition. Most current clustering algorithms for HDLSS and HDLLSS data are adaptations from traditional multivariate analysis, where the number of variables is not high and sample sizes are relatively large. Current algorithms show poor performance when applied to high dimensional data, especially in small sample size cases. In addition, available algorithms often exhibit poor clustering accuracy and stability for non-normal data. Simulations show that traditional clustering algorithms used in high dimensional data are not robust to monotone transformations. The proposed clustering algorithm PPCLUST is a powerful tool for clustering HDLSS data, which uses p-values from nonparametric rank tests of homogeneous distribution as a measure of similarity between groups of variables. Inherited from the robustness of rank procedure, the new algorithm is robust to outliers and invariant to monotone transformations of data. PPCLUSTEL is an extension of PPCLUST for clustering of HDLLSS data. A nonparametric test of no simple effect of group is developed and the p-value from the test is used as a measure of similarity between groups of variables. PPCLUST and PPCLUSTEL are able to cluster a large number of variables in the presence of very few replications and in case of PPCLUSTEL, the algorithm require neither a large number nor equally spaced time points. PPCLUST and PPCLUSTEL do not suffer from loss of power due to distributional assumptions, general multiple comparison problems and difficulty in controlling heterocedastic variances. Applications with available data from previous microarray studies show promising results and simulations studies reveal that the algorithm outperforms a series of benchmark algorithms applied to HDLSS data exhibiting high clustering accuracy and stability. High Dimensional Data Clustering Nonparametric Inference Bioinformatics Statistical Learning Data Mining Statistics (0463)
67	Measurability Aspects of the Compactness Theorem for Sample Compression Schemes Kalajdzievski, Damjan January 2012 (has links) In 1998, it was proved by Ben-David and Litman that a concept space has a sample compression scheme of size $d$ if and only if every finite subspace has a sample compression scheme of size $d$. In the compactness theorem, measurability of the hypotheses of the created sample compression scheme is not guaranteed; at the same time measurability of the hypotheses is a necessary condition for learnability. In this thesis we discuss when a sample compression scheme, created from compression schemes on finite subspaces via the compactness theorem, have measurable hypotheses. We show that if $X$ is a standard Borel space with a $d$-maximum and universally separable concept class $\m{C}$, then $(X,\CC)$ has a sample compression scheme of size $d$ with universally Borel measurable hypotheses. Additionally we introduce a new variant of compression scheme called a copy sample compression scheme. Statistical Learning VC-dimension PAC learnability Sample Compression Schemes
68	Identifying illicit graphic in the online community using the neural network framework Vega Ezpeleta, Emilio January 2017 (has links) In this paper two convolutional neural networks are estimated to classify whether an image contains a swastika or not. The images are gathered from the gaming platform Steam and by scraping a web search engine. The architecture of the networks is kept moderate and the difference between the models is the final layer. The first model uses an average type operation while the second uses the conventional fully-connected layer at the end. The results show that the performance of the two models is similar and the test error is in the 6-9 % range. Convolutional neural network CNN Neural networks Image recognition Statistical learning Probability Theory and Statistics Sannolikhetsteori och statistik
69	Efficient Algorithms for Learning Combinatorial Structures from Limited Data Asish Ghoshal (5929691) 15 May 2019 (has links) <div>Recovering combinatorial structures from noisy observations is a recurrent problem in many application domains, including, but not limited to, natural language processing, computer vision, genetics, health care, and automation. For instance, dependency parsing in natural language processing entails recovering parse trees from sentences which are inherently ambiguous. From a computational standpoint, such problems are typically intractable and call for designing efficient approximation or randomized algorithms with provable guarantees. From a statistical standpoint, algorithms that recover the desired structure using an optimal number of samples are of paramount importance.</div><div><br></div><div>We tackle several such problems in this thesis and obtain computationally and statistically efficient procedures. We demonstrate optimality of our methods by proving fundamental lower bounds on the number of samples needed by any method for recovering the desired structures. Specifically, the thesis makes the following contributions:</div><div><br></div><div>(i) We develop polynomial-time algorithms for learning linear structural equation models --- which are a widely used class of models for performing causal inference --- that recover the correct directed acyclic graph structure under identifiability conditions that are weaker than existing conditions. We also show that the sample complexity of our method is information-theoretically optimal.</div><div><br></div><div>(ii) We develop polynomial-time algorithms for learning the underlying graphical game from observations of the behavior of self-interested agents. The key combinatorial problem here is to recover the Nash equilibria set of the true game from behavioral data. We obtain fundamental lower bounds on the number of samples required for learning games and show that our method is statistically optimal.</div><div><br></div><div>(iii) Lastly, departing from the generative model framework, we consider the problem of structured prediction where the goal is to learn predictors from data that predict complex structured objects directly from a given input. We develop efficient learning algorithms that learn structured predictors by approximating the partition function and obtain generalization guarantees for our method. We demonstrate that randomization can not only improve efficiency but also generalization to unseen data.</div><div><br></div> Statistical Learning Theory Causal inferences Structural equation models Game theory. Structured prediction
70	Object-based suppression in auditory selective attention: The influence of statistical learning Daly, Heather R. January 2019 (has links) No description available. Cognitive Psychology Psychology suppression auditory selective attention auditory search statistical learning

Search results