• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 91
  • 23
  • 10
  • 4
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 148
  • 148
  • 40
  • 38
  • 35
  • 32
  • 22
  • 19
  • 19
  • 18
  • 18
  • 17
  • 17
  • 14
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Let's Have a party! An Open-Source Toolbox for Recursive Partytioning

Hothorn, Torsten, Zeileis, Achim, Hornik, Kurt January 2007 (has links) (PDF)
Package party, implemented in the R system for statistical computing, provides basic classes and methods for recursive partitioning along with reference implementations for three recently-suggested tree-based learners: conditional inference trees and forests, and model-based recursive partitioning. / Series: Research Report Series / Department of Statistics and Mathematics
32

Coping with the computational and statistical bipolar nature of machine learning

Machart, Pierre 21 December 2012 (has links)
L'Apprentissage Automatique tire ses racines d'un large champ disciplinaire qui inclut l'Intelligence Artificielle, la Reconnaissance de Formes, les Statistiques ou l'Optimisation. Dès les origines de l'Apprentissage, les questions computationelles et les propriétés en généralisation ont toutes deux été identifiées comme centrales pour la discipline. Tandis que les premières concernent les questions de calculabilité ou de complexité (sur un plan fondamental) ou d'efficacité computationelle (d'un point de vue plus pratique) des systèmes d'apprentissage, les secondes visent a comprendre et caractériser comment les solutions qu'elles fournissent vont se comporter sur de nouvelles données non encore vues. Ces dernières années, l'émergence de jeux de données à grande échelle en Apprentissage Automatique a profondément remanié les principes de la Théorie de l'Apprentissage. En prenant en compte de potentielles contraintes sur le temps d'entraînement, il faut faire face à un compromis plus complexe que ceux qui sont classiquement traités par les Statistiques. Une conséquence directe tient en ce que la mise en place d'algorithmes efficaces (autant en théorie qu'en pratique) capables de tourner sur des jeux de données a grande échelle doivent impérativement prendre en compte les aspects statistiques et computationels de l'Apprentissage de façon conjointe. Cette thèse a pour but de mettre à jour, analyser et exploiter certaines des connections qui existent naturellement entre les aspects statistiques et computationels de l'Apprentissage. / Machine Learning is known to have its roots in a broad spectrum of fields including Artificial Intelligence, Pattern Recognition, Statistics or Optimisation. From the earliest stages of Machine Learning, both computational issues and generalisation properties have been identified as central to the field. While the former address the question of computability, complexity (from a fundamental perspective) or computational efficiency (on a more practical standpoint) of learning systems, the latter aim at understanding and characterising how well the solutions they provide perform on new, unseen data. Those last years, the emergence of large-scale datasets in Machine Learning has been deeply reshaping the principles of Learning Theory. Taking into account possible constraints on the training time, one has to deal with more complex trade-offs than the ones classically addressed by Statistics. As a direct consequence, designing new efficient algorithms (both in theory and practice), able to handle large-scale datasets, imposes to jointly deal with the statistical and computational aspects of Learning. The present thesis aims at unravelling, analysing and exploiting some of the connections that naturally exist between the statistical and computational aspects of Learning. More precisely, in a first part, we extend the stability analysis, which relates some algorithmic properties to the generalisation abilities of learning algorithms, to a novel (and fine-grain) performance measure, namely the confusion matrix. In a second part, we present a novel approach to learn a kernel-based regression function, that serves the learning task at hand and exploits the structure of
33

The Effect of Reputation Shocks to Rating Agencies on Corporate Disclosures

Sethuraman, Subramanian January 2016 (has links)
<p>This paper explores the effect of credit rating agency’s (CRA) reputation on the discretionary disclosures of corporate bond issuers. Academics, practitioners, and regulators disagree on the informational role played by major CRAs and the usefulness of credit ratings in influencing investors’ perception of the credit risk of bond issuers. Using management earnings forecasts as a measure of discretionary disclosure, I find that investors demand more (less) disclosure from bond issuers when the ratings become less (more) credible. In addition, using content analytics, I find that bond issuers disclose more qualitative information during periods of low CRA reputation to aid investors better assess credit risk. That the corporate managers alter their voluntary disclosure in response to CRA reputation shocks is consistent with credit ratings providing incremental information to investors and reducing adverse selection in lending markets. Overall, my findings suggest that managers rely on voluntary disclosure as a credible mechanism to reduce information asymmetry in bond markets.</p> / Dissertation
34

Attitude and Adoption: Understanding Climate Change Through Predictive Modeling

Jackson B Bennett (7042994) 12 August 2019 (has links)
Climate change has emerged as one of the most critical issues of the 21st century. It stands to impact communities across the globe, forcing individuals and governments alike to adapt to a new environment. While it is critical for governments and organizations to make strides to change business as usual, individuals also have the ability to make an impact. The goal of this thesis is to study the beliefs that shape climate-related attitudes and the factors that drive the adoption of sustainable practices and technologies using a foundation in statistical learning. Previous research has studied the factors that influence both climate-related attitude and adoption, but comparatively little has been done to leverage recent advances in statistical learning and computing ability to advance our understanding of these topics. As increasingly large amounts of relevant data become available, it will be pivotal not only to use these emerging sources to derive novel insights on climate change, but to develop and improve statistical frameworks designed with climate change in mind. This thesis presents two novel applications of statistical learning to climate change, one of which includes a more general framework that can easily be extended beyond the field of climate change. Specifically, the work consists of two studies: (1) a robust integration of social media activity with climate survey data to relate climate-talk to climate-thought and (2) the development and validation of a statistical learning model to predict renewable energy installations using social, environmental, and economic predictors. The analysis presented in this thesis supports decision makers by providing new insights on the factors that drive climate attitude and adoption.
35

Performance financeira da carteira na avaliação de modelos de análise e concessão de crédito: uma abordagem baseada em aprendizagem estatística / Financial performance portfolio to evaluate and select analyses and credit models: An approach based on Statistical Learning

Silva, Rodrigo Alves 05 September 2014 (has links)
Os modelos de análise e decisão de concessão de crédito buscam associar o perfil do tomador de crédito à probabilidade do não pagamento de obrigações contraídas, identificando assim o risco associado ao tomador e auxiliando a firma a decidir pela aprovação ou negação da solicitação de crédito. Atualmente este campo de pesquisa tem ganhado importância no cenário nacional - pela intensificação da atividade de crédito no país com grande participação dos bancos públicos neste processo - e internacional - pelo aumento das preocupações com potenciais danos à economia derivados de eventos de default. Tal quadro fez com que fossem construídos e adaptados diversos modelos e métodos à análise de risco de crédito tanto para consumidores como para empresas. Estes modelos são testados e comparados com base na acurácia de previsão ou de métricas de otimização estatística. Este é um procedimento que pode não se mostrar eficiente do ponto de vista financeiro, ao mesmo tempo em que dificulta a interpretação e tomada de decisão por parte da firma quanto a qual o melhor modelo, gerando uma lacuna pelo desprendimento observado entre a decisão de qual o modelo a ser adotado e o objetivo financeiro da empresa. Tendo em vista que o desempenho financeiro é um dos principais indicadores de qualquer procedimento gerencial, o presente estudo objetivou preencher a esta lacuna analisando o desempenho financeiro de carteiras de crédito formadas por técnicas de aprendizagem estatística utilizadas atualmente na classificação e análise de risco de crédito em pesquisas nacionais e internacionais. A pesquisa selecionou as técnicas: análise discriminante, regressão logística, redes bayesianas Naïve Bayes, kdB-1, kdB-2, SVC e SVM e aplicou tais técnicas junto à base de dados German Credit Data Set. Os resultados foram analisados e comparados inicialmente em termos de acurácia e custos por erro de classificação. Adicionalmente a pesquisa propôs o emprego de quatro métricas financeiras (RFC, PLR, RAROC e IS), encontrando variações quanto aos resultados produzidos por cada técnica. Estes resultados sugerem variações quanto a sequência de eficiência e consequentemente de emprego das técnicas, demonstrando a importância da consideração destas métricas para a análise e decisão de seleção de modelos de classificação ótimos. / Analysis and decision credit concession models search for relating the borrower\'s credit profile to the nonpayment probability of their obligations, identifying risks related to borrower and helping decision firm to approve or deny the credit request. Currently this search field has increased in Brazilian scenario - by credit activity intensification into the country with a large public banks sharing - and in the international scenario - by growing concerns about economy potential damages resulting from default events. This position leads the construction and adaptation of several models and methods by credit risk analysis from both consumers and companies. These models have been tested and compared based on prediction of accuracy or other statistical optimization metrics. This proceed is eventually not effective when analyzed by a financial standpoint, in the same time that affects the understanding and decision of the enterprise about the best model, creating a gap in the decision model choice and the firm financial goals. Given that the financial performance is a foremost indicator of any management procedure, this study aimed to address this gap by the financial performance analysis of loan portfolios formed by statistical learning techniques currently used in the classification and credit risk analysis in national and international researches. The selected techniques (discriminant analysis, logistic regression, Bayesian networks Naïve Bayes , 1 - KDB , KDB - 2 , SVC and SVM) were applied to the German Credit Data Set and their results were initially analyzed and compared in terms of accuracy and misclassification costs. Regardless of these metrics the research has proposed to use four financial metrics (RFC, PLR, RAROC and IS), finding variations in the results of each statistical learning techniques. These results suggest variations in the sequence of efficiency and, ultimately, techniques choice, demonstrating the importance of considering these metrics for analysis and selection of decision models of optimal classification.
36

Individual differences in the use of distributional information in linguistic contexts

Hall, Jessica Erin 01 May 2018 (has links)
Statistical learning experiments have demonstrated that children and infants are sensitive to the types of statistical regularities found in natural language. These experiments often rely on statistical information based on linear dependencies, e.g. that x predicts y either immediately or after some intervening items, whereas learning to creatively use language relies on the ability to form grammatical categories (e.g. verbs, nouns) that share distributions. Distributional learning has not been explored in children or in individuals with developmental language disorder. Proposed statistical learning deficits in individuals with developmental language delay (DLD) are thought to have downstream effects related to poorer comprehension, but this relationship has not been experimentally shown. In this project, children and adults with and without DLD and their same-age typically developing (TD) peers complete an artificial grammar learning task that employs a made-up language and an online comprehension task that employs real language. In the artificial grammar learning task, participants are tested to determine if they have learned the statistical regularities of trained stimuli and formed categories based upon these regularities. We hypothesize that if individuals with DLD have difficulty utilizing distributional information from novel input, then they will show less evidence of forming new categories than TD peers. Our second hypothesis is that if regularities are learned based on experience, then adults and children will show similar learning because they will have the same exposure to the artificial language. In the online comprehension task, participants use a computer mouse to choose a preferred interpretation of a sentence that is ambiguous, but that most adults interpret a certain way due to linguistic experience. We hypothesize that if individuals with DLD have overall poorer linguistic experience compared to TD individuals, then they will show weaker effects of biases than peers. Finally, we use measurements from both tasks to verify correlation between them, for the additional goal of showing that language comprehension and statistical learning are related. This study provides information about differences between individuals with DLD and their TD peers and between adults and children in the ability to use distributional information from both accumulated and novel input. To this end, we reveal the role of input and experience in using distributional information in linguistic environments.
37

Neural Networks

Jordan, Michael I., Bishop, Christopher M. 13 March 1996 (has links)
We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.
38

Learning from Incomplete Data

Ghahramani, Zoubin, Jordan, Michael I. 24 January 1995 (has links)
Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner. These algorithms are based on mixture modeling and make two distinct appeals to the Expectation-Maximization (EM) principle (Dempster, Laird, and Rubin 1977)---both for the estimation of mixture components and for coping with the missing data.
39

A Note on Support Vector Machines Degeneracy

Rifkin, Ryan, Pontil, Massimiliano, Verri, Alessandro 11 August 1999 (has links)
When training Support Vector Machines (SVMs) over non-separable data sets, one sets the threshold $b$ using any dual cost coefficient that is strictly between the bounds of $0$ and $C$. We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by ${f w} = {f 0}$, and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays.
40

Novel Computational Analyses of Allergens for Improved Allergenicity Risk Assessment and Characterization of IgE Reactivity Relationships

Soeria-Atmadja, Daniel January 2008 (has links)
Immunoglobulin E (IgE) mediated allergy is a major and seemingly increasing health problem in the Western countries. The combined usage of databases of molecular and clinical information on allergens (allergenic proteins) as well as new experimental platforms capable of generating huge amounts of allergy-related data from a single blood test holds great potential to enhance our knowledge of this complex disease. To maximally benefit from this development, however, both novel and improved methods for computational analysis are urgently required. This thesis concerns two types of important and practical computational analyses of allergens: allergenicity/IgE-cross-reactivity risk assessment and characterization of IgE-reactivity patterns. Both directions rely on development and implementation of bioinformatics and statistical learning algorithms, which are applied to either amino acid sequence information of allergenic proteins or on quantified human blood serum levels of specific IgE-antibodies to allergen preparations (purified extracts of allergenic sources, such as e.g. peanut or birch). The main application for computational risk assessment of allergenicity is to prevent unintentional introduction of allergen-encoding transgenes in genetically modified (GM) food crops. Two separate classification procedures for potential protein allergenicity are introduced. Both protocols rely on multivariate classification algorithms that are educated to discriminate allergens from presumable non-allergens based on their amino acid sequence. Both classification procedures are thoroughly evaluated and the second protocol shows state-of-the-art performance in comparison to current top-ranked methods. Moreover, several pitfalls in performance estimation of classifiers are demonstrated and procedures to circumvent these are suggested. Visualization and characterization of IgE-reactivity patterns among allergen preparations are enabled by application of bioinformatics and statistical learning methods to a multivariate dataset holding recorded blood serum IgE-levels of over 1000 sensitized individuals, each measured to 89 allergen preparations. Moreover, a novel framework for divisive hierarchical clustering including graphical representation of the resulting output is introduced, which greatly simplifies analysis of the abovementioned dataset. Important IgE-reactivity relationships within several groups of allergen preparations are identified including well-known groups of clinically relevant cross-reactivities.

Page generated in 0.0594 seconds