Spelling suggestions: "subject:"ordinal degression"" "subject:"ordinal aregression""
1 |
Ordinal Regression to Evaluate Student Ratings DataBell, Emily Brooke 07 July 2008 (has links) (PDF)
Student evaluations are the most common and often the only method used to evaluate teachers. In these evaluations, which typically occur at the end of every term, students rate their instructors on criteria accepted as constituting exceptional instruction in addition to an overall assessment. This presentation explores factors that influence student evaluations using the teacher ratings data of Brigham Young University from Fall 2001 to Fall 2006. This project uses ordinal regression to model the probability of an instructor receiving a good, average, or poor rating. Student grade, instructor status, class level, student gender, total enrollment, term, GE class status, and college are used as explanatory variables.
|
2 |
Ordinary least squares regression of ordered categorical data: inferential implications for practiceLarrabee, Beth R. January 1900 (has links)
Master of Science / Department of Statistics / Nora Bello / Ordered categorical responses are frequently encountered in many disciplines. Examples of interest in agriculture include quality assessments, such as for soil or food products, and evaluation of lesion severity, such as teat ends status in dairy cattle. Ordered categorical responses are characterized by multiple categories or levels recorded on a ranked scale that, while apprising relative order, are not informative of magnitude of or proportionality between levels. A number of statistically sound models for ordered categorical responses have been proposed, such as logistic regression and probit models, but these are commonly underutilized in practice. Instead, the ordinary least squares linear regression model is often employed with ordered categorical responses despite violation of basic model assumptions. In this study, the inferential implications of this approach are investigated using a simulation study that evaluates robustness based on realized Type I error rate and statistical power. The design of the simulation study is motivated by applied research cases reported in the literature. A variety of plausible scenarios were considered for simulation, including various shapes of the frequency distribution and different number of categories of the ordered categorical response. Using a real dataset on frequency of antimicrobial use in feedlots, I demonstrate the inferential performance of ordinary least squares linear regression on ordered categorical responses relative to a probit model.
|
3 |
Lyckans land? : En ekonometrisk studie över nationshemvistens påverkan på upplevd lycka.Pistol, Andreas January 2010 (has links)
<p>Does the country people live in affect the probability of them experiencing happiness? Can a country variable in an ordinal regression model be affected when microeconomic and macroeconomic factors are added to the model? The possible outcomes are either that the country variable affects less when the additional predictors are added to the model, or that they stay the same. The micro data is collected from the European Social Survey database, the macro data is collected from the World Bank. The country variable becomes less substantial when additional variables are added to the model. The variable with the most influence over expected happiness apart from the country variable is whether the individual often socializes with friends or not. It’s statistically significant that the supervened variables make the country variable less volatile in some cases.</p>
|
4 |
Lyckans land? : En ekonometrisk studie över nationshemvistens påverkan på upplevd lycka.Pistol, Andreas January 2010 (has links)
Does the country people live in affect the probability of them experiencing happiness? Can a country variable in an ordinal regression model be affected when microeconomic and macroeconomic factors are added to the model? The possible outcomes are either that the country variable affects less when the additional predictors are added to the model, or that they stay the same. The micro data is collected from the European Social Survey database, the macro data is collected from the World Bank. The country variable becomes less substantial when additional variables are added to the model. The variable with the most influence over expected happiness apart from the country variable is whether the individual often socializes with friends or not. It’s statistically significant that the supervened variables make the country variable less volatile in some cases.
|
5 |
Evaluation and ranking of minor-league hitters using a statistical modelJohnson, Gary Brent January 1900 (has links)
Master of Science / Department of Statistics / Thomas M. Loughin / Traditionally, major-league scouts have evaluated young “position players,” those who are not pitchers, using the “Five Tools”: hitting for average, hitting for power, running, throwing, and fielding. However, “sabermetricians,” those who study the science of baseball, e.g. Bill James, have been trying to evaluate position players using quantifiable measures of performance. In this study, a factor analysis was used to determine underlying characteristics of minor-league hitters. The underlying factors were determined to be slugging ability, lead-off hitting ability, “patience” at the plate, and pure-hitting ability. Additionally, an ordinal response was created from the number of at-bats and on-base plus slugging percentage in the majors during the 2002-05 seasons. The underlying characteristics along with other variables such as a player’s age, position, and level in the minors are used in a cumulative logit logistic regression model to predict a player’s probability of notable success in the majors. The model is built upon data from the 2002 minor-league season and data from the 2002, 2003, 2004, and 2005 major-league seasons.
|
6 |
Unbiased Recursive Partitioning: A Conditional Inference FrameworkHothorn, Torsten, Hornik, Kurt, Zeileis, Achim January 2004 (has links) (PDF)
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously effects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, indicating the need for an unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on animal abundance, glaucoma classification, node positive breast cancer and mammography experience are re-analyzed. / Series: Research Report Series / Department of Statistics and Mathematics
|
7 |
Logistic Regression Analysis to Determine the Significant Factors Associated with Substance Abuse in School-Aged ChildrenMaxwell, Kori Lloyd Hugh 17 April 2009 (has links)
Substance abuse is the overindulgence in and dependence on a drug or chemical leading to detrimental effects on the individual’s health and the welfare of those surrounding him or her. Logistic regression analysis is an important tool used in the analysis of the relationship between various explanatory variables and nominal response variables. The objective of this study is to use this statistical method to determine the factors which are considered to be significant contributors to the use or abuse of substances in school-aged children and also determine what measures can be implemented to minimize their effect. The logistic regression model was used to build models for the three main types of substances used in this study; Tobacco, Alcohol and Drugs and this facilitated the identification of the significant factors which seem to influence their use in children.
|
8 |
Logistic Regression Analysis to Determine the Significant Factors Associated with Substance Abuse in School-Aged ChildrenMaxwell, Kori Lloyd Hugh 17 April 2009 (has links)
Substance abuse is the overindulgence in and dependence on a drug or chemical leading to detrimental effects on the individual’s health and the welfare of those surrounding him or her. Logistic regression analysis is an important tool used in the analysis of the relationship between various explanatory variables and nominal response variables. The objective of this study is to use this statistical method to determine the factors which are considered to be significant contributors to the use or abuse of substances in school-aged children and also determine what measures can be implemented to minimize their effect. The logistic regression model was used to build models for the three main types of substances used in this study; Tobacco, Alcohol and Drugs and this facilitated the identification of the significant factors which seem to influence their use in children.
|
9 |
Effective and Efficient Optimization Methods for Kernel Based Classification ProblemsTayal, Aditya January 2014 (has links)
Kernel methods are a popular choice in solving a number of problems in statistical machine learning. In this thesis, we propose new methods for two important kernel based classification problems: 1) learning from highly unbalanced large-scale datasets and 2) selecting a relevant subset of input features for a given kernel specification.
The first problem is known as the rare class problem, which is characterized by a highly skewed or unbalanced class distribution. Unbalanced datasets can introduce significant bias in standard classification methods. In addition, due to the increase of data in recent years, large datasets with millions of observations have become commonplace. We propose an approach to address both the problem of bias and computational complexity in rare class problems by optimizing area under the receiver operating characteristic curve and by using a rare class only kernel representation, respectively. We justify the proposed approach theoretically and computationally. Theoretically, we establish an upper bound on the difference between selecting a hypothesis from a reproducing kernel Hilbert space and a hypothesis space which can be represented using a subset of kernel functions. This bound shows that for a fixed number of kernel functions, it is optimal to first include functions corresponding to rare class samples. We also discuss the connection of a subset kernel representation with the Nystrom method for a general class of regularized loss minimization methods. Computationally, we illustrate that the rare class representation produces statistically equivalent test error results on highly unbalanced datasets compared to using the full kernel representation, but with significantly better time and space complexity. Finally, we extend the method to rare class ordinal ranking, and apply it to a recent public competition problem in health informatics.
The second problem studied in the thesis is known as the feature selection problem in literature. Embedding feature selection in kernel classification leads to a non-convex optimization problem. We specify a primal formulation and solve the problem using a second-order trust region algorithm. To improve efficiency, we use the two-block Gauss-Seidel method, breaking the problem into a convex support vector machine subproblem and a non-convex feature selection subproblem. We reduce possibility of saddle point convergence and improve solution quality by sharing an explicit functional margin variable between block iterates. We illustrate how our algorithm improves upon state-of-the-art methods.
|
10 |
Feature extraction and supervised learning on fMRI : from practice to theory / Estimation de variables et apprentissage supervisé en IRMf : de la pratique à la théoriePedregosa-Izquierdo, Fabian 20 February 2015 (has links)
Jusqu'à l'avènement de méthodes de neuroimagerie non invasives les connaissances du cerveau sont acquis par l'étude de ses lésions, des analyses post-mortem et expérimentations invasives. De nos jours, les techniques modernes d'imagerie telles que l'IRMf sont capables de révéler plusieurs aspects du cerveau humain à une résolution spatio-temporelle progressivement élevé. Cependant, afin de pouvoir répondre à des questions neuroscientifiques de plus en plus complexes, les améliorations techniques dans l'acquisition doivent être jumelés à de nouvelles méthodes d'analyse des données. Dans cette thèse, je propose différentes applications de l'apprentissage statistique au traitement des données d'IRMf. Souvent, les données acquises par le scanner IRMf suivent une étape de sélection de variables dans lequel les cartes d'activation sont extraites du signal IRMf. La première contribution de cette thèse est l'introduction d'un modèle nommé Rank-1 GLM (R1-GLM) pour l'estimation jointe des cartes d'activation et de la fonction de réponse hémodynamique (HRF). Nous quantifions l'amélioration de cette approche par rapport aux procédures existantes sur différents jeux de données IRMf. La deuxième partie de cette thèse est consacrée au problème de décodage en IRMf, ce est à dire, la tâche de prédire quelques informations sur les stimuli à partir des cartes d'activation du cerveau. D'un point de vue statistique, ce problème est difficile due à la haute dimensionnalité des données, souvent des milliers de variables, tandis que le nombre d'images disponibles pour la formation est faible, typiquement quelques centaines. Nous examinons le cas où la variable cible est composé à partir de valeurs discrets et ordonnées. La deuxième contribution de cette thèse est de proposer les deux mesures suivantes pour évaluer la performance d'un modèle de décodage: l'erreur absolue et de désaccord par paires. Nous présentons plusieurs modèles qui optimisent une approximation convexe de ces fonctions de perte et examinent leur performance sur des ensembles de données IRMf. Motivé par le succès de certains modèles de régression ordinales pour la tâche du décodage basé IRMf, nous nous tournons vers l'étude de certaines propriétés théoriques de ces méthodes. La propriété que nous étudions est connu comme la cohérence de Fisher. La troisième, et la plus théorique, la contribution de cette thèse est d'examiner les propriétés de cohérence d'une riche famille de fonctions de perte qui sont utilisés dans les modèles de régression ordinales. / Until the advent of non-invasive neuroimaging modalities the knowledge of the human brain came from the study of its lesions, post-mortem analyses and invasive experimentations. Nowadays, modern imaging techniques such as fMRI are revealing several aspects of the human brain with progressively high spatio-temporal resolution. However, in order to answer increasingly complex neuroscientific questions the technical improvements in acquisition must be matched with novel data analysis methods. In this thesis we examine different applications of machine learning to the processing of fMRI data. We propose novel extensions and investigate the theoretical properties of different models. % The goal of an fMRI experiments is to answer a neuroscientific question. However, it is usually not possible to perform hypothesis testing directly on the data output by the fMRI scanner. Instead, fMRI data enters a processing pipeline in which it suffers several transformations before conclusions are drawn. Often the data acquired through the fMRI scanner follows a feature extraction step in which time-independent activation coefficients are extracted from the fMRI signal. The first contribution of this thesis is the introduction a model named Rank-1 GLM (R1-GLM) for the joint estimation of time-independent activation coefficients and the hemodynamic response function (HRF). We quantify the improvement of this approach with respect to existing procedures on different fMRI datasets. The second part of this thesis is devoted to the problem of fMRI-based decoding, i.e., the task of predicting some information about the stimuli from brain activation maps. From a statistical standpoint, this problem is challenging due to the high dimensionality of the data, often thousands of variables, while the number of images available for training is small, typically a few hundreds. We examine the case in which the target variable consist of discretely ordered values. The second contribution of this thesis is to propose the following two metrics to assess the performance of a decoding model: the absolute error and pairwise disagreement. We describe several models that optimize a convex surrogate of these loss functions and examine their performance on different fMRI datasets. Motivated by the success of some ordinal regression models for the task of fMRI-based decoding, we turn to study some theoretical properties of these methods. The property that we investigate is known as consistency or Fisher consistency and relates the minimization of a loss to the minimization of its surrogate. The third, and most theoretical, contribution of this thesis is to examine the consistency properties of a rich family of surrogate loss functions that are used in the context of ordinal regression. We give sufficient conditions for the consistency of the surrogate loss functions considered. This allows us to give theoretical reasons for some empirically observed differences in performance between surrogates.
|
Page generated in 0.0765 seconds