Global ETD Search

11	Learning Algorithms Using Chance-Constrained Programs Jagarlapudi, Saketha Nath 07 1900 (has links) This thesis explores Chance-Constrained Programming (CCP) in the context of learning. It is shown that chance-constraint approaches lead to improved algorithms for three important learning problems — classification with specified error rates, large dataset classification and Ordinal Regression (OR). Using moments of training data, the CCPs are posed as Second Order Cone Programs (SOCPs). Novel iterative algorithms for solving the resulting SOCPs are also derived. Borrowing ideas from robust optimization theory, the proposed formulations are made robust to moment estimation errors. A maximum margin classifier with specified false positive and false negative rates is derived. The key idea is to employ chance-constraints for each class which imply that the actual misclassification rates do not exceed the specified. The formulation is applied to the case of biased classification. The problems of large dataset classification and ordinal regression are addressed by deriving formulations which employ chance-constraints for clusters in training data rather than constraints for each data point. Since the number of clusters can be substantially smaller than the number of data points, the resulting formulation size and number of inequalities are very small. Hence the formulations scale well to large datasets. The scalable classification and OR formulations are extended to feature spaces and the kernelized duals turn out to be instances of SOCPs with a single cone constraint. Exploiting this speciality, fast iterative solvers which outperform generic SOCP solvers, are proposed. Compared to state-of-the-art learners, the proposed algorithms achieve a speed up as high as 10000 times, when the specialized SOCP solvers are employed. The proposed formulations involve second order moments of data and hence are susceptible to moment estimation errors. A generic way of making the formulations robust to such estimation errors is illustrated. Two novel confidence sets for moments are derived and it is shown that when either of the confidence sets are employed, the robust formulations also yield SOCPs. Machine Learning Classification Dataset Classification Ordinal Regression (OR) Chance-Constrained Programming (CCP) Classification - Algorithms Ordinal Regression - Algorithms Machine Learning - Algorithms Second Order Cone Programs (SOCPs) Maximum Margin Classification Focused Crawling Large Datasets Error Rates Computer Science
12	Modelos de regressão para variáveis categóricas ordinais com aplicações ao problema de classificação / Regression models for ordinal categorical variables with applications to the classification problem Okura, Roberta Irie Sumi 11 April 2008 (has links) Neste trabalho, apresentamos algumas metodologias para analisar dados que possuem variável resposta categórica ordinal. Descrevemos os principais Modelos de Regressão conhecidos atualmente que consideram a ordenação das categorias de resposta, entre eles: Modelos Cumulativos e Modelos Sequenciais. Discutimos também o problema de discriminação e classificação de elementos em grupos ordinais, comentando sobre os preditores mais comuns para dados desse tipo. Apresentamos ainda a técnica de Análise Discriminante Ótima e sua versão aprimorada, baseada na utilização de métodos bootstrap. Por fim, aplicamos algumas das técnicas descritas a dados reais da área financeira, com o intuito de classificar possíveis clientes, no momento da aquisição de um cartão de crédito, como futuros bons, médios ou maus pagadores. Para essa aplicação, discutimos as vantagens e desvantagens dos modelos utilizados em termos de qualidade da classificação. / In this work, some methods to analyse data with ordinal categorical response are presented. We describe the most important and widely used Regression Models which consider the ordering of response categories like: Cumulative Models and Sequential Models. We also discuss the problem of how to discriminate and classify elements in ordinal groups, commenting on the most common predictors to this kind of data. Also we present the technique known as optimal discriminant analysis and its improved version, based on the use of bootstrap methods. Finally, we apply some of the described techniques to real financial data, intending to classify possible consumers, on acquistion of a credit card, as high, medium and low risk customers. With this application, we discuss the advantages and disadvantages of the models used in terms of quality of classification. classificação classification discriminação ordinal modelos de regressão ordinais ordinal categorical variables ordinal discrimination ordinal regression models variáveis categóricas ordinais
13	Modelos de regressão para variáveis categóricas ordinais com aplicações ao problema de classificação / Regression models for ordinal categorical variables with applications to the classification problem Roberta Irie Sumi Okura 11 April 2008 (has links) Neste trabalho, apresentamos algumas metodologias para analisar dados que possuem variável resposta categórica ordinal. Descrevemos os principais Modelos de Regressão conhecidos atualmente que consideram a ordenação das categorias de resposta, entre eles: Modelos Cumulativos e Modelos Sequenciais. Discutimos também o problema de discriminação e classificação de elementos em grupos ordinais, comentando sobre os preditores mais comuns para dados desse tipo. Apresentamos ainda a técnica de Análise Discriminante Ótima e sua versão aprimorada, baseada na utilização de métodos bootstrap. Por fim, aplicamos algumas das técnicas descritas a dados reais da área financeira, com o intuito de classificar possíveis clientes, no momento da aquisição de um cartão de crédito, como futuros bons, médios ou maus pagadores. Para essa aplicação, discutimos as vantagens e desvantagens dos modelos utilizados em termos de qualidade da classificação. / In this work, some methods to analyse data with ordinal categorical response are presented. We describe the most important and widely used Regression Models which consider the ordering of response categories like: Cumulative Models and Sequential Models. We also discuss the problem of how to discriminate and classify elements in ordinal groups, commenting on the most common predictors to this kind of data. Also we present the technique known as optimal discriminant analysis and its improved version, based on the use of bootstrap methods. Finally, we apply some of the described techniques to real financial data, intending to classify possible consumers, on acquistion of a credit card, as high, medium and low risk customers. With this application, we discuss the advantages and disadvantages of the models used in terms of quality of classification. classificação discriminação ordinal modelos de regressão ordinais variáveis categóricas ordinais classification ordinal categorical variables ordinal discrimination ordinal regression models
14	Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis Nekooeimehr, Iman 29 June 2016 (has links) In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to some of the classes (majority classes), while only a few instances are from the other classes (minority classes). Conventional classifiers will strongly favor the majority class and ignore the minority instances. The imbalance problem can occur in both binary data classification and also in ordinal regression. Ordinal regression is a supervised approach for learning the ordinal relationship between classes. Extensive research has been performed for addressing imbalanced datasets for binary classification; however, current methods do not address within-class imbalance and between-class imbalance at the same time. Similarly, there has been very little research work on addressing imbalanced datasets for ordinal regression. Although current standard oversampling methods can be used to improve the dataset class distribution, they do not consider the ordinal relationship between the classes. The class imbalance problem is a big challenge in classification problems. Most of the clinical datasets are highly imbalanced, which can weaken the performance of classifiers significantly. In this research, the imbalanced dataset classification problem is also examined in the context of a clinical application, particularly pelvic organ prolapse diagnosis. Pelvic organ prolapse (POP) is a major health problem that affects between 30-50% of women in the U.S. Although clinical examination is currently used to diagnose POP, there is still little evidence on specific risk factors that are directly related to particular types of POP and their severity or stages (Stage 0-IV). Data from dynamic MRI related to the movement of pelvic organs has the potential to improve POP prediction but it is currently analyzed manually limiting its exploration and use to small datasets. Moreover, POP is a disorder with multiple stages that are ordinal and whose distribution is highly imbalanced. The main goal of this research is two-fold. The first goal is to design new oversampling methods for imbalanced datasets for both binary classification and ordinal regression. The second goal is to automatically track, segment, and classify the trajectory of multiple organs on dynamic MRI to quantitatively describe pelvic organ movement. The extracted image-based data along with the designed oversampling methods will be used to improve the diagnosis of POP. The proposed research consists of three major objectives: 1) to design a new oversampling technique for binary imbalanced dataset classification; 2) to design a novel oversampling technique for ordinal regression with imbalanced datasets; and 3) to design a two-stage method to automatically track and segment multiple pelvic organs on dynamic MRI for improving the prediction of multi-stage POP with imbalanced datasets. The proposed research aims to provide robust oversampling techniques and image processing models that can (1) effectively handle highly imbalanced datasets for both binary classification and ordinal regression, and (2) automatically track and segment multiple deformable structures for feature extraction from low contrast and nonhomogeneous images and classify them using the resulted trajectories. This research will set the foundation towards a computer-aided decision support system that can automatically extract and analyze image and clinical data to improve the prediction of disorders where the dataset is highly imbalanced through personalized and evidence-based assessment. Binary Classification Ordinal Regression Pelvic Organ Prolapse Object Tracking Trajectory Analysis Computer Sciences Industrial Engineering Medicine and Health Sciences
15	Variable Selection for High-Dimensional Data with Error Control Fu, Han 23 September 2022 (has links) No description available. Biostatistics Public Health Statistics Genetics Variable selection false discovery rate ordinal regression survival analysis cure fraction knockoff filter
16	Podmíněnosti spokojenosti se životem v Česku se zaměřením na geografické faktory / Determinants of life satisfaction in Czechia with the focus on geographical factors Procházka, Petr January 2015 (has links) The objective of this thesis is to analyse determinants of subjective well-being in Czechia and to compare them with other empirical evidence from Czechia and abroad. Main theoretical approaches include those emphasising "psychological" factors and those emphasising factors outside of the human personality. Data from the Public Opinion Research Centre of more than 2,000 respondents from Czechia of years 2013 and 2014 were analysed statistically. Measures of so-called global and local subjective well-being were dependent variables. Independent variables include "geographical" and demographic variables and other dummies. It was confirmed that people living in more populated buildings, with a lower space mobility, older, of a lower employment status or unemployed, lower education and left-wing oriented declare usually a lower results on the subjective well-being, too. Gender and income had variable effect on the subjective well-being. Theoretical assumptions were not confirmed considering the settlement size, mode of commuting and religion.
17	Development of Wastewater Collection Network Asset Database, Deterioration Models and Management Framework Younis, Rizwan January 2010 (has links) The dynamics around managing urban infrastructure are changing dramatically. Today???s infrastructure management challenges ??? in the wake of shrinking coffers and stricter stakeholders??? requirements ??? include finding better condition assessment tools and prediction models, and effective and intelligent use of hard-earn data to ensure the sustainability of urban infrastructure systems. Wastewater collection networks ??? an important and critical component of urban infrastructure ??? have been neglected, and as a result, municipalities in North America and other parts of the world have accrued significant liabilities and infrastructure deficits. To reduce cost of ownership, to cope with heighten accountability, and to provide reliable and sustainable service, these systems need to be managed in an effective and intelligent manner. The overall objective of this research is to present a new strategic management framework and related tools to support multi-perspective maintenance, rehabilitation and replacement (M, R&R) planning for wastewater collection networks. The principal objectives of this research include: (1) Developing a comprehensive wastewater collection network asset database consisting of high quality condition assessment data to support the work presented in this thesis, as well as, the future research in this area. (2) Proposing a framework and related system to aggregate heterogeneous data from municipal wastewater collection networks to develop better understanding of their historical and future performance. (3) Developing statistical models to understand the deterioration of wastewater pipelines. (4) To investigate how strategic management principles and theories can be applied to effectively manage wastewater collection networks, and propose a new management framework and related system. (5) Demonstrating the application of strategic management framework and economic principles along with the proposed deterioration model to develop long-term financial sustainability plans for wastewater collection networks. A relational database application, WatBAMS (Waterloo Buried Asset Management System), consisting of high quality data from the City of Niagara Falls wastewater collection system is developed. The wastewater pipelines??? inspections were completed using a relatively new Side Scanner and Evaluation Technology camera that has advantages over the traditional Closed Circuit Television cameras. Appropriate quality assurance and quality control procedures were developed and adopted to capture, store and analyze the condition assessment data. To aggregate heterogeneous data from municipal wastewater collection systems, a data integration framework based on data warehousing approach is proposed. A prototype application, BAMS (Buried Asset Management System), based on XML technologies and specifications shows implementation of the proposed framework. Using wastewater pipelines condition assessment data from the City of Niagara Falls wastewater collection network, the limitations of ordinary and binary logistic regression methodologies for deterioration modeling of wastewater pipelines are demonstrated. Two new empirical models based on ordinal regression modeling technique are proposed. A new multi-perspective ??? that is, operational/technical, social/political, regulatory, and finance ??? strategic management framework based on modified balanced-scorecard model is developed. The proposed framework is based on the findings of the first Canadian National Asset Management workshop held in Hamilton, Ontario in 2007. The application of balanced-scorecard model along with additional management tools, such as strategy maps, dashboard reports and business intelligence applications, is presented using data from the City of Niagara Falls. Using economic principles and example management scenarios, application of Monte Carlo simulation technique along with the proposed deterioration model is presented to forecast financial requirements for long-term M, R&R plans for wastewater collection networks. A myriad of asset management systems and frameworks were found for transportation infrastructure. However, to date few efforts have been concentrated on understanding the performance behaviour of wastewater collection systems, and developing effective and intelligent M, R&R strategies. Incomplete inventories, and scarcity and poor quality of existing datasets on wastewater collection systems were found to be critical and limiting issues in conducting research in this field. It was found that the existing deterioration models either violated model assumptions or assumptions could not be verified due to limited and questionable quality data. The degradation of Reinforced Concrete pipes was found to be affected by age, whereas, for Vitrified Clay pipes, the degradation was not age dependent. The results of financial simulation model show that the City of Niagara Falls can save millions of dollars, in the long-term, by following a pro-active M, R&R strategy. The work presented in this thesis provides an insight into how an effective and intelligent management system can be developed for wastewater collection networks. The proposed framework and related system will lead to the sustainability of wastewater collection networks and assist municipal public works departments to proactively manage their wastewater collection networks. Wastewater Collection Networks Deterioration Models Condition Assessment Database Ordinal Regression Cumulative Logit Model Continuation Ratio Model Data Integration Balanced Scorecard Management Framework Financial Sustainablity Model Monte Carlo Simulations
18	Exploring the Correlation Between Reading Ability and Mathematical Ability : KTH Master thesis report Sol, Richard, Rasch, Alexander January 2023 (has links) Reading and mathematics are two essential subjects for academic success and cognitive development. Several studies show a correlation between the reading ability and mathematical ability of pupils (Korpershoek et al., 2015; Ní Ríordáin & O’Donoghue, 2009; Reikerås, 2006; Walker et al., 2008). The didactical part of this thesis presents a study investigating a correlation between reading ability and mathematical ability among pupils in upper secondary schools in Sweden. This study collaborated with Lexplore AB to use machine learning and eye-tracking to measure reading ability. Mathematical ability was measured with Mathematics 1c grades and Stockholmsprovet, which is a diagnostic mathematics test. Although no correlation was found, there are several insights about selection and measures following the result that may improve future studies on the subject. This thesis finds that the result could have been affected by a biased selection of the participants. This thesis also suggests that the measure through machine learning and eye-tracking used in the study may not fully capture the concept of reading ability as defined in previous studies. The technological aspect of this thesis focuses on modifying and improving the model used to calculate users’ reading ability scores. As the model’s estimation tends to plateau after the fifth year of compulsory school, the study aims to maintain the same level of progression observed before this point. Previous research indicates that silent reading, being unconstrained by vocalization, is faster than reading aloud. To address this progression flattening, a grid search algorithm was employed to adjust hyperparameters and assign appropriate weight to silent and aloud reading. The findings emphasize that reading aloud should be prioritized in the weighted average and the corresponding hyperparameters adjusted accordingly. Furthermore, gathering more data for older pupils can improve the machine learning model by accounting for individual reading strategies. Introducing different word complexity factors can also enhance the model’s performance. / Läsning och matematik är två avgörande ämnen för akademisk framgång och kognitiv utveckling. Flera studier visar på ett samband mellan elevers läsförmåga och matematiska förmåga (Korpershoek et al., 2015; Ní Ríordáin & O’Donoghue, 2009; Reikerås, 2006; Walker et al., 2008). Den didaktiska delen av denna rapport presenterar en studie som undersöker sambandet mellan läsförmåga och matematisk förmåga hos elever på gymnasiet i Sverige. Studien samarbetade med Lexplore AB för att använda maskininlärning och ögonspårning för att mäta läsförmåga. Matematisk förmåga mättes genom matematikbetyg och Stockholms provet, som är ett diagnostiskt matematiktest. Trotsatt inget samband hittades uppges insikter om urvalet och åtgärder som kan förbättra framtida studier i ämnet. Rapporten konstaterar att resultatet kan ha påverkats avett sned vridet urval av deltagare. Dessutom föreslår rapporten att mätningen genom maskininlärning och ögonspårning som användes i studien kanske inte helt fångar upp begreppet läsförmåga som används i tidigare studier. Teknikdelen av denna rapport fokuserar på att modifiera och förbättra modellen som används för att beräkna användarnas läsförmågepoäng. Eftersom modellens uppskattning tenderar att avplattas efter femte året i grundskola, syftar studien till att bibehålla samma nivå av progression som observerats före denna punkt. Tidigare forskning indikerar att tyst läsning, som inte begränsas av att uttala orden, är snabbare än högläsning. För att adressera denna avplattning av progression användes en rutnätssöknings-algoritm för att justera hyperparametrar och tilldela rätt viktning åt tyst läsning. Resultaten betonar att högläsning bör prioriteras i viktade medelvärdet och att motsvarande justeringar av hyperparametrar bör implementeras. Dessutom kan insamling av mer data för äldre elever förbättra maskininlärningsmodellen genom att ta hänsyn till individuella lässtrategier. Införandet av olika faktorer för textkomplexitet kan också förbättra modellens prestanda. Reading ability Mathematical ability Model optimization Eye-tracking Machine-learning models Reading Fluency Reading comprehension Formative assessment Ordinal Regression Spearman’s correlation coefficient Grid search. Läsförmåga Matematisk förmåga Modelloptimering Ögonspårning Maskininlärningsmodeller Läsflyt Läsförståelse Formativ bedömning Ordinal regression Spearmans korrelationskoefficient Rutnätssökning. Engineering and Technology Teknik och teknologier
19	Penalized mixed-effects ordinal response models for high-dimensional genomic data in twins and families Gentry, Amanda E. 01 January 2018 (has links) The Brisbane Longitudinal Twin Study (BLTS) was being conducted in Australia and was funded by the US National Institute on Drug Abuse (NIDA). Adolescent twins were sampled as a part of this study and surveyed about their substance use as part of the Pathways to Cannabis Use, Abuse and Dependence project. The methods developed in this dissertation were designed for the purpose of analyzing a subset of the Pathways data that includes demographics, cannabis use metrics, personality measures, and imputed genotypes (SNPs) for 493 complete twin pairs (986 subjects.) The primary goal was to determine what combination of SNPs and additional covariates may predict cannabis use, measured on an ordinal scale as: “never tried,” “used moderately,” or “used frequently”. To conduct this analysis, we extended the ordinal Generalized Monotone Incremental Forward Stagewise (GMIFS) method for mixed models. This extension includes allowance for a unpenalized set of covariates to be coerced into the model as well as flexibility for user-specified correlation patterns between twins in a family. The proposed methods are applicable to high-dimensional (genomic or otherwise) data with ordinal response and specific, known covariance structure within clusters. ordinal regression penalization mixed models twin modeling cannabis use GWAS Applied Statistics Biostatistics Categorical Data Analysis Medical Genetics Other Applied Mathematics Other Public Health Personality and Social Contexts Psychiatric and Mental Health Statistical Models Substance Abuse and Addiction
20	Development of Wastewater Collection Network Asset Database, Deterioration Models and Management Framework Younis, Rizwan January 2010 (has links) The dynamics around managing urban infrastructure are changing dramatically. Today’s infrastructure management challenges – in the wake of shrinking coffers and stricter stakeholders’ requirements – include finding better condition assessment tools and prediction models, and effective and intelligent use of hard-earn data to ensure the sustainability of urban infrastructure systems. Wastewater collection networks – an important and critical component of urban infrastructure – have been neglected, and as a result, municipalities in North America and other parts of the world have accrued significant liabilities and infrastructure deficits. To reduce cost of ownership, to cope with heighten accountability, and to provide reliable and sustainable service, these systems need to be managed in an effective and intelligent manner. The overall objective of this research is to present a new strategic management framework and related tools to support multi-perspective maintenance, rehabilitation and replacement (M, R&R) planning for wastewater collection networks. The principal objectives of this research include: (1) Developing a comprehensive wastewater collection network asset database consisting of high quality condition assessment data to support the work presented in this thesis, as well as, the future research in this area. (2) Proposing a framework and related system to aggregate heterogeneous data from municipal wastewater collection networks to develop better understanding of their historical and future performance. (3) Developing statistical models to understand the deterioration of wastewater pipelines. (4) To investigate how strategic management principles and theories can be applied to effectively manage wastewater collection networks, and propose a new management framework and related system. (5) Demonstrating the application of strategic management framework and economic principles along with the proposed deterioration model to develop long-term financial sustainability plans for wastewater collection networks. A relational database application, WatBAMS (Waterloo Buried Asset Management System), consisting of high quality data from the City of Niagara Falls wastewater collection system is developed. The wastewater pipelines’ inspections were completed using a relatively new Side Scanner and Evaluation Technology camera that has advantages over the traditional Closed Circuit Television cameras. Appropriate quality assurance and quality control procedures were developed and adopted to capture, store and analyze the condition assessment data. To aggregate heterogeneous data from municipal wastewater collection systems, a data integration framework based on data warehousing approach is proposed. A prototype application, BAMS (Buried Asset Management System), based on XML technologies and specifications shows implementation of the proposed framework. Using wastewater pipelines condition assessment data from the City of Niagara Falls wastewater collection network, the limitations of ordinary and binary logistic regression methodologies for deterioration modeling of wastewater pipelines are demonstrated. Two new empirical models based on ordinal regression modeling technique are proposed. A new multi-perspective – that is, operational/technical, social/political, regulatory, and finance – strategic management framework based on modified balanced-scorecard model is developed. The proposed framework is based on the findings of the first Canadian National Asset Management workshop held in Hamilton, Ontario in 2007. The application of balanced-scorecard model along with additional management tools, such as strategy maps, dashboard reports and business intelligence applications, is presented using data from the City of Niagara Falls. Using economic principles and example management scenarios, application of Monte Carlo simulation technique along with the proposed deterioration model is presented to forecast financial requirements for long-term M, R&R plans for wastewater collection networks. A myriad of asset management systems and frameworks were found for transportation infrastructure. However, to date few efforts have been concentrated on understanding the performance behaviour of wastewater collection systems, and developing effective and intelligent M, R&R strategies. Incomplete inventories, and scarcity and poor quality of existing datasets on wastewater collection systems were found to be critical and limiting issues in conducting research in this field. It was found that the existing deterioration models either violated model assumptions or assumptions could not be verified due to limited and questionable quality data. The degradation of Reinforced Concrete pipes was found to be affected by age, whereas, for Vitrified Clay pipes, the degradation was not age dependent. The results of financial simulation model show that the City of Niagara Falls can save millions of dollars, in the long-term, by following a pro-active M, R&R strategy. The work presented in this thesis provides an insight into how an effective and intelligent management system can be developed for wastewater collection networks. The proposed framework and related system will lead to the sustainability of wastewater collection networks and assist municipal public works departments to proactively manage their wastewater collection networks. Wastewater Collection Networks Deterioration Models Condition Assessment Database Ordinal Regression Cumulative Logit Model Continuation Ratio Model Data Integration Balanced Scorecard Management Framework Financial Sustainablity Model Monte Carlo Simulations Civil Engineering

Search results