Global ETD Search

191	A nonlinear appearance model for age progression Bukar, Ali M., Ugail, Hassan 15 October 2017 (has links) No / Recently, automatic age progression has gained popularity due to its nu-merous applications. Among these is the search for missing people, in the UK alone up to 300,000 people are reported missing every year. Although many algorithms have been proposed, most of the methods are affected by image noise, illumination variations, and most importantly facial expres-sions. To this end we propose to build an age progression framework that utilizes image de-noising and expression normalizing capabilities of kernel principal component analysis (Kernel PCA). Here, Kernel PCA a nonlinear form of PCA that explores higher order correlations between input varia-bles, is used to build a model that captures the shape and texture variations of the human face. The extracted facial features are then used to perform age progression via a regression procedure. To evaluate the performance of the framework, rigorous tests are conducted on the FGNET ageing data-base. Furthermore, the proposed algorithm is used to progress images of Mary Boyle; a six-year-old that went missing over 39 years ago, she is considered Ireland’s youngest missing person. The algorithm presented in this paper could potentially aid, among other applications, the search for missing people worldwide. Age synthesis Kernel appearance model Linear regression Kernel PCA Kernel preimage Mary Boyle Age progression
192	Model robust regression: combining parametric, nonparametric, and semiparametric methods Mays, James Edward January 1995 (has links) In obtaining a regression fit to a set of data, ordinary least squares regression depends directly on the parametric model formulated by the researcher. If this model is incorrect, a least squares analysis may be misleading. Alternatively, nonparametric regression (kernel or local polynomial regression, for example) has no dependence on an underlying parametric model, but instead depends entirely on the distances between regressor coordinates and the prediction point of interest. This procedure avoids the necessity of a reliable model, but in using no information from the researcher, may fit to irregular patterns in the data. The proper combination of these two regression procedures can overcome their respective problems. Considered is the situation where the researcher has an idea of which model should explain the behavior of the data, but this model is not adequate throughout the entire range of the data. An extension of partial linear regression and two methods of model robust regression are developed and compared in this context. These methods involve parametric fits to the data and nonparametric fits to either the data or residuals. The two fits are then combined in the most efficient proportions via a mixing parameter. Performance is based on bias and variance considerations. / Ph. D. / incomplete_metadata model misspecification local linear regression bandwidth mixing LD5655.V856 1995.M397
193	Breaking the barrier? Women’s career in a male dominated profession: A quantitative study of the Swedish Armed Forces Wennman, Marica January 2024 (has links) Gender-segregated labor markets remain a barrier to economic equality, significantly con- tributing to the persistent income disparities between men and women. While extensive literature has documented the prevalent wage gaps, it often attributes these disparities to educational attainment, career tenure, and familial obligations. This thesis focuses on the Swedish Armed Forces, a predominantly male-dominated organization, providing unique insights into organizational structures not extensively documented in current lit- erature. Using individual-level data from Statistical Sweden, a linear regression analysis investigates the gender wage gap, followed by a quantile regression to further explore gender variations across the income distribution. The results reveal a significant gender wage gap, although it has decreased over time and can be explained by individual char- acteristics. Persistent wage disparities in the labor market can often be attributed to the ongoing gender segregation in certain professions, where women, as minorities, tend to earn less. This uneven distribution where men predominantly occupy higher-ranked and higher-paid positions, exacerbates the income inequality. This structural imbalance not only reflects existing societal norms but also highlights the economic impact of occupa- tional segregation, which continues to disadvantage women. Gender Wage Gap Swedish Armed Forces Linear Regression Quantile Regression Income Distribution Glass ceiling Economics Nationalekonomi
194	A Sequential Modeling Approach to Explain Complex Processes and Systems Bae, Eric 12 August 2024 (has links) The ability to predict accurately the critical quality characteristics of aircraft engines is essential for modeling the degradation of engine performance over time. The acceptable margins for error grow smaller with each new generation of engines. This paper focuses on turbine gas temperature (TGT). The goal is to improve the first principles predictions through the incorporation of the pure thermodynamics, as well as available information from the engine health monitoring (EHM) data and appropriate maintenance records. The first step in the approach is to develop the proper thermodynamics model to explain and to predict the observed TGTs. The resulting residuals provide the fundamental information on degradation. The current engineering models are ad hoc adaptations of the underlying thermodynamics not properly tuned by actual data. Interestingly, pure thermodynamics model uses only two variables: atmospheric temperature and a critical pressure ratio. The resulting predictions of TGT are at least similar, and sometimes superior to these ad hoc models. The next steps recognize that there are multiple sources of variability, some nested within others. Examples include version to version of the engine, engine to engine within version, route to route across versions and engines, maintenance to maintenance cycles within engine, and flight segment to flight segment within maintenance cycle. The EHM data provide an opportunity to explain the various sources of variability through appropriate regression models. Different EHM variables explain different contributions to the variability in the residuals, which provides fundamental insights as to the causes of the degradation over time. The resulting combination of the pure thermodynamics model with proper modeling based on the EHM data yield significantly better predictions of the observed TGT, allowing analysts to see the impact of the causes of the degradation much more clearly. / Doctor of Philosophy / AEM is major civilian aircraft gas turbine engine manufacturer, serving different airliners and airlines. However, one of its newest models has had performance issues; the engines degraded faster than their in-house model had anticipated, leading to more frequent maintenance and causing significant financial losses to the company. The key objectives of our research project are to produce a model that has higher predictive capabilities than AEM's in-house predictive model (DTGT), and develop a model selection algorithm that allows for direct comparisons among models of vastly different architecture. There are three major components to our research: 1) interdisciplinary studies merging the theory of thermodynamics and regression, 2) the sequential modeling, and 3) the modified Mallows's Cp. We propose a layered sequential approach to the regression modeling, where one regression model is followed by another regression on the residuals of the previous model. We also propose the modified Mallows's Cp, a modification of the Mallows's Cp, as a viable model selection criterion. Our results demonstrated that the sequential approach both outperformed the AEM's in-house model and was found to be more useful than the traditional multiple linear regression. Our results also demonstrated that the modified Mallows's Cp prefer smaller number of parameters than other standard model selection criterion without sacrificing predictive capabilities of its models. Ensemble methods Gas turbine engines Linear and non-linear regression Mallows's Cp Thermodynamics Variance components
195	New Methods for Learning from Heterogeneous and Strategic Agents Divya, Padmanabhan January 2017 (has links) (PDF) 1 Introduction In this doctoral thesis, we address several representative problems that arise in the context of learning from multiple heterogeneous agents. These problems are relevant to many modern applications such as crowdsourcing and internet advertising. In scenarios such as crowdsourcing, there is a planner who is interested in learning a task and a set of noisy agents provide the training data for this learning task. Any learning algorithm making use of the data provided by these noisy agents must account for their noise levels. The noise levels of the agents are unknown to the planner, leading to a non-trivial difficulty. Further, the agents are heterogeneous as they differ in terms of their noise levels. A key challenge in such settings is to learn the noise levels of the agents while simultaneously learning the underlying model. Another challenge arises when the agents are strategic. For example, when the agents are required to perform a task, they could be strategic on the efforts they put in. As another example, when required to report their costs incurred towards performing the task, the agents could be strategic and may not report the costs truthfully. In general, the performance of the learning algorithms could be severely affected if the information elicited from the agents is incorrect. We address the above challenges that arise in the following representative learning problems. Multi-label Classification from Heterogeneous Noisy Agents Multi-label classification is a well-known supervised machine learning problem where each instance is associated with multiple classes. Since several labels can be assigned to a single instance, one of the key challenges in this problem is to learn the correlations between the classes. We first assume labels from a perfect source and propose a novel topic model called Multi-Label Presence-Absence Latent Dirichlet Allocation (ML-PA-LDA). In the current day scenario, a natural source for procuring the training dataset is through mining user-generated content or directly through users in a crowdsourcing platform. In the more practical scenario of crowdsourcing, an additional challenge arises as the labels of the training instances are provided by noisy, heterogeneous crowd-workers with unknown qualities. With this as the motivation, we further adapt our topic model to the scenario where the labels are provided by multiple noisy sources and refer to this model as ML-PA-LDA-MNS (ML-PA-LDA with Multiple Noisy Sources). With experiments on standard datasets, we show that the proposed models achieve superior performance over existing methods. Active Linear Regression with Heterogeneous, Noisy and Strategic Agents In this work, we study the problem of training a linear regression model by procuring labels from multiple noisy agents or crowd annotators, under a budget constraint. We propose a Bayesian model for linear regression from multiple noisy sources and use variational inference for parameter estimation. When labels are sought from agents, it is important to minimize the number of labels procured as every call to an agent incurs a cost. Towards this, we adopt an active learning approach. In this specific context, we prove the equivalence of well-studied criteria of active learning such as entropy minimization and expected error reduction. For the purpose of annotator selection in active learning, we observe a useful connection with the multi-armed bandit framework. Due to the nature of the distribution of the rewards on the arms, we resort to the Robust Upper Confidence Bound (UCB) scheme with truncated empirical mean estimator to solve the annotator selection problem. This yields provable guarantees on the regret. We apply our model to the scenario where annotators are strategic and design suitable incentives to induce them to put in their best efforts. Ranking with Heterogeneous Strategic Agents We look at the problem where a planner must rank multiple strategic agents, a problem that has many applications including sponsored search auctions (SSA). Stochastic multi-armed bandit (MAB) mechanisms have been used in the literature to solve this problem. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of (T 2=3), where T is the number of time steps. This happens because these mechanisms address the worst case scenario where the means of the agents’ stochastic rewards are separated by a very small amount that depends on T . We however take a detour and allow the planner to indicate the resolution, , with which the agents must be distinguished. This immediately leads us to introduce the notion of -Regret. We propose a dominant strategy incentive compatible (DSIC) and individually rational (IR), deterministic MAB mechanism, based on ideas from the Upper Confidence Bound (UCB) family of MAB algorithms. The proposed mechanism - UCB achieves a -regret of O(log T ). We first establish the results for single slot SSA and then non-trivially extend the results to the case of multi-slot SSA. Crowdsourcing Heterogeneous Noisy Agents Bayesian Learning Linear Regression Multi-label Classification Learning Algorithms Bayesian Linear Regression Heterogeneous Strategic Agents Multi-Armed Bandit Mechanism Multi-armed Bandit Problems Multiple Noisy Sources Computer Science
196	Modeling Melodic Accents in Jazz Solos / Modellering av melodiska accenter i jazzsolon Berrios Salas, Misael January 2023 (has links) This thesis looks at how accurately one can model accents in jazz solos, more specifically the sound level. Further understanding the structure of jazz solos can give a way of pedagogically presenting differences within music styles and even between performers. Some studies have tried to model perceived accents in different music styles. In other words, model how listeners perceive some tones as somehow accentuated and more important than others. Other studies have looked at how the sound level correlates to other attributes of the tone. But to our knowledge, no other studies have been made modeling actual accents within jazz solos, nor have other studies had such a big amount of training data. The training data used is a set of 456 solos from the Weimar Jazz Database. This is a database containing tone data and metadata from monophonic solos performed with multiple instruments. The features used for the training algorithms are features obtained from the software Director Musices created at the Royal Institute of Technology in Sweden; features obtained from the software "melfeature" created at the University of Music Franz Liszt Weimar in Germany; and features built upon tone data or solo metadata from the Weimar Jazz Database. A comparison between these is made. Three learning algorithms are used, Multiple Linear Regression (MLR), Support Vector Regression (SVR), and eXtreme Gradient Boosting (XGBoost). The first two are simpler regression models while the last is an award-winning tree boosting algorithm. The tests resulted in eXtreme Gradient Boosting (XGBoost) having the highest accuracy when combining all the available features minus some features that were removed since they did not improve the accuracy. The accuracy was around 27% with a high standard deviation. This tells that there was quite some difference when predicting the different solos, some had an accuracy of about 67% while others did not predict one tone correctly in the entire solo. But as a general model, the accuracy is too low for actual practical use. Either the methods were not the optimal ones or jazz solos differ too much to find a general pattern. / Detta examensarbete undersöker hur väl man kan modellera accenter i jazz-solos, mer specifikt ljudnivån. En bredare förståelse för strukturen i jazzsolos kan ge ett sätt att pedagogiskt presentera skillnaderna mellan olika musikstilar och även mellan olika artister. Andra studier har försökt modellera uppfattade accenter inom olika musik-stilar. Det vill säga, modellera hur åhörare upplever vissa toner som accentuerade och viktigare än andra. Andra studier har undersökt hur ljudnivån är korrelerad till andra attribut hos tonen. Men såvitt vi vet, så finns det inga andra studier som modellerar faktiska accenter inom jazzsolos, eller som haft samma stora mängd träningsdata. Träningsdatan som använts är ett set av 456 solos tagna från Weimar Jazz Database. Databasen innehåller data på toner och metadata från monofoniska solos genomförda med olika instrument. Särdragen som använts för tränings-algoritmerna är särdrag erhållna från mjukvaran Director Musices skapad på Kungliga Tekniska Högskolan i Sverige; särdrag erhållna från mjukvaran ”melfeature” skapad på University of Music Franz Liszt Weimar i Tyskland; och särdrag skapade utifrån datat i Weimar Jazz Database. En jämförelse mellan dessa har också gjorts. Tre inlärningsalgoritmer har använts, Multiple Linear Regression (MLR), Support Vector Regression (SVR), och eXtreme Gradient Boosting (XGBoost). De första två är enklare regressionsalgoritmer, medan den senare är en prisbelönt trädförstärkningsalgoritm. Testen resulterade i att eXtreme Gradient Boosting (XGBoost) skapade en modell med högst noggrannhet givet alla tillgängliga särdrag som träningsdata minus vissa särdrag som tagits bort då de inte förbättrar noggrannheten. Den erhållna noggrannheten låg på runt 27% med en hög standardavvikelse. Detta pekar på att det finns stora skillnader mellan att förutsäga ljudnivån mellan de olika solin. Vissa solin gav en noggrannhet på runt 67% medan andra erhöll inte en endaste ljudnivå korrekt i hela solot. Men som en generell modell är noggrannheten för låg för att användas i praktiken. Antingen är de valda metoderna inte de bästa, eller så är jazzsolin för olika för att hitta ett generellt mönster som går att förutsäga. Accents Jazz Solo Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamic Accenter Jazz Solos Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamisk Computer and Information Sciences Data- och informationsvetenskap
197	The relationship between volatility of price multiples and volatility of stock prices : A study of the Swedish market from 2003 to 2012 Yang, Yue, Gonta, Viorica January 2013 (has links) The purpose of our study was to examine the relationship between the volatility of price multiples and the volatility of stock prices in the Swedish market from 2003 to 2012. Our focus was on the price-to-earnings ratio and the price-to-book ratio. Some previous studies showed a link between the price multiples and the volatility of stock prices, this made us question whether there should be a link between the volatility of the price multiples and the volatility of the stock prices. The importance of this subject is accentuated by the financial crisis, as we provide investors with information regarding the movements of price multiples and stock prices. Moreover, we test if the volatility of the price multiples can be used to create a prediction model for the volatility of stock prices. Also we fill the gap in the previous researches as there is no previous literature about this topic. We conducted a quantitative research using statistical tests, such as the correlation test and the linear regression test. For our data sample we chose the Sweden Datastream index. We first calculated the volatility using the GARCH model and then continued with our statistical tests. The results of our tests showed that there is a relationship between the volatility of the price multiples and the volatility of the stock prices in the Swedish market in the past ten years. Our findings show that the correlation coefficients vary across industries and over time in both strength and direction. The second part of our tests is concerned with the linear regression tests, mainly calculating the coefficient of determination. Our results show that the volatility of the price multiples do explain changes in the volatility of stock prices. Thus, the volatility of the P/E ratio and the volatility of the P/B ratio can be used in creating a prediction model for the volatility of stock prices. Nevertheless, we also find that this model is best suited when the economic situation is unstable (i.e. crisis, bad economic outlook) as both the correlation coefficient and the coefficient of determination had the highest values in the last five years, with the peak in 2008. correlation volatility GARCH price multiples price-to-earnings ratio price-to-book ratio stock prices linear regression r square
198	[en] ADJUSTING LOAD SERIES BY THE CALENDAR AND TEMPERATURE EFFECTS / [pt] AJUSTE DAS SÉRIES DE CARGA DE ENERGIA ELÉTRICA INFLUENCIADAS PELOS OFENSORES CALENDÁRIO E TEMPERATURA THIAGO GOMES DE ARAUJO 08 January 2015 (has links) [pt] O objetivo do presente trabalho é a geração de uma série mensal de carga elétrica livre das variações de calendário e de temperatura. Para tal, foram comparadas duas abordagens, uma totalmente empírica e outra híbrida com métodos empíricos e modelagens de regressão dinâmica, para identificar a mais adequada para a retirada desses ofensores. Os dados utilizados são provenientes de observações diárias de cada um dos quatro subsistemas que integram o Sistema Interligado Nacional (SIN), porém a ideia é produzir séries mensais do SIN e não apenas de cada um dos subsistemas. A série trimestral do PIB foi utilizada para decidir qual abordagem melhor ajustou os dados de Carga. A série mensal de carga ajustada do SIN será utilizada para subsidiar decisões, de compra e venda de energia nos leilões, das empresas distribuidoras de energia elétrica. / [en] This thesis proposes a method to generate monthly load series free of variations coming from two sources: calendar and temperature. Two approaches were considered, one totally empirical and another one called hybrid, as it use empirical procedure to remove the calendar effect and a dynamic regression type of model to remove the temperature effects. The data set used comes found to daily observations from each one of the four subsystems that form the SIN (Brazilian Integrated Grid). However the final task is to obtain a unique monthly series for the SIN and not only the four subsystems monthly series. The quarterly PIB series was used to check the performance of the two proposed methods. Such adjusted series are quite important tools to hold on the decision of acquisitions and dailes of energy in the energy audits. [pt] SERIES TEMPORAIS [en] TIME SERIES [pt] ENERGIA ELETRICA [en] ELECTRIC ENERGY [pt] REGRESSAO DINAMICA [en] DYNAMIC REGRESSION [pt] REGRESSAO LINEAR [en] LINEAR REGRESSION
199	Modélisation du management des risques industriels et de la responsabilité sociale des entreprises : Cas des entreprises libanaises / Modeling the management of industrial risks and corporate social responsibility : The case of Lebanese companies Bou Nader, Raymond 18 December 2017 (has links) Notre thèse consiste à étudier la pratique actuelle de la RSE dans le contexte des compagnies libanaises à caractère industriel, et à examiner la relation entre les pratiques RSE d’une part et le management des risques d’autre part, en utilisant des techniques de statistiques inférentielles, des analyses factorielles exploratoires et des modèles de régression linéaire multiple. C’est dans ce dernier cas que la contribution principale de cette recherche a été réalisée. Ainsi, cette recherche a permis de percevoir la RSE comme étant plus qu’un simple outil de marketing et de relations publiques mais aussi un vrai outil influant le risque dans les entreprises. Notre recherche élargit la base de connaissances dans ce domaine dans le contexte libanais, en mettant l’accent sur le management et les pratiques de l’entreprise en terme de gestion du risque, afin de mieux gérer par la RSE les impacts sociaux, environnementaux, et communautaires de leurs activités. Les résultats de cette étude permettront aux chercheurs de créer une base théorique et empirique plus forte sur laquelle les recherches futures sur le sujet de la RSE et du management des risques par la RSE peuvent être développées. / The aim of our thesis is to study the current practice of CSR in the context of the Lebanese industrial companies and to examine the relationship between CSR practices and risk management, using statistical techniques as inferential tests, factor analysis and multiple linear regression models. It is in the latter that the main contribution of this research has been made. This research has made it possible to perceive CSR as more than just a marketing and public relations tool but also a real tool influencing risk in companies. Our research broadens the knowledge base in this field in the Lebanese context, focusing on the management and practices of the company in terms of risk management, in order to better manage the social, environmental, and community based activities by CSR. The results of this study will enable researchers to create a stronger theoretical and empirical basis on which future research on the subject of CSR and risk management through CSR can be developed. RSE Management du risque Statistiques inférentielles Regression linéaire multiple CSR Risk management Inferential statistics Multiple linear regression 658
200	Estimation of Regression Coefficients under a Truncated Covariate with Missing Values Reinhammar, Ragna January 2019 (has links) By means of a Monte Carlo study, this paper investigates the relative performance of Listwise Deletion, the EM-algorithm and the default algorithm in the MICE-package for R (PMM) in estimating regression coefficients under a left truncated covariate with missing values. The intention is to investigate whether the three frequently used missing data techniques are robust against left truncation when missing values are MCAR or MAR. The results suggest that no technique is superior overall in all combinations of factors studied. The EM-algorithm is unaffected by left truncation under MCAR but negatively affected by strong left truncation under MAR. Compared to the default MICE-algorithm, the performance of EM is more stable across distributions and combinations of sample size and missing rate. The default MICE-algorithm is improved by left truncation but is sensitive to missingness pattern and missing rate. Compared to Listwise Deletion, the EM-algorithm is less robust against left truncation when missing values are MAR. However, the decline in performance of the EM-algorithm is not large enough for the algorithm to be completely outperformed by Listwise Deletion, especially not when the missing rate is moderate. Listwise Deletion might be robust against left truncation but is inefficient. Key words: missing data handling linear regression truncated normal distribution EM-algorithm Listwise Deletion MICE Probability Theory and Statistics Sannolikhetsteori och statistik

Search results