Global ETD Search

201	Machine Learning Models in Fullerene/Metallofullerene Chromatography Studies Liu, Xiaoyang 08 August 2019 (has links) Machine learning methods are now extensively applied in various scientific research areas to make models. Unlike regular models, machine learning based models use a data-driven approach. Machine learning algorithms can learn knowledge that are hard to be recognized, from available data. The data-driven approaches enhance the role of algorithms and computers and then accelerate the computation using alternative views. In this thesis, we explore the possibility of applying machine learning models in the prediction of chromatographic retention behaviors. Chromatographic separation is a key technique for the discovery and analysis of fullerenes. In previous studies, differential equation models have achieved great success in predictions of chromatographic retentions. However, most of the differential equation models require experimental measurements or theoretical computations for many parameters, which are not easy to obtain. Fullerenes/metallofullerenes are rigid and spherical molecules with only carbon atoms, which makes the predictions of chromatographic retention behaviors as well as other properties much simpler than other flexible molecules that have more variations on conformations. In this thesis, I propose the polarizability of a fullerene molecule is able to be estimated directly from the structures. Structural motifs are used to simplify the model and the models with motifs provide satisfying predictions. The data set contains 31947 isomers and their polarizability data and is split into a training set with 90% data points and a complementary testing set. In addition, a second testing set of large fullerene isomers is also prepared and it is used to testing whether a model can be trained by small fullerenes and then gives ideal predictions on large fullerenes. / Machine learning models are capable to be applied in a wide range of areas, such as scientific research. In this thesis, machine learning models are applied to predict chromatography behaviors of fullerenes based on the molecular structures. Chromatography is a common technique for mixture separations, and the separation is because of the difference of interactions between molecules and a stationary phase. In real experiments, a mixture usually contains a large family of different compounds and it requires lots of work and resources to figure out the target compound. Therefore, models are extremely import for studies of chromatography. Traditional models are built based on physics rules, and involves several parameters. The physics parameters are measured by experiments or theoretically computed. However, both of them are time consuming and not easy to be conducted. For fullerenes, in my previous studies, it has been shown that the chromatography model can be simplified and only one parameter, polarizability, is required. A machine learning approach is introduced to enhance the model by predicting the molecular polarizabilities of fullerenes based on structures. The structure of a fullerene is represented by several local structures. Several types of machine learning models are built and tested on our data set and the result shows neural network gives the best predictions. Machine learning Neural Network Chromatography Fullerene Modeling Random Forest XGBoost Linear Regression SVM regression Nearest Neighbor
202	Three Essays on Econometric Modeling and Application: Health and Consumer Behaviors Kim, Namhoon 18 April 2018 (has links) In the three chapters of my dissertation, I analyze the individual behaviors including health (vaccination and preventive care) and consumer (financial literacy) behaviors and the corresponding interventions by nonlinear econometric modeling. In the first chapter, I suggest an appropriate econometric model that investigates the effect of paid sick leave on workers' decision to receive the seasonal flu vaccination. For this investigation, I apply a Bayesian non-linear structural regression model with one-outcome and two-endogenous equations. The results of my estimation indicate that having paid sick leave affects workers' vaccination decisions differently based on their income levels. Low-income workers are willing to be vaccinated because they perceive the high cost of claiming paid sick leave. However, high-income workers are willing to be vaccinated because paid sick leave reduces the cost of vaccination for seasonal flu. In the second chapter, I suggest new econometric regression models that investigate the effect of "Don't Know" or "Refuse" (DK/RF) responses on parameter identification. I estimate the effect of group characteristics and financial education on the level of young respondents' objective financial knowledge and find the actual effects and biases by my suggested models. This study examines six questions about personal finance and selects covariates in the 2015 National Financial Capability Study (NFCS). Because these questions include DK/RF responses, a simple regression model that does not consider DK/RF responses could lead to misleading conclusions, such as gender/income difference and educational effectiveness in schools. In the last chapter, I investigate the effect of three health-related interventions including a doctor's recommendation, information about human papillomavirus (HPV), and HPV vaccination, on the misuse of cervical cancer screening including too-early screening, unnecessary HPV test, annual Pap test, and no Pap smear that are not recommended for women younger than 30 years. I examine the National Health Interview Survey conducted in 2015 and applies binary and multinomial logistic regression models. From the estimation result, I observe that doctor's recommendation plays a significant role in increasing the probability of receiving cervical cancer screening while it induces the too-early screening, unnecessary HPV testing, and overuse of Pap smears. / Ph. D. Non-Linear Regression Vaccination Paid Sick Leave Financial Knowledge Financial Behaviors HPV Papanicolaou (Pap) Smear
203	Model robust regression: combining parametric, nonparametric, and semiparametric methods Mays, James Edward January 1995 (has links) In obtaining a regression fit to a set of data, ordinary least squares regression depends directly on the parametric model formulated by the researcher. If this model is incorrect, a least squares analysis may be misleading. Alternatively, nonparametric regression (kernel or local polynomial regression, for example) has no dependence on an underlying parametric model, but instead depends entirely on the distances between regressor coordinates and the prediction point of interest. This procedure avoids the necessity of a reliable model, but in using no information from the researcher, may fit to irregular patterns in the data. The proper combination of these two regression procedures can overcome their respective problems. Considered is the situation where the researcher has an idea of which model should explain the behavior of the data, but this model is not adequate throughout the entire range of the data. An extension of partial linear regression and two methods of model robust regression are developed and compared in this context. These methods involve parametric fits to the data and nonparametric fits to either the data or residuals. The two fits are then combined in the most efficient proportions via a mixing parameter. Performance is based on bias and variance considerations. / Ph. D. / incomplete_metadata model misspecification local linear regression bandwidth mixing LD5655.V856 1995.M397
204	Breaking the barrier? Women’s career in a male dominated profession: A quantitative study of the Swedish Armed Forces Wennman, Marica January 2024 (has links) Gender-segregated labor markets remain a barrier to economic equality, significantly con- tributing to the persistent income disparities between men and women. While extensive literature has documented the prevalent wage gaps, it often attributes these disparities to educational attainment, career tenure, and familial obligations. This thesis focuses on the Swedish Armed Forces, a predominantly male-dominated organization, providing unique insights into organizational structures not extensively documented in current lit- erature. Using individual-level data from Statistical Sweden, a linear regression analysis investigates the gender wage gap, followed by a quantile regression to further explore gender variations across the income distribution. The results reveal a significant gender wage gap, although it has decreased over time and can be explained by individual char- acteristics. Persistent wage disparities in the labor market can often be attributed to the ongoing gender segregation in certain professions, where women, as minorities, tend to earn less. This uneven distribution where men predominantly occupy higher-ranked and higher-paid positions, exacerbates the income inequality. This structural imbalance not only reflects existing societal norms but also highlights the economic impact of occupa- tional segregation, which continues to disadvantage women. Gender Wage Gap Swedish Armed Forces Linear Regression Quantile Regression Income Distribution Glass ceiling Economics Nationalekonomi
205	New Methods for Learning from Heterogeneous and Strategic Agents Divya, Padmanabhan January 2017 (has links) (PDF) 1 Introduction In this doctoral thesis, we address several representative problems that arise in the context of learning from multiple heterogeneous agents. These problems are relevant to many modern applications such as crowdsourcing and internet advertising. In scenarios such as crowdsourcing, there is a planner who is interested in learning a task and a set of noisy agents provide the training data for this learning task. Any learning algorithm making use of the data provided by these noisy agents must account for their noise levels. The noise levels of the agents are unknown to the planner, leading to a non-trivial difficulty. Further, the agents are heterogeneous as they differ in terms of their noise levels. A key challenge in such settings is to learn the noise levels of the agents while simultaneously learning the underlying model. Another challenge arises when the agents are strategic. For example, when the agents are required to perform a task, they could be strategic on the efforts they put in. As another example, when required to report their costs incurred towards performing the task, the agents could be strategic and may not report the costs truthfully. In general, the performance of the learning algorithms could be severely affected if the information elicited from the agents is incorrect. We address the above challenges that arise in the following representative learning problems. Multi-label Classification from Heterogeneous Noisy Agents Multi-label classification is a well-known supervised machine learning problem where each instance is associated with multiple classes. Since several labels can be assigned to a single instance, one of the key challenges in this problem is to learn the correlations between the classes. We first assume labels from a perfect source and propose a novel topic model called Multi-Label Presence-Absence Latent Dirichlet Allocation (ML-PA-LDA). In the current day scenario, a natural source for procuring the training dataset is through mining user-generated content or directly through users in a crowdsourcing platform. In the more practical scenario of crowdsourcing, an additional challenge arises as the labels of the training instances are provided by noisy, heterogeneous crowd-workers with unknown qualities. With this as the motivation, we further adapt our topic model to the scenario where the labels are provided by multiple noisy sources and refer to this model as ML-PA-LDA-MNS (ML-PA-LDA with Multiple Noisy Sources). With experiments on standard datasets, we show that the proposed models achieve superior performance over existing methods. Active Linear Regression with Heterogeneous, Noisy and Strategic Agents In this work, we study the problem of training a linear regression model by procuring labels from multiple noisy agents or crowd annotators, under a budget constraint. We propose a Bayesian model for linear regression from multiple noisy sources and use variational inference for parameter estimation. When labels are sought from agents, it is important to minimize the number of labels procured as every call to an agent incurs a cost. Towards this, we adopt an active learning approach. In this specific context, we prove the equivalence of well-studied criteria of active learning such as entropy minimization and expected error reduction. For the purpose of annotator selection in active learning, we observe a useful connection with the multi-armed bandit framework. Due to the nature of the distribution of the rewards on the arms, we resort to the Robust Upper Confidence Bound (UCB) scheme with truncated empirical mean estimator to solve the annotator selection problem. This yields provable guarantees on the regret. We apply our model to the scenario where annotators are strategic and design suitable incentives to induce them to put in their best efforts. Ranking with Heterogeneous Strategic Agents We look at the problem where a planner must rank multiple strategic agents, a problem that has many applications including sponsored search auctions (SSA). Stochastic multi-armed bandit (MAB) mechanisms have been used in the literature to solve this problem. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of (T 2=3), where T is the number of time steps. This happens because these mechanisms address the worst case scenario where the means of the agents’ stochastic rewards are separated by a very small amount that depends on T . We however take a detour and allow the planner to indicate the resolution, , with which the agents must be distinguished. This immediately leads us to introduce the notion of -Regret. We propose a dominant strategy incentive compatible (DSIC) and individually rational (IR), deterministic MAB mechanism, based on ideas from the Upper Confidence Bound (UCB) family of MAB algorithms. The proposed mechanism - UCB achieves a -regret of O(log T ). We first establish the results for single slot SSA and then non-trivially extend the results to the case of multi-slot SSA. Crowdsourcing Heterogeneous Noisy Agents Bayesian Learning Linear Regression Multi-label Classification Learning Algorithms Bayesian Linear Regression Heterogeneous Strategic Agents Multi-Armed Bandit Mechanism Multi-armed Bandit Problems Multiple Noisy Sources Computer Science
206	Modeling Melodic Accents in Jazz Solos / Modellering av melodiska accenter i jazzsolon Berrios Salas, Misael January 2023 (has links) This thesis looks at how accurately one can model accents in jazz solos, more specifically the sound level. Further understanding the structure of jazz solos can give a way of pedagogically presenting differences within music styles and even between performers. Some studies have tried to model perceived accents in different music styles. In other words, model how listeners perceive some tones as somehow accentuated and more important than others. Other studies have looked at how the sound level correlates to other attributes of the tone. But to our knowledge, no other studies have been made modeling actual accents within jazz solos, nor have other studies had such a big amount of training data. The training data used is a set of 456 solos from the Weimar Jazz Database. This is a database containing tone data and metadata from monophonic solos performed with multiple instruments. The features used for the training algorithms are features obtained from the software Director Musices created at the Royal Institute of Technology in Sweden; features obtained from the software "melfeature" created at the University of Music Franz Liszt Weimar in Germany; and features built upon tone data or solo metadata from the Weimar Jazz Database. A comparison between these is made. Three learning algorithms are used, Multiple Linear Regression (MLR), Support Vector Regression (SVR), and eXtreme Gradient Boosting (XGBoost). The first two are simpler regression models while the last is an award-winning tree boosting algorithm. The tests resulted in eXtreme Gradient Boosting (XGBoost) having the highest accuracy when combining all the available features minus some features that were removed since they did not improve the accuracy. The accuracy was around 27% with a high standard deviation. This tells that there was quite some difference when predicting the different solos, some had an accuracy of about 67% while others did not predict one tone correctly in the entire solo. But as a general model, the accuracy is too low for actual practical use. Either the methods were not the optimal ones or jazz solos differ too much to find a general pattern. / Detta examensarbete undersöker hur väl man kan modellera accenter i jazz-solos, mer specifikt ljudnivån. En bredare förståelse för strukturen i jazzsolos kan ge ett sätt att pedagogiskt presentera skillnaderna mellan olika musikstilar och även mellan olika artister. Andra studier har försökt modellera uppfattade accenter inom olika musik-stilar. Det vill säga, modellera hur åhörare upplever vissa toner som accentuerade och viktigare än andra. Andra studier har undersökt hur ljudnivån är korrelerad till andra attribut hos tonen. Men såvitt vi vet, så finns det inga andra studier som modellerar faktiska accenter inom jazzsolos, eller som haft samma stora mängd träningsdata. Träningsdatan som använts är ett set av 456 solos tagna från Weimar Jazz Database. Databasen innehåller data på toner och metadata från monofoniska solos genomförda med olika instrument. Särdragen som använts för tränings-algoritmerna är särdrag erhållna från mjukvaran Director Musices skapad på Kungliga Tekniska Högskolan i Sverige; särdrag erhållna från mjukvaran ”melfeature” skapad på University of Music Franz Liszt Weimar i Tyskland; och särdrag skapade utifrån datat i Weimar Jazz Database. En jämförelse mellan dessa har också gjorts. Tre inlärningsalgoritmer har använts, Multiple Linear Regression (MLR), Support Vector Regression (SVR), och eXtreme Gradient Boosting (XGBoost). De första två är enklare regressionsalgoritmer, medan den senare är en prisbelönt trädförstärkningsalgoritm. Testen resulterade i att eXtreme Gradient Boosting (XGBoost) skapade en modell med högst noggrannhet givet alla tillgängliga särdrag som träningsdata minus vissa särdrag som tagits bort då de inte förbättrar noggrannheten. Den erhållna noggrannheten låg på runt 27% med en hög standardavvikelse. Detta pekar på att det finns stora skillnader mellan att förutsäga ljudnivån mellan de olika solin. Vissa solin gav en noggrannhet på runt 67% medan andra erhöll inte en endaste ljudnivå korrekt i hela solot. Men som en generell modell är noggrannheten för låg för att användas i praktiken. Antingen är de valda metoderna inte de bästa, eller så är jazzsolin för olika för att hitta ett generellt mönster som går att förutsäga. Accents Jazz Solo Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamic Accenter Jazz Solos Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamisk Computer and Information Sciences Data- och informationsvetenskap
207	The relationship between volatility of price multiples and volatility of stock prices : A study of the Swedish market from 2003 to 2012 Yang, Yue, Gonta, Viorica January 2013 (has links) The purpose of our study was to examine the relationship between the volatility of price multiples and the volatility of stock prices in the Swedish market from 2003 to 2012. Our focus was on the price-to-earnings ratio and the price-to-book ratio. Some previous studies showed a link between the price multiples and the volatility of stock prices, this made us question whether there should be a link between the volatility of the price multiples and the volatility of the stock prices. The importance of this subject is accentuated by the financial crisis, as we provide investors with information regarding the movements of price multiples and stock prices. Moreover, we test if the volatility of the price multiples can be used to create a prediction model for the volatility of stock prices. Also we fill the gap in the previous researches as there is no previous literature about this topic. We conducted a quantitative research using statistical tests, such as the correlation test and the linear regression test. For our data sample we chose the Sweden Datastream index. We first calculated the volatility using the GARCH model and then continued with our statistical tests. The results of our tests showed that there is a relationship between the volatility of the price multiples and the volatility of the stock prices in the Swedish market in the past ten years. Our findings show that the correlation coefficients vary across industries and over time in both strength and direction. The second part of our tests is concerned with the linear regression tests, mainly calculating the coefficient of determination. Our results show that the volatility of the price multiples do explain changes in the volatility of stock prices. Thus, the volatility of the P/E ratio and the volatility of the P/B ratio can be used in creating a prediction model for the volatility of stock prices. Nevertheless, we also find that this model is best suited when the economic situation is unstable (i.e. crisis, bad economic outlook) as both the correlation coefficient and the coefficient of determination had the highest values in the last five years, with the peak in 2008. correlation volatility GARCH price multiples price-to-earnings ratio price-to-book ratio stock prices linear regression r square
208	[en] ADJUSTING LOAD SERIES BY THE CALENDAR AND TEMPERATURE EFFECTS / [pt] AJUSTE DAS SÉRIES DE CARGA DE ENERGIA ELÉTRICA INFLUENCIADAS PELOS OFENSORES CALENDÁRIO E TEMPERATURA THIAGO GOMES DE ARAUJO 08 January 2015 (has links) [pt] O objetivo do presente trabalho é a geração de uma série mensal de carga elétrica livre das variações de calendário e de temperatura. Para tal, foram comparadas duas abordagens, uma totalmente empírica e outra híbrida com métodos empíricos e modelagens de regressão dinâmica, para identificar a mais adequada para a retirada desses ofensores. Os dados utilizados são provenientes de observações diárias de cada um dos quatro subsistemas que integram o Sistema Interligado Nacional (SIN), porém a ideia é produzir séries mensais do SIN e não apenas de cada um dos subsistemas. A série trimestral do PIB foi utilizada para decidir qual abordagem melhor ajustou os dados de Carga. A série mensal de carga ajustada do SIN será utilizada para subsidiar decisões, de compra e venda de energia nos leilões, das empresas distribuidoras de energia elétrica. / [en] This thesis proposes a method to generate monthly load series free of variations coming from two sources: calendar and temperature. Two approaches were considered, one totally empirical and another one called hybrid, as it use empirical procedure to remove the calendar effect and a dynamic regression type of model to remove the temperature effects. The data set used comes found to daily observations from each one of the four subsystems that form the SIN (Brazilian Integrated Grid). However the final task is to obtain a unique monthly series for the SIN and not only the four subsystems monthly series. The quarterly PIB series was used to check the performance of the two proposed methods. Such adjusted series are quite important tools to hold on the decision of acquisitions and dailes of energy in the energy audits. [pt] SERIES TEMPORAIS [en] TIME SERIES [pt] ENERGIA ELETRICA [en] ELECTRIC ENERGY [pt] REGRESSAO DINAMICA [en] DYNAMIC REGRESSION [pt] REGRESSAO LINEAR [en] LINEAR REGRESSION
209	Modélisation du management des risques industriels et de la responsabilité sociale des entreprises : Cas des entreprises libanaises / Modeling the management of industrial risks and corporate social responsibility : The case of Lebanese companies Bou Nader, Raymond 18 December 2017 (has links) Notre thèse consiste à étudier la pratique actuelle de la RSE dans le contexte des compagnies libanaises à caractère industriel, et à examiner la relation entre les pratiques RSE d’une part et le management des risques d’autre part, en utilisant des techniques de statistiques inférentielles, des analyses factorielles exploratoires et des modèles de régression linéaire multiple. C’est dans ce dernier cas que la contribution principale de cette recherche a été réalisée. Ainsi, cette recherche a permis de percevoir la RSE comme étant plus qu’un simple outil de marketing et de relations publiques mais aussi un vrai outil influant le risque dans les entreprises. Notre recherche élargit la base de connaissances dans ce domaine dans le contexte libanais, en mettant l’accent sur le management et les pratiques de l’entreprise en terme de gestion du risque, afin de mieux gérer par la RSE les impacts sociaux, environnementaux, et communautaires de leurs activités. Les résultats de cette étude permettront aux chercheurs de créer une base théorique et empirique plus forte sur laquelle les recherches futures sur le sujet de la RSE et du management des risques par la RSE peuvent être développées. / The aim of our thesis is to study the current practice of CSR in the context of the Lebanese industrial companies and to examine the relationship between CSR practices and risk management, using statistical techniques as inferential tests, factor analysis and multiple linear regression models. It is in the latter that the main contribution of this research has been made. This research has made it possible to perceive CSR as more than just a marketing and public relations tool but also a real tool influencing risk in companies. Our research broadens the knowledge base in this field in the Lebanese context, focusing on the management and practices of the company in terms of risk management, in order to better manage the social, environmental, and community based activities by CSR. The results of this study will enable researchers to create a stronger theoretical and empirical basis on which future research on the subject of CSR and risk management through CSR can be developed. RSE Management du risque Statistiques inférentielles Regression linéaire multiple CSR Risk management Inferential statistics Multiple linear regression 658
210	Estimation of Regression Coefficients under a Truncated Covariate with Missing Values Reinhammar, Ragna January 2019 (has links) By means of a Monte Carlo study, this paper investigates the relative performance of Listwise Deletion, the EM-algorithm and the default algorithm in the MICE-package for R (PMM) in estimating regression coefficients under a left truncated covariate with missing values. The intention is to investigate whether the three frequently used missing data techniques are robust against left truncation when missing values are MCAR or MAR. The results suggest that no technique is superior overall in all combinations of factors studied. The EM-algorithm is unaffected by left truncation under MCAR but negatively affected by strong left truncation under MAR. Compared to the default MICE-algorithm, the performance of EM is more stable across distributions and combinations of sample size and missing rate. The default MICE-algorithm is improved by left truncation but is sensitive to missingness pattern and missing rate. Compared to Listwise Deletion, the EM-algorithm is less robust against left truncation when missing values are MAR. However, the decline in performance of the EM-algorithm is not large enough for the algorithm to be completely outperformed by Listwise Deletion, especially not when the missing rate is moderate. Listwise Deletion might be robust against left truncation but is inefficient. Key words: missing data handling linear regression truncated normal distribution EM-algorithm Listwise Deletion MICE Probability Theory and Statistics Sannolikhetsteori och statistik

Search results