Global ETD Search

1	Imputation of Missing Data with Application to Commodity Futures / Imputation av saknad data med tillämpning på råvaruterminer Östlund, Simon January 2016 (has links) In recent years additional requirements have been imposed on ﬁnancial institutions, including Central Counterparty clearing houses (CCPs), as an attempt to assess quantitative measures of their exposure to diﬀerent types of risk. One of these requirements results in a need to perform stress tests to check the resilience in case of a stressed market/crisis. However, ﬁnancial markets develop over time and this leads to a situation where some instruments traded today are not present at the chosen date because they were introduced after the considered historical event. Based on current routines, the main goal of this thesis is to provide a more sophisticated method to impute (ﬁll in) historical missing data as a preparatory work in the context of stress testing. The models considered in this paper include two methods currently regarded as state-of-the-art techniques, based on maximum likelihood estimation (MLE) and multiple imputation (MI), together with a third alternative approach involving copulas. The diﬀerent methods are applied on historical return data of commodity futures contracts from the Nordic energy market. By using conventional error metrics, and out-of-sample log-likelihood, the conclusion is that it is very hard (in general) to distinguish the performance of each method, or draw any conclusion about how good the models are in comparison to each other. Even if the Student’s t-distribution seems (in general) to be a more adequate assumption regarding the data compared to the normal distribution, all the models are showing quite poor performance. However, by analysing the conditional distributions more thoroughly, and evaluating how well each model performs by extracting certain quantile values, the performance of each method is increased signiﬁcantly. By comparing the diﬀerent models (when imputing more extreme quantile values) it can be concluded that all methods produce satisfying results, even if the g-copula and t-copula models seems to be more robust than the respective linear models. / På senare år har ytterligare krav införts för ﬁnansiella institut (t.ex. Clearinghus) i ett försök att fastställa kvantitativa mått på deras exponering mot olika typer av risker. Ett av dessa krav innebär att utföra stresstester för att uppskatta motståndskraften under stressade marknader/kriser. Dock förändras ﬁnansiella marknader över tiden vilket leder till att vissa instrument som handlas idag inte fanns under den dåvarande perioden, eftersom de introducerades vid ett senare tillfälle. Baserat på nuvarande rutiner så är målet med detta arbete att tillhandahålla en mer soﬁstikerad metod för imputation (ifyllnad) av historisk data som ett förberedande arbete i utförandet av stresstester. I denna rapport implementeras två modeller som betraktas som de bäst presterande metoderna idag, baserade på maximum likelihood estimering (MLE) och multiple imputation (MI), samt en tredje alternativ metod som involverar copulas. Modellerna tillämpas på historisk data förterminskontrakt från den nordiska energimarkanden. Genom att använda väl etablerade mätmetoder för att skatta noggrannheten förrespektive modell, är det väldigt svårt (generellt) att särskilja prestandan för varje metod, eller att dra några slutsatser om hur bra varje modell är i jämförelse med varandra. även om Students t-fördelningen verkar (generellt) vara ett mer adekvat antagande rörande datan i jämförelse med normalfördelningen, så visar alla modeller ganska svag prestanda vid en första anblick. Däremot, genom att undersöka de betingade fördelningarna mer noggrant, för att se hur väl varje modell presterar genom att extrahera speciﬁka kvantilvärden, kan varje metod förbättras markant. Genom att jämföra de olika modellerna (vid imputering av mer extrema kvantilvärden) kan slutsatsen dras att alla metoder producerar tillfredställande resultat, även om g-copula och t-copula modellerna verkar vara mer robusta än de motsvarande linjära modellerna. Missing Data Bayesian Statistics Conditional Distribution Robust Regression MCMC Copulas. Saknad Data Bayesiansk Statistik Betingad Sannolikhet Robust Regression MCMC Copulas. Probability Theory and Statistics Sannolikhetsteori och statistik
2	Aprendizado semi-supervisionado para o tratamento de incerteza na rotulação de dados de química medicinal / Semi supervised learning for uncertainty on medicinal chemistry labelling Souza, João Carlos Silva de 09 March 2017 (has links) Nos últimos 30 anos, a área de aprendizagem de máquina desenvolveu-se de forma comparável com a Física no início do século XX. Esse avanço tornou possível a resolução de problemas do mundo real que anteriormente não poderiam ser solucionados por máquinas, devido à dificuldade de modelos puramente estatísticos ajustarem-se de forma satisfatória aos dados de treinamento. Dentre tais avanços, pode-se citar a utilização de técnicas de aprendizagem de máquina na área de Química Medicinal, envolvendo métodos de análise, representação e predição de informação molecular por meio de recursos computacionais. Os dados utilizados no contexto biológico possuem algumas características particulares que podem influenciar no resultado de sua análise. Dentre estas, pode-se citar a complexidade das informações moleculares, o desbalanceamento das classes envolvidas e a existência de dados incompletos ou rotulados de forma incerta. Tais adversidades podem prejudicar o processo de identificação de compostos candidatos a novos fármacos, se não forem tratadas de forma adequada. Neste trabalho, foi abordada uma técnica de aprendizagem de máquina semi-supervisionada capaz de reduzir o impacto causado pelo problema da incerteza na rotulação dos dados, aplicando um método para estimar rótulos mais confiáveis para os compostos químicos existentes no conjunto de treinamento. Na tentativa de evitar os efeitos causados pelo desbalanceamento dos dados, foi incorporada ao processo de estimação de rótulos uma abordagem sensível ao custo, com o objetivo de evitar o viés em benefício da classe majoritária. Após o tratamento do problema da incerteza na rotulação, classificadores baseados em Máquinas de Aprendizado Extremo foram construídos, almejando boa capacidade de aproximação em um tempo de processamento reduzido em relação a outras abordagens de classificação comumente aplicadas. Por fim, o desempenho dos classificadores construídos foi avaliado por meio de análises dos resultados obtidos, confrontando o cenário com os dados originais e outros com as novas rotulações obtidas durante o processo de estimação semi-supervisionado / In the last 30 years, the area of machine learning has developed in a way comparable to Physics in the early twentieth century. This breakthrough has made it possible to solve real-world problems that previously could not be solved by machines because of the difficulty of purely statistical models to fit satisfactorily with training data. Among these advances, one can cite the use of machine learning techniques in the area of Medicinal Chemistry, involving methods for analysing, representing and predicting molecular information through computational resources. The data used in the biological context have some particular characteristics that can influence the result of its analysis. These include the complexity of molecular information, the imbalance of the classes involved, and the existence of incomplete or uncertainly labeled data. If they are not properly treated, such adversities may affect the process of identifying candidate compounds for new drugs. In this work, a semi-supervised machine learning technique was considered to reduce the impact caused by the problem of uncertainty in the data labeling, by applying a method to estimate more reliable labels for the chemical compounds in the training set. In an attempt to reduce the effects caused by data imbalance, a cost-sensitive approach was incorporated to the label estimation process, in order to avoid bias in favor of the majority class. After addressing the uncertainty problem in labeling, classifiers based on Extreme Learning Machines were constructed, aiming for good approximation ability in a reduced processing time in relation to other commonly applied classification approaches. Finally, the performance of the classifiers constructed was evaluated by analyzing the results obtained, comparing the scenario with the original data and others with the new labeling obtained by the semi-supervised estimation process Aprendizado semi-supervisionado Expectation and Maximization Extreme Learning Machines Máquinas de Aprendizado Extremo Maximização da Esperança Medicinal Chemistry Química farmacêutica Semi-supervised learning Tratamento de incerteza Uncertainty handling
3	Aprendizado semi-supervisionado para o tratamento de incerteza na rotulação de dados de química medicinal / Semi supervised learning for uncertainty on medicinal chemistry labelling João Carlos Silva de Souza 09 March 2017 (has links) Nos últimos 30 anos, a área de aprendizagem de máquina desenvolveu-se de forma comparável com a Física no início do século XX. Esse avanço tornou possível a resolução de problemas do mundo real que anteriormente não poderiam ser solucionados por máquinas, devido à dificuldade de modelos puramente estatísticos ajustarem-se de forma satisfatória aos dados de treinamento. Dentre tais avanços, pode-se citar a utilização de técnicas de aprendizagem de máquina na área de Química Medicinal, envolvendo métodos de análise, representação e predição de informação molecular por meio de recursos computacionais. Os dados utilizados no contexto biológico possuem algumas características particulares que podem influenciar no resultado de sua análise. Dentre estas, pode-se citar a complexidade das informações moleculares, o desbalanceamento das classes envolvidas e a existência de dados incompletos ou rotulados de forma incerta. Tais adversidades podem prejudicar o processo de identificação de compostos candidatos a novos fármacos, se não forem tratadas de forma adequada. Neste trabalho, foi abordada uma técnica de aprendizagem de máquina semi-supervisionada capaz de reduzir o impacto causado pelo problema da incerteza na rotulação dos dados, aplicando um método para estimar rótulos mais confiáveis para os compostos químicos existentes no conjunto de treinamento. Na tentativa de evitar os efeitos causados pelo desbalanceamento dos dados, foi incorporada ao processo de estimação de rótulos uma abordagem sensível ao custo, com o objetivo de evitar o viés em benefício da classe majoritária. Após o tratamento do problema da incerteza na rotulação, classificadores baseados em Máquinas de Aprendizado Extremo foram construídos, almejando boa capacidade de aproximação em um tempo de processamento reduzido em relação a outras abordagens de classificação comumente aplicadas. Por fim, o desempenho dos classificadores construídos foi avaliado por meio de análises dos resultados obtidos, confrontando o cenário com os dados originais e outros com as novas rotulações obtidas durante o processo de estimação semi-supervisionado / In the last 30 years, the area of machine learning has developed in a way comparable to Physics in the early twentieth century. This breakthrough has made it possible to solve real-world problems that previously could not be solved by machines because of the difficulty of purely statistical models to fit satisfactorily with training data. Among these advances, one can cite the use of machine learning techniques in the area of Medicinal Chemistry, involving methods for analysing, representing and predicting molecular information through computational resources. The data used in the biological context have some particular characteristics that can influence the result of its analysis. These include the complexity of molecular information, the imbalance of the classes involved, and the existence of incomplete or uncertainly labeled data. If they are not properly treated, such adversities may affect the process of identifying candidate compounds for new drugs. In this work, a semi-supervised machine learning technique was considered to reduce the impact caused by the problem of uncertainty in the data labeling, by applying a method to estimate more reliable labels for the chemical compounds in the training set. In an attempt to reduce the effects caused by data imbalance, a cost-sensitive approach was incorporated to the label estimation process, in order to avoid bias in favor of the majority class. After addressing the uncertainty problem in labeling, classifiers based on Extreme Learning Machines were constructed, aiming for good approximation ability in a reduced processing time in relation to other commonly applied classification approaches. Finally, the performance of the classifiers constructed was evaluated by analyzing the results obtained, comparing the scenario with the original data and others with the new labeling obtained by the semi-supervised estimation process Aprendizado semi-supervisionado Máquinas de Aprendizado Extremo Maximização da Esperança Química farmacêutica Tratamento de incerteza Expectation and Maximization Extreme Learning Machines Medicinal Chemistry Semi-supervised learning Uncertainty handling

1

Page generated in 0.158 seconds