Global ETD Search

131	Semiparametric Structure Guided by Prior Knowledge with Applications in Economics / Durch Vorwissen gesteuerte semiparametrische Struktur mit wirtschaftswissenschaftlichen Anwendungen Scholz, Michael 08 April 2011 (has links) No description available. Konsumnachfrage Engel-Kurven Semiparametrische Ökonometrie Wild Bootstrap Vorhersage von Aktienrenditen Kreuzvalidierung Erzeugte Regressoren Vorwissen Bias-Reduktion Dimensionsreduktion Consumer Demand Engel Curves Semiparametric Econometrics Wild Bootstrap Prediction of Stock Returns Cross-Validation Generated Regressors Prior Knowledge Bias-Reduction Dimension-Reduction
132	MODELING HETEROTACHY IN PHYLOGENETICS Zhou, Yan 04 1900 (has links) Il a été démontré que l’hétérotachie, variation du taux de substitutions au cours du temps et entre les sites, est un phénomène fréquent au sein de données réelles. Échouer à modéliser l’hétérotachie peut potentiellement causer des artéfacts phylogénétiques. Actuellement, plusieurs modèles traitent l’hétérotachie : le modèle à mélange des longueurs de branche (MLB) ainsi que diverses formes du modèle covarion. Dans ce projet, notre but est de trouver un modèle qui prenne efficacement en compte les signaux hétérotaches présents dans les données, et ainsi améliorer l’inférence phylogénétique. Pour parvenir à nos fins, deux études ont été réalisées. Dans la première, nous comparons le modèle MLB avec le modèle covarion et le modèle homogène grâce aux test AIC et BIC, ainsi que par validation croisée. A partir de nos résultats, nous pouvons conclure que le modèle MLB n’est pas nécessaire pour les sites dont les longueurs de branche diffèrent sur l’ensemble de l’arbre, car, dans les données réelles, le signaux hétérotaches qui interfèrent avec l’inférence phylogénétique sont généralement concentrés dans une zone limitée de l’arbre. Dans la seconde étude, nous relaxons l’hypothèse que le modèle covarion est homogène entre les sites, et développons un modèle à mélanges basé sur un processus de Dirichlet. Afin d’évaluer différents modèles hétérogènes, nous définissons plusieurs tests de non-conformité par échantillonnage postérieur prédictif pour étudier divers aspects de l’évolution moléculaire à partir de cartographies stochastiques. Ces tests montrent que le modèle à mélanges covarion utilisé avec une loi gamma est capable de refléter adéquatement les variations de substitutions tant à l’intérieur d’un site qu’entre les sites. Notre recherche permet de décrire de façon détaillée l’hétérotachie dans des données réelles et donne des pistes à suivre pour de futurs modèles hétérotaches. Les tests de non conformité par échantillonnage postérieur prédictif fournissent des outils de diagnostic pour évaluer les modèles en détails. De plus, nos deux études révèlent la non spécificité des modèles hétérogènes et, en conséquence, la présence d’interactions entre différents modèles hétérogènes. Nos études suggèrent fortement que les données contiennent différents caractères hétérogènes qui devraient être pris en compte simultanément dans les analyses phylogénétiques. / Heterotachy, substitution rate variation across sites and time, has shown to be a frequent phenomenon in the real data. Failure to model heterotachy could potentially cause phylogenetic artefacts. Currently, there are several models to handle heterotachy, the mixture branch length model (MBL) and several variant forms of the covarion model. In this project, our objective is to find a model that efficiently handles heterotachous signals in the data, and thereby improves phylogenetic inference. In order to achieve our goal, two individual studies were conducted. In the first study, we make comparisons among the MBL, covarion and homotachous models using AIC, BIC and cross validation. Based on our results, we conclude that the MBL model, in which sites have different branch lengths along the entire tree, is an over-parameterized model. Real data indicate that the heterotachous signals which interfere with phylogenetic inference are generally limited to a small area of the tree. In the second study, we relax the assumption of the homogeneity of the covarion parameters over sites, and develop a mixture covarion model using a Dirichlet process. In order to evaluate different heterogeneous models, we design several posterior predictive discrepancy tests to study different aspects of molecular evolution using stochastic mappings. The posterior predictive discrepancy tests demonstrate that the covarion mixture +Γ model is able to adequately model the substitution variation within and among sites. Our research permits a detailed view of heterotachy in real datasets and gives directions for future heterotachous models. The posterior predictive discrepancy tests provide diagnostic tools to assess models in detail. Furthermore, both of our studies reveal the non-specificity of heterogeneous models. Our studies strongly suggest that different heterogeneous features in the data should be handled simultaneously. Heterotachy Hétérotachie covarion covarion MBL MLB posterior predictive postérieur prédictif non-specificity non-spécificité discrepancy non-conformité heterogeneity hétérogénéité AIC AIC BIC BIC cross validation validation croisée
133	高齡死亡模型與年金保險應用之研究 / A Study of Elderly Mortality Models and Their Applications in Annuity Insurance 陳怡萱, Chen, Yi Xuan Unknown Date (has links) 傳統上國人寄望養兒防老，但面臨少子化及壽命延長，家庭已無法獨力負擔照顧老年人的責任，必須仰賴個人（老年人自己）、國家及政府分擔人口老化造成的需求，這也是政府在過去二十年來積極投入更多資源，制訂與老年人有關的社會保險、福利及政策的原因。像是1995年開辦的全民健康保險提升了全民健康，其中老年人受惠尤多；2005年的勞工退休金條例、2008年的國民年金保險等，則是因應我國國民壽命延長的社會保險制度。對於未來費用的需求估算，需要依賴可靠的死亡率預測，但大多數預測並沒有將死亡率改善列入考量，勢必低估長壽風險的衝擊，影響個人的財務規劃、增加國家負債。有鑑於此，本文研究常用的死亡率模型，評估哪些適合用於描述高齡死亡率的變化，且能用於計算年金商品的定價。本文考量的模型大致分成兩類：關係模型(Relational Models)及隨機模型(Stochastic Models)，第一類包括常用於高齡的Gompertz、Coale-Kisker模型，以及Discount Sequence模型，第二類則有Lee-Carter及CBD等模型。模型比較的方式以長期預測和短期預測，選用交叉驗證的方式驗證死亡率模型的預測結果與觀察值之間的差異。研究結果顯示Discount Sequence、Lee-Carter、CBD隨機模型較能準確描述台灣、日本與美國等三個國家的死亡率特性；但這三個模型在年金險保費並沒有很明顯的訂價差異。另外，若用於短期預測、長期預測比較，又以Discount Sequence的預測結果優於Lee-Carter模型的預測。 / Traditionally in Asia, families played the main role in caring their own elderly (i.e., parents and grand-parents), but the declining fertility rates and longer life expectancy make it difficult for the families to take care of the elderly alone. The elderly themselves and the government need to share the burden caused by the aging population. In fact, most Taiwan’s major social policies in the past 20 years are targeting the elderly, such as National Health Insurance, Labor Pension Act and National Pension Insurance. Their planning and financial solvency rely on reliable mortality models and their projections for the elderly population. However, many mortality models do not take into account the mortality improvements and thus underestimate the cost. In this study, we look for elderly mortality models which can reflect the mortality improvements in recent years and use them to price the annuity products. Two types of mortality models are of interest: relational models and stochastic models. The first group includes the Gompertz model, Coale-Kisker model and Discount Sequence; the other group includes the Lee-Carter and CBD models. We utilize these mortality models to project future mortality rates in Taiwan, Japan and U.S., along with the block bootstrap and ARIMA for projection. The model comparison is based on cross-validation, and both short-term and long-term projections are considered. The results show that the Discount Sequence, Lee-Carter model and CBD model have the best model fits for mortality rates and, for the short-term and long-term forecasts, the Discount Sequence is better than the Lee-Carter model. 長壽風險高齡死亡模型死亡改善電腦模擬交叉驗證 Longevity Risk Elderly Mortality Models Mortality Improvement Simulation Cross Validation
134	The development of mass spectrometry-based methodologies for the high throughput quantitation of peptides in biological matrices Howard, James W. January 2018 (has links) The aim of this research was the development of mass spectrometry-based methodologies for the high-throughput quantitation of peptides in biological matrices. Glucagon and GLP-1, which are of interest as biomarkers and in the development of therapeutics, were chosen as model peptides. Immunoassays that are traditionally used to quantify these often perform poorly; therefore, necessitating the development of alternative methodologies. Application of mass spectrometry-based methodologies to these analytes has, however, been limited, primarily due to sensitivity challenges, but also due to analytical challenges associated with their endogenous nature and instability in biological matrices. Chapter 2 describes the development and qualification of the first liquid-chromatography coupled tandem mass spectrometry (LC-MS/MS) method for the quantitation of endogenous glucagon from human plasma. A novel 2D extraction procedure was developed to ensure robustness and sensitivity, whilst a novel surrogate matrix quantitation strategy took into account the endogenous nature of the analyte. A lower limit of quantitation (LLOQ) of 25 pg/mL was qualified, which was a considerable improvement over that previously reported in the literature (250 pg/mL) for a LC-MS/MS method. Clinical samples were cross-validated against a conventional radioimmunoassay (RIA), and similar pharmacokinetic (PK) profiles resulted, demonstrating that the methods were complementary. In Chapter 2 glucagon instability in biological matrix was noted. To characterise this further, in Chapter 3 in vitro glucagon metabolites were identified using high-resolution mass spectrometry (HRMS). Metabolites observed by others (glucagon19-29, glucagon3 29 and [pGlu]3glucagon3 29) in alternative matrices were identified, alongside novel metabolites (glucagon20-29 and glucagon21-29). Cross-interference of these metabolites in immunoassays may help to explain their poor performance, whilst knowledge of metabolism may also aid the development of future stabilisation strategies. The method developed in Chapter 2 was refined in Chapter 4 to improve sensitivity, robustness and throughput, and to add GLP-1 as a secondary analyte. The sensitivity achieved (glucagon: 15 pg/mL LLOQ, GLP-1: 25 pg/mL LLOQ) is the highest reported for both peptides for an extraction avoiding immunoenrichment. Specificity of endogenous glucagon quantitation was assured using a novel approach with a supercharging mobile phase additive to access a sensitive qualifier transition. A cross-validation against established immunoassays using physiological study samples demonstrated some similarities between the methods. Differences between the immunoassay results exemplified the need to develop alternative methodologies. The resulting LC-MS/MS method is considered a viable alternative to immunoassays, for the quantitation of endogenous glucagon, dosed glucagon and/or dosed GLP-1 in human plasma.
135	Model selection Hildebrand, Annelize 11 1900 (has links) In developing an understanding of real-world problems, researchers develop mathematical and statistical models. Various model selection methods exist which can be used to obtain a mathematical model that best describes the real-world situation in some or other sense. These methods aim to assess the merits of competing models by concentrating on a particular criterion. Each selection method is associated with its own criterion and is named accordingly. The better known ones include Akaike's Information Criterion, Mallows' Cp and cross-validation, to name a few. The value of the criterion is calculated for each model and the model corresponding to the minimum value of the criterion is then selected as the "best" model. / Mathematical Sciences / M. Sc. (Statistics) Model selection Discrepancy measures Criteria Akaike Information Criterion Mallows' Cp Cross-validation R-square Adjusted R-square Mean Square Error 519.5 Mallows' Cp Akaike Information Criterion Mathematical models -- Evaluation
136	Explorando caminhos de mínima informação em grafos para problemas de classificação supervisionada Hiraga, Alan Kazuo 05 May 2014 (has links) Made available in DSpace on 2016-06-02T19:06:12Z (GMT). No. of bitstreams: 1 5931.pdf: 2655791 bytes, checksum: 6eafe016c175143a8d55692b4681adfe (MD5) Previous issue date: 2014-05-05 / Financiadora de Estudos e Projetos / Classification is a very important step in pattern recognition, as it aims to categorize objects from a set of inherent features, through its labeling. This process can be supervised, when there is a sample set of labeled training classes, semi-supervised, when the number of labeled samples is limited or nearly inexistent, or unsupervised, where there are no labeled samples. This project proposes to explore minimum information paths in graphs for classification problems, through the definition of a supervised, non-parametric, graph-based classification method, by means of a contextual approach. This method proposes to construct a graph from a set of training samples, where the samples are represented by vertices and the edges are links between samples that belongs to a neighborhood system. From the graph construction, the method calculates the local observed Fisher information, a measurement based on the Potts model, for all vertices, identifying the amount of information that each sample has. Generally, different class vertices when connected by an edge, have a high information level. After that, it is necessary to weight the edges by means of a function that penalizes connecting vertices with high information. During this process, it is possible to identify and select high information vertices, which will be chosen to be prototype vertices, namely, the nodes that define the classes boundaries. After the definition, the method proposes that each prototype sample conquer the remaining samples by offering the shortest path in terms of information, so that when a sample is conquered it receives the label of the winning prototype, occurring the classification. To evaluate the proposed method, statistical methods to estimate the error rates, such as Hold-out, K-fold and Leave-One- Out Cross-Validation will be considered. The obtained results indicate that the method can be a viable alternative to the existing classification techniques. / A classificação é uma etapa muito importante em reconhecimento de padrões, pois ela tem o objetivo de categorizar objetos a partir de um conjunto de características inerentes a ele, atribuindo-lhe um rótulo. Esse processo de classificação pode ser supervisionado, quando existe um conjunto de amostras de treinamento rotuladas que representam satisfatoriamente as classes, semi-supervisionado, quando o conjunto de amostras é limitado ou quase inexistente, ou não-supervisionado, quando não existem amostras rotuladas. Este trabalho propõe explorar caminhos de mínima informação em grafos para problemas de classificação, por meio da criação de um método de classificação supervisionado, não paramétrico, baseado em grafos, seguindo uma abordagem contextual. Esse método propõe a construção de um grafo a partir do conjunto de amostras de treinamento, onde as amostras serão representadas pelos vértices e as arestas serão as ligações entre amostras pertencentes a uma relação de adjacência. A partir da construção do grafo o método faz o calculo da informação de Fisher Local Observada, uma medida baseada no modelo de Potts, para todos os vértices, identificando o grau de informação que cada um possui. Geralmente vértices de classes distintas quando conectados por uma aresta possuem alta informação (bordas). Feito o calculo da informação, é necessário ponderar as arestas por meio de uma função que penaliza a ligação de vértices com alta informação. Enquanto as arestas são ponderadas é possível identificar e selecionar vértices altamente informativos os quais serão escolhidos para serem vértices protótipos, ou seja, os vértices que definem a região de borda. Depois de ponderadas as arestas e definidos os protótipos, o método propõe que cada protótipo conquiste as amostras oferecendo o menor caminho até ele, de modo que quando uma amostra é conquistada ela receba o rótulo do protótipo que a conquistou, ocorrendo a classificação. Para avaliar o método serão utilizados métodos estatísticos para estimar as taxas de acertos, como K-fold, Hold-out e Leave-one-out Cross- Validation. Os resultados obtidos indicam que o método pode ser um uma alternativa viável as técnicas de classificação existentes. Reconhecimento de padrões Teoria dos grafos Campos aleatórios Informação de fisher Validação cruzada Classificação de padrões Pattern classification Graph theory Markov random field Fisher information and cross-validation
137	Video Recommendation Based on Object Detection Nyberg, Selma January 2018 (has links) In this thesis, various machine learning domains have been combined in order to build a video recommender system that is based on object detection. The work combines two extensively studied research fields, recommender systems and computer vision, that also are rapidly growing and popular techniques on commercial markets. To investigate the performance of the approach, three different content-based recommender systems have been implemented at Spotify, which are based on the following video features: object detections, titles and descriptions, and user preferences. These systems have then been evaluated and compared against each other together with their hybridized result. Two algorithms have been implemented, the prediction and the top-N algorithm, where the former is the more reliable source for evaluating the system's performance. The evaluation of the system shows that the overall performance scores for predicting values of the users' liked and disliked videos are in the range from about 40 % to 70 % for the prediction algorithm and from about 15 % to 70 % for the top-N algorithm. The approach based on object detection performs worse in comparison to the other approaches. Hence, there seems to be is a low correlation between the user preferences and the video contents in terms of object detection data. Therefore, this data is not very suitable for describing the content of videos and using it in the recommender system. However, the results of this study cannot be generalized to apply for other systems before the approach has been evaluated in other environments and for various data sets. Moreover, there are plenty of room for refinements and improvements to the system, as well as there are many interesting research areas for future work. Spotify Machine Learning Artificial Intelligence Recommender Systems Content-Based Filtering Collaborative Filtering Hybrid Filtering Deep Learning Image Recognition Object Detection Natural Language Processing Paragraph Vectors Doc2Vec TensorFlow Classification K-Nearest Neighbors Cross-Validation Computer and Information Sciences Data- och informationsvetenskap
138	Modelos de agrupamento e classificação para os bairros da cidade do Rio de Janeiro sob a ótica da Inteligência Computacional: Lógica Fuzzy, Máquinas de Vetores Suporte e Algoritmos Genéticos / Clustering and classification models for the neighborhoods of the city of Rio de Janeiro from the perspective of Computational Intelligence: Fuzzy Logic, Support Vector Machine and Genetic Algorithms Natalie Henriques Martins 19 June 2015 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / A partir de 2011, ocorreram e ainda ocorrerão eventos de grande repercussão para a cidade do Rio de Janeiro, como a conferência Rio+20 das Nações Unidas e eventos esportivos de grande importância mundial (Copa do Mundo de Futebol, Olimpíadas e Paraolimpíadas). Estes acontecimentos possibilitam a atração de recursos financeiros para a cidade, assim como a geração de empregos, melhorias de infraestrutura e valorização imobiliária, tanto territorial quanto predial. Ao optar por um imóvel residencial em determinado bairro, não se avalia apenas o imóvel, mas também as facilidades urbanas disponíveis na localidade. Neste contexto, foi possível definir uma interpretação qualitativa linguística inerente aos bairros da cidade do Rio de Janeiro, integrando-se três técnicas de Inteligência Computacional para a avaliação de benefícios: Lógica Fuzzy, Máquina de Vetores Suporte e Algoritmos Genéticos. A base de dados foi construída com informações da web e institutos governamentais, evidenciando o custo de imóveis residenciais, benefícios e fragilidades dos bairros da cidade. Implementou-se inicialmente a Lógica Fuzzy como um modelo não supervisionado de agrupamento através das Regras Elipsoidais pelo Princípio de Extensão com o uso da Distância de Mahalanobis, configurando-se de forma inferencial os grupos de designação linguística (Bom, Regular e Ruim) de acordo com doze características urbanas. A partir desta discriminação, foi tangível o uso da Máquina de Vetores Suporte integrado aos Algoritmos Genéticos como um método supervisionado, com o fim de buscar/selecionar o menor subconjunto das variáveis presentes no agrupamento que melhor classifique os bairros (Princípio da Parcimônia). A análise das taxas de erro possibilitou a escolha do melhor modelo de classificação com redução do espaço de variáveis, resultando em um subconjunto que contém informações sobre: IDH, quantidade de linhas de ônibus, instituições de ensino, valor m médio, espaços ao ar livre, locais de entretenimento e crimes. A modelagem que combinou as três técnicas de Inteligência Computacional hierarquizou os bairros do Rio de Janeiro com taxas de erros aceitáveis, colaborando na tomada de decisão para a compra e venda de imóveis residenciais. Quando se trata de transporte público na cidade em questão, foi possível perceber que a malha rodoviária ainda é a prioritária Reconhecimento de Padrões Inteligência Computacional Validação Cruzada k-fold Máquina de Vetores Suporte Algoritmos Genéticos Pattern Recognition Computational Intelligence k-fold Cross-Validation Fuzzy Logic by the Extension Principle Support Vector Machine Genetic Algorithms MATEMATICA DA COMPUTACAO
139	Funções de predição espacial de propriedades do solo / Spatial prediction functions of soil properties Rosa, Alessandro Samuel 27 January 2012 (has links) Conselho Nacional de Desenvolvimento Científico e Tecnológico / The possibility of mapping soil properties using soil spatial prediction functions (SSPFe) is a reality. But is it possible to SSPFe to estimate soil properties such as the particlesize distribution (psd) in a young, unstable and geologically complex geomorphologic surface? What would be considered a good performance in such situation and what alternatives do we have to improve it? With the present study I try to find answers to such questions. To do so I used a set of 339 soil samples from a small catchment of the hillslope areas of central Rio Grande do Sul. Multiple linear regression models were built using landsurface parameters (elevation, convergence index, stream power index). The SSPFe explained more than half of data variance. Such performance is similar to that of the conventional soil mapping approach. For some size-fractions the SSPFe performance can reach 70%. Largest uncertainties are observed in areas of larger geological heterogeneity. Therefore, significant improvements in the predictions can only be achieved if accurate geological data is made available. Meanwhile, SSPFe built on land-surface parameters are efficient in estimating the psd of the soils in regions of complex geology. However, there still are questions that I couldn t answer! Is soil mapping important to solve the main social and environmental issues of our time? What if our activities were subjected to a social control as in a direct democracy, would they be worthy of receiving any attention? / A possibilidade de mapear as propriedades dos solos através do uso de funções de predição espacial de solos (FPESe) é uma realidade. Mas seria possível construir FPESe para estimar propriedades como a distribuição do tamanho de partículas do solo (dtp) em um superfície geomorfológica jovem e instável, com elevada complexidade geológica e pedológica? O que seria considerado um bom desempenho nessas condições e que alternativas temos para melhorá-lo? Com esse trabalho tento encontrar respostas para essas questões. Para isso utilizei um conjunto de 339 amostras de solo de uma pequena bacia hidrográfica de encosta da região Central do RS. Modelos de regressão linear múltiplos foram construídos com atributos de terreno (elevação, índice de convergência, índice de potência de escoamento). As FPESe explicaram mais da metade da variância dos dados. Tal desempenho é semelhante àquele da abordagem tradicional de mapeamento de solos. Para algumas frações de tamanho o desempenho das FPESe pode chegar a 70%. As maiores incertezas ocorrem nas áreas de maior heterogeneidade geológica. Assim, melhorias significativas nas predições somente poderão ser alcançadas se dados geológicos acurados forem disponibilizados. Enquanto isso, FPESe construídas a partir de atributos de terreno são eficientes em estimar a dtp de solos de regiões com geologia complexa e elevada instabilidade. Mas restam dúvidas que não consegui resolver! O mapeamento de solos é importante para a resolução dos principais problemas sociais e ambientais do nosso tempo? E se nossas atividades estivessem submetidas ao controle da população como em uma democracia direta, seriam elas dignas de receber atenção?
140	Bank Customer Churn Prediction : A comparison between classification and evaluation methods Tandan, Isabelle, Goteman, Erika January 2020 (has links) This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base. The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages. For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results. machine learning cross-validation k-fold leave-one-out random forest decision trees k-nearest neighbor logistic regression supervised learning supervised statistical learning binary classification customer churn bank customer churn. Probability Theory and Statistics Sannolikhetsteori och statistik

Search results