Spelling suggestions: "subject:"autoregression"" "subject:"aregression""
1 |
Can Leaf Spectroscopy Predict Leaf and Forest Traits Along a Peruvian Tropical Forest Elevation Gradient?Doughty, Christopher E., Santos-Andrade, P. E., Goldsmith, G. R., Blonder, B., Shenkin, A., Bentley, L. P., Chavana-Bryant, C., Huaraca-Huasco, W., Díaz, S., Salinas, N., Enquist, B. J., Martin, R., Asner, G. P., Malhi, Y. 11 1900 (has links)
High-resolution spectroscopy can be used to measure leaf chemical and structural traits. Such leaf traits are often highly correlated to other traits, such as photosynthesis, through the leaf economics spectrum. We measured VNIR (visible-near infrared) leaf reflectance (400-1,075nm) of sunlit and shaded leaves in similar to 150 dominant species across ten, 1ha plots along a 3,300m elevation gradient in Peru (on 4,284 individual leaves). We used partial least squares (PLS) regression to compare leaf reflectance to chemical traits, such as nitrogen and phosphorus, structural traits, including leaf mass per area (LMA), branch wood density and leaf venation, and higher-level traits such as leaf photosynthetic capacity, leaf water repellency, and woody growth rates. Empirical models using leaf reflectance predicted leaf N and LMA (r(2)>30% and %RMSE<30%), weakly predicted leaf venation, photosynthesis, and branch density (r(2) between 10 and 35% and %RMSE between 10% and 65%), and did not predict leaf water repellency or woody growth rates (r(2)<5%). Prediction of higher-level traits such as photosynthesis and branch density is likely due to these traits correlations with LMA, a trait readily predicted with leaf spectroscopy.
|
2 |
Seleção de variáveis preditivas com base em índices de importância das variáveis e regressão PLS / Selecting the most relevant predictive variables based on variable importance indices and PLS regressionZimmer, Juliano January 2012 (has links)
A presente dissertação propõe métodos para seleção de variáveis preditivas com base em índices de importância das variáveis e regressão PLS (Partial Least Squares). Partindo-se de uma revisão da bibliografia sobre PLS e índices de importância das variáveis, sugere-se um método, denominado Eliminação Backward (EB), para seleção de variáveis a partir da eliminação sistemática de variáveis de acordo com a ordem definida por índices de importância das variáveis. Um novo índice de importância de variáveis, proposto com base nos parâmetros da regressão PLS, tem seu desempenho avaliado frente a outros índices reportados pela literatura. Duas variações do método EB são propostas e testadas através de simulação: (i) o método EBM (Eliminação backward por mínimos), que identifica o conjunto que maximiza o indicador de acurácia preditiva sem considerar o percentual de variáveis retidas, e (ii) o método EBDE (Eliminação backward por distância euclidiana), que seleciona o conjunto de variáveis responsável pela mínima distância euclidiana entre os pontos do perfil gerado pela eliminação das variáveis e um ponto ideal hipotético definido pelo usuário. A aplicação dos três métodos em quatro bancos de dados reais aponta o EBDE como recomendável, visto que retém, em média, apenas 13% das variáveis originais e eleva a acurácia de predição em 32% em relação à utilização de todas as variáveis. / This dissertation presents new methods for predictive variable selection based on variable importance indices and PLS regression. The novel method, namely Backward Elimination (BE), selects the most important variables by eliminating process variables according to their importance described by the variable importance indices. A new variable importance index is proposed, and compared to previous indices for that purpose. We then offer two modifications on the BE method: (i) the EBM method, which selects the subset of variables yielding the maximum predictive accuracy (i.e., the minimum residual index), and (ii) the EBDE, which selects the subset leading to the minimum Euclidian distance between the points generated by variable removal and a hypothetical ideal point defined by the user. When applied to four manufacturing data sets, the recommended method, EBDE, retains average 13% of the original variables and increases the prediction accuracy in average 32% compared to using all the variables.
|
3 |
Seleção de variáveis preditivas com base em índices de importância das variáveis e regressão PLS / Selecting the most relevant predictive variables based on variable importance indices and PLS regressionZimmer, Juliano January 2012 (has links)
A presente dissertação propõe métodos para seleção de variáveis preditivas com base em índices de importância das variáveis e regressão PLS (Partial Least Squares). Partindo-se de uma revisão da bibliografia sobre PLS e índices de importância das variáveis, sugere-se um método, denominado Eliminação Backward (EB), para seleção de variáveis a partir da eliminação sistemática de variáveis de acordo com a ordem definida por índices de importância das variáveis. Um novo índice de importância de variáveis, proposto com base nos parâmetros da regressão PLS, tem seu desempenho avaliado frente a outros índices reportados pela literatura. Duas variações do método EB são propostas e testadas através de simulação: (i) o método EBM (Eliminação backward por mínimos), que identifica o conjunto que maximiza o indicador de acurácia preditiva sem considerar o percentual de variáveis retidas, e (ii) o método EBDE (Eliminação backward por distância euclidiana), que seleciona o conjunto de variáveis responsável pela mínima distância euclidiana entre os pontos do perfil gerado pela eliminação das variáveis e um ponto ideal hipotético definido pelo usuário. A aplicação dos três métodos em quatro bancos de dados reais aponta o EBDE como recomendável, visto que retém, em média, apenas 13% das variáveis originais e eleva a acurácia de predição em 32% em relação à utilização de todas as variáveis. / This dissertation presents new methods for predictive variable selection based on variable importance indices and PLS regression. The novel method, namely Backward Elimination (BE), selects the most important variables by eliminating process variables according to their importance described by the variable importance indices. A new variable importance index is proposed, and compared to previous indices for that purpose. We then offer two modifications on the BE method: (i) the EBM method, which selects the subset of variables yielding the maximum predictive accuracy (i.e., the minimum residual index), and (ii) the EBDE, which selects the subset leading to the minimum Euclidian distance between the points generated by variable removal and a hypothetical ideal point defined by the user. When applied to four manufacturing data sets, the recommended method, EBDE, retains average 13% of the original variables and increases the prediction accuracy in average 32% compared to using all the variables.
|
4 |
Seleção de variáveis preditivas com base em índices de importância das variáveis e regressão PLS / Selecting the most relevant predictive variables based on variable importance indices and PLS regressionZimmer, Juliano January 2012 (has links)
A presente dissertação propõe métodos para seleção de variáveis preditivas com base em índices de importância das variáveis e regressão PLS (Partial Least Squares). Partindo-se de uma revisão da bibliografia sobre PLS e índices de importância das variáveis, sugere-se um método, denominado Eliminação Backward (EB), para seleção de variáveis a partir da eliminação sistemática de variáveis de acordo com a ordem definida por índices de importância das variáveis. Um novo índice de importância de variáveis, proposto com base nos parâmetros da regressão PLS, tem seu desempenho avaliado frente a outros índices reportados pela literatura. Duas variações do método EB são propostas e testadas através de simulação: (i) o método EBM (Eliminação backward por mínimos), que identifica o conjunto que maximiza o indicador de acurácia preditiva sem considerar o percentual de variáveis retidas, e (ii) o método EBDE (Eliminação backward por distância euclidiana), que seleciona o conjunto de variáveis responsável pela mínima distância euclidiana entre os pontos do perfil gerado pela eliminação das variáveis e um ponto ideal hipotético definido pelo usuário. A aplicação dos três métodos em quatro bancos de dados reais aponta o EBDE como recomendável, visto que retém, em média, apenas 13% das variáveis originais e eleva a acurácia de predição em 32% em relação à utilização de todas as variáveis. / This dissertation presents new methods for predictive variable selection based on variable importance indices and PLS regression. The novel method, namely Backward Elimination (BE), selects the most important variables by eliminating process variables according to their importance described by the variable importance indices. A new variable importance index is proposed, and compared to previous indices for that purpose. We then offer two modifications on the BE method: (i) the EBM method, which selects the subset of variables yielding the maximum predictive accuracy (i.e., the minimum residual index), and (ii) the EBDE, which selects the subset leading to the minimum Euclidian distance between the points generated by variable removal and a hypothetical ideal point defined by the user. When applied to four manufacturing data sets, the recommended method, EBDE, retains average 13% of the original variables and increases the prediction accuracy in average 32% compared to using all the variables.
|
5 |
Statut de la faillite en théorie financière : approches théoriques et validations empiriques dans le contexte français / Status of the bankruptcy of financial theory : theoretical and empirical validation in French contextBen Jabeur, Sami 27 May 2011 (has links)
Dans la conjoncture économique actuelle un nombre croissant de firmes se trouvent confrontées à des difficultés économiques et financières qui peuvent, dans certains cas, conduire à la faillite. En principe, les difficultés ne surviennent pas brutalement, en effet, avant qu’une entreprise soit déclarée en faillite, elle est confrontée à des difficultés financières de gravité croissante : défaut de paiement d’une dette, insolvabilité temporaire, pénurie de liquidité, etc. L’identification des causes de la défaillance n’est pas évidente, puisqu’on ne saurait énumérer de manière limitative les facteurs qui la provoquent. Les causes sont multiples et leur cumul compromet d’autant plus la survie de l’entreprise. L’importance de ce phénomène et son impact sur l’ensemble de l’économie justifie le besoin de le comprendre, l’expliquer en analysant les causes et les origines. L’objectif de notre étude est de classer les entreprises en difficulté selon leur degré de viabilité et de comprendre les causes de la dégradation de leur situation. Nous effectuerons une comparaison entre trois modèles (Analyse discriminante linéaire, le modèle Logit et la régression PLS) ce qui nous permettra à partir des taux de bon classement obtenus, de choisir le meilleur modèle tout en précisant l’origine et les causes de ces défaillances. / In actual economic situation an increasing number of firms are facing economic and financial difficulties which can, in certain cases, drive to failure. In principle, difficulties do not happen suddenly, in effect, before a firm is declared bankrupt, it is confronted to financial difficulties of growing seriousness: default in payment of a debt, temporary insolvency, scarceness of liquidity, etc. Identifying the causes of the failure is not obvious, since one can not exhaustively enumerate the factors that cause it. The causes are multiple and overlapping compromise even more the company's survival. The importance of this phenomenon and its impact on the overall economy justifies the need to understand, explain it by analyzing the causes and origins The aim of our study is to classify firms in trouble according to their degree of viability and to understand the causes of the deterioration of their situation. We will do a comparison between three models (linear differential Analysis, the model Logit and decline PLS) what will allow us from the rates of good classification acquired, to choose the best model while specifying origin and reasons of these faults.
|
6 |
Modellering av volym samt max- och medeldjup i svenska sjöar : en statistisk analys med hjälp av geografiska informationssystem / Modeling volume, max- and mean-depth in Swedish lakes : a statistical analysis with geographical information systemsSandström, Sara January 2017 (has links)
Lake volume and lake depth are important variables that defines a lake and its ecosystem. Sweden has around 100 000 lakes, but only around 8000 lakes has measured data for volume, max- and mean-depth. To collect data for the rest of the lakes is presently too time consuming and expensive, therefore a predictive method is needed. Previous studies by Sobek et al. (2011) have found a model predicting lake volume from map-derived parameters with high degrees of explanation for mean volume of 15 lakes or more. However, the predictions for one individual lake, as well as max- and mean-depth, were not accurate enough. The purpose with this study was to derive better models based on new map material with higher resolution. Variables used was derived using GIS-based calculations and then analyzed with multivariate statistical analysis with PCA, PLS-regression and multiple linear regression. A model predicting lake volume for one individual lake with better accuracy than previous studies was found. The variables best explaining the variations in lake volume was lake area and the median slope of an individual zone around each lake (R2=0.87, p<0.00001). Also, the model predicting max-depth from lake area, median slope of an individual zone around each lake and height differences in the closest area surrounding each lake, had higher degrees of explanation than in previous studies (R2=0.42). The mean-depth had no significant correlation with map-derived parameters, but showed strong correlation with max-depth. Reference Sobek, S., Nisell, J. & Fölster J. (2011). Predicting the volume and depths of lakes from map-derived parameters. Inland Waters, vol. 1, ss. 177-184.
|
7 |
Modèles de prédiction pour l'évaluation génomique des bovins laitiers français : application aux races Holstein et Montbéliarde / Prediction models for the genomic evaluation of French dairy cattle : application to the Holstein and Montbéliarde breedsColombani, Carine 16 October 2012 (has links)
L'évolution rapide des techniques de séquençage et de génotypage soulèvent de nouveaux défis dans le développement des méthodes de sélection pour les animaux d’élevage. Par comparaison de séquences, il est à présent possible d'identifier des sites polymorphes dans chaque espèce afin de baliser le génome par des marqueurs moléculaires appelés SNP (Single Nucleotide Polymorphism). Les méthodes de sélection des animaux à partir de cette information moléculaire nécessitent une représentation complète des effets génétiques. Meuwissen et al. (2001) ont introduit le concept de sélection génomique en proposant de prédire simultanément tous les effets des régions marquées puis de construire un index "génomique" en sommant les effets de chaque région. Le challenge dans l’évaluation génomique est de disposer de la meilleure méthode de prédiction afin d’obtenir des valeurs génétiques précises pour une sélection efficace des animaux candidats. L’objectif général de cette thèse est d'explorer et d’évaluer de nouvelles approches génomiques capables de prédire des dizaines de milliers d'effets génétiques, sur la base des phénotypes de centaines d'individus. Elle s’inscrit dans le cadre du projet ANR AMASGEN dont le but est d’étendre la sélection assistée par marqueurs, utilisée jusqu’à lors chez les bovins laitiers français, et de développer une méthode de prédiction performante. Pour cela, un panel varié de méthodes est exploré en estimant leurs capacités prédictives. Les méthodes de régression PLS (Partial Least Squares) et sparse PLS, ainsi que des approches bayésiennes (LASSO bayésien et BayesCπ) sont comparées à deux méthodes usuelles en amélioration génétique : le BLUP basé sur l’information pedigree et le BLUP génomique basé sur l’information des SNP. Ces méthodologies fournissent des modèles de prédiction efficaces même lorsque le nombre d’observations est très inférieur au nombre de variables. Elles reposent sur la théorie des modèles linéaires mixtes gaussiens ou les méthodes de sélection de variables, en résumant l’information massive des SNP par la construction de nouvelles variables. Les données étudiées dans le cadre de ce travail proviennent de deux races de bovins laitiers français (1 172 taureaux de race Montbéliarde et 3 940 taureaux de race Holstein) génotypés sur environ 40 000 marqueurs SNP polymorphes. Toutes les méthodes génomiques testées ici produisent des évaluations plus précises que la méthode basée sur la seule information pedigree. On observe un léger avantage prédictif des méthodes bayésiennes sur certains caractères mais elles sont cependant trop exigeantes en temps de calcul pour être appliquées en routine dans un schéma de sélection génomique. L’avantage des méthodes de sélection de variables est de pouvoir faire face au nombre toujours plus important de données SNP. De plus, elles sont capables de mettre en évidence des ensembles réduits de marqueurs, identifiés sur la base de leurs effets estimés, c’est-à-dire ayant un impact important sur les caractères étudiés. Il serait donc possible de développer une méthode de prédiction des valeurs génomiques sur la base de QTL détectés par ces approches. / The rapid evolution in sequencing and genotyping raises new challenges in the development of methods of selection for livestock. By sequence comparison, it is now possible to identify polymorphic regions in each species to mark the genome with molecular markers called SNPs (Single Nucleotide Polymorphism). Methods of selection of animals from genomic information require the representation of the molecular genetic effects. Meuwissen et al. (2001) introduced the concept of genomic selection by predicting simultaneously all the effects of the markers. Then a genomic index is built summing the effects of each region. The challenge in genomic evaluation is to find the best prediction method to obtain accurate genetic values of candidates. The overall objective of this thesis is to explore and evaluate new genomic approaches to predict tens of thousands of genetic effects, based on the phenotypes of hundreds of individuals. It is part of the ANR project AMASGEN whose aim is to extend the marker-assisted selection, used in French dairy cattle, and to develop an accurate method of prediction. A panel of methods is explored by estimating their predictive abilities. The PLS (Partial Least Squares) and sparse PLS regressions and Bayesian approaches (Bayesian LASSO and BayesCπ) are compared with two current methods in genetic improvement: the BLUP based on pedigree information and the genomic BLUP based on SNP markers. These methodologies are effective even when the number of observations is smaller than the number of variables. They are based on the theory of Gaussian linear mixed models or methods of variable selection, summarizing the massive information of SNP by new variables. The datasets come from two French dairy cattle breeds (1172 Montbéliarde bulls and 3940 Holstein bulls) genotyped with 40 000 polymorphic SNPs. All genomic methods give more accurate estimates than the method based on pedigree information only. There is a slight predictive advantage of Bayesian methods on some traits but they are still too demanding in computation time to be applied routinely in a genomic selection scheme. The advantage of variable selection methods is to cope with the increasing number of SNP data. In addition, they are able to extract reduced sets of markers based of their estimated effects, that is to say, with a significant impact on the trait studied. It would be possible to develop a method to predict genomic values on the basis of QTL detected by these approaches.
|
8 |
Amélioration et développement de méthodes de sélection du nombre de composantes et de prédicteurs significatifs pour une régression PLS et certaines de ses extensions à l'aide du bootstrap / lmprovement and development of selection methods for both the number of components and significant predictors for a PLS regression and some extensions with bootstrap techniquesMagnanensi, Jérémy 18 December 2015 (has links)
La régression Partial Least Squares (PLS), de part ses caractéristiques, est devenue une méthodologie statistique de choix pour le traitement de jeux de données issus d’études génomiques. La fiabilité de la régression PLS et de certaines de ses extensions repose, entre autres, sur une détermination robuste d’un hyperparamètre, le nombre de composantes. Une telle détermination reste un objectif important à ce jour, aucun critère existant ne pouvant être considéré comme globalement satisfaisant. Nous avons ainsi élaboré un nouveau critère de choix pour la sélection du nombre de composantes PLS basé sur la technique du bootstrap et caractérisé notamment par une forte stabilité. Nous avons ensuite pu l’adapter et l’utiliser à des fins de développement et d’amélioration de procédés de sélection de prédicteurs significatifs, ouvrant ainsi la voie à une identification rendue plus fiable et robuste des probe sets impliqués dans la caractéristique étudiée d’une pathologie. / The Partial Least Squares (PLS) regression, through its properties, has become a versatile statistic methodology for the analysis of genomic datasets.The reliability of the PLS regression and some of its extensions relies on a robust determination of a tuning parameter, the number of components. Such a determination is still a major aim since no existing criterion could be considered as a global benchmark one in the state-of-art literature. We developed a new bootstrap based stopping criterion in PLS components construction that guarantee a high level of stability. We then adapted and used it to develop and improve variable selection processes, allowing a more reliable and robust determination of significant probe sets related to the studied feature of a pathology.
|
9 |
Fast characterization of the organic matter, instrumentation and modelling for the AD process performances prediction / Caractérisation rapide de la matière organique, instrumentation et modélisation pour la prédiction des performances des procédés de digestion anaérobieCharnier, Cyrille 25 November 2016 (has links)
La digestion anaérobie est un des piliers de l'économie circulaire européenne, produisant du méthane et des engrais organiques à partir de déchets. Le développement de ce secteur passe par la co-digestion et l’optimisation de l'alimentation des procédés. Cela nécessite l'estimation de l'état biologique du digesteur, la caractérisation du substrat ainsi que l’utilisation de modèles prédictifs simulant les performances du digesteur, pour lesquels les solutions actuelles ne sont pas adaptées. Dans cette thèse, un capteur titrimétrique couplant pH et conductivité électrique pour l'estimation des concentrations en acides gras volatils, carbone inorganique et azote ammoniacale a été conçu, améliorant la précision d'estimation des acides gras volatils de 14,5 par rapport aux capteurs actuels. Couplé à l’analyse du biogaz, il permet d'estimer l'état biologique du procédé. En parallèle, une analyse spectroscopique proche-infrarouge, estimant les teneurs en glucides, protéines, lipides, demande chimique en oxygène, rendement et cinétique de production de méthane a été développée réduisant le temps de caractérisation des substrats à quelques minutes. La caractérisation rapide des substrats est utilisée pour implémenter le modèle de digestion anaérobie ADM1 de l’IWA qui prédit les performances d'un digesteur dans des conditions de digestion optimales. Le couplage de l’estimation de l'état biologique à cette approche permet de corriger la prédiction en prenant en compte l'état actuel du digesteur. Cette approche fournit un outil prédictif puissant pour le contrôle avancé des unités de digestion anaérobie ainsi que l'optimisation de la recette d'alimentation. / Anaerobic digestion is an important pillar of the European circular economy, producing methane and organic fertilizers from waste. The development of the anaerobic digestion sector goes through co-digestion and feeding strategy optimization. Its development requires the biological state estimation of the plant, substrate characterization and predictive models simulating the plant performances, for which current solutions are not suitable. In this thesis, a titration sensor coupling pH and electrical conductivity for the estimation of volatile fatty acids, inorganic carbon and ammonia has been designed, improving the accuracy on volatile fatty acids estimation by 14.5 compared to current sensors. Along with biogas analyses, it allows estimating the biological state of the unit. Besides, fast near infrared spectroscopic analysis, estimating in a matter of minute carbohydrate, protein and lipid contents, chemical oxygen demand, methane production yield and kinetics, has been developed. Thus fast substrate characterization is then used to implement a modified Anaerobic Digestion Model n°1 which predicts the performances of a plant under optimal condition. Coupling biological state estimation to this approach enables to correct the prediction with the current state of the plant. This approach provides a powerful predictive tool for advanced control of anaerobic digestion plants and feeding strategy optimization.
|
10 |
Using machine learning to determine fold class and secondary structure content from Raman optical activity and Raman vibrational spectroscopyKinalwa-Nalule, Myra January 2012 (has links)
The objective of this project was to apply machine learning methods to determine protein secondary structure content and protein fold class from ROA and Raman vibrational spectral data. Raman and ROA are sensitive to biomolecular structure with the bands of each spectra corresponding to structural elements in proteins and when combined give a fingerprint of the protein. However, there are many bands of which little is known. There is a need, therefore, to find ways of extrapolating information from spectral bands and investigate which regions of the spectra contain the most useful structural information. Support Vector Machines (SVM) classification and Random Forests (RF) trees classification were used to mine protein fold class information and Partial Least Squares (PLS) regression was used to determine secondary structure content of proteins. The classification methods were used to group proteins into α-helix, β-sheet, α/β and disordered fold classes. The PLS regression was used to determine percentage protein structural content from Raman and ROA spectral data. The analyses were performed on spectral bin widths of 10cm-1 and on the spectral amide regions I, II and III. The full spectra and different combinations of the amide regions were also analysed. The SVM analyses, classification and regression, generally did not perform well. SVM classification models for example, had low Matthew Correlation Coefficient (MCC) values below 0.5 but this is better than a negative value which would indicate a random chance prediction. The SVM regression analyses also showed very poor performances with average R2 values below 0.5. R2 is the Pearson's correlations coefficient and shows how well predicted and observed structural content values correlate. An R2 value 1 indicates a good correlation and therefore a good prediction model. The Partial Least Squares regression analyses yielded much improved results with very high accuracies. Analyses of full spectrum and the spectral amide regions produced high R2 values of 0.8-0.9 for both ROA and Raman spectral data. This high accuracy was also seen in the analysis of the 850-1100 cm-1 backbone region for both ROA and Raman spectra which indicates that this region could have an important contribution to protein structure analysis. 2nd derivative Raman spectra PLS regression analysis showed very improved performance with high accuracy R2 values of 0.81-0.97. The Random Forest algorithm used here for classification showed good performance. The 2-dimensional plots used to visualise the classification clusters showed clear clusters in some analyses, for example tighter clustering was observed for amide I, amide I & III and amide I & II & III spectral regions than for amide II, amide III and amide II&III spectra analysis. The Random Forest algorithm also determines variable importance which showed spectral bins were crucial in the classification decisions. The ROA Random Forest analyses performed generally better than Raman Random Forest analyses. ROA Random Forest analyses showed 75% as the highest percentage of correctly classified proteins while Raman analyses reported 50% as the highest percentage. The analyses presented in this thesis have shown that Raman and ROA vibrational spectral contains information about protein secondary structure and these data can be extracted using mathematical methods such as the machine learning techniques presented here. The machine learning methods applied in this project were used to mine information about protein secondary structure and the work presented here demonstrated that these techniques are useful and could be powerful tools in the determination protein structure from spectral data.
|
Page generated in 0.0779 seconds