Global ETD Search

21	Understanding patterns of aggregation in count data Sebatjane, Phuti 06 1900 (has links) The term aggregation refers to overdispersion and both are used interchangeably in this thesis. In addressing the problem of prevalence of infectious parasite species faced by most rural livestock farmers, we model the distribution of faecal egg counts of 15 parasite species (13 internal parasites and 2 ticks) common in sheep and goats. Aggregation and excess zeroes is addressed through the use of generalised linear models. The abundance of each species was modelled using six different distributions: the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-altered Poisson (ZAP) and zero-altered negative binomial (ZANB) and their fit was later compared. Excess zero models (ZIP, ZINB, ZAP and ZANB) were found to be a better fit compared to standard count models (Poisson and negative binomial) in all 15 cases. We further investigated how distributional assumption a↵ects aggregation and zero inflation. Aggregation and zero inflation (measured by the dispersion parameter k and the zero inflation probability) were found to vary greatly with distributional assumption; this in turn changed the fixed-effects structure. Serial autocorrelation between adjacent observations was later taken into account by fitting observation driven time series models to the data. Simultaneously taking into account autocorrelation, overdispersion and zero inflation proved to be successful as zero inflated autoregressive models performed better than zero inflated models in most cases. Apart from contribution to the knowledge of science, predictability of parasite burden will help farmers with effective disease management interventions. Researchers confronted with the task of analysing count data with excess zeroes can use the findings of this illustrative study as a guideline irrespective of their research discipline. Statistical methods from model selection, quantifying of zero inflation through to accounting for serial autocorrelation are described and illustrated. / Statistics / M.Sc. (Statistics) Aggregations Autoregressive models Akaike information criterion Correlation Count data Exponential family Generalised linear models Goats Internal parasites Hosts Negative binomial distribution Overdispersion Poisson distribution Sheep Time series Zero inflation 519.537 Correlation (Statistics) Akaike Information Criterion Exponential functions Negative binomial distribution Poisson distribution Livestock -- Parasites Time-series analysis Binomial distribution
22	Logistic regression to determine significant factors associated with share price change Muchabaiwa, Honest 19 February 2014 (has links) This thesis investigates the factors that are associated with annual changes in the share price of Johannesburg Stock Exchange (JSE) listed companies. In this study, an increase in value of a share is when the share price of a company goes up by the end of the financial year as compared to the previous year. Secondary data that was sourced from McGregor BFA website was used. The data was from 2004 up to 2011. Deciding which share to buy is the biggest challenge faced by both investment companies and individuals when investing on the stock exchange. This thesis uses binary logistic regression to identify the variables that are associated with share price increase. The dependent variable was annual change in share price (ACSP) and the independent variables were assets per capital employed ratio, debt per assets ratio, debt per equity ratio, dividend yield, earnings per share, earnings yield, operating profit margin, price earnings ratio, return on assets, return on equity and return on capital employed. Different variable selection methods were used and it was established that the backward elimination method produced the best model. It was established that the probability of success of a share is higher if the shareholders are anticipating a higher return on capital employed, and high earnings/ share. It was however, noted that the share price is negatively impacted by dividend yield and earnings yield. Since the odds of an increase in share price is higher if there is a higher return on capital employed and high earning per share, investors and investment companies are encouraged to choose companies with high earnings per share and the best returns on capital employed. The final model had a classification rate of 68.3% and the validation sample produced a classification rate of 65.2% / Mathematical Sciences / M.Sc. (Statistics) Logistic regression Binary logistic regression Share price Stock exchange Akaike’s Information Criterion Wald Test Score test Enter method Stepwise logistic regression 519.536 Stock exchange Logistic regression analysis Market share Akaike Information Criterion
23	Identification des grands utilisateurs de soins de santé chez les patients souffrant de la douleur chronique non cancéreuse et suivis en soins de première ligne Antaky, Elie 03 1900 (has links) Contexte: La douleur chronique non cancéreuse (DCNC) génère des retombées économiques et sociétales importantes. L’identification des patients à risque élevé d’être de grands utilisateurs de soins de santé pourrait être d’une grande utilité; en améliorant leur prise en charge, il serait éventuellement possible de réduire leurs coûts de soins de santé. Objectif: Identifier les facteurs prédictifs bio-psycho-sociaux des grands utilisateurs de soins de santé chez les patients souffrant de DCNC et suivis en soins de première ligne. Méthodologie: Des patients souffrant d’une DCNC modérée à sévère depuis au moins six mois et bénéficiant une ordonnance valide d’un analgésique par un médecin de famille ont été recrutés dans des pharmacies communautaires du territoire du Réseau universitaire intégré de santé (RUIS), de l’Université de Montréal entre Mai 2009 et Janvier 2010. Ce dernier est composé des six régions suivantes : Mauricie et centre du Québec, Laval, Montréal, Laurentides, Lanaudière et Montérégie. Les caractéristiques bio-psycho-sociales des participants ont été documentées à l’aide d’un questionnaire écrit et d’une entrevue téléphonique au moment du recrutement. Les coûts directs de santé ont été estimés à partir des soins et des services de santé reçus au cours de l’année précédant et suivant le recrutement et identifiés à partir de la base de données de la Régie d’Assurance maladie du Québec, RAMQ (assureur publique de la province du Québec). Ces coûts incluaient ceux des hospitalisations reliées à la douleur, des visites à l’urgence, des soins ambulatoires et de la médication prescrite pour le traitement de la douleur et la gestion des effets secondaires des analgésiques. Les grands utilisateurs des soins de santé ont été définis comme étant ceux faisant partie du quartile le plus élevé de coûts directs annuels en soins de santé dans l’année suivant le recrutement. Des modèles de régression logistique multivariés et le critère d’information d’Akaike ont permis d’identifier les facteurs prédictifs des coûts directs élevés en soins de santé. Résultats: Le coût direct annuel médian en soins de santé chez les grands utilisateurs de soins de santé (63 patients) était de 7 627 CAD et de 1 554 CAD pour les utilisateurs réguliers (188 patients). Le modèle prédictif final du risque d’être un grand utilisateur de soins de santé incluait la douleur localisée au niveau des membres inférieurs (OR = 3,03; 95% CI: 1,20 - 7,65), la réduction de la capacité fonctionnelle liée à la douleur (OR = 1,24; 95% CI: 1,03 - 1,48) et les coûts directs en soins de santé dans l’année précédente (OR = 17,67; 95% CI: 7,90 - 39,48). Les variables «sexe», «comorbidité», «dépression» et «attitude envers la guérison médicale» étaient également retenues dans le modèle prédictif final. Conclusion: Les patients souffrant d’une DCNC au niveau des membres inférieurs et présentant une détérioration de la capacité fonctionnelle liée à la douleur comptent parmi ceux les plus susceptibles d’être de grands utilisateurs de soins et de services. Le coût direct en soins de santé dans l’année précédente était également un facteur prédictif important. Améliorer la prise en charge chez cette catégorie de patients pourrait influencer favorablement leur état de santé et par conséquent les coûts assumés par le système de santé. / Background: Chronic non-cancer pain (CNCP) has major social and economic impacts. Identifying patients at risk of being heavy health care users could be very useful; therefore, by improving their care direct health care costs could eventually be reduced. Purpose: To identify bio-psycho-social factors predicting the risk of being a heavy health care user among primary care CNCP patients. Methods: Patients reporting moderate to severe CNCP for at least 6 months with an active analgesic prescription from a primary care physician were recruited in community pharmacies on the territory of the Réseau universitaire integré de santé (RUIS), of the Université de Montréal between May 2009 and January 2010. The latter comprises six areas: Mauricie and centre du Quebec, Laval, Montreal, the Laurentians, Lanaudière and Montérégie. Upon recruitment, their bio-psycho-social characteristics were documented through self-administered and telephone questionnaires. The direct health costs were estimated for the health care services provided to patients in the year preceding and following recruitment using the database of the Régie d’Assurance maladie du Québec, RAMQ (Quebec province public health care insurance). These costs took into account the pain-related hospitalizations, emergency room visits, ambulatory care, and medication prescribed for pain treatment and drug side effects Heavy health care users were defined as those in the highest annual direct health care costs quartile in the year following recruitment. Logistic multivariate regression models using the Akaike information criterion were developed in order to identify the predictors of heavy health care use. Results: The median annual direct health care cost incurred by heavy health care users (n = 63) was CAD 7,627, compared to CAD 1,554 for the standard health care users (n = 188). The final predictive model of the risks of being a heavy health care user included pain located in the lower body (Odds ratio (OR) = 3.03; 95% CI: 1.20 - 7.65), pain-related disability (OR = 1.24; 95% CI: 1.03 - 1.48), and health care costs in the previous year (OR = 17.67; 95% CI: 7.90 - 39.48). Other retained variables were sex, comorbidity, depression level, and patients’ attitudes towards medical pain cure. Conclusion: Patients suffering from CNCP in the lower body and having a greater impact of pain on their daily functioning were more likely to be heavy health care and services users. Previous year annual direct cost was also a significant predictor. Improving pain management in this clientele of patients may improve their health and eventually reduce their health care cost to the health care system. Douleur chronique non cancéreuse Première ligne Coûts directs Facteurs prédictifs Étude de cohorte Critère d’information d’Akaike Chronic non-cancer pain Primary care Direct health care costs Cost predictors Cohort study Akaike information criterion
24	Mensuração da biomassa e construção de modelos para construção de equações de biomassa / Biomass measurement and models selection for biomass equations Vismara, Edgar de Souza 07 May 2009 (has links) O interesse pela quantificação da biomassa florestal vem crescendo muito nos últimos anos, sendo este crescimento relacionado diretamente ao potencial que as florestas tem em acumular carbono atmosférico na sua biomassa. A biomassa florestal pode ser acessada diretamente, por meio de inventário, ou através de modelos empíricos de predição. A construção de modelos de predição de biomassa envolve a mensuração das variáveis e o ajuste e seleção de modelos estatísticos. A partir de uma amostra destrutiva de de 200 indivíduos de dez essências florestais distintas advindos da região de Linhares, ES., foram construídos modelos de predição empíricos de biomassa aérea visando futuro uso em projetos de reflorestamento. O processo de construção dos modelos consistiu de uma análise das técnicas de obtenção dos dados e de ajuste dos modelos, bem como de uma análise dos processos de seleção destes a partir do critério de Informação de Akaike (AIC). No processo de obtenção dos dados foram testadas a técnica volumétrica e a técnica gravimétrica, a partir da coleta de cinco discos de madeira por árvore, em posições distintas no lenho. Na técnica gravimétrica, estudou-se diferentes técnicas de composição do teor de umidade dos discos para determinação da biomassa, concluindo-se como a melhor a que utiliza a média aritmética dos discos da base, meio e topo. Na técnica volumétrica, estudou-se diferentes técnicas de composição da densidade do tronco com base nas densidades básicas dos discos, concluindo-se que em termos de densidade do tronco, a média aritmética das densidades básicas dos cinco discos se mostrou como melhor técnica. Entretanto, quando se multiplica a densidade do tronco pelo volume deste para obtenção da biomassa, a utilização da densidade básica do disco do meio se mostrou superior a todas as técnicas. A utilização de uma densidade básica média da espécie para determinação da biomassa, via técnica volumétrica, se apresentou como uma abordagem inferior a qualquer técnica que utiliza informação da densidade do tronco das árvores individualmente. Por fim, sete modelos de predição de biomassa aérea de árvores considerando seus diferentes compartimentos foram ajustados, a partir das funções de Spurr e Schumacher-Hall, com e sem a inclusão da altura como variável preditora. Destes modelos, quatro eram gaussianos e três eram lognormais. Estes mesmos sete modelos foram ajustados incluindo a medida de penetração como variável preditora, totalizando quatorze modelos testados. O modelo de Schumacher-Hall se mostrou, de maneira geral, superior ao modelo de Spurr. A altura só se mostrou efetiva na explicação da biomassa das árvores quando em conjunto com a medida de penetração. Os modelos selecionados foram do grupo que incluíram a medida de penetração no lenho como variável preditora e , exceto o modelo de predição da biomassa de folhas, todos se mostraram adequados para aplicação na predição da biomassa aérea em áreas de reflorestamento. / Forest biomass measurement implies a destructive procedure, thus forest inventories and biomass surveys apply indirect procedure for the determination of biomass of the different components of the forest (wood, branches, leaves, roots, etc.). The usual approch consists in taking a destructive sample for the measurment of trees attributes and an empirical relationship is established between the biomass and other attributes that can be directly measured on standing trees, e.g., stem diameter and tree height. The biomass determination of felled trees can be achived by two techniques: the gravimetric technique, that weights the components in the field and take a sample for the determination of water content in the laboratory; and the volumetric technique, that determines the volume of the component in the field and take a sample for the determination of the wood specific gravity (wood basic density) in the laboratory. The gravimetric technique applies to all components of the trees, while the volumetric technique is usually restricted to the stem and large branches. In this study, these two techniques are studied in a sample fo 200 trees of 10 different species from the region of Linhares, ES. In each tree, 5 cross-sections of the stem were taken to investigate the best procedure for the determination of water content in gravimetric technique and for determination of the wood specific gravity in the volumetric technique. Also, Akaike Information Criterion (AIC) was used to compare different statistical models for the prediction o tree biomass. For the stem water content determination, the best procedure as the aritmetic mean of the water content from the cross-sections in the base, middle and top of the stem. In the determination of wood specific gravity, the best procedure was the aritmetic mean of all five cross-sections discs of the stem, however, for the determination of the biomass, i.e., the product of stem volume and wood specific gravity, the best procedure was the use of the middle stem cross-section disc wood specific gravity. The use of an average wood specific gravity by species showed worse results than any procedure that used information of wood specific gravity at individual tree level. Seven models, as variations of Spurr and Schumacher-Hall volume equation models, were tested for the different tree components: wood (stem and large branches), little branches, leaves and total biomass. In general, Schumacher-Hall models were better than Spurr based models, and models that included only diameter (DBH) information performed better than models with diameter and height measurements. When a measure of penetration in the wood, as a surrogate of wood density, was added to the models, the models with the three variables: diameter, height and penetration, became the best models. Aboveground biomass AIC Akaike information criterion Atlantic rain forest. Basic density Biomassa Florestas tropicais Gravimetria Gravimetric tecnique Model Selection Prediction models Seleção de modelos Stem water content Volumetria. Volumetric tecnique
25	Automated construction of generalized additive neural networks for predictive data mining / Jan Valentine du Toit Du Toit, Jan Valentine January 2006 (has links) In this thesis Generalized Additive Neural Networks (GANNs) are studied in the context of predictive Data Mining. A GANN is a novel neural network implementation of a Generalized Additive Model. Originally GANNs were constructed interactively by considering partial residual plots. This methodology involves subjective human judgment, is time consuming, and can result in suboptimal results. The newly developed automated construction algorithm solves these difficulties by performing model selection based on an objective model selection criterion. Partial residual plots are only utilized after the best model is found to gain insight into the relationships between inputs and the target. Models are organized in a search tree with a greedy search procedure that identifies good models in a relatively short time. The automated construction algorithm, implemented in the powerful SAS® language, is nontrivial, effective, and comparable to other model selection methodologies found in the literature. This implementation, which is called AutoGANN, has a simple, intuitive, and user-friendly interface. The AutoGANN system is further extended with an approximation to Bayesian Model Averaging. This technique accounts for uncertainty about the variables that must be included in the model and uncertainty about the model structure. Model averaging utilizes in-sample model selection criteria and creates a combined model with better predictive ability than using any single model. In the field of Credit Scoring, the standard theory of scorecard building is not tampered with, but a pre-processing step is introduced to arrive at a more accurate scorecard that discriminates better between good and bad applicants. The pre-processing step exploits GANN models to achieve significant reductions in marginal and cumulative bad rates. The time it takes to develop a scorecard may be reduced by utilizing the automated construction algorithm. / Thesis (Ph.D. (Computer Science))--North-West University, Potchefstroom Campus, 2006. Akaike Information Criterion AIC Automated construction algorithm Bayesian Model Averaging Credit scoring Data mining Generalized Additive Neural Network GANN Generalized Additive Model GAM Interactive construction algorithm Model averaging Neural network Partial residua Predictive modeling Schwarz information criterion SBC
26	Automated construction of generalized additive neural networks for predictive data mining / Jan Valentine du Toit Du Toit, Jan Valentine January 2006 (has links) In this thesis Generalized Additive Neural Networks (GANNs) are studied in the context of predictive Data Mining. A GANN is a novel neural network implementation of a Generalized Additive Model. Originally GANNs were constructed interactively by considering partial residual plots. This methodology involves subjective human judgment, is time consuming, and can result in suboptimal results. The newly developed automated construction algorithm solves these difficulties by performing model selection based on an objective model selection criterion. Partial residual plots are only utilized after the best model is found to gain insight into the relationships between inputs and the target. Models are organized in a search tree with a greedy search procedure that identifies good models in a relatively short time. The automated construction algorithm, implemented in the powerful SAS® language, is nontrivial, effective, and comparable to other model selection methodologies found in the literature. This implementation, which is called AutoGANN, has a simple, intuitive, and user-friendly interface. The AutoGANN system is further extended with an approximation to Bayesian Model Averaging. This technique accounts for uncertainty about the variables that must be included in the model and uncertainty about the model structure. Model averaging utilizes in-sample model selection criteria and creates a combined model with better predictive ability than using any single model. In the field of Credit Scoring, the standard theory of scorecard building is not tampered with, but a pre-processing step is introduced to arrive at a more accurate scorecard that discriminates better between good and bad applicants. The pre-processing step exploits GANN models to achieve significant reductions in marginal and cumulative bad rates. The time it takes to develop a scorecard may be reduced by utilizing the automated construction algorithm. / Thesis (Ph.D. (Computer Science))--North-West University, Potchefstroom Campus, 2006. Akaike Information Criterion AIC Automated construction algorithm Bayesian Model Averaging Credit scoring Data mining Generalized Additive Neural Network GANN Generalized Additive Model GAM Interactive construction algorithm Model averaging Neural network Partial residua Predictive modeling Schwarz information criterion SBC
27	Logistic regression to determine significant factors associated with share price change Muchabaiwa, Honest 19 February 2014 (has links) This thesis investigates the factors that are associated with annual changes in the share price of Johannesburg Stock Exchange (JSE) listed companies. In this study, an increase in value of a share is when the share price of a company goes up by the end of the financial year as compared to the previous year. Secondary data that was sourced from McGregor BFA website was used. The data was from 2004 up to 2011. Deciding which share to buy is the biggest challenge faced by both investment companies and individuals when investing on the stock exchange. This thesis uses binary logistic regression to identify the variables that are associated with share price increase. The dependent variable was annual change in share price (ACSP) and the independent variables were assets per capital employed ratio, debt per assets ratio, debt per equity ratio, dividend yield, earnings per share, earnings yield, operating profit margin, price earnings ratio, return on assets, return on equity and return on capital employed. Different variable selection methods were used and it was established that the backward elimination method produced the best model. It was established that the probability of success of a share is higher if the shareholders are anticipating a higher return on capital employed, and high earnings/ share. It was however, noted that the share price is negatively impacted by dividend yield and earnings yield. Since the odds of an increase in share price is higher if there is a higher return on capital employed and high earning per share, investors and investment companies are encouraged to choose companies with high earnings per share and the best returns on capital employed. The final model had a classification rate of 68.3% and the validation sample produced a classification rate of 65.2% / Mathematical Sciences / M.Sc. (Statistics) Logistic regression Binary logistic regression Share price Stock exchange Akaike’s Information Criterion Wald Test Score test Enter method Stepwise logistic regression 519.536 Stock exchange Logistic regression analysis Market share Akaike Information Criterion
28	Modelagem de sistemas dinamicos não lineares utilizando sistemas fuzzy, algoritmos geneticos e funções de base ortonormal / Modeling of nonlinear dynamics systems using fuzzy systems, genetic algorithms and orthonormal basis functions Medeiros, Anderson Vinicius de 23 January 2006 (has links) Orientadores: Wagner Caradori do Amaral, Ricardo Jose Gabrielli Barreto Campello / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-06T08:36:39Z (GMT). No. of bitstreams: 1 Medeiros_AndersonViniciusde_M.pdf: 896535 bytes, checksum: 48d0d75d38fcbbd0f47f7c49823806f1 (MD5) Previous issue date: 2006 / Resumo: Esta dissertação apresenta uma metodologia para a geração e otimização de modelos fuzzy Takagi-Sugeno (TS) com Funções de Base Ortonormal (FBO) para sistemas dinâmicos não lineares utilizando um algoritmo genético. Funções de base ortonormal têm sido utilizadas por proporcionarem aos modelos propriedades como ausência de recursão da saída e possibilidade de se alcançar uma razoável capacidade de representação com poucos parâmetros. Modelos fuzzy TS agregam a essas propriedades as características de interpretabilidade e facilidade de representação do conhecimento. Enfim, os algoritmos genéticos se apresentam como um método bem estabelecido na literatura na tarefa de sintonia de parâmetros de modelos fuzzy TS. Diante disso, desenvolveu-se um algoritmo genético para a otimização de duas arquiteturas, o modelo fuzzy TS FBO e sua extensão, o modelo fuzzy TS FBO Generalizado. Foram analisados modelos locais lineares e não lineares nos conseqüentes das regras fuzzy, assim como a diferença entre a estimação local e a global (utilizando o estimador de mínimos quadrados) dos parâmetros desses modelos locais. No algoritmo genético, cada arquitetura contou com uma representação cromossômica específica. Elaborou-se para ambas uma função de fitness baseada no critério de Akaike. Em relação aos operadores de reprodução, no operador de crossover aritmético foi introduzida uma alteração para a manutenção da diversidade da população e no operador de mutação gaussiana adotou-se uma distribuição variável ao longo das gerações e diferenciada para cada gene. Introduziu-se ainda um método de simplificação de soluções através de medidas de similaridade para a primeira arquitetura citada. A metodologia foi avaliada na tarefa de modelagem de dois sistemas dinâmicos não lineares: um processo de polimerização e um levitador magnético / Abstract: This work introduces a methodology for the generation and optimization of Takagi-Sugeno (TS) fuzzy models with Orthonormal Basis Functions (OBF) for nonlinear dynamic systems based on a genetic algorithm. Orthonormal basis functions have been used because they provide models with properties like absence of output feedback and the possibility to reach a reasonable approximation capability with just a few parameters. TS fuzzy models aggregate to these properties the characteristics of interpretability and easiness to knowledge representation in a linguistic manner. Genetic algorithms appear as a well-established method for tuning parameters of TS fuzzy models. In this context, it was developed a genetic algorithm for the optimization of two architectures, the OBF TS fuzzy model and its extension, the Generalized OBF TS fuzzy model. Local linear and nonlinear models in the consequent of the fuzzy rules were analyzed, as well as the difference between local and global estimation (using least squares estimation) of the parameters of these local models. Each architecture had a specific chromosome representation in the genetic algorithm. It was developed a fitness function based on the Akaike information criterion. With respect to the genetic operators, the arithmetic crossover was modified in order to maintain the population diversity and the Gaussian mutation had its distribution varied along the generations and differentiated for each gene. Besides, it was used, in the first architecture presented, a method for simplifying the solutions by using similarity measures. The whole methodology was evaluated in modeling two nonlinear dynamic systems, a polymerization process and a magnetic levitator / Mestrado / Automação / Mestre em Engenharia Elétrica Teoria dos sistemas dinâmicos Sistemas fuzzy Algoritmos genéticos Otimização matemática Identificação de sistemas Sistemas não-lineares Dynamic systems modeling Takagi-Sugeno fuzzy model Orthonormal basis functions Optimization of models Genetic algorithms Akaike information criterion Similarity measures
29	Mensuração da biomassa e construção de modelos para construção de equações de biomassa / Biomass measurement and models selection for biomass equations Edgar de Souza Vismara 07 May 2009 (has links) O interesse pela quantificação da biomassa florestal vem crescendo muito nos últimos anos, sendo este crescimento relacionado diretamente ao potencial que as florestas tem em acumular carbono atmosférico na sua biomassa. A biomassa florestal pode ser acessada diretamente, por meio de inventário, ou através de modelos empíricos de predição. A construção de modelos de predição de biomassa envolve a mensuração das variáveis e o ajuste e seleção de modelos estatísticos. A partir de uma amostra destrutiva de de 200 indivíduos de dez essências florestais distintas advindos da região de Linhares, ES., foram construídos modelos de predição empíricos de biomassa aérea visando futuro uso em projetos de reflorestamento. O processo de construção dos modelos consistiu de uma análise das técnicas de obtenção dos dados e de ajuste dos modelos, bem como de uma análise dos processos de seleção destes a partir do critério de Informação de Akaike (AIC). No processo de obtenção dos dados foram testadas a técnica volumétrica e a técnica gravimétrica, a partir da coleta de cinco discos de madeira por árvore, em posições distintas no lenho. Na técnica gravimétrica, estudou-se diferentes técnicas de composição do teor de umidade dos discos para determinação da biomassa, concluindo-se como a melhor a que utiliza a média aritmética dos discos da base, meio e topo. Na técnica volumétrica, estudou-se diferentes técnicas de composição da densidade do tronco com base nas densidades básicas dos discos, concluindo-se que em termos de densidade do tronco, a média aritmética das densidades básicas dos cinco discos se mostrou como melhor técnica. Entretanto, quando se multiplica a densidade do tronco pelo volume deste para obtenção da biomassa, a utilização da densidade básica do disco do meio se mostrou superior a todas as técnicas. A utilização de uma densidade básica média da espécie para determinação da biomassa, via técnica volumétrica, se apresentou como uma abordagem inferior a qualquer técnica que utiliza informação da densidade do tronco das árvores individualmente. Por fim, sete modelos de predição de biomassa aérea de árvores considerando seus diferentes compartimentos foram ajustados, a partir das funções de Spurr e Schumacher-Hall, com e sem a inclusão da altura como variável preditora. Destes modelos, quatro eram gaussianos e três eram lognormais. Estes mesmos sete modelos foram ajustados incluindo a medida de penetração como variável preditora, totalizando quatorze modelos testados. O modelo de Schumacher-Hall se mostrou, de maneira geral, superior ao modelo de Spurr. A altura só se mostrou efetiva na explicação da biomassa das árvores quando em conjunto com a medida de penetração. Os modelos selecionados foram do grupo que incluíram a medida de penetração no lenho como variável preditora e , exceto o modelo de predição da biomassa de folhas, todos se mostraram adequados para aplicação na predição da biomassa aérea em áreas de reflorestamento. / Forest biomass measurement implies a destructive procedure, thus forest inventories and biomass surveys apply indirect procedure for the determination of biomass of the different components of the forest (wood, branches, leaves, roots, etc.). The usual approch consists in taking a destructive sample for the measurment of trees attributes and an empirical relationship is established between the biomass and other attributes that can be directly measured on standing trees, e.g., stem diameter and tree height. The biomass determination of felled trees can be achived by two techniques: the gravimetric technique, that weights the components in the field and take a sample for the determination of water content in the laboratory; and the volumetric technique, that determines the volume of the component in the field and take a sample for the determination of the wood specific gravity (wood basic density) in the laboratory. The gravimetric technique applies to all components of the trees, while the volumetric technique is usually restricted to the stem and large branches. In this study, these two techniques are studied in a sample fo 200 trees of 10 different species from the region of Linhares, ES. In each tree, 5 cross-sections of the stem were taken to investigate the best procedure for the determination of water content in gravimetric technique and for determination of the wood specific gravity in the volumetric technique. Also, Akaike Information Criterion (AIC) was used to compare different statistical models for the prediction o tree biomass. For the stem water content determination, the best procedure as the aritmetic mean of the water content from the cross-sections in the base, middle and top of the stem. In the determination of wood specific gravity, the best procedure was the aritmetic mean of all five cross-sections discs of the stem, however, for the determination of the biomass, i.e., the product of stem volume and wood specific gravity, the best procedure was the use of the middle stem cross-section disc wood specific gravity. The use of an average wood specific gravity by species showed worse results than any procedure that used information of wood specific gravity at individual tree level. Seven models, as variations of Spurr and Schumacher-Hall volume equation models, were tested for the different tree components: wood (stem and large branches), little branches, leaves and total biomass. In general, Schumacher-Hall models were better than Spurr based models, and models that included only diameter (DBH) information performed better than models with diameter and height measurements. When a measure of penetration in the wood, as a surrogate of wood density, was added to the models, the models with the three variables: diameter, height and penetration, became the best models. Biomassa Florestas tropicais Gravimetria Seleção de modelos Volumetria. Aboveground biomass AIC Akaike information criterion Atlantic rain forest. Basic density Gravimetric tecnique Model Selection Prediction models Stem water content Volumetric tecnique
30	Dynamic prediction of repair costs in heavy-duty trucks Saigiridharan, Lakshidaa January 2020 (has links) Pricing of repair and maintenance (R&M) contracts is one among the most important processes carried out at Scania. Predictions of repair costs at Scania are carried out using experience-based prediction methods which do not involve statistical methods for the computation of average repair costs for contracts terminated in the recent past. This method is difficult to apply for a reference population of rigid Scania trucks. Hence, the purpose of this study is to perform suitable statistical modelling to predict repair costs of four variants of rigid Scania trucks. The study gathers repair data from multiple sources and performs feature selection using the Akaike Information Criterion (AIC) to extract the most significant features that influence repair costs corresponding to each truck variant. The study proved to show that the inclusion of operational features as a factor could further influence the pricing of contracts. The hurdle Gamma model, which is widely used to handle zero inflations in Generalized Linear Models (GLMs), is used to train the data which consists of numerous zero and non-zero values. Due to the inherent hierarchical structure within the data expressed by individual chassis, a hierarchical hurdle Gamma model is also implemented. These two statistical models are found to perform much better than the experience-based prediction method. This evaluation is done using the mean absolute error (MAE) and root mean square error (RMSE) statistics. A final model comparison is conducted using the AIC to draw conclusions based on the goodness of fit and predictive performance of the two statistical models. On assessing the models using these statistics, the hierarchical hurdle Gamma model was found to perform predictions the best Hurdle Gamma model Hierarchical hurdle Gamma model Generalized linear model (GLM) Supervised Machine Learning feature selection Akaike Information Criterion (AIC) prediction of repair costs heavy-duty trucks truck variants operational features Probability Theory and Statistics Sannolikhetsteori och statistik

Search results