Global ETD Search

1	Využití klasifikačních stromů v marketingovém průzkumu trhu / Usage classification trees in market analysis PROKOPOVÁ, Kateřina January 2008 (has links) In my thesis I dealt with usage classification trees in market analysis, whereas I focused on area providing mobile services. The aim of my work was, by use of progressive methodology CART (classification and regression trees), to identify important factors biasing consumer behavior. On the basis of this questionnaire inquiry I have came the opinion, that consumers purchase mobile phone services base on their net monthly income, age, occupation, services required and whether they use the mobile service for personal or business purposes.
2	Fast growing and interpretable oblique trees via logistic regression models Truong, Alfred Kar Yin January 2009 (has links) The classification tree is an attractive method for classification as the predictions it makes are more transparent than most other classifiers. The most widely accepted approaches to tree-growth use axis-parallel splits to partition continuous attributes. Since the interpretability of a tree diminishes as it grows larger, researchers have sought ways of growing trees with oblique splits as they are better able to partition observations. The focus of this thesis is to grow oblique trees in a fast and deterministic manner and to propose ways of making them more interpretable. Finding good oblique splits is a computationally difficult task. Various authors have proposed ways of doing this by either performing stochastic searches or by solving problems that effectively produce oblique splits at each stage of tree-growth. A new approach to finding such splits is proposed that restricts attention to a small but comprehensive set of splits. Empirical evidence shows that good oblique splits are found in most cases. When observations come from a small number of classes, empirical evidence shows that oblique trees can be grown in a matter of seconds. As interpretability is the main strength of classification trees, it is important for oblique trees that are grown to be interpretable. As the proposed approach to finding oblique splits makes use of logistic regression, well-founded variable selection techniques are introduced to classification trees. This allows concise oblique splits to be found at each stage of tree-growth so that oblique trees that are more interpretable can be directly grown. In addition to this, cost-complexity pruning ideas which were developed for axis-parallel trees have been adapted to make oblique trees more interpretable. A major and practical component of this thesis is in providing the oblique.tree package in R that allows casual users to experiment with oblique trees in a way that was not possible before. 530.0724
3	A statistical investigation of the risk factors for tuberculosis van Woerden, Irene January 2013 (has links) Tuberculosis (TB) is called a disease of poverty and is the main cause of death from infectious diseases among adults. In 1993 the World Health Organisation (WHO) declared TB to be a global emergency; however there were still approximately 1.4 million deaths due to TB in 2011. This thesis contains a detailed study of the existing literature regarding the global risk factors of TB. The risk factors identified from the literature review search which were also available from the NFHS-3 survey were then analysed to determine how well we could identify respondents who are at high risk of TB. We looked at the stigma and misconceptions people have regarding TB and include detailed reports from the existing literature of how a persons wealth, health, education, nutrition, and HIV status affect how likely the person is to have TB. The difference in the risk factor distribution for the TB and non-TB populations were examined and classification trees, nearest neighbours, and logistic regression models were trialled to determine if it was possible for respondents who were at high risk of TB to be identified. Finally gender-specific statistically likely directed acyclic graphs were created to visualise the most likely associations between the variables. TB Tuberculosis NFHS-3 classification trees nearest neighbours logistic regression Directed Acyclic Graphs
4	An Exploration of Statistical Modelling Methods on Simulation Data Case Study: Biomechanical Predator–Prey Simulations January 2018 (has links) abstract: Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more traditional statistical methods of inference are still very effective and widely used. Nevertheless, with the decrease in cost of high-performance computing, these fields are starting to employ simulation models to generate insights into questions that have been elusive in the laboratory and field. Although these computational models allow for exquisite control over large numbers of parameters, they also generate data at a qualitatively different scale than most experts in these fields are accustomed to. Thus, more sophisticated methods from big-data statistics have an opportunity to better facilitate the often-forgotten area of bioinformatics that might be called “in-silicomics”. As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics. / Dissertation/Thesis / Masters Thesis Industrial Engineering 2018 Biostatistics Biomechanics Ecology Classification Trees Data Science LASSO Logistic Regression Simulation Variable Selection
5	Statistické klasifikační metody / Statistical Classification Methods Barvenčík, Oldřich January 2010 (has links) The thesis deals with selected classification methods. The thesis describes the basis of cluster analysis, discriminant analysis and theory of classification trees. The usage is demonstrated by classification of simulated data, the calculation is made in the program STATISTICA. In practical part of the thesis there is the comparison of the methods for classification of real data files of various extent. Classification methods are used for solving of the real task – prediction of air pollution based of the weather forecast.
6	The link between carbon management strategy, company characteristics and corporate financial performance Matthews, Natalie Georgette 23 February 2013 (has links) That companies need to respond to the issue of climate change is no longer in question and with multiple carbon management activity options to choose from, companies need to select the most appropriate carbon management strategy to meet the challenges of a carbon constrained future. Because of South Africa’s vulnerability to the impacts of climate change as a developing country and because of business’ pivotal role in addressing this urgent issue, it is important to characterise the corporate responses to climate change. The contextual factors that influence carbon management strategy decisions need to be understood so that appropriate policy decisions are taken to encourage innovation related to climate change opportunities.To this end, secondary data in the form of qualitative responses from 70 large South African listed companies to the Carbon Disclosure Project 2011 questionnaire were analysed for this study during September and October 2012. The detailed responses were first mined using a text-mining statistical program to identify the five carbon management activities currently practised by the companies. A cluster analysis of these activities revealed four general response strategies to climate change and carbon emission reduction pressures.The companies were found to have a strong focus on saving energy with less focus on higher-order sustainability activities. While market capitalisation, turnover, sector and carbon commitment were shown to correlate and indeed predict the carbon management strategy chosen by companies, no significant link was found between carbon management strategy and corporate financial performance. / Dissertation (MBA)--University of Pretoria, 2012. / Gordon Institute of Business Science (GIBS) / unrestricted UCTD Classification trees Carbon management Cluster analysis Corporate financial performance Corporate carbon management strategy
7	Predicting Customer Satisfaction from Dental Implants Perception Data Elmassad, Omnya January 2013 (has links) <p>In recent years, measuring customer satisfaction has become one of the key concerns of market research studies. One of the basic features of leading companies is their success in fulfilling their customers’ demands. For that reason, companies attempt to find out what essential factors dominate their customers’ purchasing habits.</p> <p>Millennium Research Group (MRG) - a global authority on medical tech- nology market intelligence - uses a web-based survey tool to collect informa- tion about customers’ level of satisfaction. One of their surveys is designed to gather information about the practitioner’s level of satisfaction on different brands of dental implants. The Dental Implants dataset obtained from the survey tool has thirty-four attributes, and practitioners were asked to rank or specify their level of satisfaction by assigning a score to each attribute.</p> <p>The basic question asked by the company was whether the attributes were useful to make customer behavior predictions. The aim of this study is to assess the reliability and accuracy of these measures and to build a model for future predictions, then, determine the attributes that are most influential</p> <p>in the practitioners’ purchasing decisions. Classification and regression trees (CART) and Partial least squares regression (PLSR) are the two statistical approaches used in this study to build a prediction model for the Dental Implants dataset.</p> <p>The prediction models generated, using both of the techniques, have rel- atively small prediction powers; which may be perceived as an indication of deficiency in the dataset. However, getting a small prediction power is gener- ally expected in market research studies. The research then attempts to find ways to improve the power of these models to get more accurate results. The model generated by CART analysis tends to have better prediction power and is more suitable for future predictions. Although PLSR provides extremely small prediction power, it helps finding out the most important attributes that influence the practitioners’ purchasing decisions. Improvements in pre- diction are sought by restricting the cases in the data to subsets that show better alignment between predictors and customer purchasing behaviour.</p> / Master of Science (MSc) Customer Satisfaction Dental Implants Classification Trees Regression Trees PLSR Market Research Applied Statistics Applied Statistics
8	Ditch detection using refined LiDAR data : A bachelor’s thesis at Jönköping University / Dikesdetektion med hjälp av raffinerad LiDAR-data Flyckt, Jonatan, Andersson, Filip January 2019 (has links) In this thesis, a method for detecting ditches using digital elevation data derived from LiDAR scans was developed in collaboration with the Swedish Forest Agency. The objective was to compare a machine learning based method with a state-of-the-art automated method, and to determine which LiDAR-based features represent the strongest ditch predictors. This was done by using the digital elevation data to develop several new features, which were used as inputs in a random forest machine learning classifier. The output from this classifier was processed to remove noise, before a binarisation process produced the final ditch prediction. Several metrics including Cohen's Kappa index were calculated to evaluate the performance of the method. These metrics were then compared with the metrics from the results of a reproduced state-of-the-art automated method. The confidence interval for the Cohen's Kappa metric for the population was calculated to be [0.567 , 0.645] with a 95 % certainty. Features based on the Impoundment attribute derived from the digital elevation data overall represented the strongest ditch predictors. Our method outperformed the state-of-the-art automated method by a high margin. This thesis proves that it is possible to use AI and machine learning with digital elevation data to detect ditches to a substantial extent. machine learning geographic information systems GIS classification trees supervised learning maskininlärning geografiska informationssystem GIS klassificeringsträd övervakat lärande Computer Systems Datorsystem
9	Comparação de métodos de mapeamento digital de solos através de variáveis geomorfométricas e sistemas de informações geográficas Coelho, Fabrício Fernandes January 2010 (has links) Mapas pedológicos são fontes de informações primordiais para planejamento e manejo de uso do solo, porém apresentam altos custos de produção. A fim de produzir mapas de solos a partir de mapas existentes, o presente trabalho objetiva testar e comparar métodos de classificação em estágio único (regressões logísticas múltiplas multinomiais e Bayes) e em estágios múltiplos (CART, J48 e LMT) com utilização de sistemas de informações geográficas e de variáveis geomorfométricas para produção de mapas pedológicos com legenda original e simplificada. A base de dados foi gerenciada em ambiente ArcGis onde as variáveis e o mapa original foram relacionados através de amostras de treinamento para os algoritmos. O resultado dos algoritmos obtidos no software Weka foram implementados no ArcGis para a confecção dos mapas. Foram gerados matrizes de erros para análise de acurácias dos mapas. As variáveis geomorfométricas de declividade, perfil e plano de curvatura, elevação e índice de umidade topográfica são aquelas que melhor explicam a distribuição espacial das classes de solo. Os métodos de classificação em estágio múltiplo apresentaram sensíveis melhoras nas acurácias globais, porém significativas melhoras nos índices Kappa. A utilização de legenda simplificada aumentou significativamente as acurácias do produtor e do usuário, porém sensível melhora na acurácia global e índice Kappa. / Soil maps are sources of important information for land planning and management, but are expensive to produce. This study proposes testing and comparing single stage classification methods (multiple multinomial logistic regression and Bayes) and multiple stage classification methods (CART, J48 and LMT) using geographic information system and terrain parameters for producing soil maps with both original and simplified legend. In ArcGis environment terrain parameters and original soil map were sampled for training algoritms. The results from statistical software Weka were implemented in ArcGis environment to generate digital soil maps. Error matrices were genereted for analysis accuracies of the maps.The terrain parameters that best explained soil distribution were slope, profile and planar curvature, elevation, and topographic wetness index. The multiple stage classification methods showed small improvements in overall accuracies and large improvements in the Kappa index. Simplification of the original legend significantly increased the producer and user accuracies, however produced small improvements in overall accuracies and Kappa index. Classificacao do solo Geomorfologia Mapeamento digital Sensoriamento remoto Sistema de informação geográfica Digital elevation model Terrain parameters Single stage classification Classification trees
10	Bayesian Logistic Regression Model for Siting Biomass-using Facilities Huang, Xia 01 December 2010 (has links) Key sources of oil for western markets are located in complex geopolitical environments that increase economic and social risk. The amalgamation of economic, environmental, social and national security concerns for petroleum-based economies have created a renewed emphasis on alternative sources of energy which include biomass. The stability of sustainable biomass markets hinges on improved methods to predict and visualize business risk and cost to the supply chain. This thesis develops Bayesian logistic regression models, with comparisons of classical maximum likelihood models, to quantify significant factors that influence the siting of biomass-using facilities and predict potential locations in the 13-state Southeastern United States for three types of biomass-using facilities. Group I combines all biomass-using mills, biorefineries using agricultural residues and wood-using bioenergy/biofuels plants. Group II included pulp and paper mills, and biorefineries that use agricultural and wood residues. Group III included food processing mills and biorefineries that use agricultural and wood residues. The resolution of this research is the 5-digit ZIP Code Tabulation Area (ZCTA), and there are 9,416 ZCTAs in the 13-state Southeastern study region. For both classical and Bayesian approaches, a training set of data was used plus a separate validation (hold out) set of data using a pseudo-random number-generating function in SAS® Enterprise Miner. Four predefined priors are constructed. Bayesian estimation assuming a Gaussian prior distribution provides the highest correct classification rate of 86.40% for Group I; Bayesian methods assuming the non-informative uniform prior has the highest correct classification rate of 95.97% for Group II; and Bayesian methods assuming a Gaussian prior gives the highest correct classification rate of 92.67% for Group III. Given the comparative low sensitivity for Group II and Group III, a hybrid model that integrates classification trees and local Bayesian logistic regression was developed as part of this research to further improve the predictive power. The hybrid model increases the sensitivity of Group II from 58.54% to 64.40%, and improves both of the specificity and sensitivity significantly for Group III from 98.69% to 99.42% and 39.35% to 46.45%, respectively. Twenty-five optimal locations for the biomass-using facility groupings at the 5-digit ZCTA resolution, based upon the best fitted Bayesian logistic regression model and the hybrid model, are predicted and plotted for the 13-state Southeastern study region. biorefineries agricultural residues site location prediction Bayesian logistic regression models Classification Trees Applied Statistics Statistical Methodology Statistical Models

Search results