Global ETD Search

311	Interpreting Random Forest Classification Models Using a Feature Contribution Method Palczewska, Anna Maria, Palczewski, J., Marchese-Robinson, R.M., Neagu, Daniel 18 February 2014 (has links) No / Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is relatively easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance . For “black box” models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. By analysing feature contributions for a training dataset, the most significant variables can be determined and their typical contribution towards predictions made for individual classes, i.e., class-specific feature contribution “patterns”, are discovered. These patterns represent a standard behaviour of the model and allow for an additional assessment of the model reliability for new data. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models. Random forest Classification Variable importance Feature contribution Cluster analysis
312	Algorithms for the degree-constrained minimum spanning tree and the hierarchical clustering problems using the nearest-neighbor techniques Mao, Li Jen 01 January 1999 (has links) No description available. Cluster analysis Parallel algorithms Computer Sciences Physical Sciences and Mathematics
313	Normal Mixture Models for Gene Cluster Identification in Two Dimensional Microarray Data Harvey, Eric Scott 01 January 2003 (has links) This dissertation focuses on methodology specific to microarray data analyses that organize the data in preliminary steps and proposes a cluster analysis method which improves the interpretability of the cluster results. Cluster analysis of microarray data allows samples with similar gene expression values to be discovered and may serve as a useful diagnostic tool. Since microarray data is inherently noisy, data preprocessing steps including smoothing and filtering are discussed. Comparing the results of different clustering methods is complicated by the arbitrariness of the cluster labels. Methods for re-labeling clusters to assess the agreement between the results of different clustering techniques are proposed. Microarray data involve large numbers of observations and generally present as arrays of light intensity values reflecting the degree of activity of the genes. These measurements are often two dimensional in nature since each is associated with an individual sample (cell line) and gene. The usual hierarchical clustering techniques do not easily adapt to this type of problem. These techniques allow only one dimension of the data to be clustered at a time and lose information due to the collapsing of the data in the opposite dimension. A novel clustering technique based on normal mixture distribution models is developed. This method clusters observations that arise from the same normal distribution and allows the data to be simultaneously clustered in two dimensions. The model is fitted using the Expectation/Maximization (EM) algorithm. For every cluster, the posterior probability that an observation belongs to that cluster is calculated. These probabilities allow the analyst to control the cluster assignments, including the use of overlapping clusters. A user friendly program, 2-DCluster, was written to support these methods. This program was written for Microsoft Windows 2000 and XP systems and supports one and two dimensional clustering. The program and sample applications are available at http://etd.vcu.edu. An electronic copy of this dissertation is available at the same address. cluster analysis microarrays biostatistics two dimensional cluster analysis em algorithm mixture models Biostatistics Physical Sciences and Mathematics Statistics and Probability
314	A novel framework for binning environmental genomic fragments Yang, Bin, 杨彬 January 2010 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Genomics - Data processing. Genomes - Data processing. Microbial ecology - Data processing. Cluster analysis - Data processing. Cluster analysis - Computer programs.
315	Hodnocení výsledků metod shlukové analýzy / Evaluation of Cluster Analysis Methods Löster, Tomáš January 2004 (has links) Cluster analysis includes a range of methods and practices that are used primarily for classification of objects. It takes an important role in many areas. Since the resulting distribution of objects into clusters may vary depending on the selected methods and specifications, it is appropriate to assess the results obtained. This paper proposes new ways of evaluating these results in a situation where objects are characterized by qualitative variables or by variables of different types. These coefficients can be used either to compare different methods (in terms of better outcomes) or for finding of the optimal number of clusters. All of them are based on the detection of variability which is also used for measuring of dissimilarity of objects and clusters. The newly proposed evaluation methods are applied to real data sets (of different sizes, with different number of variables, including variables of different types) and the behavior of these coefficients in different conditions is being examined. These data sets have known as well as unknown classification of objects into clusters. The best coefficient for evaluating clustering results with different types of variables can be considered, based on the analysis carried out, the modified coefficient of CHF. Local maximum value according to which the results of the clustering are evaluated, almost always exists. The analysis has proven that in most cases this value meets the expected results of the well-known classification of objects into clusters. The existence of local extremes of the other coefficients depends on specific data sets and is not always feasible.
316	Automatic summarization of mouse gene information for microarray analysis by functional gene clustering and ranking of sentences in MEDLINE abstracts : a dissertation Yang, Jianji 06 1900 (has links) (PDF) Ph.D. / Medical Informatics and Clinical Epidemiology / Tools to automatically summarize gene information from the literature have the potential to help genomics researchers better interpret gene expression data and investigate biological pathways. Even though several useful human-curated databases of information about genes already exist, these have significant limitations. First, their construction requires intensive human labor. Second, curation of genes lags behind the rapid publication rate of new research and discoveries. Finally, most of the curated knowledge is limited to information on single genes. As such, most original and up-to-date knowledge on genes can only be found in the immense amount of unstructured, free text biomedical literature. Genomic researchers frequently encounter the task of finding information on sets of differentially expressed genes from the results of common highthroughput technologies like microarray experiments. However, finding information on a set of genes by manually searching and scanning the literature is a time-consuming and daunting task for scientists. For example, PubMed, the first choice of literature research for biologists, usually returns hundreds of references for a search on a single gene in reverse chronological order. Therefore, a tool to summarize the available textual information on genes could be a valuable tool for scientists. In this study, we adapted automatic summarization technologies to the biomedical domain to build a query-based, task-specific automatic summarizer of information on mouse genes studied in microarray experiments - mouse Gene Information Clustering and Summarization System (GICSS). GICSS first clusters a set of differentially expressed genes by Medical Subject Heading (MeSH), Gene Ontology (GO), and free text features into functionally similar groups;next it presents summaries for each gene as ranked sentences extracted from MEDLINE abstracts, with the ranking emphasizing the relation between genes, similarity to the function cluster it belongs to, and recency. GICSS is available as a web application with links to the PubMed (www.pubmed.gov) website for each extracted sentence. It integrates two related steps, functional gene clustering and gene information gathering, of the microarray data analysis process. The information from the clustering step was used to construct the context for summarization. The evaluation of the system was conducted with scientists who were analyzing their real microarray datasets. The evaluation results showed that GICSS can provide meaningful clusters for real users in the genomic research area. In addition, the results also indicated that presenting sentences in the abstract can provide more important information to the user than just showing the title in the default PubMed format. Both domain-specific and non-domain-specific terminologies contributed in the informative sentences selection. Summarization may serve as a useful tool to help scientists to access information at the time of microarray data analysis. Further research includes setting up the automatic update of MEDLINE records; extending and fine-tuning of the feature parameters for sentence scoring using the available evaluation data; and expanding GICSS to incorporate textual information from other species. Finally, dissemination and integration of GICSS into the current workflow of the microarray analysis process will help to make GICSS a truly useful tool for the targeted users, biomedical genomics researchers.
317	Análise da situação financeira da Cooperativa Agroindustrial Lar em relação a 31 cooperativas agropecuárias do estado do Paraná: uma análise aplicando um modelo de previsão de insolvência / Analysis of the financial situation of the Lar Agro-Industrial Cooperative compared to 31 agricultural cooperatives from the state of Paraná: an analysis applying an insolvency prevision model Vieira, Daliana Carla 09 March 2007 (has links) Made available in DSpace on 2017-07-10T18:33:13Z (GMT). No. of bitstreams: 1 Daliana Carla Vieira.pdf: 497262 bytes, checksum: 8dd08e3923131b5ee349d12cb30f938c (MD5) Previous issue date: 2007-03-09 / The aim of this dissertation was to accomplish an analysis of the financial situation of the Lar Agro-Industrial Cooperative compared to 31 cooperatives from the State of Paraná, through an insolvency prevision model, from the year 2000 to 2004. Thus, some financial indexes of the cooperatives studied were examined in accordance to the standard indexes. The Gimenes and Opazo (2001) insolvency prevision model was used to verify the financial solvency or insolvency situation of the cooperatives. Then the similarity among the group of agricultural cooperatives was analyzed through Cluster Analysis. The results lead to a general conclusion that, in the period analyzed, Lar Agro-Industrial Cooperative presented a highly satisfactory performance, a particular characteristic of this cooperative, due to the fact that high similarities between Lar cooperative and other cooperatives were not verified / O objetivo deste trabalho foi realizar uma análise da situação financeira da Cooperativa Agroindustrial Lar em relação a 31 cooperativas do Estado do Paraná, através de um modelo de previsão de insolvência, no período de 2000 a 2004. Para tanto, examinaram-se alguns indicadores financeiros das cooperativas em estudo a partir da classificação segundo os índices-padrão. Para verificar a situação de solvência ou insolvência financeira das cooperativas foi utilizado o modelo de previsão de insolvência de Gimenes e Opazo (2001). Finalmente, foi analisada a similaridade entre o conjunto das cooperativas agropecuárias por meio da análise multivariada de agrupamento ou Cluster Analysis. Os resultados obtidos permitem concluir, de uma forma geral, que, no período analisado, a Cooperativa Agroindustrial Lar apresentou um desempenho financeiro bastante satisfatório, característica muito particular desta cooperativa, já que não se verificou alta similaridade entre a cooperativa Lar e as demais cooperativas Cooperativas agropecuárias Previsão de insolvência Índices-padrão Cluster analysis Agricultural cooperatives Insolvency prevision Standard indexes Cluster analysis
318	Nástroj pro shlukovou analýzu / Cluster Analysis Tool Hezoučký, Ladislav January 2010 (has links) The master' s thesis deals with cluster data analysis. There are explained basic concepts and methods from this domain. Result of the thesis is Cluster analysis tool, in which are implemented methods K-Medoids and DBSCAN. Adjusted results on real data are compared with programs Rapid Miner and SAS Enterprise Miner.
319	Socio-cultural impacts of museums for their local communities : the case of the Royal Albert Memorial Museum, Exeter Hutchison, Fiona Catherine January 2013 (has links) In the English museums sector, an impetus for impact assessment stems from an internal ethos towards producing positive impacts for the public. Furthermore, as institutions largely dependent on national and local government funding, museums have increasingly been called to demonstrate their impacts to policy makers. Economic impact and valuation procedures are employed to help meet these demands. However, consideration of non-economic impacts has not kept pace. Reasons include the contested priorities in the sector, a fluctuating policy landscape and too exclusive a focus on theoretical debates rather than empirical research. Indeed, a great deal of attention and time has already been allocated to impact assessment with little accumulation of evidence at a museum-specific or national level. Accordingly, this research set out to reveal a detailed understanding of socio-cultural impacts of museums for their local communities. A thorough meta-synthesis of nineteen academic and non-academic sources, revealed the limitations of previous studies. These limitations relate to sampling, method choice, sophistication of analysis and transparency in reporting. Often, only potential impacts have amounted. The Royal Albert Memorial Museum (RAMM), in the southwest city of Exeter offered a suitable research site for this large-scale study. Drop and Collect administered household surveys ensured the elicitation of views from residents across the city. A range of statistical analysis techniques were applied to cross-sectional samples (n=435, n=384). The main contribution of this research is to demonstrate a replicable approach to eliciting views from the public regarding the impacts of their local museum. Future evaluation can follow this model which is neither focused upon economic impacts, nor arrives at a monetised valuation. Cluster Analysis proves a preferable way of grouping the public rather than traditional segmentations pertaining to socio-demographic or behavioural characteristics. Furthermore, socio-cultural impacts are effectively assessed, monitored and prioritised through Gap Analysis. Factor Analysis reveals latent constructs of Personal-fulfilment, Objects and their Surrounding Narratives, Self-actualisation, Learning and Networked Leisure drive these impacts. Therefore, this research meets the museum management challenge of finding a suitable design for assessment of impacts in relation to different communities. 069
320	Spatial clustering and the development of small businesses in Khayelitsha Mans, Gerbrand 04 1900 (has links) Thesis (MBA)--Stellenbosch University, 2015. / ENGLISH ABSTRACT: Khayelitsha was developed as a dormitory town on the outskirts of Cape Town in the late 1980’s with little intention by the government of the time to actively stimulate local economic development within the area. Since 1994 one of the biggest South African challenges is to ensure that dormitory townships, like Khayelitsha, are developed appropriately to create jobs and to allow for the evolution of quality living environments. Many types of government investment initiatives came to life in the past 20 years, complemented by initiatives to draw in private sector investment in these areas. Nevertheless, the economic development discrepancy between Khayelitsha and other areas in Cape Town remains stark. This study shows that to date development initiatives did not focus enough on the stimulation and development of local entrepreneurial enterprises. Clustering of these enterprises occurs around key areas, like shopping centres, which act as a catalytic factor for other support initiatives aimed at SMME development. The study identifies key areas of local small and micro-businesses clustering in Khayelitsha and evaluates the underlying growth factors. It then presents key suggestions regarding policy interventions to support local entrepreneurial development. These suggestions were two pronged. Spatial interventions focused on recommendations regarding development nodes, activity routes and alternative zoning practices. General business support initiatives relates to access to finance, education and training, mentoring, business incubators and business networks. In general the study highlights the importance of public-private partnerships in small business support. Small business development Cluster analysis Entrepreneurship -- Khayelitsha Small business -- Growth -- Khayelitsha UCTD

Search results