Return to search

Random forest och glesa datarespresentationer / Random forest using sparse data structures

In silico experimentation is the process of using computational and statistical models to predict medicinal properties in chemicals; as a means of reducing lab work and increasing success rate this process has become an important part of modern drug development. There are various ways of representing molecules - the problem that motivated this paper derives from collecting substructures of the chemical into what is known as fractional representations. Assembling large sets of molecules represented in this way will result in sparse data, where a large portion of the set is null values. This consumes an excessive amount of computer memory which inhibits the size of data sets that can be used when constructing predictive models.In this study, we suggest a set of criteria for evaluation of random forest implementations to be used for in silico predictive modeling on sparse data sets, with regard to computer memory usage, model construction time and predictive accuracy.A novel random forest system was implemented to meet the suggested criteria, and experiments were made to compare our implementation to existing machine learning algorithms to establish our implementation‟s correctness. Experimental results show that our random forest implementation can create accurate prediction models on sparse datasets, with lower memory usage overhead than implementations using a common matrix representation, and in less time than existing random forest implementations evaluated against. We highlight design choices made to accommodate for sparse data structures and data sets in the random forest ensemble technique, and therein present potential improvements to feature selection in sparse data sets. / Program: Systemarkitekturutbildningen

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hb-16672
Date January 2012
CreatorsLinusson, Henrik, Rudenwall, Robin, Olausson, Andreas
PublisherHögskolan i Borås, Institutionen Handels- och IT-högskolan, Högskolan i Borås, Institutionen Handels- och IT-högskolan, Högskolan i Borås, Institutionen Handels- och IT-högskolan, University of Borås/School of Business and IT
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationKandidatuppsats, ; 2012KSAI01

Page generated in 0.0353 seconds