Return to search

In silico modeling for uncertain biochemical data

<p>Analyzing and modeling data is a well established research area and a vast variety of different methods have been developed over the last decades. Most of these methods assume fixed positions of data points; only recently uncertainty in data has caught attention as potentially useful source of information. In order to provide a deeper insight into this subject, this thesis concerns itself with the following essential question: Can information on uncertainty of feature values be exploited to improve in silico modeling? For this reason a state-of-art random forest algorithm is developed using Matlab R. In addition, three techniques of handling uncertain numeric features are presented and incorporated in different modified versions of random forests. To test the hypothesis six realworld data sets were provided by AstraZeneca. The data describe biochemical features of chemical compounds, including the results of an Ames test; a widely used technique to determine the mutagenicity of chemical substances. Each of the datasets contains a single uncertain numeric feature, represented as an expected value and an error estimate. Themodified algorithms are then applied on the six data sets in order to obtain classifiers, able to predict the outcome of an Ames test. The hypothesis is tested using a paired t-test and the results reveal that information on uncertainty can indeed improve the performance of in silico models.</p>

Identiferoai:union.ndltd.org:UPSALLA/oai:DiVA.org:his-3099
Date January 2009
CreatorsGusenleitner, Daniel
PublisherUniversity of Skövde, School of Life Sciences
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, text

Page generated in 0.0018 seconds