The development, implementation, and performance evaluation of new techniques for the
location of hot spots in proteins and exons in DNA using digital filters are presented.
The application of bandpass notch (BPN) digital filters for locating hot spots in proteins
is first investigated. A technique is proposed for designing the appropriate BPN filter for a
specific protein sequence in which the area under the amplitude response is minimized to
achieve maximum selectivity for a chosen stability margin. The minimization is performed
using the golden-section search. A tuning technique is also proposed for improving the
accuracy of the BPN filter. The tuning is carried out using a least-squares polynomial
model. Several example protein sequences are used to illustrate these techniques.
BPN filters are then employed for locating exons in DNA. An additional step of lowpass
filtering is introduced in order to detect the strength of the bandpass filtered signal as a
function of nucleotide location. For the character-to-numerical mapping, the application
of the electron-ion interaction potentials (EIIPs) of the nucleotides as well as their binary
sequences is investigated.
The performance of the techniques is then evaluated using metrics such as sensitivity,
specificity, accuracy, precision, and computational efficiency. These metrics are used in
conjunction with the so-called receiver operating characteristic (ROC) technique to establish
a reliable framework for the comparisons. For exon location, a technique based on the
short-time discrete Fourier transform (STDFT) reported in the literature is also included in
the comparison. The effect of using different window functions on the prediction accuracy
of the technique is explored. Using a set of examples, it is shown that BPN filters predict
short exons with better accuracy than the STDFT. The test dataset comprised 66 protein
sequences and 160 DNA sequences obtained from the protein data bank and the HMR195
database, respectively. Results show that among the techniques considered, BPN filters
perform best for the location of both protein hot spots and DNA exons in terms of accuracy
and computational efficiency. User-friendly MATLAB implementations of the techniques
incorporating graphical interfaces are also described.
Optimized numerical mapping schemes are proposed for exon location using both EIIP
as well as binary sequences. Characteristic numerical values are obtained for the four nucleotides
using a training procedure in which the prediction accuracy is maximized using
a quasi-Newton algorithm based on the Broyden-Fletcher-Goldfarb-Shanno updating formula.
A training set of 80 DNA sequences is chosen from the HMR195 database and the
objective function is formulated using the ROC technique. The procedure is initialized using
EIIP values. Unbiased testing of the optimized values is carried out using a test set
that has no overlap with the training set. Simulation results show that the optimized values
yield more accurate exon locations than those obtained using the actual EIIP values. In
addition, they perform significantly better than a set of existing optimized complex values.
By employing a similar strategy to optimize the weights of the binary sequences, it
is shown that, in practice, only three out of four binary sequences are necessary to obtain
accurate estimates of exon locations. Consequently, a computational saving of 25% can be
achieved, which is substantial considering that DNA sequences encountered in practice are
very long in nature. / Graduate
Identifer | oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/3324 |
Date | 30 May 2011 |
Creators | Ramachandran, Parameswaran |
Contributors | Antoniou, Andreas, Lu, Wu-Sheng |
Source Sets | University of Victoria |
Language | English, English |
Detected Language | English |
Type | Thesis |
Rights | Available to the World Wide Web |
Page generated in 0.0017 seconds