Global ETD Search

201	Sparse Modeling in Classification, Compression and Detection Chen, Jihong 12 July 2004 (has links) The principal focus of this thesis is the exploration of sparse structures in a variety of statistical modelling problems. While more comprehensive models can be useful to solve a larger number of problems, its calculation may be ill-posed in most practical instances because of the sparsity of informative features in the data. If this sparse structure can be exploited, the models can often be solved very efficiently. The thesis is composed of four projects. Firstly, feature sparsity is incorporated to improve the performance of support vector machines when there are a lot of noise features present. The second project is about an empirical study on how to construct an optimal cascade structure. The third project involves the design of a progressive, rate-distortionoptimized shape coder by combining zero-tree algorithm with beamlet structure. Finally, the longest run statistics is applied for the detection of a filamentary structure in twodimensional rectangular region. The fundamental ideas of the above projects are common — extract an efficient summary from a large amount of data. The main contributions of this work are to develop and implement novel techniques for the efficient solutions of several dicult problems that arise in statistical signal/image processing. Longest run Detection Coding Beamlet Cascade Feature selection Support vector machine Sparsity
202	A Novelty Detection Approach to Seizure Analysis from Intracranial EEG Gardner, Andrew Britton 12 April 2004 (has links) A Novelty Detection Approach to Seizure Analysis from Intracranial EEG Andrew B. Gardner 146 pages Directed by Dr. George Vachtsevanos and Dr. Brian Litt A framework for support vector machine classification of time series events is proposed and applied to analyze physiological signals recorded from epileptic patients. In contrast to previous works, this research formulates seizure analysis as a novelty detection problem which allows seizure detection and prediction to be treated uniformly, in a way that is capable of accommodating multichannel and/or multimodal measurements. Theoretical properties of the support vector machine algorithm employed provide a straightforward means for controlling the false alarm rate of the detector. The resulting novelty detection system was evaluated both offline and online on a corpus of 1077 hours of intracranial electroencephalogram (IEEG) recordings from 12 patients diagnosed with medically resistant temporal lobe epilepsy during evaluation for epilepsy surgery. These patients collectively had 118 seizures during the recording period. The performance of the novelty detection framework was assessed with an emphasis on four key metrics: (1) sensitivity (probability of correct detection), (2) mean detection latency, (3) early-detection fraction (prediction or detection of seizure prior to electrographic onset), and (4) false positive rate. Both the offline and online novelty detectors achieved state-of-the-art seizure detection performance. In particular, the online detector achieved 97.85% sensitivity, -13.3 second latency, and 40% early-detection fraction at an average of 1.74 false positive predictions per hour (Fph). These results demonstrate that a novelty detection approach is not only feasible for seizure analysis, but it improves upon the state-of-the-art as an effective, robust technique. Additionally, an extension of the basic novelty detection framework demonstrated its use as a simple, effective tool for examining the spread of seizure onsets. This may be useful for automatically identifying seizure focus channels in patients with focal epilepsies. It is anticipated that this research will aid in localizing seizure onsets, and provide more efficient algorithms for use in a real device. 1-class svm Novelty detection Seizure analysis Seizure detection Support vector machine
203	Search and Analysis of the Sequence Space of a Protein Using Computational Tools Dubey, Anshul 25 August 2006 (has links) A new approach to the process of Directed Evolution is proposed, which utilizes different machine learning algorithms. Directed Evolution is a process of improving a protein for catalytic purposes by introducing random mutations in its sequence to create variants. Through these mutations, Directed Evolution explores the sequence space, which is defined as all the possible sequences for a given number of amino acids. Each variant sequence is divided into one of two classes, positive or negative, according to their activity or stability. By employing machine learning algorithms for feature selection on the sequence of these variants of the protein, attributes or amino acids in its sequence important for the classification into positive or negative, can be identified. Support Vector Machines (SVMs) were utilized to identify the important individual amino acids for any protein, which have to be preserved to maintain its activity. The results for the case of beta-lactamase show that such residues can be identified with high accuracy while using a small number of variant sequences. Another class of machine learning problems, Boolean Learning, was used to extend this approach to identifying interactions between the different amino acids in a proteins sequence using the variant sequences. It was shown through simulations that such interactions can be identified for any protein with a reasonable number of variant sequences. For experimental verification of this approach, two fluorescent proteins, mRFP and DsRed, were used to generate variants, which were screened for fluorescence. Using Boolean Learning, an interacting pair was identified, which was shown to be important for the fluorescence. It was also shown through experiments and simulations that knowing such pairs can increase the fraction active variants in the library. A Boolean Learning algorithm was also developed for this application, which can learn Boolean functions from data in the presence of classification noise. OCAT Interacting residues of a protein Important residues of a protein Boolean learning Support vector machines Machine learning Directed evolution
204	Hybrid Data Mining and MSVM for Short Term Load Forecasting Yang, Ren-fu 21 June 2010 (has links) The accuracy of load forecast has a significant impact for power companies on executing the plan of power development, reducing operating costs and providing reliable power to the client. Short-term load forecasting is to forecast load demand for the duration of one hour or less. This study presents a new approach to process load forecasting. A Support Vector Machine (SVM) was used for the initial load estimation. Particle Swarm Optimization (PSO) was then adopted to search for optimal parameters for the SVM. In doing the load forecast, training data is the most important factor to affect the calculation time. Using more data for model training should provide a better forecast results, but it needs more computing time and is less efficient. Applications of data mining can provide means to reduce the data requirement and the computing time. The proposed Modified Support Vector Machines approach can be proved to provide a more accurate load forecasting. Data Mining Short Term Load Forecasting Particle Swarm Optimization Support Vector Machine
205	Prediction for the Essential Protein with the Support Vector Machine Yang, Zih-Jie 06 September 2011 (has links) Essential proteins affect the cellular life deeply, but it is hard to identify them. Protein-protein interaction is one of the ways to disclose whether a protein is essential or not. We notice that many researchers use the feature set composed of topology properties from protein-protein interaction to predict the essential proteins. However, the functionality of a protein is also a clue to determine its essentiality. In this thesis, to build SVM models for predicting the essential proteins, our feature set contains the sequence properties which can influence the protein function, topology properties and protein properties. In our experiments, we download Scere20070107, which contains 4873 proteins and 17166 interactions, from DIP database. The ratio of essential proteins to nonessential proteins is nearly 1:4, so it is imbalanced. In the imbalanced dataset, the best values of F-measure, MCC, AIC and BIC of our models are 0.5197, 0.4671, 0.2428 and 0.2543, respectively. We build another balanced dataset with ratio 1:1. For balanced dataset, the best values of F-measure, MCC, AIC and BIC of our models are 0.7742, 0.5484, 0.3603 and 0.3828, respectively. Our results are superior to all previous results with various measurements. bioinformatics essential protein protein-protein interaction support vector machine feature set
206	ICA-clustered Support Vector Regressions in Time Series Stock Price Forecasting Chen, Tse-Cheng 29 August 2012 (has links) Financial time-series forecasting has long been discussed because of its vitality for making informed investment decisions. This kind of problem, however, is intrinsically challenging due to the data dynamics in nature. Most of the research works in the past focus on artificial neural network (ANN)-based approaches. It has been pointed out that such approaches suffer from explanatory power and generalized prediction ability though. The objective of this research is thus to propose a hybrid approach for stock price forecasting. Independent component analysis (ICA) is employed to reveal the latent structure of the observed time-series and remove noise and redundancy in the structure. It further assists clustering analysis. Support vector regression (SVR) models are then applied to enhance the generalization ability with separate models built based on the time-series data of companies in each individual cluster. Two experiments are conducted accordingly. The results show that SVR has robust accuracy performance. More importantly, SVR models with ICA-based clustered data perform better than the single SVR model with all data involved. Our proposed approach does enhance the generalization ability of the forecasting models, which justifies the feasibility of its applications. support vector regression cluster analysis independent component analysis time-series forecasting financial time-series
207	Forward-Selection-Based Feature Selection for Genre Analysis and Recognition of Popular Music Chen, Wei-Yu 09 September 2012 (has links) In this thesis, a popular music genre recognition approach for Japanese popular music using SVM (support vector machine) with forward feature selection is proposed. First, various common acoustic features are extracted from the digital signal of popular music songs, including sub-bands, energy, rhythm, tempo, formants. A set of the most appropriate features for the genre identification is then selected by the proposed forward feature selection technique. Experiments conducted on the database consisting of 296 Japanese popular music songs demonstrate that the accuracy of recognition the proposed algorithm can achieve approximately 78.81% and the accuracy is stable when the number of testing music songs is increased. RBF (radial basis function) SVM (support vector machine) forward selection Genre recognition feature selection
208	Software and Hardware Designs of a Vehicle Detection System Based on Single Camera Image Sequence Yeh, Kuan-Fu 10 September 2012 (has links) In this thesis, we present a vehicle detection and tracking system based on image processing and pattern recognition of single camera image sequences. Both software design and hardware implementation are considered. In the hypothesis generation (HG) step and the hypothesis verification (HV) step, we use the shadow detection technique combined with the proposed constrained vehicle width/distance ratio to eliminate unreasonable hypotheses. Furthermore, we use SVM classifier, a popular machine learning technique, to verify the generated hypothesis more precisely. In the vehicle tracking step, we limit vehicle tracking duration and periodic vehicle detection mechanisms. These tracking methods alleviate our driver-assistant system from executing complex operations of vehicle detection repeatedly and thus increase system performance without sacrificing too much in case of tracking wrong objects. Based on the the profiling of the software execution time, we implement by hardware the most critical part, the preprocessing of intensity conversion and edge detection. The complete software/hardware embedded system is realized in a FPGA prototype board, so that performance of whole system could achieve real-time processing without too much hardware cost. Machine Learning Vehicle Detection Pattern Recognition Image Processing Driving Assistance System Support Vector Machine (SVM)
209	The Disulfide Connectivity Prediction with Support Vector Machine and Behavior Knowledge Space Chen, Hong-Yu 12 September 2012 (has links) The disulfide bond in a protein is a single covalent bond formed from the oxidation of two cysteines. It plays an important role in the folding and structure stability, and may regulate protein functions. The connectivity prediction problem is difficult because the number of possible patterns grows rapidly with respect to the number of cysteines. We discover some rules to discriminate the patterns with high accuracy in many methods. We implement multiple SVM methods, and utilize the BKS to fuse these classifiers. We apply the hybrid method to SP39 dataset with 4-fold cross-validation for the comparison with the previous works. We raise the accuracy to 71.5%, which improves significantly that of the best previous work, 65.9%. behavior knowledge space support vector machine connectivity pattern disulfide bond cysteine
210	Detecting Near-Duplicate Documents using Sentence-Level Features and Machine Learning Liao, Ting-Yi 23 October 2012 (has links) From the large scale of documents effective to find the near-duplicate document, has been a very important issue. In this paper, we propose a new method to detect near-duplicate document from the large scale dataset, our method is divided into three parts, feature selection, similarity measure and discriminant derivation. In feature selection, document will be detected after preprocessed. Documents have to remove signals, stop words ... and so on. We measure the value of the term weight in the sentence, and then choose the terms which have higher weight in the sentence. These terms collected as a feature of the document. The document¡¦s feature set collected by these features. Similarity measure is based on similarity function to measure the similarity value between two feature sets. Discriminant derivation is based on support vector machine which train a classifiers to identify whether a document is a near-duplicate or not. support vector machine is a supervised learning strategy. It trains a classifier by the training patterns. In the characteristics of documents, the sentence-level features are more effective than terms-level features. Besides, learning a discriminant by SVM can avoid trial-and-error efforts required in conventional methods. Trial-and-error is going to find a threshold, a discriminant value to define document¡¦s relation. In the final analysis of experiment, our method is effective in near-duplicate document detection than other methods. Near-duplicate threshold trial-and-error support vector machine feature selection stop words similarity function

Search results