Global ETD Search

21	Application of MapReduce to Ranking SVM for Large-Scale Datasets Hu, Su-Hsien 10 August 2010 (has links) Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranking support vector machine (SVM) attracts a lot of attention in the information retrieval community. One difficulty with Ranking SVM is that the computation cost is very high for constructing a ranking model due to the huge number of training data pairs when the size of training dataset is large. We adopt the MapReduce programming model to solve this difficulty. MapReduce is a distributed computing framework introduced by Google and is commonly adopted in cloud computing centers. It can deal easily with large-scale datasets using a large number of computers. Moreover, it hides the messy details of parallelization, fault-tolerance, data distribution, and load balancing from the programmer and allows him/her to focus on only the underlying problem to be solved. In this paper, we apply MapReduce to Ranking SVM for processing large-scale datasets. We specify the Map function to solve the dual sub problems involved in Ranking SVM and the Reduce function to aggregate all the outputs having the same intermediate key from Map functions of distributed machines. Experimental results show efficiency improvement on ranking SVM by our proposed approach. search engines cloud computing MapReduce Ranking SVM
22	Disulfide Bond Prediction with Hybrid Models Wang, Chong-Jie 06 September 2011 (has links) Disulfide bonds are special covalent cross links between two cysteines in a protein. This kind of bonding state plays an important role in protein folding and stabilization. For connectivity pattern prediction, it is a very difficult problem because of the fast growth of possible patterns with respect to the number of cysteines. In this thesis, we propose a new approach to address this problem. The method is based on hybrid models with SVM. Via this strategy, we can improve the prediction accuracies by selecting appropriate models. In order to evaluate the performance of our method, we apply the method by 4-fold cross-validation on SP39 dataset, which contains 446 proteins. We achieve accuracies with 70.8% and 65.9% for pair-wise and pattern-wise prediction respectively, which is better than the previous works. prediction SVM disulfide bond cysteine hybrid model
23	All-atom Backbone Prediction with Improved Tool Preference Classification Chen, Kai-Yu 07 September 2011 (has links) The all-atom protein backbone reconstruction problem (PBRP) is to reconstruct the 3D coordinates of all atoms, including N, C, and O atoms on the backbone, for a protein whose primary sequence and £\-carbon coordinates are given. A variety of methods for solving PBRP have been proposed, such as Adcock¡¦s method, SABBAC, BBQ, Chang¡¦s and Yen¡¦s methods. In a recent work, Yen et al. found that the results of Chang¡¦s method are not always better than SABBAC. So they apply a tool preference classification to determine which tool is more suitable for predicting the structure of the given protein. In this thesis, we involve BBQ (Backbone Building from Quadrilaterals) and Chang¡¦s method as our candidate prediction tools. In addition, the tool preferences of different atoms (N, C, O) are determined separately. We call the preference classification as an atom classifier, which is built by support vector machine (SVM). According to the preference classification of each atom classifier, a proper prediction tool, either BBQ or Chang¡¦s method, is used to construct the atom of the target protein. Thus, the combination of all atom results, the backbone structure of a protein is reconstructed. The datasets of our experiments are extracted from CASP7, CASP8, and CASP9, which consists of 30, 24, and 55 proteins, respectively. The proteins of the datasets contain only standard amino acids. We improve the average RMSDs of Yen¡¦s results from 0.4019 to 0.3682 in CASP7, from 0.4543 to 0.4202 in CASP8, and from 0.4155 to 0.3601 in CASP9. protein 3D coordinate prediction backbone SVM
24	he Prediction of the Department Score of the College Entrance Examination in Taiwan Chen, Yun-Shiuan 11 September 2012 (has links) Prediction systems for College Entrance Examination (CEE) are popular during the graduating season, July every year in Taiwan. These systems give students suggestion according to their examination scores. There are several CEE prediction systems in Taiwan, but most of them are not constructed with rigorous theories. In 2005, Zen et al. constructed a prediction model using statistical method, which was later verified and improved by Lin in 2008. In this thesis, we will introduce the recording mechanism of the College Entrance Examination, and explain how to construct a prediction system under this mechanism. Also, we will compare the previous system with ours. We apply an empirical method and SVR as our first two approaches, and then we propose a new method. In our experiments, we consider the scores published by CEE center from 2004 to 2008. We use the root mean square error (RMSE) value to evaluate the performance of our present method. We also use the value generated by our method to show some information of the schools and the departments. SVM SVR College Entrance Examination regression prediction
25	Backdoor Detection based on SVM Tzeng, Zhong-Chiang 29 July 2005 (has links) With the improvement of computer technologies and the wide use of the Internet, network security becomes more and more significant. According to the relevant statistics, malicious codes such as virus, worms, backdoors, and Trojans launch a lot of attacks. Backdoors are especially critical. Not only can it cross firewalls and antivirus software but also will steal confidential information and misuse network resources and launch attacks such as DDoS¡]Distributed Denial of Service¡^. In this research, we analyze the properties and categories of backdoors and the application of data mining and support vector machines in intrusion detection. This research will focus on detecting the behavior of backdoor connection, and we propose a detecting architecture. The architecture is based on SVM, which is a machine learning method based on statistic theory and proposed by Vapnik to solve the problems in Neural Network techniques. In system modules, this research chooses IPAudit as our network monitor and libsvm as a SVM classifier. The packets captured by IPAudit will be classified into interactive or non-interactive flow by libsvm, and the result will be compared with legal service lists to determine whether a connection is a backdoor connection. We compare the accuracy of SVM, C4.5, and Na Intrusion Detection Backdoor Data Mining and Classification SVM
26	An Efficient Algorithm for Determining Protein Structure Similarity Lo, Yu-chieh 27 August 2006 (has links) Protein is a fundamental material of life. There are many kinds of proteins in the body. If one of them malfunctions, it will cause physical problems. Therefore, many scientists try to analyze the functions of proteins. It is believed that the protein structure determines its function. The more similar the structures are, the more similar their functions are. Therefore, the prediction and comparison of protein structures are important topics in bioinformatics. Typically, distance RMSD (Root Mean Square Deviation) is a method used by most scientists to measure the distance between two structures. In this thesis, we propose a new algorithm to compare two protein structures, which is based on the comparison of curves in the space. To test and verify our method, we randomly choose some families in the CATH database and try to identify them. Experimental results show that our method outperforms RMSD. Furthermore, we also use the SVM (Support Vector Machine) tool to help us to obtain the better classification. Protein Structure SVM RMSD B-Spline
27	Enhancement of Incremental Learning Algorithm for Support Vector Machines Using Fuzzy Set Theory Chuang, Yu-Ming 03 February 2009 (has links) Over the past few years, a considerable number of studies have been made on Support Vector Machines (SVMs) in many domains to improve classification or prediction. However, SVMs request high computational time and memory when the datasets are large. Although incremental learning techniques are viewed as one possible solution developed to reduce the computation complexity of the scalability problem, few studies have considered that some examples close to the decision hyperplane other than support vectors (SVs) might contribute to the learning process. Consequently, we propose three novel algorithms, named Mixed Incremental learning (MIL), Half-Mixed Incremental learning (HMIL), and Partition Incremental learning (PIL), by improving Syed¡¦s incremental learning method based on fuzzy set theory. We expect to achieve better accuracy than other methods. In the experiments, the proposed algorithms are investigated on five standard machine learning benchmark datasets to demonstrate the effectiveness of the method. Experimental results show that HIL have superior classification accuracy than the other incremental or active learning algorithms. Especially, for the datasets that might have high accuracy in other research reports, HMIL and PIL could even improve the performance. Fuzzy Set Theory Classification SVM Incremental Learning
28	Evaluation of Random Forests for Detection and Localization of Cattle Eyes Sandsveden, Daniel January 2015 (has links) In a time when cattle herds grow continually larger the need for automatic methods to detect diseases is ever increasing. One possible method to discover diseases is to use thermal images and automatic head and eye detectors. In this thesis an eye detector and a head detector is implemented using the Random Forests classifier. During the implementation the classifier is evaluated using three different descriptors: Histogram of Oriented Gradients, Local Binary Patterns, and a descriptor based on pixel differences. An alternative classifier, the Support Vector Machine, is also evaluated for comparison against Random Forests. The thesis results show that Histogram of Oriented Gradients performs well as a description of cattle heads, while Local Binary Patterns performs well as a description of cattle eyes. The provided descriptor performs almost equally well in both cases. The results also show that Random Forests performs approximately as good as the Support Vector Machine, when the Support Vector Machine is paired with Local Binary Patterns for both heads and eyes. Finally the thesis results indicate that it is easier to detect and locate cattle heads than it is to detect and locate cattle eyes. For eyes, combining a head detector and an eye detector is shown to give a better result than only using an eye detector. In this combination heads are first detected in images, followed by using the eye detector in areas classified as heads. Random Forests HOG LBP SVM Descriptor Classifier
29	Informative correlation extraction from and for Forex market analysis Lei, Song January 2010 (has links) The forex market is a complex, evolving, and a non-linear dynamical system, and its forecast is difficult due to high data intensity, noise/outliers, unstructured data and high degree of uncertainty. However, the exchange rate of a currency is often found surprisingly similar to the history or the variation of an alternative currency, which implies that correlation knowledge is valuable for forex market trend analysis. In this research, we propose a computational correlation analysis for the intelligent correlation extraction from all available economic data. The proposed correlation is a synthesis of channel and weighted Pearson's correlation, where the channel correlation traces the trend similarity of time series, and the weighted Pearson's correlation filters noise in correlation extraction. In the forex market analysis, we consider 3 particular aspects of correlation knowledge: (1) historical correlation, correlation to previous market data; (2) cross-currency correlation, correlation to relevant currencies, and (3) macro correlation, correlation to macroeconomic variables. While evaluating the validity of extracted correlation knowledge, we conduct a comparison of Support Vector Regression (SVR) against the correlation aided SVR (cSVR) for forex time series prediction, where correlation in addition to the observed forex time series data is used for the training of SVR. The experiments are carried out on 5 futures contracts (NZD/AUD, NZD/EUD, NZD/GBP, NZD/JPY and NZD/USD) within the period from January 2007 to December 2008. The comparison results show that the proposed correlation is computationally significant for forex market analysis in that the cSVR is performing consistently better than purely SVR on all 5 contracts exchange rate prediction, in terms of error functions MSE, RMSE, NMSE, MAE and MAPE. However, the cSVR prediction is found occasionally differing significantly from the actual price, which suggests that despite the significance of the proposed correlation, how to use correlation knowledge for market trend analysis remains a very challenging difficulty that prevents in practice further understanding of the forex market. In addition, the selection of macroeconomic factors and the determination of time period for analysis are two computationally essential points worth addressing further for future forex market correlation analysis. Correlation extraction Forex market analysis SVM regression
30	Informative correlation extraction from and for Forex market analysis Lei, Song January 2010 (has links) The forex market is a complex, evolving, and a non-linear dynamical system, and its forecast is difficult due to high data intensity, noise/outliers, unstructured data and high degree of uncertainty. However, the exchange rate of a currency is often found surprisingly similar to the history or the variation of an alternative currency, which implies that correlation knowledge is valuable for forex market trend analysis. In this research, we propose a computational correlation analysis for the intelligent correlation extraction from all available economic data. The proposed correlation is a synthesis of channel and weighted Pearson's correlation, where the channel correlation traces the trend similarity of time series, and the weighted Pearson's correlation filters noise in correlation extraction. In the forex market analysis, we consider 3 particular aspects of correlation knowledge: (1) historical correlation, correlation to previous market data; (2) cross-currency correlation, correlation to relevant currencies, and (3) macro correlation, correlation to macroeconomic variables. While evaluating the validity of extracted correlation knowledge, we conduct a comparison of Support Vector Regression (SVR) against the correlation aided SVR (cSVR) for forex time series prediction, where correlation in addition to the observed forex time series data is used for the training of SVR. The experiments are carried out on 5 futures contracts (NZD/AUD, NZD/EUD, NZD/GBP, NZD/JPY and NZD/USD) within the period from January 2007 to December 2008. The comparison results show that the proposed correlation is computationally significant for forex market analysis in that the cSVR is performing consistently better than purely SVR on all 5 contracts exchange rate prediction, in terms of error functions MSE, RMSE, NMSE, MAE and MAPE. However, the cSVR prediction is found occasionally differing significantly from the actual price, which suggests that despite the significance of the proposed correlation, how to use correlation knowledge for market trend analysis remains a very challenging difficulty that prevents in practice further understanding of the forex market. In addition, the selection of macroeconomic factors and the determination of time period for analysis are two computationally essential points worth addressing further for future forex market correlation analysis. Correlation extraction Forex market analysis SVM regression

Search results