Spelling suggestions: "subject:"8upport vector"" "subject:"6upport vector""
101 |
Modelagem da produtividade da cultura da cana de açúcar por meio do uso de técnicas de mineração de dados / Modeling sugarcane yield through Data Mining techniquesRalph Guenther Hammer 27 July 2016 (has links)
O entendimento da hierarquia de importância dos fatores que influenciam a produtividade da cana de açúcar pode auxiliar na sua modelagem, contribuindo assim para a otimização do planejamento agrícola das unidades produtoras do setor, bem como no aprimoramento das estimativas de safra. Os objetivos do presente estudo foram a ordenação das variáveis que condicionam a produtividade da cana de açúcar, de acordo com a sua importância, bem como o desenvolvimento de modelos matemáticos de produtividade da cana de açúcar. Para tanto, foram utilizadas três técnicas de mineração de dados nas análises de bancos de dados de usinas de cana de açúcar no estado de São Paulo. Variáveis meteorológicas e de manejo agrícola foram submetidas às análises por meio das técnicas Random Forest, Boosting e Support Vector Machines, e os modelos resultantes foram testados por meio da comparação com dados independentes, utilizando-se o coeficiente de correlação (r), índice de Willmott (d), índice de confiança de Camargo (C), erro absoluto médio (EAM) e raíz quadrada do erro médio (RMSE). Por fim, comparou-se o desempenho dos modelos gerados com as técnicas de mineração de dados com um modelo agrometeorológico, aplicado para os mesmos bancos de dados. Constatou-se que, das variáveis analisadas, o número de cortes foi o fator mais importante em todas as técnicas de mineração de dados. A comparação entre as produtividades estimadas pelos modelos de mineração de dados e as produtividades observadas resultaram em RMSE variando de 19,70 a 20,03 t ha-1 na abordagem mais geral, que engloba todas as regiões do banco de dados. Com isso, o desempenho preditivo foi superior ao modelo agrometeorológico, aplicado no mesmo banco de dados, que obteve RMSE ≈ 70% maior (≈ 34 t ha-1). / The understanding of the hierarchy of the importance of the factors which influence sugarcane yield can subsidize its modeling, thus contributing to the optimization of agricultural planning and crop yield estimates. The objectives of this study were to ordinate the variables which condition the sugarcane yield, according to their relative importance, as well as the development of mathematical models for predicting sugarcane yield. For this, three Data Mining techniques were applied in the analyses of data bases of several sugar mills in the State of São Paulo, Brazil. Meteorological and crop management variables were analyzed through the Data Mining techniques Random Forest, Boosting and Support Vector Machines, and the resulting models were tested through the comparison with an independent data set, using the coefficient of correlation (r), Willmott index (d), confidence index of Camargo (c), mean absolute error (MAE), and root mean square error (RMSE). Finally, the predictive performances of these models were compared with the performance of an agrometeorological model, applied in the same data set. The results allowed to conclude that, within all the variables, the number of cuts was the most important factor considered by all Data Mining models. The comparison between the observed yields and those estimated by the Data Mining techniques resulted in a RMSE ranging between 19,70 to 20,03 t ha-1, in the general method, which considered all regions of the data base. Thus, the predictive performances of the Data Mining algorithms were superior to that of the agrometeorological model, which presented RMSE ≈ 70% higher (≈ 34 t ha-1).
|
102 |
Employing Multiple Kernel Support Vector Machines for Counterfeit Banknote RecognitionSu, Wen-pin 29 July 2008 (has links)
Finding an efficient method to detect counterfeit banknotes is imperative. In this study, we propose multiple kernel weighted support vector machine for counterfeit banknote recognition. A variation of SVM in optimizing false alarm rate, called FARSVM, is proposed which provide minimized false negative rate and false positive rate. Each banknote is divided into m ¡Ñ n partitions, and each partition comes with its own kernels. The optimal weight with each kernel matrix in the combination is obtained through the semidefinite programming (SDP) learning method. The amount of time and space required by the original SDP is very demanding. We focus on this framework and adopt two strategies to reduce the time and space requirements. The first strategy is to assume the non-negativity of kernel weights, and the second strategy is to set the sum of weights equal to 1. Experimental results show that regions with zero kernel weights are easy to imitate with today¡¦s digital imaging technology, and regions with nonzero kernel weights are difficult to imitate. In addition, these results show that the proposed approach outperforms single kernel SVM and standard SVM with SDP on Taiwanese banknotes.
|
103 |
Support vector classification analysis of resting state functional connectivity fMRICraddock, Richard Cameron 17 November 2009 (has links)
Since its discovery in 1995 resting state functional connectivity derived from functional
MRI data has become a popular neuroimaging method for study psychiatric disorders.
Current methods for analyzing resting state functional connectivity in disease involve
thousands of univariate tests, and the specification of regions of interests to employ in the
analysis. There are several drawbacks to these methods. First the mass univariate tests
employed are insensitive to the information present in distributed networks of functional
connectivity. Second, the null hypothesis testing employed to select functional connectivity
dierences between groups does not evaluate the predictive power of identified functional
connectivities. Third, the specification of regions of interests is confounded by experimentor
bias in terms of which regions should be modeled and experimental error in terms
of the size and location of these regions of interests. The objective of this dissertation is
to improve the methods for functional connectivity analysis using multivariate predictive
modeling, feature selection, and whole brain parcellation.
A method of applying Support vector classification (SVC) to resting state functional
connectivity data was developed in the context of a neuroimaging study of depression.
The interpretability of the obtained classifier was optimized using feature selection techniques
that incorporate reliability information. The problem of selecting regions of interests
for whole brain functional connectivity analysis was addressed by clustering whole brain
functional connectivity data to parcellate the brain into contiguous functionally homogenous
regions. This newly developed famework was applied to derive a classifier capable of
correctly seperating the functional connectivity patterns of patients with depression from
those of healthy controls 90% of the time. The features most relevant to the obtain classifier
match those previously identified in previous studies, but also include several regions not
previously implicated in the functional networks underlying depression.
|
104 |
Αναγνώριση γονιδιακών εκφράσεων νεοπλασιών σε microarrays / Identification of tumor gene expression from microarraysΤσακανίκας, Παναγιώτης 16 May 2007 (has links)
Η ουσιώδης ανάπτυξη που παρουσίασε η μοριακή παθολογία τα τελευταία χρόνια, είναι συνυφασμένη με την ανάπτυξη της microarray τεχνολογίας. Αυτή η τεχνολογία μας παρέχει μια νέα οδό προσπέλασης υψηλής χωρητικότητας τέτοια ώστε: i. να δίνεται η δυνατότητα ανάλυσης μεγάλης κλίμακας της ισοτοπικής αφθονίας του αγγελιοφόρου RNA (mRNA), ως δείκτη γονιδιακών εκφράσεων (cDNA arrays), ii. να ανιχνεύονται πολυμορφισμοί ή μεταλλάξεις μέσα σε έναν πληθυσμό γονιδίων χρησιμοποιώντας ξεχωριστούς nucleotide πολυμορφισμούς (Single Nucleotide Polymorphisms arrays), iii. και για εξέταση «απώλειας» ή «κέρδους», ή αλλαγές στον αριθμό αντιγραφής κάποιου συγκεκριμένου γονιδίου που σχετίζεται με κάποια ασθένεια (CGH arrays). Η τεχνολογία των microarrays είναι ευλογοφανές να εξελιχθεί σε ακρογωνιαίο λίθο της μοριακής έρευνας στα επόμενα χρόνια, και αυτό γιατί DNA microarrays χρησιμοποιούνται για τον ποσοτικό προσδιορισμό δεκάδων χιλιάδων DNA ή RNA ακολουθιών σε μια και μόνο ανάλυση – πείραμα. Από μια σειρά από τέτοια πειράματα, είναι δυνατόν να προσδιορίσουμε τους μηχανισμούς που ελέγχουν την ενεργοποίηση των γονιδίων σε έναν οργανισμό. Ακόμη η χρήση των microarrays για την επισκόπηση γονιδιακών εκφράσεων είναι μια ραγδαία αναπτυσσόμενη τεχνολογία, η οποία μετακινήθηκε από εξειδικευμένα, σε συμβατικά βιολογικά εργαστήρια. Στην παρούσα διπλωματική εργασία κυρίως, θα αναφερθούμε στα δύο γενικά στάδια της ανάλυσης των microarray εικόνων που μας δίνονται ως απόρρεια των πειραμάτων, που έχουν ως στόχο την εξόρυξη πληροφορίας από αυτές. Τα δύο αυτά στάδια είναι: i. Επεξεργασία εικόνας και εξαγωγή πληροφορίας από αυτήν. ii. Ανάλυση της προκύπτουσας πληροφορίας και αναγνώριση των γονιδιακών εκφράσεων. Όσον αφορά το πρώτο στάδιο θα αναφέρουμε τις βασικές μεθόδους που χρησιμοποιούνται σήμερα από εμπορικά και εκπαιδευτικά πακέτα λογισμικού, οι οποίες μέθοδοι δίνουν και τα καλύτερα αποτελέσματα μέχρι στιγμής. Για το δεύτερο στάδιο, αφού αναφέρουμε τις πιο σημαντικές μεθόδους που χρησιμοποιούνται, θα υλοποιήσουμε μια δική μας μέθοδο και θα την συγκρίνουμε με τις υπάρχουσες.
|
105 |
Uma metodologia de projetos para circuitos com reconfiguração dinâmica de hardware aplicada a support vector machines. / A design methodology for circuits with dynamic reconfiguration of hardware applied to support vector machines.José Artur Quilici Gonzalez 07 November 2006 (has links)
Sistemas baseados em processadores de uso geral caracterizam-se pela flexibilidade a mudanças de projeto, porém com desempenho computacional abaixo daqueles baseados em circuitos dedicados otimizados. A implementação de algoritmos em dispositivos reconfiguráveis, conhecidos como Field Programmable Gate Arrays - FPGAs, oferece uma solução de compromisso entre a flexibilidade dos processadores e o desempenho dos circuitos dedicados, pois as FPGAs permitem que seus recursos de hardware sejam configurados por software, com uma granularidade menor que a do processador de uso geral e flexibilidade maior que a dos circuitos dedicados. As versões atuais de FPGAs apresentam um tempo de reconfiguração suficientemente pequeno para viabilizar sua reconfiguração dinâmica, i.e., mesmo com o dispositivo executando um algoritmo, a forma como seus recursos são dispostos pode ser alterada, oferecendo a possibilidade de particionar temporalmente um algoritmo. Novas linhas de FPGAs já são fabricadas com opção de reconfiguração dinâmica parcial, i.e., é possível reconfigurar áreas selecionadas de uma FPGA enquanto o restante continua em operação. No entanto, para que esta nova tecnologia se torne largamente difundida é necessário o desenvolvimento de uma metodologia própria, que ofereça soluções eficazes aos novos desdobramentos do projeto digital. Em particular, uma das principais dificuldades apresentadas por esta abordagem refere-se à maneira de particionar o algoritmo, de forma a minimizar o tempo necessário para completar sua tarefa. Este manuscrito oferece uma metodologia de projeto para dispositivos dinamicamente reconfiguráveis, com ênfase no problema do particionamento temporal de circuitos, tendo como aplicação alvo uma família de algoritmos, utilizados principalmente em Bioinformática, representada pelo classificador binário conhecido como Support Vector Machine. Algumas técnicas de particionamento para FPGA Dinamicamente Reconfigurável, especificamente aplicáveis ao particionamento de FSM, foram desenvolvidas para garantir que um projeto dominado por fluxo de controle seja mapeado numa única FPGA, sem alterar sua funcionalidade. / Systems based on general-purpose processors are characterized by a flexibility to design changes, although with a computational performance below those based on optimized dedicated circuits. The implementation of algorithms in reconfigurable devices, known as Field Programmable Gate Arrays, FPGAs, offers a solution with a trade-off between the processor\'s flexibility and the dedicated circuit\'s performance. With FPGAs it is possible to have their hardware resources configured by software, with a smaller granularity than that of the general-purpose processor and greater flexibility than that of dedicated circuits. Current versions of FPGAs present a reconfiguration time sufficiently small as to make feasible dynamic reconfiguration, i.e., even with the device executing an algorithm, the way its resources are displayed can be modified, offering the possibility of temporal partitioning of an algorithm. New lines of FPGAs are already being manufactured with the option of partial dynamic reconfiguration, i.e. it is possible to reconfigure selected areas of an FPGA anytime, while the remainder area continue in operation. However, in order for this new technology to become widely adopted the development of a proper methodology is necessary, which offers efficient solutions to the new stages of the digital project. In particular, one of the main difficulties presented by this approach is related to the way of partitioning the algorithm, in order to minimize the time necessary to complete its task. This manuscript offers a project methodology for dynamically reconfigurable devices, with an emphasis on the problem of the temporal partitioning of circuits, having as a target application a family of algorithms, used mainly in Bioinformatics, represented by the binary classifier known as Support Machine Vector. Some techniques of functional partitioning for Dynamically Reconfigurable FPGA, specifically applicable to partitioning of FSMs, were developed to guarantee that a control flow dominated design be mapped in only one FPGA, without modifying its functionality.
|
106 |
Vytvoření předpovědi průměrných měsíčních průtoků pro řízení zásobní funkce fiktivní vodní nádrže / Creating predictions average monthly flow for the control of the storage capacity of a fictive reservoir damHrabinová, Barbora January 2018 (has links)
The diploma thesis is focused on predictions of mean monthly flows for a purpose of control of storage functions when thinking differently positions of fictive reservoirs in the catchment area. One of the reservoir is situated in the upper part of the catchment area and the second is situated in the middle part of catchment area. Predictions are made by Support vector machine method in RStudio and with the use of R language. Predicted values of flows was evaluated by the correlation coefficient, coefficient of determination, Root mean square error and than was made the simulation of operation of storage function, which was evaluated by Total sum of squares modificated for problems of water management. In the end was made a comparison of both of the reservoirs for assessment of the suitability of the method.
|
107 |
Investigation of integrated waterlevel sensor solution forsubmersible pumps : A study of how sensors can be combined towithstand build-up materials and improvereliability in harsh environment / Undersökning av integrerad vattennivåsensorlösningför dränkbar pumpAbelin, Sarah January 2017 (has links)
Monitoring water level in harsh environment in order to handle the start and stop function of drainage pumps has been a major issue. Several environmental factors are present, which affect and disturb sensor measurements. Current solutions with mechanical float switches, mounted outside of pumps, wear out, get entangled and account for more than half of all the emergency call outs to pumping stations. Since pumps are frequently moved around, a new sensor solution is needed which can be integrated within the pump house and is able to continuously monitor water level to optimize the operation of the pump and to decrease wear, cost and energy consumption. This thesis presents an investigation how different sensor techniques can be combined to improve reliability for monitoring water level and handle the start and stop function of drainage pumps in harsh environment. The main focus has been to identify suitable water level sensing techniques and to investigate how sensors are affected by build-up materials building up on the pump surface and covering the sensor probes. A support vector machine algorithm is implemented to fuse sensor data in order to increase reliability of the sensor solution in contaminated condition. Results show that a combination of a pressure sensor and a capacitive sensor is the most suitable combination for withstanding build-up materials. For operating conditions when sensors are covered with soft or viscous build-ups, sensors were able to monitor water level through the build-up materials. No solution was found that could satisfactorily monitor water level through solidified build-up materials. / Att övervaka vattennivån i extrema miljöer för att hantera start- och stoppfunktion av dräneringspumpar har varit ett stort problem. Flera påverkande faktorer från pumpomgivningen influerar och stör sensormätningarna. Nuvarande lösningar med mekaniskt rörliga nivåvippor som är monterade utanför pumparna slits ut, trasslar in sig och står för mer än hälften av alla jourutryckningar till pumpstationerna. Eftersom pumpar ofta flyttas runt, behövs en ny sensorlösning som kan integreras i pumpen och som kontinuerligt kan övervaka vattennivån för att optimera pumpdriften och minska slitage, kostnad och energiförbrukning. Den här masteruppsatsen presenterar en undersökning av hur olika givartekniker kan kombineras för att förbättra tillförlitligheten för övervakning av vattennivån och hantera start- och stoppfunktionen av dräneringspumpar i extrema miljöer. Fokus har legat på att identifiera lämpliga givartekniker för att mäta vattennivå och undersöka hur givare påverkas av beläggningar som byggs upp på pumpytan och täcker givarna. En support vector machine algoritm har implementerats för att kombinera givardata i syfte att öka tillförlitligheten hos givarlösningen i kontaminerat skick. Resultaten visar att en kombination av en tryckgivare och en kapacitiv givare är den mest lämpliga kombinationen för att motstå beläggningsmaterial. För driftsförhållanden när givarna är täckta med mjuka beläggningar kunde givarna mäta vattennivån genom beläggningarna. Ingen lösning identifierades som på ett tillfredsställande sätt kunde mäta vattennivå genom stelnade, solida beläggningsmaterial.
|
108 |
Integrative Modeling and Analysis of High-throughput Biological DataChen, Li 21 January 2011 (has links)
Computational biology is an interdisciplinary field that focuses on developing mathematical models and algorithms to interpret biological data so as to understand biological problems. With current high-throughput technology development, different types of biological data can be measured in a large scale, which calls for more sophisticated computational methods to analyze and interpret the data. In this dissertation research work, we propose novel methods to integrate, model and analyze multiple biological data, including microarray gene expression data, protein-DNA interaction data and protein-protein interaction data. These methods will help improve our understanding of biological systems.
First, we propose a knowledge-guided multi-scale independent component analysis (ICA) method for biomarker identification on time course microarray data. Guided by a knowledge gene pool related to a specific disease under study, the method can determine disease relevant biological components from ICA modes and then identify biologically meaningful markers related to the specific disease. We have applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification.
Second, we propose a novel method for transcriptional regulatory network identification by integrating gene expression data and protein-DNA binding data. The approach is built upon a multi-level analysis strategy designed for suppressing false positive predictions. With this strategy, a regulatory module becomes increasingly significant as more relevant gene sets are formed at finer levels. At each level, a two-stage support vector regression (SVR) method is utilized to reduce false positive predictions by integrating binding motif information and gene expression data; a significance analysis procedure is followed to assess the significance of each regulatory module. The resulting performance on simulation data and yeast cell cycle data shows that the multi-level SVR approach outperforms other existing methods in the identification of both regulators and their target genes. We have further applied the proposed method to breast cancer cell line data to identify condition-specific regulatory modules associated with estrogen treatment. Experimental results show that our method can identify biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer.
Third, we propose a bootstrapping Markov Random Filed (MRF)-based method for subnetwork identification on microarray data by incorporating protein-protein interaction data. Methodologically, an MRF-based network score is first derived by considering the dependency among genes to increase the chance of selecting hub genes. A modified simulated annealing search algorithm is then utilized to find the optimal/suboptimal subnetworks with maximal network score. A bootstrapping scheme is finally implemented to generate confident subnetworks. Experimentally, we have compared the proposed method with other existing methods, and the resulting performance on simulation data shows that the bootstrapping MRF-based method outperforms other methods in identifying ground truth subnetwork and hub genes. We have then applied our method to breast cancer data to identify significant subnetworks associated with drug resistance. The identified subnetworks not only show good reproducibility across different data sets, but indicate several pathways and biological functions potentially associated with the development of breast cancer and drug resistance. In addition, we propose to develop network-constrained support vector machines (SVM) for cancer classification and prediction, by taking into account the network structure to construct classification hyperplanes. The simulation study demonstrates the effectiveness of our proposed method. The study on the real microarray data sets shows that our network-constrained SVM, together with the bootstrapping MRF-based subnetwork identification approach, can achieve better classification performance compared with conventional biomarker selection approaches and SVMs.
We believe that the research presented in this dissertation not only provides novel and effective methods to model and analyze different types of biological data, the extensive experiments on several real microarray data sets and results also show the potential to improve the understanding of biological mechanisms related to cancers by generating novel hypotheses for further study. / Ph. D.
|
109 |
Mahalanobis kernel-based support vector data description for detection of large shifts in mean vectorNguyen, Vu 01 January 2015 (has links)
Statistical process control (SPC) applies the science of statistics to various process control in order to provide higher-quality products and better services. The K chart is one among the many important tools that SPC offers. Creation of the K chart is based on Support Vector Data Description (SVDD), a popular data classifier method inspired by Support Vector Machine (SVM). As any methods associated with SVM, SVDD benefits from a wide variety of choices of kernel, which determines the effectiveness of the whole model. Among the most popular choices is the Euclidean distance-based Gaussian kernel, which enables SVDD to obtain a flexible data description, thus enhances its overall predictive capability. This thesis explores an even more robust approach by incorporating the Mahalanobis distance-based kernel (hereinafter referred to as Mahalanobis kernel) to SVDD and compare it with SVDD using the traditional Gaussian kernel. Method's sensitivity is benchmarked by Average Run Lengths obtained from multiple Monte Carlo simulations. Data of such simulations are generated from multivariate normal, multivariate Student's (t), and multivariate gamma populations using R, a popular software environment for statistical computing. One case study is also discussed using a real data set received from Halberg Chronobiology Center. Compared to Gaussian kernel, Mahalanobis kernel makes SVDD and thus the K chart significantly more sensitive to shifts in mean vector, and also in covariance matrix.
|
110 |
"Investigação de estratégias para a geração de máquinas de vetores de suporte multiclasses" / Investigation of strategies for the generation of multiclass support vector machinesLorena, Ana Carolina 16 February 2006 (has links)
Diversos problemas envolvem a classificação de dados em categorias, também denominadas classes. A partir de um conjunto de dados cujas classes são conhecidas, algoritmos de Aprendizado de Máquina (AM) podem ser utilizados na indução de um classificador capaz de predizer a classe de novos dados do mesmo domínio, realizando assim a discriminação desejada. Dentre as diversas técnicas de AM utilizadas em problemas de classificação, as Máquinas de Vetores de Suporte (Support Vector Machines - SVMs) se destacam por sua boa capacidade de generalização. Elas são originalmente concebidas para a solução de problemas com apenas duas classes, também denominados binários. Entretanto, diversos problemas requerem a discriminação dos dados em mais que duas categorias ou classes. Nesta Tese são investigadas e propostas estratégias para a generalização das SVMs para problemas com mais que duas classes, intitulados multiclasses. O foco deste trabalho é em estratégias que decompõem o problema multiclasses original em múltiplos subproblemas binários, cujas saídas são então combinadas na obtenção da classificação final. As estratégias propostas visam investigar a adaptação das decomposições a cada aplicação considerada, a partir de informações do desempenho obtido em sua solução ou extraídas de seus dados. Os algoritmos implementados foram avaliados em conjuntos de dados gerais e em aplicações reais da área de Bioinformática. Os resultados obtidos abrem várias possibilidades de pesquisas futuras. Entre os benefícios verificados tem-se a obtenção de decomposições mais simples, que requerem menos classificadores binários na solução multiclasses. / Several problems involve the classification of data into categories, also called classes. Given a dataset containing data whose classes are known, Machine Learning (ML) algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, thus performing the desired discrimination. Among the several ML techniques applied to classification problems, the Support Vector Machines (SVMs) are known by their high generalization ability. They are originally conceived for the solution of problems with only two classes, also named binary problems. However, several problems require the discrimination of examples into more than two categories or classes. This thesis investigates and proposes strategies for the generalization of SVMs to problems with more than two classes, known as multiclass problems. The focus of this work is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are then combined to obtain the final classification. The proposed strategies aim to investigate the adaptation of the decompositions for each multiclass application considered, using information of the performance obtained for its solution or extracted from its examples. The implemented algorithms were evaluated on general datasets and on real applications from the Bioinformatics domain. The results obtained open possibilities of many future work. Among the benefits observed is the obtainment of simpler decompositions, which require less binary classifiers in the multiclass solution.
|
Page generated in 0.0672 seconds