51 |
Aplicação de máquina de vetores de suporte na investigação da atividade gênica do câncer de colo de intestinoVieira, Sylvio André Garcia 30 March 2011 (has links)
Submitted by MARCIA ROVADOSCHI (marciar@unifra.br) on 2018-08-16T13:41:53Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Dissertacao_SylvioAndreGarciaVieira.pdf: 1371899 bytes, checksum: 6884c5455ed76729974e03777f962948 (MD5) / Made available in DSpace on 2018-08-16T13:41:53Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Dissertacao_SylvioAndreGarciaVieira.pdf: 1371899 bytes, checksum: 6884c5455ed76729974e03777f962948 (MD5)
Previous issue date: 2011-03-30 / Data mining is the process of discovering patterns correlated with the various existing data in a database. GEO is a public biological database, maintained by NCBI, where they sought information relating to thirty-two patients of colorectal adenoma, with readings from the probes concerning the expression of genes, extracted RNA. The data deposited in biological banks alone do not produce useful information, and therefore, were selected respecting various factors such as the reliability of the information collected, the amount of information present in the greatest number of probes, and finally filtered by reading higher expression. After the databse and be treated with the selected genes was then applied to the R tool with the SVM in order to identify this small set of genes, the possibility of their association with the presence of adenoma of the colon of the intestine.
From the results obtained by classifying the data it was noticed that the characteristics of the genes are distinct and that the activity varies greatly from gene to gene. However, this occurs in a standardized manner, allowing the algorithm could identify these patterns and suggest their involvement in the adenoma. / A mineração de dados é o processo de descoberta de padrões correlacionados entre os diversos dados existentes em uma base. O GEO é uma base de dados biológicos público, mantido pelo NCBI, onde se buscou as informações referentes a trinta e dois pacientes de Adenoma de colo e intestino, com leituras de expressão de sondas referentes aos genes, extraídas do RNA. Os dados depositados em bancos biológicos, por si só, não produzem informação útil, e por isto, foram selecionados respeitando diversos fatores, como a confiabilidade da informação colhida, a quantidade de informações presentes no maior número de sondas, e finalmente filtrados pela leitura de maior expressão. Após a base de dados ser tratada e com os genes selecionados, foi então aplicada a ferramenta R com o classificador SVM com o objetivo de identificar, neste pequeno conjunto de genes, a possibilidade de associação deles com a presença do adenoma de colo de intestino.
A partir dos resultados obtidos através da classificação dos dados percebeu-se que as características dos genes são bem distintas e que a atividade varia bastante de gene para gene. Porém, isto ocorre de forma padronizada, o que permitiu que o algoritmo pudesse identificar estes padrões e sugerir sua participação no processo do adenoma.
|
52 |
Redes neurais e support vector machines como técnicas de diagnósticos em medições industriais de nível por tecnologia tipo radar sem contato e apoio à decisão para a melhoria de sua aplicação / Neural networks and support vector machines as diagnosing tool for industrial level measurement through non-contacting radar type and support to the decision for its better applicationDenis Borg 02 December 2016 (has links)
O objetivo dessa tese é detectar e classificar problemas de medição de nível por princípio de radar de propagação de onda livre por meio de RNA (redes neurais artificiais) e SVM (support vector machines) aliados à tratamentos estatísticos. Um primeiro cenário com ambiente controlado foi montado para a obtenção de dados preliminares. Na sequência, outros três cenários empregaram dados industriais reais. Para tanto, algumas topologias de redes neurais em quatro cenários diferentes foram testadas e foi possível demonstrar o funcionamento eficiente da RNA com acertos de 100% para o primeiro cenário, 93,51% para o segundo, 99,75% para o terceiro e de 99,94% para o quarto cenário. Para esses mesmos quatro cenários, os resultados de classificação do SVM foram de 100%, 84,41%, 93,74% e de 96,40%. Os resultados obtidos demonstram que a técnica desenvolvida pode ser aplicada à cenários reais de medição de nível. Após a classificação dos problemas pela RNA ou SVM é recomendada a utilização de alguns dos ícones baseados na norma internacional NAMUR NE107 para reportar as diferentes classificações de problemas resultantes da aplicação das técnicas dessa tese. Propõe-se que essas técnicas sejam embarcadas em aplicativos computacionais de gerenciamento de ativos para melhorar a confiabilidade da medição, antecipar rotinas de manutenção dos instrumentos e aumentar a segurança da planta industrial através de reportes adequados aos usuários dos problemas de medição de nível e do mapeamento das fases do processo. / The aim of this Thesis is to detect and classify level measurement problems by free wave propagation radars using ANN (artificial neural network) and SVM (support vector machines) with statistical pre-processing data. In the first scenario, a controlled environment was build in order to get the preliminary data. In addition, three other scenarios with real industry data was considered. Therefore, some topologies of neural networks and SVM in four different scenarios were tested and it was demonstrated the efficiency of ANN to reach an accuracy rate of 100% for the first scenario, 93.51% for the second, 99.75% for third and 99.94% for the fourth scenario. For these same four scenarios, the results of SVM classification were 100%, 84.41%, 93.74% and 96.40%. After classifying the problems by ANN or SVM, it is recommended to use some of the icons following the international standard NAMUR NE107 to report the different classifications of problems within this thesis. It is proposed that these techniques be embedded in asset management environment to improve the reliability of level measurement, antecipate maintenance routines and improve plant safety through adequately reporting the classified problems and mapping stage of the process to the users.
|
53 |
Uso de imagens de fluorescência para monitoramento da evolução do cancro cítrico / Use of fluorescence imaging for monitoring the evolution of citrus cankerCaio Bruno Wetterich 29 February 2012 (has links)
A doença cancro cítrico é considerada uma das mais importantes doenças da citricultura devido ao seu poder de proliferação nas fazendas, e aos danos causados às plantas e frutos. Os prejuízos causados pela presença da doença são consideravelmente preocupantes, pois as principais medidas de controle pelos órgãos responsáveis envolvem a erradicação de plantas infectadas e demais plantas vizinhas, inviabilizando economicamente grandes áreas produtivas. A legislação brasileira exige um extenso protocolo de atividades que necessita ser realizado antes da confirmação do diagnóstico. Atrasos na confirmação do diagnóstico favorecem a proliferação da doença. Assim, qualquer esforço em acelerar esta detecção deve com certeza ter um grande impacto nesta área. Esta é a motivação de nosso trabalho, onde aplicamos a técnica de espectroscopia por imagens de fluorescência em folhas de culturas cítricas com a intenção de avaliar a capacidade de diagnóstico desta técnica em plantas assintomáticas contaminadas no laboratório com cancro cítrico. O objetivo é determinar o instante de tempo mínimo necessário entre a infecção e o diagnóstico preciso da doença. Este estudo foi aplicado para experimentos envolvendo amostras destrutivas e não-destrutivas. Os resultados mostram a possibilidade de aplicar tal técnica na detecção de cancro cítrico. / The citrus canker disease is considered one of the most important citrus diseases due to its ability to spread on farms, and to damage plants and fruits. The damage, caused by the citrus canker, can be devastating, because the main control actions involve the eradication of infected plants and other plants nearby, causing large economic losses. Brazilian law requires an extensive testing protocol to confirm the diagnosis. Delays in diagnosis tests allows the spread of the disease. Therefore, any effort to accelerate this procedure will have a major impact in this area. This is the motivation of our work, where we apply the fluorescence imaging spectroscopy technique on citrus leaves with the goal to evaluate the diagnostic capability of this technique in asymptomatic plants infected with citrus canker in the laboratory. The goal is to determine the minimum time delay between infection and accurate diagnosis of the disease. This study was applied to experiments involving non-destructive and destructive samples. The results show the possibility of applying this technique in the detection of citrus canker.
|
54 |
Extraction of Key-Frames from an Unstable Video FeedVempati, Nikhilesh 28 September 2017 (has links) (PDF)
The APOLI project deals with Automated Power Line Inspection using Highly-automated Unmanned Aerial Systems. Beside the Real-time damage assessment by on-board high-resolution image data exploitation a postprocessing of the video data is necessary. This Master Thesis deals with the implementation of an Isolator Detector Framework and a Work ow in the Automotive Data and Time-triggered Framework(ADTF) that loads a video direct from a camera or from a storage and extracts the Key Frames which contain objects of interest. This is done by the implementation of an object detection system using C++ and the creation of ADTF Filters that perform the task of detection of the objects of interest and extract the Key Frames using a supervised learning platform. The use case is the extraction of frames from video samples that contain Images of Isolators from Power Transmission Lines.
|
55 |
[en] SUPERVISED LEARNING INCREMENTAL FEATURE INDUCTION AND SELECTION / [pt] INDUÇÃO E SELEÇÃO INCREMENTAIS DE ATRIBUTOS NO APRENDIZADO SUPERVISIONADOEDUARDO NEVES MOTTA 13 January 2017 (has links)
[pt] A indução de atributos não lineares a partir de atributos básicos é um modo de obter modelos preditivos mais precisos para problemas de classificação. Entretanto, a indução pode causar o rápido crescimento do número de atributos, resultando usualmente em overfitting e em modelos com baixo poder de generalização. Para evitar esta consequência indesejada, técnicas de regularização são aplicadas, para criar um compromisso entre um reduzido conjunto de atributos representativo do domínio e a capacidade de generalização
Neste trabalho, descrevemos uma abordagem de aprendizado de máquina supervisionado com indução e seleção incrementais de atributos. Esta abordagem integra árvores de decisão, support vector machines e seleção de atributos utilizando perceptrons esparsos em um framework de aprendizado que chamamos IFIS – Incremental Feature Induction and Selection. Usando o IFIS, somos capazes de criar modelos regularizados não lineares de alto desempenho utilizando um algoritmo com modelo linear. Avaliamos o nosso
sistema em duas tarefas de processamento de linguagem natural em dois idiomas. Na primeira tarefa, anotação morfossintática, usamos dois corpora, o corpus WSJ em língua inglesa e o Mac-Morpho em Português. Em ambos, alcançamos resultados competitivos com o estado da arte reportado na literatura, alcançando as acurácias de 97,14 por cento e 97,13 por cento, respectivamente. Na segunda tarefa, análise de dependência, utilizamos o corpus da CoNLL 2006 Shared Task em português, ultrapassando os resultados reportados durante aquela competição e alcançando resultados competitivos com o estado da arte para esta tarefa, com a métrica UAS igual a 92,01 por cento. Com a regularização usando um perceptron esparso, geramos modelos SVM que são até 10 vezes menores, preservando sua acurácia. A redução dos
modelos é obtida através da regularização dos domínios dos atributos, que atinge percentuais de até 99 por cento. Com a regularização dos modelos, alcançamos uma redução de até 82 por cento no tamanho físico dos modelos. O tempo de predição do modelo compacto é reduzido em até 84 por cento. A redução dos domínios e modelos permite também melhorar a engenharia de atributos, através da análise dos domínios compactos e da introdução incremental de novos atributos. / [en] Non linear feature induction from basic features is a method of generating predictive models with higher precision for classification problems. However, feature induction may rapidly lead to a huge number of features, causing overfitting and models with low predictive power. To prevent this side effect, regularization techniques are employed to obtain a trade-off between a reduced feature set representative of the domain and generalization power. In this work, we describe a supervised machine learning approach that incrementally inducts and selects feature conjunctions derived from base features. This approach integrates decision trees, support vector machines and feature selection using sparse perceptrons in a machine learning framework named IFIS – Incremental Feature Induction and Selection. Using IFIS, we generate regularized non-linear models with high performance using a linear algorithm. We evaluate our system in two natural language processing tasks in two different languages. For the first task, POS tagging, we use two corpora,
WSJ corpus for English, and Mac-Morpho for Portuguese. Our results are competitive with the state-of-the-art performance in both, achieving accuracies of 97.14 per cent and 97.13 per cent, respectively. In the second task, Dependency Parsing, we use the CoNLL 2006 Shared Task Portuguese corpus, achieving better results than those reported during that competition and competitive with the state-of-the-art for this task, with UAS score of 92.01 per cent. Applying model regularization using a sparse perceptron, we obtain SVM
models 10 times smaller, while maintaining their accuracies. We achieve model reduction by regularization of feature domains, which can reach 99 per cent. Using the regularized model we achieve model physical size shrinking of up to 82 per cent. The prediction time is cut by up to 84 per cent. Domains and models downsizing also allows enhancing feature engineering, through compact domain analysis and incremental inclusion of new features.
|
56 |
Uso de imagens de fluorescência para monitoramento da evolução do cancro cítrico / Use of fluorescence imaging for monitoring the evolution of citrus cankerWetterich, Caio Bruno 29 February 2012 (has links)
A doença cancro cítrico é considerada uma das mais importantes doenças da citricultura devido ao seu poder de proliferação nas fazendas, e aos danos causados às plantas e frutos. Os prejuízos causados pela presença da doença são consideravelmente preocupantes, pois as principais medidas de controle pelos órgãos responsáveis envolvem a erradicação de plantas infectadas e demais plantas vizinhas, inviabilizando economicamente grandes áreas produtivas. A legislação brasileira exige um extenso protocolo de atividades que necessita ser realizado antes da confirmação do diagnóstico. Atrasos na confirmação do diagnóstico favorecem a proliferação da doença. Assim, qualquer esforço em acelerar esta detecção deve com certeza ter um grande impacto nesta área. Esta é a motivação de nosso trabalho, onde aplicamos a técnica de espectroscopia por imagens de fluorescência em folhas de culturas cítricas com a intenção de avaliar a capacidade de diagnóstico desta técnica em plantas assintomáticas contaminadas no laboratório com cancro cítrico. O objetivo é determinar o instante de tempo mínimo necessário entre a infecção e o diagnóstico preciso da doença. Este estudo foi aplicado para experimentos envolvendo amostras destrutivas e não-destrutivas. Os resultados mostram a possibilidade de aplicar tal técnica na detecção de cancro cítrico. / The citrus canker disease is considered one of the most important citrus diseases due to its ability to spread on farms, and to damage plants and fruits. The damage, caused by the citrus canker, can be devastating, because the main control actions involve the eradication of infected plants and other plants nearby, causing large economic losses. Brazilian law requires an extensive testing protocol to confirm the diagnosis. Delays in diagnosis tests allows the spread of the disease. Therefore, any effort to accelerate this procedure will have a major impact in this area. This is the motivation of our work, where we apply the fluorescence imaging spectroscopy technique on citrus leaves with the goal to evaluate the diagnostic capability of this technique in asymptomatic plants infected with citrus canker in the laboratory. The goal is to determine the minimum time delay between infection and accurate diagnosis of the disease. This study was applied to experiments involving non-destructive and destructive samples. The results show the possibility of applying this technique in the detection of citrus canker.
|
57 |
An Iterative Feature Perturbation Method for Gene Selection from Microarray DataCanul Reich, Juana 11 June 2010 (has links)
Gene expression microarray datasets often consist of a limited number of samples relative to a large number of expression measurements, usually on the order of thousands of genes. These characteristics pose a challenge to any classification model as they might negatively impact its prediction accuracy. Therefore, dimensionality reduction is a core process prior to any classification task.
This dissertation introduces the iterative feature perturbation method (IFP), an embedded gene selector that iteratively discards non-relevant features. IFP considers relevant features as those which after perturbation with noise cause a change in the predictive accuracy of the classification model. Non-relevant features do not cause any change in the predictive accuracy in such a situation.
We apply IFP to 4 cancer microarray datasets: colon cancer (cancer vs. normal), leukemia (subtype classification), Moffitt colon cancer (prognosis predictor) and lung cancer (prognosis predictor). We compare results obtained by IFP to those of SVM-RFE and the t-test using a linear support vector machine as the classifier in all cases. We do so using the original entire set of features in the datasets, and using a preselected set of 200 features (based on p values) from each dataset. When using the entire set of features, the IFP approach results in comparable accuracy (and higher at some points) with respect to SVM-RFE on 3 of the 4 datasets. The simple t-test feature ranking typically produces classifiers with the highest accuracy across the 4 datasets. When using 200 features chosen by the t-test, the accuracy results show up to 3% performance improvement for both IFP and SVM-RFE across the 4 datasets. We corroborate these results with an AUC analysis and a statistical analysis using the Friedman/Holm test.
Similar to the application of the t-test, we used the methodsinformation gain and reliefF as filters and compared all three. Results of the AUC analysis show that IFP and SVM-RFE obtain the highest AUC value when applied on the t-test-filtered datasets. This result is additionally corroborated with statistical analysis.
The percentage of overlap between the gene sets selected by any two methods across the four datasets indicates that different sets of genes can and do result in similar accuracies.
We created ensembles of classifiers using the bagging technique with IFP, SVM-RFE and the t-test, and showed that their performance can be at least equivalent to those of the non-bagging cases, as well as better in some cases.
|
58 |
Predicting Student Performance Using Machine Learning: A Comparative Study Between Classification AlgorithmsHayder, Alabbas January 2022 (has links)
Forskningsfrågan i denna avhandling var att utvärdera och jämföra två ML-algoritmer som var Support Vector Machine (SVM) och Artificial Neural Network (ANN) i termer av noggrannhet, precision, återkallelse, f1-poäng och förutsägelse när de tränades för att klassificera binära datamängder. Datauppsättningen hämtades från Ladok och bestod av anonyma högskolestudenter från en mängd kurser. Algoritmerna kördes på TensorFlow med Keras som API och byggdes, tränades och kördes för utvärdering, allt på Google Colab. Källkoden skrevs i Python. Det icke-tekniska målet med studien var att försöka hitta ett förutsägelsemönster för studentprestationer och tillhandahålla ett tekniskt ramverktyg för att ge feedback till studenter och universitetsfakulteten. Forskningsfrågan delades upp i tre separata delfrågor. Den första var om ML-algoritmerna var ett lämpligt sätt att hitta dessa elevmönster och den kunskap man fick var att ja eftersom dessa algoritmer var lämpliga för den lilla datauppsättningsstorleken. Den andra handlade om hur man implementerar SVM och ANN och det löstes med TensorFlow med Keras API. Den tredje handlade om mängden som behövdes för att dra slutsatserna och förutsäga dessa algoritmer, och det fastställdes att storleken var tillräcklig på grund av att den tränade noggrannheten var högre än baslinjenoggrannheten i båda algoritmerna. Den huvudsakliga forskningsfrågan resulterade i att SVM-modellen överträffade ANN-modellen vad gäller alla nämnda parametrar. Detta teoretiserades på grund av att SVM har linjärt ökande multiparameter som matchade de ökade ingångarna. Detta var inte fallet med strukturen för ANN. / The research question of this thesis was to evaluate and compare two ML algorithms which were Support Vector Machine (SVM) and Artificial Neural Network (ANN) in terms of accuracy, precision, recall, f1 score, and prediction when trained for classifying binary datasets. The dataset was fetched from Ladok and consisted of anonymous higher education student credit from a multitude of courses. The algorithms were run on TensorFlow with Keras as an API and were built, trained, and run for evaluation all on Google Colab. The source-code was written in Python. The non-technical goal of the study was to try to find a prediction pattern for student performance and provide a technical framework tool to provide feedback for student and university faculty. The research question was broken down into three separate sub questions. The first one was if the ML algorithms were an appropriate way to find these student patterns and the knowledge gained was that yes because theses algorithms were appropriate for the small dataset size. The second one was about how to implement SVM and ANN and that was solved using TensorFlow with Keras API. The third one was about the amount needed to draw the conclusions and prediction these algorithms would make, and it was determined that the size was sufficient due to the trained accuracy being higher that the baseline accuracy in both algorithms. The main research question resulted in the SVM model outperforming the ANN model in terms of all the parameters mentioned. This was theorized due to the nature of SVM having linearly increasing multiparameter that that matched the increased inputs. This was not the case with the structure of the ANN.
|
59 |
MACHINE LEARNING ALGORITHMS and THEIR APPLICATIONS in CLASSIFYING CYBER-ATTACKS on a SMART GRID NETWORKAribisala, Adedayo, Khan, Mohammad S., Husari, Ghaith 01 January 2021 (has links)
Smart grid architecture and Software-defined Networking (SDN) have evolved into a centrally controlled infrastructure that captures and extracts data in real-time through sensors, smart-meters, and virtual machines. These advances pose a risk and increase the vulnerabilities of these infrastructures to sophisticated cyberattacks like distributed denial of service (DDoS), false data injection attack (FDIA), and Data replay. Integrating machine learning with a network intrusion detection system (NIDS) can improve the system's accuracy and precision when detecting suspicious signatures and network anomalies. Analyzing data in real-time using trained and tested hyperparameters on a network traffic dataset applies to most network infrastructures. The NSL-KDD dataset implemented holds various classes, attack types, protocol suites like TCP, HTTP, and POP, which are critical to packet transmission on a smart grid network. In this paper, we leveraged existing machine learning (ML) algorithms, Support vector machine (SVM), K-nearest neighbor (KNN), Random Forest (RF), Naïve Bayes (NB), and Bagging; to perform a detailed performance comparison of selected classifiers. We propose a multi-level hybrid model of SVM integrated with RF for improved accuracy and precision during network filtering. The hybrid model SVM-RF returned an average accuracy of 94% in 10-fold cross-validation and 92.75%in an 80-20% split during class classification.
|
60 |
New support vector machine formulations and algorithms with application to biomedical data analysisGuan, Wei 13 June 2011 (has links)
The Support Vector Machine (SVM) classifier seeks to find the separating hyperplane wx=r that maximizes the margin distance 1/||w||2^2. It can be formalized as an optimization problem that minimizes the hinge loss Ʃ[subscript i](1-y[subscript i] f(x[subscript i]))₊ plus the L₂-norm of the weight vector. SVM is now a mainstay method of machine learning. The goal of this dissertation work is to solve different biomedical data analysis problems efficiently using extensions of SVM, in which we augment the standard SVM formulation based on the application requirements. The biomedical applications we explore in this thesis include: cancer diagnosis, biomarker discovery, and energy function learning for protein structure prediction.
Ovarian cancer diagnosis is problematic because the disease is typically asymptomatic especially at early stages of progression and/or recurrence. We investigate a sample set consisting of 44 women diagnosed with serous papillary ovarian cancer and 50 healthy women or women with benign conditions. We profile the relative metabolite levels in the patient sera using a high throughput ambient ionization mass spectrometry technique, Direct Analysis in Real Time (DART). We then reduce the diagnostic classification on these metabolic profiles into a functional classification problem and solve it with functional Support Vector Machine (fSVM) method. The assay distinguished between the cancer and control groups with an unprecedented 99\% accuracy (100\% sensitivity, 98\% specificity) under leave-one-out-cross-validation. This approach has significant clinical potential as a cancer diagnostic tool.
High throughput technologies provide simultaneous evaluation of thousands of potential biomarkers to distinguish different patient groups. In order to assist biomarker discovery from these low sample size high dimensional cancer data, we first explore a convex relaxation of the L₀-SVM problem and solve it using mixed-integer programming techniques. We further propose a more efficient L₀-SVM approximation, fractional norm SVM, by replacing the L₂-penalty with L[subscript q]-penalty (q in (0,1)) in the optimization formulation. We solve it through Difference of Convex functions (DC) programming technique. Empirical studies on the synthetic data sets as well as the real-world biomedical data sets support the effectiveness of our proposed L₀-SVM approximation methods over other commonly-used sparse SVM methods such as the L₁-SVM method.
A critical open problem in emph{ab initio} protein folding is protein energy function design. We reduce the problem of learning energy function for extit{ab initio} folding to a standard machine learning problem, learning-to-rank. Based on the application requirements, we constrain the reduced ranking problem with non-negative weights and develop two efficient algorithms for non-negativity constrained SVM optimization. We conduct the empirical study on an energy data set for random conformations of 171 proteins that falls into the {it ab initio} folding class. We compare our approach with the optimization approach used in protein structure prediction tool, TASSER. Numerical results indicate that our approach was able to learn energy functions with improved rank statistics (evaluated by pairwise agreement) as well as improved correlation between the total energy and structural dissimilarity.
|
Page generated in 0.0455 seconds