• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 55
  • 18
  • 13
  • 7
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 119
  • 119
  • 63
  • 56
  • 48
  • 40
  • 28
  • 28
  • 27
  • 24
  • 24
  • 20
  • 19
  • 18
  • 17
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Improving Multi-class Text Classification with Naive Bayes

Rennie, Jason D. M. 01 September 2001 (has links)
There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.
2

Likelihood-based classification of single trees in hemi-boreal forests

Vallin, Simon January 2015 (has links)
Determining species of individual trees is important for forest management. In this thesis we investigate if it is possible to discriminate between Norway spruce, Scots pine and deciduous trees from airborne laser scanning data by using unique probability density functions estimated for each specie. We estimate the probability density functions in three different ways: by fitting a beta distribution, histogram density estimation and kernel density estimation. All these methods classifies single laser returns (and not segments of laser returns). The resulting classification is compared with a reference method based on features extracted from airborne laser scanning data.We measure how well a method performs by using the overall accuracy, that is the proportion of correctly predicted trees. The highest overall accuracy obtained by the methods we developed in this thesis is obtained by using histogram-density estimation where an overall accuracy of 83.4 percent is achieved. This result can be compared with the best result from the reference method that produced an overall accuracy of 84.1 percent. The fact that we achieve a high level of correctly classified trees indicates that it is possible to use these types of methods for identification of tree species. / Att kunna artbestämma enskilda träd är viktigt inom skogsbruket. I denna uppsats undersöker vi om det är möjligt att skilja mellan gran, tall och lövträd med data från en flygburen laserskanner genom att skatta en unik täthetsfunktion för varje trädslag. Täthetsfunktionerna skattas på tre olika sätt: genom att anpassa en beta-fördelning, skatta täthetsfunktionen med histogram samt skatta täthetsfunktionen med en kernel täthetsskattning. Alla dessa metoder klassificerar varje enskild laserretur (och inte segment av laserreturer). Resultaten från vår klassificering jämförs sedan med en referensmetod som bygger på särdrag från laserskanner data. Vi mäter hur väl metoderna presterar genom att jämföra den totala precisionen, vilket är andelen korrektklassificerade träd. Den högsta totala precisionen för de framtagna metoderna i denna uppsats erhölls med metoden som bygger på täthetsskattning med histogram. Precisionen för denna metod var 83,4 procent rättklassicerade träd. Detta kan jämföras med en rättklassificering på 84,1 procent vilket är det bästa resultatet för referensmetoderna. Att vi erhåller en så pass hög grad av rättklassificerade träd tyder på att de metoder som vi använder oss av är användbara för trädslagsklassificering.
3

Modelamiento y Estudio de la Red de Interacciones Proteicas del Complejo NRC/MASC

Campos Valenzuela, Jaime Alberto January 2010 (has links)
La presente memoria tiene por objetivo investigar el sistema sináptico y levantar nuevas hipótesis acerca de la relación entre la organización de la densidad postsinaptica y el gatillamiento de enfermedades cognitivas, tales como, esquizofrenia, Alzheimer y retardo mental. Ello con la motivación de iniciar el desarrollo de nuevas terapias que permitan un ataque al mecanismo de estas enfermedades y no sólo a las consecuencias de ellas. En particular este trabajo explora nuevas metodologías en la inferencia de interacciones interproteicas y aplicar aquellas relaciones putativas en el estudio de la estructura receptora de glutamato NRC/MASC (NMDA receptor complex/ MAGUK associated signalling complex), ya que en la última década se ha determinado el rol fundamental del neurotransmisor glutamato en los procesos cognitivos y, por lo tanto, de la importancia de la recepción de él. Para el desarrollo de los objetivos se propuso un protocolo nuevo, en donde se unen dos metodologías novedosas. En primer lugar, la aplicación del clasificador Naïve-Bayes para inferir interacciones interproteicas del ser humano, logrando de esa forma obtener una red de interacción más amplia y con un parámetro de confianza para cada uno de sus elementos. En segundo lugar, utilizando esta red inferida, en conjunto con otras redes disponibles en literatura, se llevó a cabo un estudio sistémico de la unidad NRC/MASC, y como ésta se ve afectada en sujetos con enfermedades cognitivas. Para ello se utilizó un algoritmo de clustering que permitió la definición de los módulos funcionales del complejo. El primer resultado obtenido fue una red de interacciones interproteicas para el ser humano, compuesta por un número de proteínas comparable a las reportadas con anterioridad. La información disponible en estas redes fue integrada en un modelo único. Se seleccionaron los nodos pertenecientes al complejo receptor NRC/MASC, los que fueron agrupados en 12 módulos altamente especializados mediante el algoritmo de clustering. El análisis de las características de cada modulo permitió identificar una nueva organización no reportada en literatura: un gran módulo receptor conforma la capa de entrada de la señal de glutamato, seguido de una capa de modulación, para finalizar con la capa de módulos efectores. Por otro lado se designó una capa híbrida, con clusters con una función dual, tanto moduladores como efectores. Estos resultados permiten definir un nuevo modelo funcional del receptor, en donde se presentan una gran cantidad de vías de señalización y un aumento de la complejidad de las relaciones intermodulares. Además, se encontró que los clusters con una alta correlación con las enfermedades cognitivas serían el módulo receptor y el cluster modulador compuesto por 3 proteínas G. Finalmente, esta memoria ha propuesto un modelo funcional para la unidad receptora NRC/MASC, cuya composición y características organizativas se diferencian de los reportados anteriormente. Estas características transforman este modelo en una herramienta novedosa para el estudio de los complejos mecanismos que hay detrás de enfermedades como esquizofrenia y retardo mental.
4

Analysis of Immunosignaturing Case Studies

January 2012 (has links)
abstract: Immunosignaturing is a technology that allows the humoral immune response to be observed through the binding of antibodies to random sequence peptides. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides in a multiplexed fashion. There are computational and statistical challenges to the analysis of immunosignaturing data. The overall aim of my dissertation is to develop novel computational and statistical methods for immunosignaturing data to access its potential for diagnostics and drug discovery. Firstly, I discovered that a classification algorithm Naive Bayes which leverages the biological independence of the probes on our array in such a way as to gather more information outperforms other classification algorithms due to speed and accuracy. Secondly, using this classifier, I then tested the specificity and sensitivity of immunosignaturing platform for its ability to resolve four different diseases (pancreatic cancer, pancreatitis, type 2 diabetes and panIN) that target the same organ (pancreas). These diseases were separated with >90% specificity from controls and from each other. Thirdly, I observed that the immunosignature of type 2 diabetes and cardiovascular complications are unique, consistent, and reproducible and can be separated by 100% accuracy from controls. But when these two complications arise in the same person, the resultant immunosignature is quite different in that of individuals with only one disease. I developed a method to trace back from informative random peptides in disease signatures to the potential antigen(s). Hence, I built a decipher system to trace random peptides in type 1 diabetes immunosignature to known antigens. Immunosignaturing, unlike the ELISA, has the ability to not only detect the presence of response but also absence of response during a disease. I observed, not only higher but also lower peptides intensities can be mapped to antigens in type 1 diabetes. To study immunosignaturing potential for population diagnostics, I studied effect of age, gender and geographical location on immunosignaturing data. For its potential to be a health monitoring technology, I proposed a single metric Coefficient of Variation that has shown potential to change significantly when a person enters a disease state. / Dissertation/Thesis / Ph.D. Biological Design 2012
5

Optimización del clasificador “naive bayes” usando árbol de decisión C4.5

Alarcón Jaimes, Carlos January 2015 (has links)
El clasificador Naive Bayes es uno de los modelos de clasificación más efectivos, debido a su simplicidad, resistencia al ruido, poco tiempo de procesamiento y alto poder predictivo. El clasificador Naive Bayes asume una fuerte suposición de independencia entre las variables predictoras dada la clase, lo que generalmente no se cumple. Muchas investigaciones buscan mejorar el poder predictivo del clasificador relajando esta suposición de independencia, como el escoger un subconjunto de variables que sean independientes o aproximadamente independientes. En este trabajo, se presenta un método que busca optimizar el clasificador Naive Bayes usando el árbol de decisión C4.5. Este método, selecciona un subconjunto de variables del conjunto de datos usando el árbol de decisión C4.5 inducido y luego aplica el clasificador Naive Bayes a estas variables seleccionadas. Con el uso previo del árbol de decisión C4.5 se consigue remover las variables redundantes y/o irrelevantes del conjunto de datos y escoger las que son más informativas en tareas de clasificación, y de esta forma mejorar el poder predictivo del clasificador. Este método es ilustrado utilizando tres conjuntos de datos provenientes del repositorio UCI , Irvin Repository of Machine Learning databases de la Universidad de California y un conjunto de datos proveniente de la Encuesta Nacional de Hogares del Instituto Nacional de Estadística e Informática del Perú, ENAHO – INEI, e implementado con el programa WEKA.
6

Using machine learning to classify news articles

Lagerkrants, Eleonor, Holmström, Jesper January 2016 (has links)
In today’s society a large portion of the worlds population get their news on electronicdevices. This opens up the possibility to enhance their reading experience bypersonalizing news for the readers based on their previous preferences. We have conductedan experiment to find out how accurately a Naïve Bayes classifier can selectarticles that a user might find interesting. Our experiments was done on two userswho read and classified 200 articles as interesting or not interesting. Those articleswere divided into four datasets with the sizes 50, 100, 150 and 200. We used a NaïveBayes classifier with 16 different settings configurations to classify the articles intotwo categories. From these experiments we could find several settings configurationsthat showed good results. One settings configuration was chosen as a good generalsetting for this kind of problem. We found that for datasets with a size larger than 50there were no significant increase in classification confidence.
7

The prediction of HLA genotypes from next generation sequencing and genome scan data

Farrell, John J. 22 January 2016 (has links)
Genome-wide association studies have very successfully found highly significant disease associations with single nucleotide polymorphisms (SNP) in the Major Histocompatibility Complex for adverse drug reactions, autoimmune diseases and infectious diseases. However, the extensive linkage disequilibrium in the region has made it difficult to unravel the HLA alleles underlying these diseases. Here I present two methods to comprehensively predict 4-digit HLA types from the two types of experimental genome data widely available. The Virtual SNP Imputation approach was developed for genome scan data and demonstrated a high precision and recall (96% and 97% respectively) for the prediction of HLA genotypes. A reanalysis of 6 genome-wide association studies using the HLA imputation method identified 18 significant HLA allele associations for 6 autoimmune diseases: 2 in ankylosing spondylitis, 2 in autoimmune thyroid disease, 2 in Crohn's disease, 3 in multiple sclerosis, 2 in psoriasis and 7 in rheumatoid arthritis. The EPIGEN consortium also used the Virtual SNP Imputation approach to detect a novel association of HLA-A*31:01 with adverse reactions to carbamazepine. For the prediction of HLA genotypes from next generation sequencing data, I developed a novel approach using a naïve Bayes algorithm called HLA-Genotyper. The validation results covered whole genome, whole exome and RNA-Seq experimental designs in the European and Yoruba population samples available from the 1000 Genomes Project. The RNA-Seq data gave the best results with an overall precision and recall near 0.99 for Europeans and 0.98 for the Yoruba population. I then successfully used the method on targeted sequencing data to detect significant associations of idiopathic membranous nephropathy with HLA-DRB1*03:01 and HLA-DQA1*05:01 using the 1000 Genomes European subjects as controls. Using the results reported here, researchers may now readily unravel the association of HLA alleles with many diseases from genome scans and next generation sequencing experiments without the expensive and laborious HLA typing of thousands of subjects. Both algorithms enable the analysis of diverse populations to help researchers pinpoint HLA loci with biological roles in infection, inflammation, autoimmunity, aging, mental illness and adverse drug reactions.
8

COMBATING DISINFORMATION : Detecting fake news with linguistic models and classification algorithms / BEKÄMPNING AV DISINFORMATION : Upptäcka falska nyheter med språkliga modeller och klassificeringsalgoritmer

Svärd, Mikael, Rumman, Philip January 2017 (has links)
The purpose of this study is to examine the possibility of accurately distinguishing fabricated news from authentic news stories using Naive Bayes classification algorithms. This involves a comparative study of two different machine learning classification algorithms. The work also contains an overview of how linguistic text analytics can be utilized in detection purposes and an attempt to extract interesting information was made using Word Frequencies. A discussion of how different actors and parties in businesses and governments are affected by and how they handle deception caused by fake news articles was also made. This study further tries to ascertain what collective steps could be made towards introducing a functioning solution to combat fake news. The result swere inconclusive and the simple Naive Bayes algorithms used did not yieldfully satisfactory results. Word frequencies alone did not give enough information for detection. They were however found to be potentially useful as part of a larger set of algorithms and strategies as part of a solution to handling of misinformation. / Syftet med denna studie är att undersöka möjligheten att på ett pålitligt sättskilja mellan fabricerade och autentiska nyheter med hjälp av Naive bayesalgoritmer,detta involverar en komparativ studie mellan två olika typer avalgoritmer. Arbetet innehåller även en översikt över hur lingvistisk textanalyskan användas för detektion och ett försök gjordes att extrahera information medhjälp av ordfrekvenser. Det förs även en diskussion kring hur de olika aktörernaoch parterna inom näringsliv och regeringar påverkas av och hur de hanterarbedrägeri kopplat till falska nyheter. Studien försöker vidare undersöka vilkasteg som kan tas mot en fungerande lösning för att motarbeta falska nyheter. Algoritmernagav i slutändan otillfredställande resultat och ordfrekvenserna kundeinte ensamma ge nog med information. De tycktes dock potentiellt användbarasom en del i ett större maskineri av algoritmer och strategier ämnade att hanteradesinformation.
9

Product categorisation using machine learning / Produktkategorisering med hjälp av maskininlärning

Stefan, Vasic, Nicklas, Lindgren January 2017 (has links)
Machine learning is a method in data science for analysing large data sets and extracting hidden patterns and common characteristics in the data. Corporations often have access to databases containing great amounts of data that could contain valuable information. Navetti AB wants to investigate the possibility to automate their product categorisation by evaluating different types of machine learning algorithms. This could increase both time- and cost efficiency. This work resulted in three different prototypes, each using different machine learning algorithms with the ability to categorise products automatically. The prototypes were tested and evaluated based on their ability to categorise products and their performance in terms of speed. Different techniques used for preprocessing data is also evaluated and tested. An analysis of the tests shows that when providing a suitable algorithm with enough data it is possible to automate the manual categorisation. / Maskininlärning är en metod inom datavetenskap vars uppgift är att analysera stora mängder data och hitta dolda mönster och gemensamma karaktärsdrag. Företag har idag ofta tillgång till stora mängder data som i sin tur kan innehålla värdefull information. Navetti AB vill undersöka möjligheten att automatisera sin produktkategorisering genom att utvärdera olika typer av maskininlärnings- algoritmer. Detta skulle dramatiskt öka effektiviteten både tidsmässigt och ekonomiskt. Resultatet blev tre prototyper som implementerar tre olika maskininlärnings-algoritmer som automatiserat kategoriserar produkter. Prototyperna testades och utvärderades utifrån dess förmåga att kategorisera och dess prestanda i form av hastighet. Olika tekniker som används för att förbereda data analyseras och utvärderas. En analys av testerna visar att med tillräckligt mycket data och en passande algoritm så är det möjligt att automatisera den manuella kategoriseringen.
10

Reconhecimento automático de defeitos de fabricação em painéis TFT-LCD através de inspeção de imagem

SILVA, Antonio Carlos de Castro da 15 January 2016 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-09-12T14:09:09Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) MSc_Antonio Carlos de Castro da Silva_digital_12_04_16.pdf: 2938596 bytes, checksum: 9d5e96b489990fe36c4e1ad5a23148dd (MD5) / Made available in DSpace on 2016-09-12T14:09:09Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) MSc_Antonio Carlos de Castro da Silva_digital_12_04_16.pdf: 2938596 bytes, checksum: 9d5e96b489990fe36c4e1ad5a23148dd (MD5) Previous issue date: 2016-01-15 / A detecção prematura de defeitos nos componentes de linhas de montagem de fabricação é determinante para a obtenção de produtos finais de boa qualidade. Partindo desse pressuposto, o presente trabalho apresenta uma plataforma desenvolvida para detecção automática dos defeitos de fabricação em painéis TFT-LCD (Thin Film Transistor-Liquid Cristal Displays) através da realização de inspeção de imagem. A plataforma desenvolvida é baseada em câmeras, sendo o painel inspecionado posicionado em uma câmara fechada para não sofrer interferência da luminosidade do ambiente. As etapas da inspeção consistem em aquisição das imagens pelas câmeras, definição da região de interesse (detecção do quadro), extração das características, análise das imagens, classificação dos defeitos e tomada de decisão de aprovação ou rejeição do painel. A extração das características das imagens é realizada tomando tanto o padrão RGB como imagens em escala de cinza. Para cada componente RGB a intensidade de pixels é analisada e a variância é calculada, se um painel apresentar variação de 5% em relação aos valores de referência, o painel é rejeitado. A classificação é realizada por meio do algorítimo de Naive Bayes. Os resultados obtidos mostram um índice de 94,23% de acurácia na detecção dos defeitos. Está sendo estudada a incorporação da plataforma aqui descrita à linha de produção em massa da Samsung em Manaus. / The early detection of defects in the parts used in manufacturing assembly lines is crucial for assuring the good quality of the final product. Thus, this paper presents a platform developed for automatically detecting manufacturing defects in TFT-LCD (Thin Film Transistor-Liquid Cristal Displays) panels by image inspection. The developed platform is based on câmeras. The panel under inspection is positioned in a closed chamber to avoid interference from light sources from the environment. The inspection steps encompass image acquisition by the cameras, setting the region of interest (frame detection), feature extraction, image analysis, classification of defects, and decision making. The extraction of the features of the acquired images is performed using both the standard RGB and grayscale images. For each component the intensity of RGB pixels is analyzed and the variance is calculated. A panel is rejected if the value variation of the measure obtained is 5% of the reference values. The classification is performed using the Naive Bayes algorithm. The results obtained show an accuracy rate of 94.23% in defect detection. Samsung (Manaus) is considering the possibility of incorporating the platform described here to its mass production line.

Page generated in 0.1431 seconds