• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 101
  • 21
  • 20
  • 9
  • 4
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 196
  • 196
  • 83
  • 47
  • 46
  • 40
  • 36
  • 33
  • 33
  • 32
  • 24
  • 23
  • 23
  • 22
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Método de mineração de dados para diagnóstico de câncer de mama baseado na seleção de variáveis / A data mining method for breast cancer diagnosis based on selected features

Holsbach, Nicole January 2012 (has links)
A presente dissertação propõe métodos para mineração de dados para diagnóstico de câncer de mama (CM) baseado na seleção de variáveis. Partindo-se de uma revisão sistemática, sugere-se um método para a seleção de variáveis para classificação das observações (pacientes) em duas classes de resultado, benigno ou maligno, baseado na análise citopatológica de amostras de célula da mama de pacientes. O método de seleção de variáveis para categorização das observações baseia-se em 4 passos operacionais: (i) dividir o banco de dados original em porções de treino e de teste, e aplicar a ACP (Análise de Componentes Principais) na porção de treino; (ii) gerar índices de importância das variáveis baseados nos pesos da ACP e na percentagem da variância explicada pelos componentes retidos; (iii) classificar a porção de treino utilizando as técnicas KVP (k-vizinhos mais próximos) ou AD (Análise Discriminante). Em seguida eliminar a variável com o menor índice de importância, classificar o banco de dados novamente e calcular a acurácia de classificação; continuar tal processo iterativo até restar uma variável; e (iv) selecionar o subgrupo de variáveis responsável pela máxima acurácia de classificação e classificar a porção de teste utilizando tais variáveis. Quando aplicado ao WBCD (Wisconsin Breast Cancer Database), o método proposto apresentou acurácia média de 97,77%, retendo uma média de 5,8 variáveis. Uma variação do método é proposta, utilizando quatro diferentes tipos de kernels polinomiais para remapear o banco de dados original; os passos (i) a (iv) acima descritos são então aplicados aos kernels propostos. Ao aplicar-se a variação do método ao WBCD, obteve-se acurácia média de 98,09%, retendo uma média de 17,24 variáveis de um total de 54 variáveis geradas pelo kernel polinomial recomendado. O método proposto pode auxiliar o médico na elaboração do diagnóstico, selecionando um menor número de variáveis (envolvidas na tomada de decisão) com a maior acurácia, obtendo assim o maior acerto possível. / This dissertation presents a data mining method for breast cancer (BC) diagnosis based on selected features. We first carried out a systematic literature review, and then suggested a method for feature selection and classification of observations, i.e., patients, into benign or malignant classes based on patients’ breast tissue measures. The proposed method relies on four operational steps: (i) split the original dataset into training and testing sets and apply PCA (Principal Component Analysis) on the training set; (ii) generate attribute importance indices based on PCA weights and percent of variance explained by the retained components; (iii) classify the training set using KNN (k-Nearest Neighbor) or DA (Discriminant Analysis) techniques, eliminate irrelevant features and compute the classification accuracy. Next, eliminate the feature with the lowest importance index, classify the dataset, and re-compute the accuracy. Continue such iterative process until one feature is left; and (iv) choose the subset of features yielding the maximum classification accuracy, and classify the testing set based on those features. When applied to the WBCD (Wisconsin Breast Cancer Database), the proposed method led to average 97.77% accurate classifications while retaining average 5.8 features. One variation of the proposed method is presented based on four different types of polynomial kernels aimed at remapping the original database; steps (i) to (iv) are then applied to such kernels. When applied to the WBCD, the proposed modification increased average accuracy to 98.09% while retaining average of 17.24 features from the 54 variables generated by the recommended kernel. The proposed method can assist the physician in making the diagnosis, selecting a smaller number of variables (involved in the decision-making) with greater accuracy, thereby obtaining the highest possible accuracy.
112

Metric space indexing for nearest neighbor search in multimedia context : Indexação de espaços métricos para busca de vizinho mais próximo em contexto multimídia / Indexação de espaços métricos para busca de vizinho mais próximo em contexto multimídia

Silva, Eliezer de Souza da, 1988- 26 August 2018 (has links)
Orientador: Eduardo Alves do Valle Junior / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-26T08:10:33Z (GMT). No. of bitstreams: 1 Silva_EliezerdeSouzada_M.pdf: 2350845 bytes, checksum: dd31928bd19312563101a08caea74d63 (MD5) Previous issue date: 2014 / Resumo: A crescente disponibilidade de conteúdo multimídia é um desafio para a pesquisa em Recuperação de Informação. Usuários querem não apenas ter acesso aos documentos multimídia, mas também obter semântica destes documentos, de modo que a capacidade de encontrar um conteúdo específico em grandes coleções de documentos textuais e não textuais é fundamental. Nessas grandes escalas, sistemas de informação multimídia de recuperação devem contar com a capacidade de executar a busca por semelhança de forma eficiente. No entanto, documentos multimídia são muitas vezes representados por descritores multimídia representados por vetores de alta dimensionalidade, ou por outras representações complexas em espaços métricos. Fornecer a possibilidade de uma busca por similaridade eficiente para esse tipo de dados é extremamente desafiador. Neste projeto, vamos explorar uma das famílias mais citado de soluções para a busca de similaridade, o Hashing Sensível à Localidade (LSH - Locality-sensitive Hashing em inglês), que se baseia na criação de funções de hash que atribuem, com maior probabilidade, a mesma chave para os dados que são semelhantes. O LSH está disponível apenas para um punhado funções de distância, mas, quando disponíveis, verificou-se ser extremamente eficiente para arquiteturas com custo de acesso uniforme aos dados. A maioria das funções LSH existentes são restritas a espaços vetoriais. Propomos dois métodos novos para o LSH, generalizando-o para espaços métricos quaisquer utilizando particionamento métrico (centróides aleatórios e k-medoids). Apresentamos uma comparação com os métodos LSH bem estabelecidos em espaços vetoriais e com os últimos concorrentes novos métodos para espaços métricos. Desenvolvemos uma modelagem teórica do comportamento probalístico dos algoritmos propostos e demonstramos algumas relações e limitantes para a probabilidade de colisão de hash. Dentre os algoritmos propostos para generelizar LSH para espaços métricos, esse desenvolvimento teórico é novo. Embora o problema seja muito desafiador, nossos resultados demonstram que ela pode ser atacado com sucesso. Esta dissertação apresentará os desenvolvimentos do método, a formulação teórica e a discussão experimental dos métodos propostos / Abstract: The increasing availability of multimedia content poses a challenge for information retrieval researchers. Users want not only have access to multimedia documents, but also make sense of them --- the ability of finding specific content in extremely large collections of textual and non-textual documents is paramount. At such large scales, Multimedia Information Retrieval systems must rely on the ability to perform search by similarity efficiently. However, Multimedia Documents are often represented by high-dimensional feature vectors, or by other complex representations in metric spaces. Providing efficient similarity search for that kind of data is extremely challenging. In this project, we explore one of the most cited family of solutions for similarity search, the Locality-Sensitive Hashing (LSH), which is based upon the creation of hashing functions which assign, with higher probability, the same key for data that are similar. LSH is available only for a handful distance functions, but, where available, it has been found to be extremely efficient for architectures with uniform access cost to the data. Most existing LSH functions are restricted to vector spaces. We propose two novel LSH methods (VoronoiLSH and VoronoiPlex LSH) for generic metric spaces based on metric hyperplane partitioning (random centroids and K-medoids). We present a comparison with well-established LSH methods in vector spaces and with recent competing new methods for metric spaces. We develop a theoretical probabilistic modeling of the behavior of the proposed algorithms and show some relations and bounds for the probability of hash collision. Among the algorithms proposed for generalizing LSH for metric spaces, this theoretical development is new. Although the problem is very challenging, our results demonstrate that it can be successfully tackled. This dissertation will present the developments of the method, theoretical and experimental discussion and reasoning of the methods performance / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica
113

Exploração de dados multivariados de fontes e extratos de antocianinas ultilizando análise de componentes princiaipais e método do vizinho mais proximo / Exploring multivariate data of sources and extracts of anthocyanins using principal components analysis and method of nearest neighbor

Favaro, Martha Maria Andreotti, 1981- 20 August 2018 (has links)
Orientador: Adriana Vitorino Rossi / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Química / Made available in DSpace on 2018-08-20T02:46:28Z (GMT). No. of bitstreams: 1 Favaro_MarthaMariaAndreotti_D.pdf: 3734314 bytes, checksum: 08002efe51b2f18e9a942c3b818270b7 (MD5) Previous issue date: 2012 / Resumo: Antocianinas (ACYS) são corantes naturais responsáveis pela coloração de frutas, hortaliças, flores e grãos. Novas perspectivas de usos de antocianinas em diversos segmentos industriais estimulam estudos analíticos para sistematizar a identificação e a classificação de fontes e extratos desses corantes. Neste trabalho foram utilizadas fontes de ACYS como frutas típicas brasileiras: AMORA (Morus nigra), amora preta (Rubus sp.), jabuticaba (Myrciaria cauliflora), jambolão (Syzygium cumini), jussara (Euterpe edulis Mart.), morango (Fragaria x ananassa Duch) e uva (Vitis vinífera e Vitis vinífera L. Brasil); hortaliças: alface roxa (Lactuca sativa), berinjela (Solanum melongena), cebola roxa (Allium cepa), rabanete (Raphanus sativus), repolho roxo (Brassica oleraceae) e flores: beijo-turco (Impatiens walleriana), gerânio (Pelargonium hortorum e Pelargonium peltatum L.), hibisco (Hibiscus sinensis e Hibiscus syriacus) e hortênsia (Hydrangea macrophylla). A literatura descreve diversas técnicas para análise de ACYS em vegetais e seus extratos, com destaque para cromatografia líquida de alta eficiência (HPLC), espectrometria de massas (MS) e espectrofotometria (UV-Vis), sendo que todas elas foram aplicadas neste trabalho, incluindo-se espectrofotometria de reflectância e a técnica de eletromigração em capilares cromatografia eletrocinética micelar (MEKC). As ferramentas quimiométricas utilizadas no tratamento dos dados foram análise de componentes principais (PCA) e método do vizinho mais próximo (KNN). Os modelos quimiométricos de classificação obtidos apresentaram-se robustos com erros de previsão de menos de 30 % sendo possível identificar as fontes de ACYS, o solvente extrator, a idade dos extratos e dados sobre sua estabilidade e condições de armazenamento. Os resultados apontaram que dados obtidos de técnicas analíticas simples como espectrofotometria de absorção e sem necessidade de preparo de amostra como reflectância difusa na região do visível são comparáveis a resultados de técnicas mais sofisticadas e caras como HPLC e MEKC e até superam o potencial de algumas informações obtidas por MS / Abstract: Anthocyanins (ACYS) are natural dyes responsible for color in fruits, vegetables, flowers and grains. New perspectives for use of anthocyanins in various industries stimulate analytical studies to systematize the identification and classification of sources and extracts of these dyes. In this work, typical Brazilian fruits: mulberry (Morus nigra), blackberry (Rubus sp), jaboticaba (Myrciaria cauliflora), jambolan (Syzygium cumini), jussara fruit (Euterpe edulis Mart.), strawberry (Fragaria x ananassa Duch) and grapes (Vitis vinifera and Vitis vinifera L. 'Brazil'); vegetables: red lettuce (Lactuca sativa), eggplant (Solanum melongena), purple onion (Allium cepa), radish (Raphanus sativus), red cabbage (Brassica oleracea) and flowers, Buzy Lizzie (Impatiens walleriana), geranium (Pelargonium hortorum and Pelargonium peltatum L.), hibiscus (Hibiscus sinensis and Hibiscus syriacus) and hydrangea (Hydrangea macrophylla) were used as sources of ACYS. The literature describes several techniques for analyzing ACYS in vegetables and their extracts, with emphasis on high performance liquid chromatography (HPLC), mass spectrometry (MS) and spectrophotometry (UV-VIS). All of these techniques were applied in this work, including reflectance spectrophotometry and micellar electrokinetic chromatography (MEKC) which is one of the capillary electromigration techniques. The chemometric tools used in data handling were the principal component analysis (PCA) and the K-nearest neighbor method (KNN). The chemometric classification models obtained are robust with predict errors of less than 30 %. It is possible to identify the sources of ACYS, the extractor solvent, the age of the extracts, their stability and storage conditions. The results show that data obtained from simple analytical techniques such as absorption spectroscopy and diffuse reflectance in the visible region (sample preparation is not needed) are comparable to results of those obtained from sophisticated and expensive techniques such as HPLC and MEKC. These techniques also surpass the information obtained by MS / Doutorado / Quimica Analitica / Doutor em Ciências
114

Aplikace heuristických metod na rozvozní úlohu s časovými okny / Application of Heuristic Methods for Vehicle Routing Problem with Time Windows

Chytrá, Alena January 2008 (has links)
This thesis demonstrates practical using of vehicle routing problem with time windows (VRPTW) and its solution by heuristic method. There are described teoretical principles of integer models, mathematical definitions of VRP with one or more vehicles, VRPTW and some heuristics for VRP. The practical part is solution of VRP by heuristic nearest neighbor. Product distribution is planed according to the firm settings in Prague. I compare existing situation and computed solution that show benefits of using described methods in conclusion.
115

Klasifikační metody analýzy vrstvy nervových vláken na sítnici / A Classification Methods for Retinal Nerve Fibre Layer Analysis

Zapletal, Petr January 2010 (has links)
This thesis is deal with classification for retinal nerve fibre layer. Texture features from six texture analysis methods are used for classification. All methods calculate feature vector from inputs images. This feature vector is characterized for every cluster (class). Classification is realized by three supervised learning algorithms and one unsupervised learning algorithm. The first testing algorithm is called Ho-Kashyap. The next is Bayess classifier NDDF (Normal Density Discriminant Function). The third is the Nearest Neighbor algorithm k-NN and the last tested classifier is algorithm K-means, which belongs to clustering. For better compactness of this thesis, three methods for selection of training patterns in supervised learning algorithms are implemented. The methods are based on Repeated Random Subsampling Cross Validation, K-Fold Cross Validation and Leave One Out Cross Validation algorithms. All algorithms are quantitatively compared in the sense of classication error evaluation.
116

Sinkhole Hazard Assessment in Minnesota Using a Decision Tree Model

Gao, Yongli, Alexander, E. Calvin 01 May 2008 (has links)
An understanding of what influences sinkhole formation and the ability to accurately predict sinkhole hazards is critical to environmental management efforts in the karst lands of southeastern Minnesota. Based on the distribution of distances to the nearest sinkhole, sinkhole density, bedrock geology and depth to bedrock in southeastern Minnesota and northwestern Iowa, a decision tree model has been developed to construct maps of sinkhole probability in Minnesota. The decision tree model was converted as cartographic models and implemented in ArcGIS to create a preliminary sinkhole probability map in Goodhue, Wabasha, Olmsted, Fillmore, and Mower Counties. This model quantifies bedrock geology, depth to bedrock, sinkhole density, and neighborhood effects in southeastern Minnesota but excludes potential controlling factors such as structural control, topographic settings, human activities and land-use. The sinkhole probability map needs to be verified and updated as more sinkholes are mapped and more information about sinkhole formation is obtained.
117

Découverte d'évènements par contenu visuel dans les médias sociaux / Visual-based event mining in social media

Trad, Riadh 05 June 2013 (has links)
L’évolution du web, de ce qui était typiquement connu comme un moyen de communication à sens unique en mode conversationnel, a radicalement changé notre manière de traiter l’information. Des sites de médias sociaux tels que Flickr et Facebook, offrent des espaces d’échange et de diffusion de l’information. Une information de plus en plus riche, mais aussi personnelle, et qui s’organise, le plus souvent, autour d’événements de la vie réelle. Ainsi, un événement peut être perçu comme un ensemble de vues personnelles et locales, capturées par différents utilisateurs. Identifier ces différentes instances permettrait, dès lors, de reconstituer une vue globale de l’événement. Plus particulièrement, lier différentes instances d’un même événement profiterait à bon nombre d’applications tel que la recherche, la navigation ou encore le filtrage et la suggestion de contenus. L’objectif principal de cette thèse est l’identification du contenu multimédia, associé à un événement dans de grandes collections d’images. Une première contribution est une méthode de recherche d’événements basée sur le contenu visuel. La deuxième contribution est une approche scalable et distribuée pour la construction de graphes des K plus proches voisins. La troisième contribution est une méthode collaborative pour la sélection de contenu pertinent. Plus particulièrement, nous nous intéresserons aux problèmes de génération automatique de résumés d’événements et suggestion de contenus dans les médias sociaux. / The ease of publishing content on social media sites brings to the Web an ever increasing amount of user generated content captured during, and associated with, real life events. Social media documents shared by users often reflect their personal experience of the event. Hence, an event can be seen as a set of personal and local views, recorded by different users. These event records are likely to exhibit similar facets of the event but also specific aspects. By linking different records of the same event occurrence we can enable rich search and browsing of social media events content. Specifically, linking all the occurrences of the same event would provide a general overview of the event. In this dissertation we present a content-based approach for leveraging the wealth of social media documents available on the Web for event identification and characterization. To match event occurrences in social media, we develop a new visual-based method for retrieving events in huge photocollections, typically in the context of User Generated Content. The main contributions of the thesis are the following : (1) a new visual-based method for retrieving events in photo collections, (2) a scalable and distributed framework for Nearest Neighbors Graph construction for high dimensional data, (3) a collaborative content-based filtering technique for selecting relevant social media documents for a given event.
118

Data mining inom tillverkningsindustrin : En fallstudie om möjligheten att förutspå kvalitetsutfall i produktionslinjer

Janson, Lisa, Mathisson, Minna January 2021 (has links)
I detta arbete har en fallstudie utförts på Volvo Group i Köping. I takt med ¨övergången till industri 4.0, ökar möjligheterna att använda maskininlärning som ett verktyg i analysen av industriell data och vidareutvecklingen av industriproduktionen. Detta arbete syftar till att undersöka möjligheten att förutspå kvalitetsutfall vid sammanpressning av nav och huvudaxel. Metoden innefattar implementering av tre maskininlärningsmodeller samt evaluering av dess prestation i förhållande till varandra. Vid applicering av modellerna på monteringsdata från fabriken erhölls ett bristfälligt resultat, vilket indikerar att det utifrån de inkluderade variablerna inte är möjligt att förutspå kvalitetsutfallet. Orsakerna som låg till grund för resultatet granskades, och det resulterade i att det förmodligen berodde på att modellerna var oförmögna att finna samband i datan eller att det inte fanns något samband i datasetet. För att avgöra vilken av dessa två faktorer som var avgörande skapades ett fabricerat dataset där tre nya variabler introducerades. De fabricerade värdena på dessa variabler skapades på sådant sätt att det fanns syntetisk kausalitet mellan två av variablerna och kvalitetsutfallet. Vid applicering av modellerna på den fabricerade datan, lyckades samtliga modeller identifiera det syntetiska sambandet. Utifrån det drogs slutsatsen att det bristfälliga resultatet inte berodde på modellernas prestation utan att det inte fanns något samband i datasetet bestående av verklig monteringsdata. Det här bidrog till bedömningen att om spårbarheten på komponenterna hade ökat i framtiden, i kombination med att fler maskiner i produktionslinjen genererade data till ett sammankopplat system, skulle denna studie kunna utföras igen, men med fler variabler och ett större dataset. Support vector machine var den modell som presterade bäst, givet de prestationsmått som användes i denna studie. Det faktum att modellerna som inkluderats i den här studien lyckades identifiera sambandet i datan, när det fanns vetskap om att sambandet existerade, motiverar användandet av dessa modeller i framtida studier. Avslutningsvis kan det konstateras att med förbättrad spårbarhet och en allt mer uppkopplad fabrik, finns det möjlighet att använda maskininlärningsmodeller som komponenter i större system för att kunna uppnå effektiviseringar. / As the adaptation towards Industry 4.0 proceeds, the possibility of using machine learning as a tool for further development of industrial production, becomes increasingly profound. In this paper, a case study has been conducted at Volvo Group in Köping, in order to investigate the wherewithals of predicting quality outcomes in the compression of hub and mainshaft. In the conduction of this study, three different machine learning models were implemented and compared amongst each other. A dataset containing data from Volvo’s production site in Köping was utilized when training and evaluating the models. However, the low evaluation scores acquired from this, indicate that the quality outcome of the compression could not be predicted given solely the variables included in that dataset. Therefore, a dataset containing three additional variables consisting of fabricated values and a known causality between two of the variables and the quality outcome, was also utilized. The purpose of this was to investigate whether the poor evaluation metrics resulted from a non-existent pattern between the included variables and the quality outcome, or from the models not being able to find the pattern. The performance of the models, when trained and evaluated on the fabricated dataset, indicate that the models were in fact able to find the pattern that was known to exist. Support vector machine was the model that performed best, given the evaluation metrics that were chosen in this study. Consequently, if the traceability of the components were to be enhanced in the future and an additional number of machines in the production line would transmit production data to a connected system, it would be possible to conduct the study again with additional variables and a larger data set. The fact that the models included in this study succeeded in finding patterns in the dataset when such patterns were known to exist, motivates the use of the same models. Furthermore, it can be concluded that with enhanced traceability of the components and a larger amount of machines transmitting production data to a connected system, there is a possibility that machine learning models could be utilized as components in larger business monitoring systems, in order to achieve efficiencies.
119

Statistics of Quantum Energy Levels of Integrable Systems and a Stochastic Network Model with Applications to Natural and Social Sciences

Ma, Tao 18 October 2013 (has links)
No description available.
120

Investigating the performance of matrix factorization techniques applied on purchase data for recommendation purposes

Holländer, John January 2015 (has links)
Automated systems for producing product recommendations to users is a relatively new area within the field of machine learning. Matrix factorization techniques have been studied to a large extent on data consisting of explicit feedback such as ratings, but to a lesser extent on implicit feedback data consisting of for example purchases.The aim of this study is to investigate how well matrix factorization techniques perform compared to other techniques when used for producing recommendations based on purchase data. We conducted experiments on data from an online bookstore as well as an online fashion store, by running algorithms processing the data and using evaluation metrics to compare the results. We present results proving that for many types of implicit feedback data, matrix factorization techniques are inferior to various neighborhood- and association rules techniques for producing product recommendations. We also present a variant of a user-based neighborhood recommender system algorithm \textit{(UserNN)}, which in all tests we ran outperformed both the matrix factorization algorithms and the k-nearest neighbors algorithm regarding both accuracy and speed. Depending on what dataset was used, the UserNN achieved a precision approximately 2-22 percentage points higher than those of the matrix factorization algorithms, and 2 percentage points higher than the k-nearest neighbors algorithm. The UserNN also outperformed the other algorithms regarding speed, with time consumptions 3.5-5 less than those of the k-nearest neighbors algorithm, and several orders of magnitude less than those of the matrix factorization algorithms.

Page generated in 0.0434 seconds