Spelling suggestions: "subject:"naive bayes"" "subject:"waive bayes""
91 |
在Spark大數據平台上分析DBpedia開放式資料:以電影票房預測為例 / Analyzing DBpedia Linked Open Data (LOD) on Spark:Movie Box Office Prediction as an Example劉文友, Liu, Wen Yu Unknown Date (has links)
近年來鏈結開放式資料 (Linked Open Data,簡稱LOD) 被認定含有大量潛在價值。如何蒐集與整合多元化的LOD並提供給資料分析人員進行資料的萃取與分析,已成為當前研究的重要挑戰。LOD資料是RDF (Resource Description Framework) 的資料格式。我們可以利用SPARQL來查詢RDF資料,但是目前對於大量RDF的資料除了缺少一個高性能且易擴展的儲存和查詢分析整合性系統之外,對於RDF大數據資料分析流程的研究也不夠完備。本研究以預測電影票房為例,使用DBpedia LOD資料集並連結外部電影資料庫 (例如:IMDb),並在Spark大數據平台上進行巨量圖形的分析。首先利用簡單貝氏分類與貝氏網路兩種演算法進行電影票房預測模型實例的建構,並使用貝氏訊息準則 (Bayesian Information Criterion,簡稱BIC) 找到最佳的貝氏網路結構。接著計算多元分類的ROC曲線與AUC值來評估本案例預測模型的準確率。 / Recent years, Linked Open Data (LOD) has been identified as containing large amount of potential value. How to collect and integrate multiple LOD contents for effective analytics has become a research challenge. LOD is represented as a Resource Description Framework (RDF) format, which can be queried through SPARQL language. But large amount of RDF data is lack of a high performance and scalable storage analysis system. Moreover, big RDF data analytics pipeline is far from perfect. The purpose of this study is to exploit the above research issue. A movie box office sale prediction scenario is demonstrated by using DBpedia with external IMDb movie database. We perform the DBpedia big graph analytics on the Apache Spark platform. The movie box office prediction for optimal model selection is first evaluated by BIC. Then, Naïve Bayes and Bayesian Network optimal model’s ROC and AUC values are obtained to justify our approach.
|
92 |
Evaluation of computational methods for data predictionErickson, Joshua N. 03 September 2014 (has links)
Given the overall increase in the availability of computational resources, and the importance of forecasting the future, it should come as no surprise that prediction is considered to be one of the most compelling and challenging problems for both academia and industry in the world of data analytics. But how is prediction done, what factors make it easier or harder to do, how accurate can we expect the results to be, and can we harness the available computational resources in meaningful ways? With efforts ranging from those designed to save lives in the moments before a near field tsunami to others attempting to predict the performance of Major League Baseball players, future generations need to have realistic expectations about prediction methods and analytics. This thesis takes a broad look at the problem, including motivation, methodology, accuracy, and infrastructure. In particular, a careful study involving experiments in regression, the prediction of continuous, numerical values, and classification, the assignment of a class to each sample, is provided. The results and conclusions of these experiments cover only the included data sets and the applied algorithms as implemented by the Python library. The evaluation includes accuracy and running time of different algorithms across several data sets to establish tradeoffs between the approaches, and determine the impact of variations in the size of the data sets involved. As scalability is a key characteristic required to meet the needs of future prediction problems, a discussion of some of the challenges associated with parallelization is included. / Graduate / 0984 / erickson@uvic.ca
|
93 |
[en] HYBRID INTELLIGENT SYSTEM FOR CLASSIFICATION OF NON-RESIDENTIAL ELECTRICITY CUSTOMERS PAYMENT PROFILES / [pt] SISTEMA INTELIGENTE HÍBRIDO PARA CLASSIFICAÇÃO DO PERFIL DE PAGAMENTO DOS CONSUMIDORES NÃO-RESIDENCIAIS DE ENERGIA ELÉTRICANORMA ALICE DA SILVA CARVALHO 26 March 2018 (has links)
[pt] O objetivo desta pesquisa é classificar o perfil de pagamento dos consumidores não-residenciais de energia elétrica, considerando conhecimento armazenado em base de dados de distribuidoras de energia elétrica. A motivação para desenvolvê-la surgiu da necessidade das distribuidoras por um modelo de suporte a formulação de estratégias capazes de reduzir o grau inadimplência. A metodologia proposta consiste em um sistema inteligente híbrido composto por módulos intercomunicativos que usam conhecimentos armazenados em base de dados para segmentar consumidores e, então, atingir o objetivo proposto. O sistema inicia-se com o módulo neural, que aloca as unidades consumidoras em grupos conforme similaridades (valor fatura, consumo, demanda medida/demanda contratada, intensidade energética e peso da conta no orçamento), em sequência, o módulo bayesiano, estabelece um escore entre 0 e 1 que permite predizer o perfil de pagamento das unidades considerando os grupos gerados e os atributos categóricos (atividade econômica, estrutura tarifária, mesorregião, natureza jurídica e porte empresarial) que caracterizam essas unidades. Os resultados revelaram que o sistema proposto estabelece razoável taxa de acerto na classificação do perfil de consumidores e, portanto, constitui uma importante ferramenta de suporte a formulação de estratégias para combate à inadimplência. Conclui-se que, o sistema híbrido proposto apresenta caráter generalista podendo ser adaptado e implementado em outros mercados. / [en] The objective of this research is to classify the non-residential electricity customer payment profiles regarding the knowledge stored in electricity distribution utilities databases. The motivation for development of the work from the need of electricity distribution by a support model to formulate strategies for tackling non-payment and late payment. The proposed methodology consists of
a hybrid intelligent system constituted by intercommunicating modules that use knowledge stored in database to customer segmentation and then achieve the proposed objective. The system begins with the neural module, which allocates the consuming units in groups according to similarities (bill amount, consumption, measured demand/contracted demand, energy intensity and share of the electricity
bill in the customer s income), in sequence, the Bayesian module establishes a score between 0 and 1 that allows to predict what payment profile of the units considering the generated groups and categorical attributes (business activity, tariff type, business size, mesoregion and company s legal form) that characterize these units. The results showed that the proposed system provides a reasonable
success rate when classifying customer profiles and thus constitutes an important tool in the formulation of strategies for tackling non-payment and late payment. In conclusion, the hybrid system proposed here is a generalist one and could usefully be adapted and implemented in other markets.
|
94 |
Predictive models for career progressionSoliman, Zakaria 08 1900 (has links)
No description available.
|
95 |
Redes probabilísticas de K-dependência para problemas de classificação binária / Redes probabilísticas de K-dependência para problemas de classificação bináriaSouza, Anderson Luiz de 28 February 2012 (has links)
Made available in DSpace on 2016-06-02T20:06:06Z (GMT). No. of bitstreams: 1
4338.pdf: 1335557 bytes, checksum: 8e0bef5711ff8c398be194e335deecec (MD5)
Previous issue date: 2012-02-28 / Universidade Federal de Sao Carlos / Classification consists in the discovery of rules of prediction to assist with planning and decision-making, being a continuously indispensable tool and a highly discussed subject in literature. As a special case in classification, we have the process of credit risk rating, within which there is interest in identifying good and bad paying customers through binary classification methods. Therefore, in many application backgrounds, as in financial, several techniques can be utilized, such as discriminating analysis, probit analysis, logistic regression and neural nets. However, the Probabilistic Nets technique, also known as Bayesian Networks, have showed itself as a practical convenient classification method with successful applications in several areas. In this paper, we aim to display the appliance of Probabilistic Nets in the classification scenario, specifically, the technique named K-dependence Bayesian Networks also known as KDB nets, as well as compared its performance with conventional techniques applied within context of the Credit Scoring and Medical diagnosis. Applications of the technique based in real and artificial datasets and its performance assisted by the bagging procedure will be displayed as results. / A classificação consiste na descoberta de regras de previsão para auxílio no planejamento e tomada de decisões, sendo uma ferramenta indispensável e um tema bastante discutido na literatura. Como caso especial de classificação, temos o processo de avaliação de risco de crédito, no qual temos o interesse de identificar clientes bons e maus pagadores através de métodos de classificação binária. Assim, em diversos enredos de aplicação, como nas financeiras, diversas técnicas podem ser utilizadas, tais como análise discriminante, análise probito, regressão logística e redes neurais. Porém, a técnica de Redes Probabilísticas, também conhecida como Redes Bayesianas, tem se mostrado um método prático de classificação e com aplicações bem sucedidas em diversos campos. Neste trabalho, visamos exibir a aplicação das Redes Probabilísticas no contexto de classificação, em específico, a técnica denominada Redes Probabilísticas com K-dependência, também conhecidas como redes KDB, bem como comparar seu desempenho com as técnicas convencionais aplicadas no contexto de Credit Scoring e Diagnose Médica. Exibiremos como resultado aplicações da técnica baseadas em conjuntos de dados reais e artificiais e seu desempenho auxiliado pelo procedimento de bagging.
|
96 |
Reconhecimento de padrões em rede social científica: aplicação do algoritmo Naive Bayes para classificação de papers no MendeleySombra, Tobias Ribeiro 22 March 2018 (has links)
Submitted by Priscilla Araujo (priscilla@ibict.br) on 2018-08-07T18:37:30Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tobias Sombra-Mestrado-2018.pdf: 2977663 bytes, checksum: b45309648a3be783327111ae5673abab (MD5) / Made available in DSpace on 2018-08-07T18:37:30Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tobias Sombra-Mestrado-2018.pdf: 2977663 bytes, checksum: b45309648a3be783327111ae5673abab (MD5)
Previous issue date: 2018-03-22 / Este trabalho apresenta uma pesquisa exploratória usando o algoritmo Naive Bayes
com capacidade para classificar documentos no Mendeley usando até cinco classes de saída,
definidas com base na quantidade de leitores dos documentos. Usando uma série de atributos
que foram encontrados durante a coleta de dados, é realizada a classificação para tentar
identificar padrões nos atributos, a fim de reconhecer lógicas sociais dos cientistas, que
envolve tanto o comportamento quanto sua dinâmica nas redes sociais científicas. Para fins de
concretização deste trabalho, foi aplicada uma Revisão Sistemática de Literatura, a fim de
buscar o estado da arte de pesquisas que envolvam o uso de Reconhecimento de Padrões em
Redes Sociais Científicas, além da aplicação de um método que envolve o uso de algoritmos
desenvolvidos para o tratamento automático de todos os dados coletados no Mendeley. / This work is an exploratory research using the Naive Bayes algorithm with the
ability to classify documents in Mendeley using the output classes, based on the amount of
reading of the documents. Using a series of data that was found during a data collection, a
classification is given to check the patterns in the attributes, an end to recognize the social
logics of the scientists, that involve both the behavior and its dynamics in scientific social
networks. For the purpose of this work, a literature systematic review was applied, with
emphasis on the use of methods that involve the use of social networking concepts, as well as
the application of a method for the use of algorithms. Created for automatic processing of all
data collected at Mendeley.
|
97 |
Využití vybraných metod strojového učení pro modelování kreditního rizika / Machine Learning Methods for Credit Risk ModellingDrábek, Matěj January 2017 (has links)
This master's thesis is divided into three parts. In the first part I described P2P lending, its characteristics, basic concepts and practical implications. I also compared P2P market in the Czech Republic, UK and USA. The second part consists of theoretical basics for chosen methods of machine learning, which are naive bayes classifier, classification tree, random forest and logistic regression. I also described methods to evaluate the quality of classification models listed above. The third part is a practical one and shows the complete workflow of creating classification model, from data preparation to evaluation of model.
|
98 |
Definition Extraction From Swedish Technical Documentation : Bridging the gap between industry and academy approachesHelmersson, Benjamin January 2016 (has links)
Terminology is concerned with the creation and maintenance of concept systems, terms and definitions. Automatic term and definition extraction is used to simplify this otherwise manual and sometimes tedious process. This thesis presents an integrated approach of pattern matching and machine learning, utilising feature vectors in which each feature is a Boolean function of a regular expression. The integrated approach is compared with the two more classic approaches, showing a significant increase in recall while maintaining a comparable precision score. Less promising is the negative correlation between the performance of the integrated approach and training size. Further research is suggested.
|
99 |
Efficient Feature Extraction for Shape Analysis, Object Detection and TrackingSolis Montero, Andres January 2016 (has links)
During the course of this thesis, two scenarios are considered. In the first one, we contribute to feature extraction algorithms. In the second one, we use features to improve object detection solutions and localization. The two scenarios give rise to into four thesis sub-goals. First, we present a new shape skeleton pruning algorithm based on contour approximation and the integer medial axis. The algorithm effectively removes unwanted branches, conserves the connectivity of the skeleton and respects the topological properties of the shape. The algorithm is robust to significant boundary noise and to rigid shape transformations. It is fast and easy to implement. While shape-based solutions via boundary and skeleton analysis are viable solutions to object detection, keypoint features are important for textured object detection. Therefore, we present a keypoint featurebased planar object detection framework for vision-based localization. We demonstrate that our framework is robust against illumination changes, perspective distortion, motion
blur, and occlusions. We increase robustness of the localization scheme in cluttered environments and decrease false detection of targets. We present an off-line target evaluation strategy and a scheme to improve pose. Third, we extend planar object detection to a real-time approach for 3D object detection using a mobile and uncalibrated camera. We develop our algorithm based on two novel naive Bayes classifiers for viewpoint and feature matching that improve performance and decrease memory usage. Our algorithm exploits the specific structure of various binary descriptors in order to boost feature matching by conserving descriptor properties. Our novel naive classifiers require a database with a small memory footprint because we only store efficiently encoded features. We improve the feature-indexing scheme to speed up the matching process creating a highly efficient database for objects. Finally, we present a model-free long-term tracking algorithm based on the Kernelized Correlation Filter. The proposed solution improves the correlation tracker based on precision, success, accuracy and robustness while increasing frame rates. We integrate adjustable Gaussian window and sparse features for robust scale estimation creating a better separation of the target and the background. Furthermore, we include fast descriptors and Fourier spectrum packed format to boost performance while decreasing the memory footprint. We compare our algorithm with state-of-the-art techniques to validate the results.
|
100 |
SLA violation prediction : a machine learning perspectiveAskari Hemmat, Reyhane 10 1900 (has links)
No description available.
|
Page generated in 0.0589 seconds