621 |
Reconhecimento de padrões em rede social científica: aplicação do algoritmo Naive Bayes para classificação de papers no MendeleySombra, Tobias Ribeiro 22 March 2018 (has links)
Submitted by Priscilla Araujo (priscilla@ibict.br) on 2018-08-07T18:37:30Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tobias Sombra-Mestrado-2018.pdf: 2977663 bytes, checksum: b45309648a3be783327111ae5673abab (MD5) / Made available in DSpace on 2018-08-07T18:37:30Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Tobias Sombra-Mestrado-2018.pdf: 2977663 bytes, checksum: b45309648a3be783327111ae5673abab (MD5)
Previous issue date: 2018-03-22 / Este trabalho apresenta uma pesquisa exploratória usando o algoritmo Naive Bayes
com capacidade para classificar documentos no Mendeley usando até cinco classes de saída,
definidas com base na quantidade de leitores dos documentos. Usando uma série de atributos
que foram encontrados durante a coleta de dados, é realizada a classificação para tentar
identificar padrões nos atributos, a fim de reconhecer lógicas sociais dos cientistas, que
envolve tanto o comportamento quanto sua dinâmica nas redes sociais científicas. Para fins de
concretização deste trabalho, foi aplicada uma Revisão Sistemática de Literatura, a fim de
buscar o estado da arte de pesquisas que envolvam o uso de Reconhecimento de Padrões em
Redes Sociais Científicas, além da aplicação de um método que envolve o uso de algoritmos
desenvolvidos para o tratamento automático de todos os dados coletados no Mendeley. / This work is an exploratory research using the Naive Bayes algorithm with the
ability to classify documents in Mendeley using the output classes, based on the amount of
reading of the documents. Using a series of data that was found during a data collection, a
classification is given to check the patterns in the attributes, an end to recognize the social
logics of the scientists, that involve both the behavior and its dynamics in scientific social
networks. For the purpose of this work, a literature systematic review was applied, with
emphasis on the use of methods that involve the use of social networking concepts, as well as
the application of a method for the use of algorithms. Created for automatic processing of all
data collected at Mendeley.
|
622 |
Využití vybraných metod strojového učení pro modelování kreditního rizika / Machine Learning Methods for Credit Risk ModellingDrábek, Matěj January 2017 (has links)
This master's thesis is divided into three parts. In the first part I described P2P lending, its characteristics, basic concepts and practical implications. I also compared P2P market in the Czech Republic, UK and USA. The second part consists of theoretical basics for chosen methods of machine learning, which are naive bayes classifier, classification tree, random forest and logistic regression. I also described methods to evaluate the quality of classification models listed above. The third part is a practical one and shows the complete workflow of creating classification model, from data preparation to evaluation of model.
|
623 |
Definition Extraction From Swedish Technical Documentation : Bridging the gap between industry and academy approachesHelmersson, Benjamin January 2016 (has links)
Terminology is concerned with the creation and maintenance of concept systems, terms and definitions. Automatic term and definition extraction is used to simplify this otherwise manual and sometimes tedious process. This thesis presents an integrated approach of pattern matching and machine learning, utilising feature vectors in which each feature is a Boolean function of a regular expression. The integrated approach is compared with the two more classic approaches, showing a significant increase in recall while maintaining a comparable precision score. Less promising is the negative correlation between the performance of the integrated approach and training size. Further research is suggested.
|
624 |
Predictive Place-Cell Sequences for Goal-Finding Emerge from Goal Memory and the Cognitive Map: A Computational ModelGönner, Lorenz, Vitay, Julien, Hamker, Fred 23 November 2017 (has links) (PDF)
Hippocampal place-cell sequences observed during awake immobility often represent previous experience, suggesting a role in memory processes. However, recent reports of goals being overrepresented in sequential activity suggest a role in short-term planning, although a detailed understanding of the origins of hippocampal sequential activity and of its functional role is still lacking. In particular, it is unknown which mechanism could support efficient planning by generating place-cell sequences biased toward known goal locations, in an adaptive and constructive fashion. To address these questions, we propose a model of spatial learning and sequence generation as interdependent processes, integrating cortical contextual coding, synaptic plasticity and neuromodulatory mechanisms into a map-based approach. Following goal learning, sequential activity emerges from continuous attractor network dynamics biased by goal memory inputs. We apply Bayesian decoding on the resulting spike trains, allowing a direct comparison with experimental data. Simulations show that this model (1) explains the generation of never-experienced sequence trajectories in familiar environments, without requiring virtual self-motion signals, (2) accounts for the bias in place-cell sequences toward goal locations, (3) highlights their utility in flexible route planning, and (4) provides specific testable predictions.
|
625 |
Efficient Feature Extraction for Shape Analysis, Object Detection and TrackingSolis Montero, Andres January 2016 (has links)
During the course of this thesis, two scenarios are considered. In the first one, we contribute to feature extraction algorithms. In the second one, we use features to improve object detection solutions and localization. The two scenarios give rise to into four thesis sub-goals. First, we present a new shape skeleton pruning algorithm based on contour approximation and the integer medial axis. The algorithm effectively removes unwanted branches, conserves the connectivity of the skeleton and respects the topological properties of the shape. The algorithm is robust to significant boundary noise and to rigid shape transformations. It is fast and easy to implement. While shape-based solutions via boundary and skeleton analysis are viable solutions to object detection, keypoint features are important for textured object detection. Therefore, we present a keypoint featurebased planar object detection framework for vision-based localization. We demonstrate that our framework is robust against illumination changes, perspective distortion, motion
blur, and occlusions. We increase robustness of the localization scheme in cluttered environments and decrease false detection of targets. We present an off-line target evaluation strategy and a scheme to improve pose. Third, we extend planar object detection to a real-time approach for 3D object detection using a mobile and uncalibrated camera. We develop our algorithm based on two novel naive Bayes classifiers for viewpoint and feature matching that improve performance and decrease memory usage. Our algorithm exploits the specific structure of various binary descriptors in order to boost feature matching by conserving descriptor properties. Our novel naive classifiers require a database with a small memory footprint because we only store efficiently encoded features. We improve the feature-indexing scheme to speed up the matching process creating a highly efficient database for objects. Finally, we present a model-free long-term tracking algorithm based on the Kernelized Correlation Filter. The proposed solution improves the correlation tracker based on precision, success, accuracy and robustness while increasing frame rates. We integrate adjustable Gaussian window and sparse features for robust scale estimation creating a better separation of the target and the background. Furthermore, we include fast descriptors and Fourier spectrum packed format to boost performance while decreasing the memory footprint. We compare our algorithm with state-of-the-art techniques to validate the results.
|
626 |
Remaining useful life estimation of critical components based on Bayesian Approaches. / Prédiction de l'état de santé des composants critiques à l'aide de l'approche BayesienneMosallam, Ahmed 18 December 2014 (has links)
La construction de modèles de pronostic nécessite la compréhension du processus de dégradation des composants critiques surveillés afin d’estimer correctement leurs durées de fonctionnement avant défaillance. Un processus de d´dégradation peut être modélisé en utilisant des modèles de Connaissance issus des lois de la physique. Cependant, cette approche n´nécessite des compétences Pluridisciplinaires et des moyens expérimentaux importants pour la validation des modèles générés, ce qui n’est pas toujours facile à mettre en place en pratique. Une des alternatives consiste à apprendre le modèle de dégradation à partir de données issues de capteurs installés sur le système. On parle alors d’approche guidée par des données. Dans cette thèse, nous proposons une approche de pronostic guidée par des données. Elle vise à estimer à tout instant l’état de santé du composant physique et prédire sa durée de fonctionnement avant défaillance. Cette approche repose sur deux phases, une phase hors ligne et une phase en ligne. Dans la phase hors ligne, on cherche à sélectionner, parmi l’ensemble des signaux fournis par les capteurs, ceux qui contiennent le plus d’information sur la dégradation. Cela est réalisé en utilisant un algorithme de sélection non supervisé développé dans la thèse. Ensuite, les signaux sélectionnés sont utilisés pour construire différents indicateurs de santé représentant les différents historiques de données (un historique par composant). Dans la phase en ligne, l’approche développée permet d’estimer l’état de santé du composant test en faisant appel au filtre Bayésien discret. Elle permet également de calculer la durée de fonctionnement avant défaillance du composant en utilisant le classifieur k-plus proches voisins (k-NN) et le processus de Gauss pour la régression. La durée de fonctionnement avant défaillance est alors obtenue en comparant l’indicateur de santé courant aux indicateurs de santé appris hors ligne. L’approche développée à été vérifiée sur des données expérimentales issues de la plateforme PRO-NOSTIA sur les roulements ainsi que sur des données fournies par le Prognostic Center of Excellence de la NASA sur les batteries et les turboréacteurs. / Constructing prognostics models rely upon understanding the degradation process of the monitoredcritical components to correctly estimate the remaining useful life (RUL). Traditionally, a degradationprocess is represented in the form of physical or experts models. Such models require extensiveexperimentation and verification that are not always feasible in practice. Another approach that buildsup knowledge about the system degradation over time from component sensor data is known as datadriven. Data driven models require that sufficient historical data have been collected.In this work, a two phases data driven method for RUL prediction is presented. In the offline phase, theproposed method builds on finding variables that contain information about the degradation behaviorusing unsupervised variable selection method. Different health indicators (HI) are constructed fromthe selected variables, which represent the degradation as a function of time, and saved in the offlinedatabase as reference models. In the online phase, the method estimates the degradation state usingdiscrete Bayesian filter. The method finally finds the most similar offline health indicator, to the onlineone, using k-nearest neighbors (k-NN) classifier and Gaussian process regression (GPR) to use it asa RUL estimator. The method is verified using PRONOSTIA bearing as well as battery and turbofanengine degradation data acquired from NASA data repository. The results show the effectiveness ofthe method in predicting the RUL.
|
627 |
Návrh a implementace Data Mining modelu v technologii MS SQL Server / Design and implementation of Data Mining model with MS SQL Server technologyPeroutka, Lukáš January 2012 (has links)
This thesis focuses on design and implementation of a data mining solution with real-world data. The task is analysed, processed and its results evaluated. The mined data set contains study records of students from University of Economics, Prague (VŠE) over the course of past three years. First part of the thesis focuses on theory of data mining, definition of the term, history and development of this particular field. Current best practices and meth-odology are described, as well as methods for determining the quality of data and methods for data pre-processing ahead of the actual data mining task. The most common data mining techniques are introduced, including their basic concepts, advantages and disadvantages. The theoretical basis is then used to implement a concrete data mining solution with educational data. The source data set is described, analysed and some of the data are chosen as input for created models. The solution is based on MS SQL Server data mining platform and it's goal is to find, describe and analyse potential as-sociations and dependencies in data. Results of respective models are evaluated, including their potential added value. Also mentioned are possible extensions and suggestions for further development of the solution.
|
628 |
Dolování z dat v jazyce Python / Data Mining with PythonŠenovský, Jakub January 2017 (has links)
The main goal of this thesis was to get acquainted with the phases of data mining, with the support of the programming languages Python and R in the field of data mining and demonstration of their use in two case studies. The comparison of these languages in the field of data mining is also included. The data preprocessing phase and the mining algorithms for classification, prediction and clustering are described here. There are illustrated the most significant libraries for Python and R. In the first case study, work with time series was demonstrated using the ARIMA model and Neural Networks with precision verification using a Mean Square Error. In the second case study, the results of football matches are classificated using the K - Nearest Neighbors, Bayes Classifier, Random Forest and Logical Regression. The precision of the classification is displayed using Accuracy Score and Confusion Matrix. The work is concluded with the evaluation of the achived results and suggestions for the future improvement of the individual models.
|
629 |
Zpracování uživatelských recenzí / Processing of User ReviewsCihlářová, Dita January 2019 (has links)
Very often, people buy goods on the Internet that they can not see and try. They therefore rely on reviews of other customers. However, there may be too many reviews for a human to handle them quickly and comfortably. The aim of this work is to offer an application that can recognize in Czech reviews what features of a product are most commented and whether the commentary is positive or negative. The results can save a lot of time for e-shop customers and provide interesting feedback to the manufacturers of the products.
|
630 |
Autonomní jednokanálový deinterleaving / Autonomous Single-Channel DeinterleavingTomešová, Tereza January 2021 (has links)
This thesis deals with an autonomous single-channel deinterleaving. An autonomous single-channel deinterleaving is a separation of the received sequence of impulses from more than one emitter to sequences of impulses from one emitter without a human assistance. Methods used for deinterleaving could be divided into single-parameter and multiple-parameter methods according to the number of parameters used for separation. This thesis primarily deals with multi-parameter methods. As appropriate methods for an autonomous single-channel deinterleaving DBSCAN and variational bayes methods were chosen. Selected methods were adjusted for deinterleaving and implemented in programming language Python. Their efficiency is examined on simulated and real data.
|
Page generated in 0.0293 seconds