121 |
Neural Networks for Tea Leaf ClassificationSilva, Jesús, Hernández Palma, Hugo, Niebles Núẽz, William, Ruiz-Lazaro, Alex, Varela, Noel 07 January 2020 (has links)
The process of classification of the raw material, is one of the most important procedures in any tea dryer, being responsible for ensuring a good quality of the final product. Currently, this process in most tea processing companies is usually handled by an expert, who performs the work manually and at his own discretion, which has a number of associated drawbacks. In this work, a solution is proposed that includes the planting, design, development and testing of a prototype that is able to correctly classify photographs corresponding to samples of raw material arrived at a dryer, using intelligence techniques (IA) type supervised for Classification by Artificial Neural Networks and not supervised with K-means Grouping for class preparation. The prototype performed well and is a reliable tool for classifying the raw material slammed into tea dryers.
|
122 |
Applying Machine Learning Algorithms for Anomaly Detection in Electricity Data : Improving the Energy Efficiency of Residential BuildingsGuss, Herman, Rustas, Linus January 2020 (has links)
The purpose of this thesis is to investigate how data from a residential property owner can be utilized to enable better energy management for their building stock. Specifically, this is done through the development of two machine learning models with the objective of detecting anomalies in the existing data of electricity consumption. The dataset consists of two years of residential electricity consumption for 193 substations belonging to the residential property owner Uppsalahem. The first of the developed models uses the K-means method to cluster substations with similar consumption patterns to create electricity profiles, while the second model uses Gaussian process regression to predict electricity consumption of a 24 hour timeframe. The performance of these models is evaluated and the optimal models resulting from this process are implemented to detect anomalies in the electricity consumption data. Two different algorithms for anomaly detection are presented, based on the differing properties of the two earlier models. During the evaluation of the models, it is established that the consumption patterns of the substations display a high variability, making it difficult to accurately model the full dataset. Both models are shown to be able to detect anomalies in the electricity consumption data, but the K-means based anomaly detection model is preferred due to it being faster and more reliable. It is concluded that substation electricity consumption is not ideal for anomaly detection, and that if a model should be implemented, it should likely exclude some of the substations with less regular consumption profiles.
|
123 |
Využitie strojového učenia v IS firmy hľadajúcej ľudské zdrojeŠkantárová, Martina January 2019 (has links)
The thesis is focused on an analysis of the recruitment process at the present time and the possibilities of usage of the machine learning in it. On this analysis is based a solution how to make the search process for suitable candidates in recruitment more effective by machine learning, specifically with the K-Means algorithm and the SOM neural network.
|
124 |
Analýza textových používateľských hodnotení vybranej skupiny produktovValovič, Roman January 2019 (has links)
This work focuses on the design of a system that identifies frequently discussed product features in product reviews, summarizes them, and displays them to the user in terms of sentiment. The work deals with the issue of natural language processing, with a specific focus on Czech languague. The reader will be introduced the methods of preprocessing the text and their impact on the quality of the analysis results. The identification of the mainly discussed products features is carried out by cluster analysis using the K-Means algorithm, where we assume that sufficiently internally homogeneous clusters will represent the individual features of the products. A new area that will be explored in this work is the representation of documents using the Word embeddings technique, and its potential of using vector space as input for machine learning algorithms.
|
125 |
Aplicación de Data Science SpecialistCcora Camarena, Yuli, Jeri De La Cruz, Nélida, Enriquez Yance, Rosario Grace 14 January 2020 (has links)
El trabajo de investigación que se presenta a continuación constituye el análisis de la problemática planteada sobre la empresa Travico Perú S.A.C, la cual ha reportado un descenso en sus ventas de sus diferentes servicios que ofrece.
Para este desarrollo de este trabajo se ha aplicado la metodología de la ciencia de datos, con la cual se logró identificar las variables que influyeron en las ventas de todos los servicios durante los años 2016 al 2018, el conjunto de datos se obtuvo a través de plataformas con las que la empresa trabaja y los reportes de control interno, con ello, se identificaron 12 variables con 6429 datos.
Así mismo, se empleó la técnica de aprendizaje automático no supervisado, basado en particiones: K means, las cual permitió segmentar y agrupar las variables que fueron seleccionadas.
Finalmente, para el análisis, se presentaron distintas gráficas con los resultados de las ventas de la empresa y se hicieron comparaciones con los resultados de los clústeres. / The research work presented below constitutes the analysis of the problem raised about the company Travico Perú S.A.C, which has reported a decrease in its sales of its different services offered.
For this development of this work, the methodology of data science has been applied, with which it has been identified to identify the variables that influenced the sales of all services during the years 2016 to 2018, the data set was achieved through of platforms with which the company works and internal control reports, thereby identifying 12 variables with 6429 data.
Furthermore, we use a technique machine learning without supervised, based on partitions: K means the qualified segment and group the variables that were selected.
Finally, for the analysis, different graphs are shown with the results of the company's sales and comparisons were made with the results of the clusters. / Trabajo de investigación
|
126 |
Summarization and keyword extraction on customer feedback data : Comparing different unsupervised methods for extracting trends and insight from textSkoghäll, Therése, Öhman, David January 2022 (has links)
Polestar has during the last couple of months more than doubled its amount of customer feedback, and the forecast for the future is that this amount will increase even more. Manually reading this feedback is expensive and time-consuming, and for this reason there's a need to automatically analyse the customer feedback. The company wants to understand the customer and extract trends and topics that concerns the consumer in order to improve the customer experience. Over the last couple of years as Natural Language Processing developed immensely, new state of the art language models have pushed the boundaries in all type of benchmark tasks. In this thesis have three different extractive summarization models and three different keyword extraction methods been tested and evaluated based on two different quantitative measures and human evaluation to extract information from text. This master thesis has shown that extractive summarization models with a Transformer-based text representation are best at capturing the context in a text. Based on the quantitative results and the company's needs, Textrank with a Transformer-based embedding was chosen as the final extractive summarization model. For Keywords extraction was the best overall model YAKE!, based on the quantitative measure and human validation
|
127 |
Towards Efficient Certificate Revocation Status Validation in Vehicular Ad Hoc Networks with Data MiningZhang, Qingwei January 2012 (has links)
Vehicular Ad hoc Networks (VANETs) are emerging as a promising approach to improving traffic safety and providing a wide range of wireless applications for drivers and passengers. To perform reliable and trusted vehicular communications, one prerequisite is to ensure a peer vehicle’s credibility by means of digital certificates validation from messages that are sent out by other vehicles. However, in vehicular communication systems, certificates validation is more time consuming than in traditional networks, due to the fact that each vehicle receives a large number of messages in a short period of time. Another issue that needs to be addressed is the unsuccessful delivery of information between vehicles and other entities on the road as a result of their high mobility rate. For these reasons, we need new solutions to accelerate the process of certificates validation. In this thesis, we propose a certificate revocation status validation scheme using the concept of clustering; based on data mining practices, which can meet the aforementioned requirements. We employ the technique of k -means clustering to boost the efficiency of certificates validation, thereby enhancing the security of a vehicular ad hoc network. Additionally, a comprehensive analysis of the security of the proposed scheme is presented. The analytical results demonstrate that this scheme can effectively improve the validation of certificates and thus secure the vehicular communication in vehicular networks.
|
128 |
Principal points, principal curves and principal surfacesGaney, Raeesa January 2015 (has links)
The idea of approximating a distribution is a prominent problem in statistics. This dissertation explores the theory of principal points and principal curves as approximation methods to a distribution. Principal points of a distribution have been initially introduced by Flury (1990) who tackled the problem of optimal grouping in multivariate data. In essence, principal points are the theoretical counterparts of cluster means obtained by the k-means algorithm. Principal curves defined by Hastie (1984), are smooth one-dimensional curves that pass through the middle of a p-dimensional data set, providing a nonlinear summary of the data. In this dissertation, details on the usefulness of principal points and principal curves are reviewed. The application of principal points and principal curves are then extended beyond its original purpose to well-known computational methods like Support Vector Machines in machine learning.
|
129 |
Micro-Raman Imaging for Biology with Multivariate Spectral AnalysisMalvaso, Federica 05 May 2015 (has links)
Raman spectroscopy is a noninvasive technique that can provide complex information on the vibrational state of the molecules. It defines the unique fingerprint that allow the identification of the various chemical components within a given sample. The aim of the following thesis work is to analyze Raman maps related to three pairs of different cells, highlighting differences and similarities through multivariate algorithms. The first pair of analyzed cells are human embryonic stem cells (hESCs), while the other two pairs are induced pluripotent stem cells (iPSCs) derived from T lymphocytes and keratinocytes, respectively. Although two different multivariate techniques were employed, ie Principal Component Analysis and Cluster Analysis, the same results were achieved: the iPSCs derived from T-lymphocytes show a higher content of genetic material both compared with the iPSCs derived from keratinocytes and the hESCs . On the other side, equally evident, was that iPS cells derived from keratinocytes assume a molecular distribution very similar to hESCs.
|
130 |
Exploring the weather impact on bike sharing usage through a clustering analysisQuach, Jessica January 2020 (has links)
Today bike sharing systems exists in many cities around the globe after a recent growth and popularity in the last decades. It is attractive for cities and users who wants to promote healthier lifestyles; to reduce air pollution and gas emission as well as improve traffic. One major challenge to docked bike sharing system is redistributing bikes and balancing dock stations. There are studies that propose models that can help forecasting bike usage; strategies for rebalancing bike distribution; establish patterns or how to identify patterns. Some of these studies proposes to extend the approach by including weather data. Some had limitations and did not include weather data. This study aims to extend upon these proposals and opportunities to explore on how and in what magnitude weather impacts bike usage. Bike usage data and weather data are gathered for the city of Washington D.C. and are analyzed by using a clustering algorithm called k-means. K-means is suitable for discovering patterns within the data by grouping (clustering) similar instances, which literature review also advocated. In this project, the k-means algorithm managed to identify three clusters that corresponds to bike usage depending on weather. The results show that weather impact on bike usage was noticeable between clusters. It showed that temperature followed by precipitation weighted the most, out of five weather variables. Results also supported that the use of k-means was appropriate for this type of study.
|
Page generated in 0.0578 seconds