291 |
Análisis de la relación existente entre el ewom generado por consumidoras de servicios de belleza en Facebook en Lima Metropolitana, de acuerdo con su puntuación o recomendación. Un enfoque desde el Text miningTorres Shuan, Nicole Ailen 28 November 2019 (has links)
Tema: Análisis de la relación existente entre el ewom generado por consumidoras de servicios de belleza en Facebook en Lima Metropolitana, de acuerdo con su puntuación o recomendación. Un enfoque desde el Text mining.
Objetivo: Usar técnicas de minería de texto en el procesamiento del ewom que ayude a explicar el grado de relación existente entre la puntuación/ valoración y el sentimiento de un comentario en el sector de belleza.
La presente investigación tiene como tema central el estudio del boca a boca electrónico o también conocido como ewom; y la coherencia existente entre esta variable y su acompañante (valoración/ puntuación); esta relación es medida a través de indicadores propuestos, de los cuales una variable representa la mayor influencia en el modelo, esta es la variable “sentimiento”. Para poder lograr el objetivo propuesto se realizaron estudios de tipo cualitativo y cuantitativo. El desarrollo cualitativo se centro en investigar el accionar de las consumidoras al dejar una opinión a modo de cocreación de valor para con las empresas del rubro de la belleza. El estudio cuantitativo fue progresivo; ya que, involucro el uso de diversas herramientas para el resultado final; en primera instancia se recolecto la base de opiniones, se aplicaron filtros, se analizaron los sentimientos de los comentarios con el software de “semantria for Excel”; y, por último, se realizo el análisis de regresiones con la herramienta estadística SPSS.
Es importante reconocer que los dos tipos de investigación ayudaron a afianzar el modelo; ya que, permitieron conocer el comportamiento actual de las usuarias peruanas de servicio de belleza en el canal digital; y si su aporte (ewom) estaba asociado con el sentimiento relativo a la satisfacción del servicio recibido. Al finalizar la investigación, se proponen recomendaciones a nivel digital (online) y servicio (offline) para generar una mayor satisfacción en las usuarias. / Topic: Analysis of relation between the ewom generated by the consumers of beauty services on Facebook in Metropolitan Lima, according to the asigned score or recommendation. An approach from Text mining
Objective: To use text mining techniques in the processing of the ewom that will help to explain the degree of relation between score/valuation and feeling of a comment in the beauty sector.
The central theme of this research is the study of electronic word-of-mouth or also known as ewom; and the existing coherence between this variable and its (assessment/scoring); this relationship is measured through indicators of which a variable represents the greatest influence on the model, this is the most important one is the "feeling" variable. In order to achieve the objective proposed, studies of the following type were carried out qualitative and quantitative. The qualitative development focused on researching the actions of the consumers by leaving an opinion in the form of co-creation of value for the consumers companies in the beauty sector. The study was progressive; since, it involved the use of various tools for the development of the final result; in the first instance, the basis of opinions was gathered, the following were applied filters, we analyzed the feelings of the comments with the software of "semantria for Excel"; and finally, regression analysis was performed with the tool SPSS statistics.
It is important to recognize that the two types of research helped to strengthen the model; since, they allowed to know the current behavior of the Peruvian users of beauty service on the digital channel; and if his contribution (ewom) was associated with the feeling relative to the satisfaction of the service. At the end of the research, recommendations are proposed at the digital level (online) and service (offline) to generate a greater satisfaction in the users of the service. / Trabajo de investigación
|
292 |
Srovnání sylabů předmětů na různých univerzitách dolováním znalosti z textuMoravcová, Libuše January 2018 (has links)
The thesis is focused on how to get the most accurate information about Universities, faculties, fields, and the syllabi of particular subjects of those Universities through text-mining tools. The first part describes the basics of text mining and related topics, collecting and creating data text background, turning them into the English language. In the next phase, the database will be generated from accumulated data entries. The purpose of the next step will be to obtain the most matching results such as specific phrases. The procedure of valorizing and summarizing will be used at the end of the thesis. In case of any problems, possible solutions or alternatives will be suggested.
|
293 |
Získavanie a analýza dát pre oblasť crowdfundinguKoštial, Martin January 2019 (has links)
The thesis deals with data acquisition from crowdfunding and their analysis. The theoretical part is focused on the description of available technologies and algorithms for data analysis. In the practical part the data collection is realized. Data mining and text mining algorithms are applied in this section for data.
|
294 |
Analýza textových používateľských hodnotení vybranej skupiny produktovValovič, Roman January 2019 (has links)
This work focuses on the design of a system that identifies frequently discussed product features in product reviews, summarizes them, and displays them to the user in terms of sentiment. The work deals with the issue of natural language processing, with a specific focus on Czech languague. The reader will be introduced the methods of preprocessing the text and their impact on the quality of the analysis results. The identification of the mainly discussed products features is carried out by cluster analysis using the K-Means algorithm, where we assume that sufficiently internally homogeneous clusters will represent the individual features of the products. A new area that will be explored in this work is the representation of documents using the Word embeddings technique, and its potential of using vector space as input for machine learning algorithms.
|
295 |
Ursäkta, vi har lite bråttom : Om automatisering för att effektivisera tillgängliggörandet av affärstryck / Would You Mind Hurrying Up Please : On Automatization as a Means of Improved Efficiency When Cataloging Commercial EphemeraHellgren, Andreas January 2019 (has links)
The demand on research libraries to digitize theircollections as a means of increasing the availabilityof said collections are increasing. However, a prerequisite for this is the cataloging of the collections – a task commonly associated with large demands on time and other resources. One way of handling this might be efforts in applying automatization as a part of the cataloging process. This thesis examines the possibilities of using automatization when catalog- ing commercial ephemera. For this, focus is directed towards the features of the material; the process of cataloging; and the demands on the catalogued mate- rial from its various users using a theory based on Monica J. Bates (2002) Cascade-model. By conducting a case study, consisting of observations based on contextual inquiry and interviews partly using photo elicitation, automatization of cataloging is found to be a possible way to improve availability, but not without its own complications and demands on re- sources. In conclusion, suggestions are made for considerations libraries should be aware of before automatization might be implemented at research libraries.
|
296 |
Using a Text Mining Approach to Examine Online Learning Research Trends of the Past 20 Years (1997-2016)Keahey, Heather Lynn 12 1900 (has links)
The purpose of this research is to identify longitudinal trends relevant to online learning research within 15 highly regarded, peer-reviewed publications in educational technology and online education. Online instruction has become a popular form of education delivery across academic institutions. A review of literature on the topic shows that missing from the corpus is a trend analysis focused in online learning research across multiple journals. Previous efforts of establishing trends in online learning are narrow in focus using only one journal or a shortened time frame. This metatrend analysis employed text mining techniques to examine twenty years (1997-2016) of published research in an effort to establish past, present and emerging trends within published literature. A general bibliometric analysis is offered highlighting prolific and yearly journal publications. Meaningful trending terms used during the twenty-year time period were identified and analyzed. A cluster analysis performed on the extracted data provides a single layer taxonomy regarding online learning research. Time trends within the clusters were identified to offer a more in-depth analysis. Trends revealed during the research indicate a changing relationship of online learning and distance education. A strong emphasis on students and learning was noted as a consistent trend throughout the literature. Emerging categories recognized include openness and mobility, game-based learning, and MOOCs. The intention of the research is to offer an overview of trends in online learning research in order to contribute to the ongoing dialogue concerning the development and delivery of online education.
|
297 |
Automatisk synonymgenerering med Word2Vec for query expansion inom e-handelKojic, Kemal, Petersson, Emil January 2018 (has links)
I detta arbete undersöks hur väl automatisk synonymgenerering genom maskininlärnings-metoden Word2Vec, som tränats över en datamängd från Google News på hundra miljarder ord, lämpar sig för query expansion inom ehandel. Detta görs genom användning av produkt- och eventdata från ett välkänt modebolag där synonymer genereras utifrån söksträngar som loggats i eventdata genom olika metoder som i sin tur bildar synonymböcker som används i framtida sökningar med hjälp av query expansion. För att kunna besvara studiens forskningsfrågor utförs först en kvantitativ analys. Denna analys utförs på data som matchade köp, produktträffar, no-hits och söktid. Information om denna data genereras utifrån en söksimulator som simulerar loggade händelser från användarsessioner i ett ehandelssystem. Därefter filtreras de genererade synonymböckerna genom att ta bort synonymer som är kopplade till de söksträngar som producerat ett sämre resultat i simuleringen med synonymer, än utan. För att validera vårt resultat från den kvantitativa analysen utförs även en kvalitativ analys på skillnaden i sökresultatet som de olika metoderna tar fram, där vi undersöker vad det är för produkter som tas fram med hjälp av synonymerna, för att undersöka dess relevans. Våra tester uppvisar att ett lägre tröskelvärde leder till fler produkträffar och minskar antalet no-hits. Antalet produktträffar ökades med mellan 4\%-10\%, no-hits reducerades med mellan 11\%-22\%. I de fall där söksträngen har tilldelats bra synonymer påverkas relevansen av produkterna positivt då fler relevanta produkter dyker upp i sökresultatet. I de fall där söksträngen har tilldelats mindre bra synonymer påverkas relevansen av produkterna negativt då vissa irrelevanta produkter dyker upp i sökresultatet som användaren antagligen inte vill se i sitt sökresultat. I alla fall där de automatiskt genererade synonymerna används så befinner sig majoriteten av alla köpta produkter i den första halvan av sökresultatet, däremot minskar antalet köpta produkter på den första platsen i sökresultatet i alla fallen. / In this thesis, we examine automatic synonym generation through the use of the machine learning algorithm Word2Vec that has been trained using a Google News data set containing a hundred million words to find out if it is suitable for query expansions in e-commerce. This is examined through the use of product- and event data from a well-known fashion company where synonyms are generated from search-queries that have been logged in the event data through different methods, resulting in thesaurus' that are used in future searches with the use of query expansions. In order to answer the thesis' research question, a quantitative analysis is performed. This analysis is performed on data such as matched payments, product matches, no-hits and search time. Information about this data is generated through a search simulator that simulates logged events from user sessions in a e-commerce system. The generated thesaurus' are later filtered through the removal of synonyms that are connected to search queries whose results have produced worse results than the results without synonyms. In order to validate our results from the quantitative analysis a qualitative analysis is also performed on the difference of the search result that the different methods produce. In this qualitative analysis we research what type of products that the added synonyms produce in order to understand the relevance of the search query. Our tests show that the lower the threshold is, the higher the number of product hits and the lower the number of no-hits. Our tests shows that the number of product hits was increased by between 4\%-10\%, the number of no-hits was reduced by 11\%-22\%. In all of the tests using automatically generated synonyms, the results show that the majority of the purchased products are presented in the first half of the search result, however, in all of the tests using automatically generated synonyms the number of purchases in the first position of the search result was reduced.
|
298 |
Semantik und Sentiment: Konzepte, Verfahren und Anwendungen von Text-MiningNeubauer, Nicolas 06 June 2014 (has links)
Diese Arbeit befasst sich mit zwei Themenbereichen des Data Mining beziehungsweise Text Mining, den zugehörigen algorithmischen Verfahren sowie Konzepten und untersucht mögliche Anwendungsszenarien. Auf der einen Seite wird das Gebiet der semantischen Ähnlichkeit besprochen. Kurz, der Frage, wie algorithmisch bestimmt werden kann, wie viel zwei Begriffe oder Konzepte miteinander zu tun haben. Die Technologie um das Wissen, dass etwa "Regen" ein Bestandteil von "Wetter" sein kann, ermöglicht verschiedenste Anwendungen. In dieser Arbeit wird ein Überblick über gängige Literatur gegeben, das Forschungsgebiet wird grob in die zwei Schulen der wissensbasierten und statistischen Methoden aufgeteilt und in jeder wird ein Beitrag durch Untersuchung vorhandener und Vorstellung eigener Ähnlichkeitsmaße geleistet. Eine Studie mit Probanden und ein daraus entstandener Datensatz liefert schließlich Einblicke in die Präferenzen von Menschen bezüglich ihrer Ähnlichkeitswahrnehmung. Auf der anderen Seite steht das Gebiet des Sentiment Mining, in dem versucht wird, algorithmisch aus großen Sammlungen unstrukturierten Texts, etwa Nachrichten von Twitter oder anderen sozialen Netzwerken, Stimmungen und Meinungen zu identifizieren und zu klassifizieren. Nach einer Besprechung zugehöriger Literatur wird der Aufbau eines neuen Testdatensatzes motiviert und die Ergebnisse der Gewinnung dieses beschrieben. Auf dieser neuen Grundlage erfolgt eine ausführliche Auswertung einer Vielzahl von Vorgehensweisen und Klassifikationsmethoden. Schließlich wird die praktische Nutzbarkeit der Ergebnisse anhand verschiedener Anwendungsszenarien bei Produkt-Präsentationen sowie Medien- oder Volksereignissen wie der Bundestagswahl nachgewiesen.
|
299 |
Contributions to Computational Methods for Association Extraction from Biomedical Data: Applications to Text Mining and In Silico ToxicologyRaies, Arwa B. 29 November 2018 (has links)
The task of association extraction involves identifying links between different entities. Here, we make contributions to two applications related to the biomedical field. The first application is in the domain of text mining aiming at extracting associations between methylated genes and diseases from biomedical literature. Gathering such associations can benefit disease diagnosis and treatment decisions. We developed the DDMGD database to provide a comprehensive repository of information related to genes methylated in diseases, gene expression, and disease progression. Using DEMGD, a text mining system that we developed, and with an additional post-processing, we extracted ~100,000 of such associations from free-text. The accuracy of extracted associations is 82% as estimated on 2,500 hand-curated entries. The second application is in the domain of computational toxicology that aims at identifying relationships between chemical compounds and toxicity effects. Identifying toxicity effects of chemicals is a necessary step in many processes including drug design. To extract these associations, we propose using multi-label classification (MLC) methods. These methods have not undergone
comprehensive benchmarking in the domain of predictive toxicology that could help in identifying guidelines for overcoming the existing deficiencies of these methods. Therefore, we performed extensive benchmarking and analysis of ~19,000 MLC models. We demonstrated variability in the performance of these models under several conditions
and determined the best performing model that achieves accuracy of 91% on an independent testing set. Finally, we propose a novel framework, LDR (learning from dense regions), for developing MLC and multi-target regression (MTR) models from datasets with missing labels. The framework is generic, so it can be applied to predict associations between samples and discrete or continuous labels. Our assessment shows that LDR performed better than the baseline approach (i.e., the binary relevance algorithm) when evaluated using four MLC and five MTR datasets. LDR achieved accuracy scores of up to 97% using testing MLC datasets, and R2 scores up to 88% for testing MTR datasets. Additionally, we developed a novel method for minority oversampling to tackle the problem of imbalanced MLC datasets. Our method improved the precision score of LDR by 10%.
|
300 |
Complexity penalized methods for structured and unstructured dataGoeva, Aleksandrina 08 November 2017 (has links)
A fundamental goal of statisticians is to make inferences from the sample about characteristics of the underlying population. This is an inverse problem, since we are trying to recover a feature of the input with the availability of observations on an output. Towards this end, we consider complexity penalized methods, because they balance goodness of fit and generalizability of the solution. The data from the underlying population may come in diverse formats - structured or unstructured - such as probability distributions, text tokens, or graph characteristics. Depending on the defining features of the problem we can chose the appropriate complexity penalized approach, and assess the quality of the estimate produced by it. Favorable characteristics are strong theoretical guarantees of closeness to the true value and interpretability. Our work fits within this framework and spans the areas of simulation optimization, text mining and network inference. The first problem we consider is model calibration under the assumption that given a hypothesized input model, we can use stochastic simulation to obtain its corresponding output observations. We formulate it as a stochastic program by maximizing the entropy of the input distribution subject to moment matching. We then propose an iterative scheme via simulation to approximately solve it. We prove convergence of the proposed algorithm under appropriate conditions and demonstrate the performance via numerical studies. The second problem we consider is summarizing text documents through an inferred set of topics. We propose a frequentist reformulation of a Bayesian regularization scheme. Through our complexity-penalized perspective we lend further insight into the nature of the loss function and the regularization achieved through the priors in the Bayesian formulation. The third problem is concerned with the impact of sampling on the degree distribution of a network. Under many sampling designs, we have a linear inverse problem characterized by an ill-conditioned matrix. We investigate the theoretical properties of an approximate solution for the degree distribution found by regularizing the solution of the ill-conditioned least squares objective. Particularly, we study the rate at which the penalized solution tends to the true value as a function of network size and sampling rate.
|
Page generated in 0.0361 seconds