Global ETD Search

1	Identifying New Customers Using Machine Learning : A case study on B2B-sales in the Swedish IT-consulting sector / Identifiering av nya kunder med hjälp av maskininlärning : En fallstudie om B2B-försäljning i den svenska IT-konsultsektorn Norlin, Patrik, Paulsrud, Viktor January 2017 (has links) In this thesis, we examine machine learning as a tool for predicting new cus- tomers in a B2B-sales context. Using only publicly available information, we try to solve the problem using two different approaches: 1) a naive clustering based classifier built on K-means and 2) PU-learning with a random forests- adapter. We test these models with different sets of features and evaluate them using statistical measures and a discussion of the business implications. Our main findings conclude that the PU-learning could produce results that are satisfactorily for the purpose of improving the sales process, with the best case of being 4.8 times better than a random baseline classifier. However, the clustering based classifier was not good enough, producing only marginally better results than a random classifier in its best case. We also find that us- ing more variables improved the models, even in high-dimensional spaces with over 60 variables. Machine Learning B2B Industrial Marketing PU-learning Computer Sciences Datavetenskap (datalogi)
2	Detección de opinion spam usando PU-learning Hernández Fusilier, Donato 20 July 2016 (has links) Tesis por compendio / [EN] Abstract The detection of false or true opinions about a product or service has become nowadays a very important problem. Recent studies show that up to 80% of people have changed their final decision on the basis of opinions checked on the web. Some of these opinions may be false, positive in order to promote a product/service or negative to discredit it. To help solving this problem in this thesis is proposed a new method for detection of false opinions, called PU-Learning, which increases the precision by an iterative algorithm. It also solves the problem of lack of labeled opinions. To operate the method proposed only a small set of opinions labeled as positive and another large set of opinions unlabeled are needed. From this last set, missing negative opinions are extracted and used to achieve a two classes binary classification. This scenario has become a very common situation in the available corpora. As a second contribution, we propose a representation based on n-grams of characters. This representation has the advantage of capturing both the content and the writing style, allowing for improving the effectiveness of the proposed method for the detection of false opinions. The experimental evaluation of the method was carried out by conducting three experiments classification of opinions, using two different collections. The results obtained in each experiment allow seeing the effectiveness of proposed method as well as differences between the use of several types of attributes. Because the veracity or falsity of the reviews expressed by users becomes a very important parameter in decision making, the method presented here, can be used in any corpus where you have the above characteristics. / [ES] Resumen La detección de opiniones falsas o verdaderas acerca de un producto o servicio, se ha convertido en un problema muy relevante de nuestra 'época. Según estudios recientes hasta el 80% de las personas han cambiado su decisión final basados en las opiniones revisadas en la web. Algunas de estas opiniones pueden ser falsas positivas, con la finalidad de promover un producto, o falsas negativas para desacreditarlo. Para ayudar a resolver este problema se propone en esta tesis un nuevo método para la detección de opiniones falsas, llamado PU-Learning modificado. Este método aumenta la precisión mediante un algoritmo iterativo y resuelve el problema de la falta de opiniones etiquetadas. Para el funcionamiento del método propuesto se utilizan un conjunto pequeño de opiniones etiquetadas como falsas y otro conjunto grande de opiniones no etiquetadas, del cual se extraen las opiniones faltantes y así lograr una clasificación de dos clases. Este tipo de escenario se ha convertido en una situación muy común en los corpus de opiniones disponibles. Como una segunda contribución se propone una representación basada en n-gramas de caracteres. Esta representación tiene la ventaja de capturar tanto elementos de contenido como del estilo de escritura, permitiendo con ello mejorar la efectividad del método propuesto en la detección de opiniones falsas. La evaluación experimental del método se llevó a cabo mediante tres experimentos de clasificación de opiniones utilizando dos colecciones diferentes. Los resultados obtenidos en cada experimento permiten ver la efectividad del método propuesto así como también las diferencias entre la utilización de varios tipos de atributos. Dado que la falsedad o veracidad de las opiniones vertidas por los usuarios, se convierte en un parámetro muy importante en la toma de decisiones, el método que aquí se presenta, puede ser utilizado en cualquier corpus donde se tengan las características mencionadas antes. / [CA] Resum La detecció d'opinions falses o vertaderes al voltant d'un producte o servei s'ha convertit en un problema força rellevant de la nostra època. Segons estudis recents, fins el 80\% de les persones han canviat la seua decisió final en base a les opinions revisades en la web. Algunes d'aquestes opinions poden ser falses positives, amb la finalitat de promoure un producte, o falses negatives per tal de desacreditarlo. Per a ajudar a resoldre aquest problema es proposa en aquesta tesi un nou mètode de detecció d'opinions falses, anomenat PU-Learning. Aquest mètode augmenta la precisió mitjançant un algoritme iteratiu i resol el problema de la falta d'opinions etiquetades. Per al funcionament del mètode proposat, s'utilitzen un conjunt reduït d'opinions etiquetades com a falses i un altre conjunt gran d'opinions no etiquetades, del qual se n'extrauen les opinions que faltaven i, així, aconseguir una classificació de dues classes. Aquest tipus d'escenari s'ha convertit en una situació molt comuna en els corpus d'opinions de què es disposa. Com una segona contribució es proposa una representació basada en n-gramas de caràcters. Aquesta representació té l'avantatge de capturar tant elements de contingut com a d'estil d'escriptura, permetent amb això millorar l'efectivitat del mètode proposat en la detecció d'opinions falses. L'avaluació experimental del mètode es va dur a terme mitjançant tres experiments de classificació d'opinions utilitzant dues coleccions diferents. Els resultats obtingut en cada experiment permeten veure l'efectivitat del mètode proposat, així com també les diferències entre la utilització de varis tipus d'atributs. Ja que la falsedat o veracitat de les opinions vessades pels usuaris es converteix en un paràmetre molt important en la presa de decisions, el mètode que ací es presenta pot ser utilitzat en qualsevol corpus on es troben les característiques abans esmentades. / Hernández Fusilier, D. (2016). Detección de opinion spam usando PU-learning [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/61990 / Compendio Opinión spam PU-Learning Opiniones falsas y verdaderas Opiniones favorables y desfavorables Minería de opiniones N-gramas de palabras N-gramas de caracteres LENGUAJES Y SISTEMAS INFORMATICOS
3	Clustering and Anomaly detection using Medical Enterprise system Logs (CAMEL) / Klustring av och anomalidetektering på systemloggar Ahlinder, Henrik, Kylesten, Tiger January 2023 (has links) Research on automated anomaly detection in complex systems by using log files has been on an upswing with the introduction of new deep-learning natural language processing methods. However, manually identifying and labelling anomalous logs is time-consuming, error-prone, and labor-intensive. This thesis instead uses an existing state-of-the-art method which learns from PU data as a baseline and evaluates three extensions to it. The first extension provides insight into the performance of the choice of word em-beddings on the downstream task. The second extension applies a re-labelling strategy to reduce problems from pseudo-labelling. The final extension removes the need for pseudo-labelling by applying a state-of-the-art loss function from the field of PU learning. The findings show that FastText and GloVe embeddings are viable options, with FastText providing faster training times but mixed results in terms of performance. It is shown that several of the methods studied in this thesis suffer from sporadically poor performances on one of the datasets studied. Finally, it is shown that using modified risk functions from the field of PU learning provides new state-of-the-art performances on the datasets considered in this thesis. Natural Language processing NLP Anomaly detection log anomaly detection Positive-Unlabelled learning Positive Unlabelled learning PULearning PU Learning PU nnPU CAMEL clustering

1

Page generated in 0.3686 seconds