Global ETD Search

1	AI Approaches for Classification and Attribute Extraction in Text Magnusson, Ludvig, Rovala, Johan January 2017 (has links) As the amount of data online grows, the urge to use this data for different applications grows as well. Machine learning can be used with the intent to reconstruct and validate the data you are interested in. Although the problem is very domain specific, this report will attempt to shed some light on what we call strategies for classification, which in broad terms mean, a set of steps in a process where the end goal is to have classified some part of the original data. As a result, we hope to introduce clarity into the classification process in detail as well as from a broader perspective. The report will investigate two classification objectives, one of which is dependent on many variables found in the input data and one that is more literal and only dependent on one or two variables. Specifically, the data we will classify are sales-objects. Each sales-object has a text describing the object and a related image. We will attempt to place these sales-objects into the correct product category. We will also try to derive the year of creation and it’s dimensions such as height and width. Different approaches are presented in the aforementioned strategies in order to classify such attributes. The results showed that for broader attributes such as a product category, supervised learning is indeed an appropriate approach, while the same can not be said for narrower attributes, which instead had to rely on entity recognition. Experiments on image analytics in conjunction with supervised learning proved image analytics to be a good addition when requiring a higher precision score. text classification feature extraction machine learning scikit Software Engineering Programvaruteknik
2	Comparative analysis for filtering toxic messages using machine learning models / Jämförande analys för filtrering av olämpliga meddelanden med maskininlärningsmodeller Murman, Mats-Hjalmar, Lundin, Jacob January 2022 (has links) Online communication has become prevalent within today’s society. The issue with such platforms is that people are allowed to express what they want without repercussion. Consequently, toxicity on these platforms becomes common. One approach to limit such inappropriate messages could be using a filtering method. The thesis will discuss how to create a toxicity filter using machine learning along with an API for filtering messages by using the models created. The study also analyse which models perform the best in terms of three metrics: accuracy, precision and recall. The results indicate that KNN had the best result for predicting multiple variables while SVC and Logistic Regression worked best on single variable. Thus, making machine learning a viable method for filtering toxic messages. / Online kommunikation har blivit allmänt förekommande i dagens sammhälle. Ett problem som har uppstått är att man kan säga vad som helst utan åtanke. En konsekvens av detta blir att opassande medelanden förekommer i stor grad. För att begränsa olämpliga meddelanden kan ett filter användas. Rapporten kommer att disktuera hur ett sådant filter kan göras med hjälp av maskininlärning och sedan implementera till ett API. Denna rapport kommer även att analysera vilken model som fungerar bäst inom noggrannhet, precision, och återkallelse. Resultaten av denna rapport visar att KNN hade bästa resultat för flera variabler men Logistic Regression var bäst på en enskild variabel. Machine learning Regression Classifier Toxicity filter API Tensorflow Scikit-learn maskininlärning regression klassifiering olämplighetsfilter API Tensorflow Scikit-learn
3	Creating a Back Stock to Increase Order Delivery and Pickup Availability / Framtagning av ett baklager för att öka tillgängligheten av leverans och upphämtning av ordrar Nguyen, John, Lindén, Kasper January 2019 (has links) Apotek Hjärtat wants to keep developing their e-commerce website and improve retrieval and delivery of orders to customers. Click and Collect and Click and Express are two options for retrieving e-commerce orders that are available if all products in the order are present in the store. By implementing a back stock in the stores with popular e-commercial items, all products of an order will more often be present in the store. The back stock will in such a way increase the availability of Click and Collect and Click and Express. The goals for the study are to conduct a pilot study, compare methods and possible solutions to implement a model to reach the goals. The pilot study was made by studying previous works in mathematical statistics methods and machine learning methods. The statistical method was accomplished through the analytical tool Statistical Package for the Social Sciences (SPSS) and Java. The machine learning method was accomplished through Python and the Scikit-learn library. The machine learning method was performed by a regression algorithm that was used to find relations between category sales and pollen forecasts. The statistical and machine learning methods were compared to each other. Both gave identical results, but the machine learning method was more functional and easier to further develop and consequently was chosen. Several models were created for a few selected product categories. The categories that did not work for the models had an unrealistic amount of sold products. These amounts could be negative or extremely high when unknown inputs were introduced. A simulation was made of the back stock to estimate how it would increase the availability of Click and Collect/Click and Express. The machine learning models could need more data for more accurate predictions. A conclusion could be made though that is possible to predict the amount of sold products of certain categories such as Allergy and Child Medicine with pollen halt taken into account. / Apotek Hjärtat vill fortsätta utveckla sin e-handelssida och förbättra upphämtning och leverans av ordrar till kund. Click and Collect och Click and Express är två val för att hämta upp e-handelsordrar som finns tillgängliga om alla produkter i ordern finns i butik. Genom att implementera ett baklager i butiker med populära unika ehandelsprodukter kommer alla produkter i en order oftare att finnas i butik. Baklagret kommer på så vis öka tillgängligheten av Click and Collect och Click and Express. Målen är att utföra en förstudie, samt att jämföra och hitta en bra lösning att implementera en modell för att uppnå målen. Förstudien gick ut på att analysera tidigare arbeten inom matematiska statistikmetoder och maskininlärningsmetoder. Den statistiska metoden utfördes genom det analytiska verktyget Statistical Package for the Social Sciences (SPSS) och Java. Maskininlärningsmetoden utvecklades med hjälp av Python och Scikit-learn biblioteket. Maskinlärningsmetoden utfördes genom en regressionsalgoritm som användes för att ta fram flera modeller för relationer mellan försäljning av kategorier och pollenprognoser. Statistiska metoden och maskininlärningsmetoden jämfördes med varandra. Båda gav identiska resultat men maskininlärning var mer funktionellt och enklare att vidareutveckla och därför valdes den metoden. Flera olika modeller lyckades tas fram för en del produktkategorier. De kategorier som inte fungerade för modellerna hade orealistiska mängder sålda varor. Dessa mängder kunde vara negativa eller extremt höga när okända inputs introducerades. Med hjälp av simulationen var det möjligt att uppskatta hur baklagret skulle öka tillgängligheten av Click and Collect/Express. Maskininlärningsmodellerna skulle behöva mer data, som kommer i framtiden, för att ge en mer precis prediktering mellan pollenvärden. Som slutsats är det möjligt att använda dem i framtiden för vissa kategorier som allergi och barnmedicin. e-commerce back stock statistics supervised machine learning linear regression Scikit-learn e-handel baklager statistik övervakad maskininlärning linjär regression Scikit-learn Computer Engineering Datorteknik
4	Сбор и анализ данных из открытых источников для разработки рекомендательной системы в сфере туризма : магистерская диссертация / Collection and analysis of data from open sources to develop a recommendation system in the field of tourism Крайнов, А. И., Krainov, A. I. January 2023 (has links) В данной дипломной работе была поставлена цель разработки эффективной рекомендательной системы для туристических достопримечательностей на основе графов и алгоритмов машинного обучения. Основная задача состояла в создании системы, которая может анализировать обширный набор данных о туристических достопримечательностях, извлекаемых из Википедии. Используя дампы Википедии, содержащие информацию о миллионах статей, был выполнен обзор существующих рекомендательных систем и методов машинного обучения, применяемых для предоставления рекомендаций в области туризма. Затем были выбраны определенные категории туристических достопримечательностей, которые были использованы для построения моделей рекомендаций. Для обработки и анализа данных из Википедии был использован современный технический стек инструментов, включающий Python, библиотеки networkx и pandas для работы с графами и данными, а также библиотеку scikit-learn для применения алгоритмов машинного обучения. Кроме того, для разработки интерактивного веб-интерфейса был использован фреймворк Streamlit. Процесс работы включал сбор и предварительную обработку данных из Википедии, включая информацию о достопримечательностях, связях между ними и характеристиках. Для создания графа данных на основе загруженных и обработанных данных были применены выбранные алгоритмы машинного обучения. Алгоритм PageRank был использован для определения важности каждой достопримечательности в графе и формирования персонализированных рекомендаций. Демонстрационный пользовательский интерфейс, разработанный на основе фреймворка Streamlit, позволяет пользователям взаимодействовать с системой, вводить запросы о местах и получать персонализированные рекомендации. С помощью выпадающего списка можно выбрать конкретную достопримечательность, к которой требуется получить рекомендации, а с помощью ползунка можно настроить количество рекомендаций. / This thesis aimed to develop an effective recommendation system for tourist attractions based on graphs and machine learning algorithms. The main challenge was to create a system that can analyze a large set of tourist attraction data extracted from Wikipedia. Using Wikipedia dumps containing information on millions of articles, a review of existing recommender systems and machine learning methods used to provide recommendations in the field of tourism was performed. Specific categories of tourist attractions were then selected and used to build recommendation models. To process and analyze data from Wikipedia, a modern technical stack of tools was used, including Python, the networkx and pandas libraries for working with graphs and data, as well as the scikit-learn library for applying machine learning algorithms. In addition, the Streamlit framework was used to develop an interactive web interface. The work process included the collection and preliminary processing of data from Wikipedia, including information about attractions, connections between them and characteristics. Selected machine learning algorithms were applied to create a data graph based on the downloaded and processed data. The PageRank algorithm was used to determine the importance of each point of interest in the graph and generate personalized recommendations. The demo user interface, developed using the Streamlit framework, allows users to interact with the system, enter queries about places and receive personalized recommendations. Using the drop-down list, you can select a specific attraction for which you want to receive recommendations, and using the slider, you can adjust the number of recommendations. ВИКИПЕДИЯ PAGERANK PYTHON ВЕБ-ИНТЕРФЕЙС NETWORKX PANDAS MASTER'S THESIS WIKIPEDIA PAGERANK PYTHON WEB INTERFACE NETWORKX PANDAS SCIKIT-LEARN FRAMEWORK STREAMLIT
5	Návrh systému pro doporučování pracovních příležitostí / Design of a system for recommending job opportunities Paulavets, Anastasiya January 2014 (has links) This thesis deals with recommender systems in the field of e-recruitment. The main objective is to design a job recommender system for career portal UNIjobs.cz. First, the theoretical background of recommender systems is provided. In the following part, specific properties of job recommender systems are discussed, as well as existing approaches to recommendation in the e-recruitment environment. The last part of the thesis is dedicated to designing a recommender system for career portal UNIjobs.cz. The output of that part is the main contribution of the thesis.
6	Bioinformatický nástroj pro klasifikaci bakterií do taxonomických kategorií na základě sekvence genu 16S rRNA / Bioinformatic Tool for Classification of Bacteria into Taxonomic Categories Based on the Sequence of 16S rRNA Gene Valešová, Nikola January 2019 (has links) Tato práce se zabývá problematikou automatizované klasifikace a rozpoznávání bakterií po získání jejich DNA procesem sekvenování. V rámci této práce je navržena a popsána nová metoda klasifikace založená na základě segmentu 16S rRNA. Představený princip je vytvořen podle stromové struktury taxonomických kategorií a používá známé algoritmy strojového učení pro klasifikaci bakterií do jedné ze tříd na nižší taxonomické úrovni. Součástí práce je dále implementace popsaného algoritmu a vyhodnocení jeho přesnosti predikce. Přesnost klasifikace různých typů klasifikátorů a jejich nastavení je prozkoumána a je určeno nastavení, které dosahuje nejlepších výsledků. Přesnost implementovaného algoritmu je také porovnána s několika existujícími metodami. Během validace dosáhla implementovaná aplikace KTC více než 45% přesnosti při predikci rodu na datových sadách BLAST 16S i BLAST V4. Na závěr je zmíněno i několik možností vylepšení a rozšíření stávající implementace algoritmu.
7	Analýza sociálních sítí využitím metod rozpoznání vzoru / Social Network Analysis using methods of pattern recognition Križan, Viliam January 2015 (has links) Diplomová práca sa zaoberá rozpoznávaním emócií z textu v sociálnych sieťach. Práca popisuje súčasné metódy extrakcie príznakov, používané lexikóny, korpusy a klasifikátory. Emócie boli rozpoznávané na základe klasifikátoru, netrénovaného na anotovaných dátach z mikroblogovacej siete Twitter. Výhodou použitia služby Twitter, bolo geografické vymedzenie dát, ktoré umožňuje sledovanie zmien emócií populácie v rôznych mestách. Prvým prístupom klasifikácie bolo vytvorenie Baseline algoritmu, ktorý používal jednoduchý lexikón. Pre zlepšenie klasifikácie sme v druhom bode použili komplexnejší SVM klasifikátor. SVM klasifikátory, extrakcie a selekcie príznakov boli použité z dostupnej Python knižnice Scikit. Dáta pre natrénovanie klasifikátoru boli zhromažďované z oblasti USA, a to s pomocou vytvorenej aplikácie. Klasifikátor bol natrénovaný na dátach, označených pri ich zhromažďovaní - bez manuálnej anotácie. Boli použité dve rôzne implantácie SVM klasifikátorov. Výsledné klasifikované emócie, v rôznych mestách a dňoch, boli zobrazené v podobe farebných značiek na mape.
8	Near Real-time Detection of Masquerade attacks in Web applications : catching imposters using their browsing behavor Panopoulos, Vasileios January 2016 (has links) This Thesis details the research on Machine Learning techniques that are central in performing Anomaly and Masquerade attack detection. The main focus is put on Web Applications because of their immense popularity and ubiquity. This popularity has led to an increase in attacks, making them the most targeted entry point to violate a system. Speciﬁcally, a group of attacks that range from identity theft using social engineering to cross site scripting attacks, aim at exploiting and masquerading users. Masquerading attacks are even harder to detect due to their resemblance with normal sessions, thus posing an additional burden. Concerning prevention, the diversity and complexity of those systems makes it harder to deﬁne reliable protection mechanisms. Additionally, new and emerging attack patterns make manually conﬁgured and Signature based systems less eﬀective with the need to continuously update them with new rules and signatures. This leads to a situation where they eventually become obsolete if left unmanaged. Finally the huge amount of traﬃc makes manual inspection of attacks and False alarms an impossible task. To tackle those issues, Anomaly Detection systems are proposed using powerful and proven Machine Learning algorithms. Gravitating around the context of Anomaly Detection and Machine Learning, this Thesis initially deﬁnes several basic deﬁnitions such as user behavior, normality and normal and anomalous behavior. Those deﬁnitions aim at setting the context in which the proposed method is targeted and at deﬁning the theoretical premises. To ease the transition into the implementation phase, the underlying methodology is also explained in detail. Naturally, the implementation is also presented, where, starting from server logs, a method is described on how to pre-process the data into a form suitable for classiﬁcation. This preprocessing phase was constructed from several statistical analyses and normalization methods (Univariate Selection, ANOVA) to clear and transform the given logs and perform feature selection. Furthermore, given that the proposed detection method is based on the source and1request URLs, a method of aggregation is proposed to limit the user privacy and classiﬁer over-ﬁtting issues. Subsequently, two popular classiﬁcation algorithms (Multinomial Naive Bayes and Support Vector Machines) have been tested and compared to deﬁne which one performs better in our given situations. Each of the implementation steps (pre-processing and classiﬁcation) requires a number of diﬀerent parameters to be set and thus a method called Hyper-parameter optimization is deﬁned. This method searches for the parameters that improve the classiﬁcation results. Moreover, the training and testing methodology is also outlined alongside the experimental setup. The Hyper-parameter optimization and the training phases are the most computationally intensive steps, especially given a large number of samples/users. To overcome this obstacle, a scaling methodology is also deﬁned and evaluated to demonstrate its ability to handle larger data sets. To complete this framework, several other options have been also evaluated and compared to each other to challenge the method and implementation decisions. An example of this, is the "Transitions-vs-Pages" dilemma, the block restriction eﬀect, the DR usefulness and the classiﬁcation parameters optimization. Moreover, a Survivability Analysis is performed to demonstrate how the produced alarms could be correlated aﬀecting the resulting detection rates and interval times. The implementation of the proposed detection method and outlined experimental setup lead to interesting results. Even so, the data-set that has been used to produce this evaluation is also provided online to promote further investigation and research on this ﬁeld. / Det här arbetet behandlar forskningen på maskininlärningstekniker som är centrala i utförandet av detektion av anomali- och maskeradattacker. Huvud-fokus läggs på webbapplikationer på grund av deras enorma popularitet och att de är så vanligt förekommande. Denna popularitet har lett till en ökning av attacker och har gjort dem till den mest utsatta punkten för att bryta sig in i ett system. Mer specifikt så syftar en grupp attacker som sträcker sig från identitetsstölder genom social ingenjörskonst, till cross-site scripting-attacker, på att exploatera och maskera sig som olika användare. Maskeradattacker är ännu svårare att upptäcka på grund av deras likhet med vanliga sessioner, vilket utgör en ytterligare börda. Vad gäller förebyggande, gör mångfalden och komplexiteten av dessa system det svårare att definiera pålitliga skyddsmekanismer. Dessutom gör nya och framväxande attackmönster manuellt konfigurerade och signaturbaserade system mindre effektiva på grund av behovet att kontinuerligt uppdatera dem med nya regler och signaturer. Detta leder till en situation där de så småningom blir obsoleta om de inte sköts om. Slutligen gör den enorma mängden trafik manuell inspektion av attacker och falska alarm ett omöjligt uppdrag. För att ta itu med de här problemen, föreslås anomalidetektionssystem som använder kraftfulla och beprövade maskininlärningsalgoritmer. Graviterande kring kontexten av anomalidetektion och maskininlärning, definierar det här arbetet först flera enkla definitioner såsom användarbeteende, normalitet, och normalt och anomalt beteende. De här definitionerna syftar på att fastställa sammanhanget i vilket den föreslagna metoden är måltavla och på att definiera de teoretiska premisserna. För att under-lätta övergången till implementeringsfasen, förklaras även den bakomliggande metodologin i detalj. Naturligtvis presenteras även implementeringen, där, med avstamp i server-loggar, en metod för hur man kan för-bearbeta datan till en form som är lämplig för klassificering beskrivs. Den här för´-bearbetningsfasen konstruerades från flera statistiska analyser och normaliseringsmetoder (univariate se-lection, ANOVA) för att rensa och transformera de givna loggarna och utföra feature selection. Dessutom, givet att en föreslagen detektionsmetod är baserad på käll- och request-URLs, föreslås en metod för aggregation för att begränsa problem med överanpassning relaterade till användarsekretess och klassificerare. Efter det så testas och jämförs två populära klassificeringsalgoritmer (Multinomialnaive bayes och Support vector machines) för att definiera vilken som fungerar bäst i våra givna situationer. Varje implementeringssteg (för-bearbetning och klassificering) kräver att ett antal olika parametrar ställs in och således definieras en metod som kallas Hyper-parameter optimization. Den här metoden söker efter parametrar som förbättrar klassificeringsresultaten. Dessutom så beskrivs tränings- och test-ningsmetodologin kortfattat vid sidan av experimentuppställningen. Hyper-parameter optimization och träningsfaserna är de mest beräkningsintensiva stegen, särskilt givet ett stort urval/stort antal användare. För att övervinna detta hinder så definieras och utvärderas även en skalningsmetodologi baserat på dess förmåga att hantera stora datauppsättningar. För att slutföra detta ramverk, utvärderas och jämförs även flera andra alternativ med varandra för att utmana metod- och implementeringsbesluten. Ett exempel på det är ”Transitions-vs-Pages”-dilemmat, block restriction-effekten, DR-användbarheten och optimeringen av klassificeringsparametrarna. Dessu-tom så utförs en survivability analysis för att demonstrera hur de producerade alarmen kan korreleras för att påverka den resulterande detektionsträ˙säker-heten och intervalltiderna. Implementeringen av den föreslagna detektionsmetoden och beskrivna experimentuppsättningen leder till intressanta resultat. Icke desto mindre är datauppsättningen som använts för att producera den här utvärderingen också tillgänglig online för att främja vidare utredning och forskning på området. Naive Bayes SVM Support Vector Machines Machine Learning IDS Intrusion Detection System Web Application scikit-learn Elektroteknik och elektronik Communication Systems Kommunikationssystem
9	Automated analysis of battery articles Haglund, Robin January 2020 (has links) Journal articles are the formal medium for the communication of results among scientists, and often contain valuable data. However, manually collecting article data from a large field like lithium-ion battery chemistry is tedious and time consuming, which is an obstacle when searching for statistical trends and correlations to inform research decisions. To address this a platform for the automatic retrieval and analysis of large numbers of articles is created and applied to the field of lithium-ion battery chemistry. Example data produced by the platform is presented and evaluated and sources of error limiting this type of platform are identified, with problems related to text extraction and pattern matching being especially significant. Some solutions to these problems are presented and potential future improvements are proposed. battery chemistry chemical engineering chemistry engineering battery li-ion lithium-ion batteries LFP lithium iron phosphate machine learning regular expressions classification SVM keras scikit-learn Chemical Engineering Kemiteknik
10	Data Engineering and Failure Prediction for Hard Drive S.M.A.R.T. Data Ramanayaka Mudiyanselage, Asanga 08 September 2020 (has links) No description available. Computer Science Machine Learning Data Engineering Python Data Analysis Big Data Predictive Analytics, Feature Selection Resampling Techniques Hard Drive Failure Prediction SMART Attributes Scikit-Learn PySpark

Search results