Global ETD Search

1	Сбор и анализ данных из открытых источников для разработки рекомендательной системы в сфере туризма : магистерская диссертация / Collection and analysis of data from open sources to develop a recommendation system in the field of tourism Крайнов, А. И., Krainov, A. I. January 2023 (has links) В данной дипломной работе была поставлена цель разработки эффективной рекомендательной системы для туристических достопримечательностей на основе графов и алгоритмов машинного обучения. Основная задача состояла в создании системы, которая может анализировать обширный набор данных о туристических достопримечательностях, извлекаемых из Википедии. Используя дампы Википедии, содержащие информацию о миллионах статей, был выполнен обзор существующих рекомендательных систем и методов машинного обучения, применяемых для предоставления рекомендаций в области туризма. Затем были выбраны определенные категории туристических достопримечательностей, которые были использованы для построения моделей рекомендаций. Для обработки и анализа данных из Википедии был использован современный технический стек инструментов, включающий Python, библиотеки networkx и pandas для работы с графами и данными, а также библиотеку scikit-learn для применения алгоритмов машинного обучения. Кроме того, для разработки интерактивного веб-интерфейса был использован фреймворк Streamlit. Процесс работы включал сбор и предварительную обработку данных из Википедии, включая информацию о достопримечательностях, связях между ними и характеристиках. Для создания графа данных на основе загруженных и обработанных данных были применены выбранные алгоритмы машинного обучения. Алгоритм PageRank был использован для определения важности каждой достопримечательности в графе и формирования персонализированных рекомендаций. Демонстрационный пользовательский интерфейс, разработанный на основе фреймворка Streamlit, позволяет пользователям взаимодействовать с системой, вводить запросы о местах и получать персонализированные рекомендации. С помощью выпадающего списка можно выбрать конкретную достопримечательность, к которой требуется получить рекомендации, а с помощью ползунка можно настроить количество рекомендаций. / This thesis aimed to develop an effective recommendation system for tourist attractions based on graphs and machine learning algorithms. The main challenge was to create a system that can analyze a large set of tourist attraction data extracted from Wikipedia. Using Wikipedia dumps containing information on millions of articles, a review of existing recommender systems and machine learning methods used to provide recommendations in the field of tourism was performed. Specific categories of tourist attractions were then selected and used to build recommendation models. To process and analyze data from Wikipedia, a modern technical stack of tools was used, including Python, the networkx and pandas libraries for working with graphs and data, as well as the scikit-learn library for applying machine learning algorithms. In addition, the Streamlit framework was used to develop an interactive web interface. The work process included the collection and preliminary processing of data from Wikipedia, including information about attractions, connections between them and characteristics. Selected machine learning algorithms were applied to create a data graph based on the downloaded and processed data. The PageRank algorithm was used to determine the importance of each point of interest in the graph and generate personalized recommendations. The demo user interface, developed using the Streamlit framework, allows users to interact with the system, enter queries about places and receive personalized recommendations. Using the drop-down list, you can select a specific attraction for which you want to receive recommendations, and using the slider, you can adjust the number of recommendations. ВИКИПЕДИЯ PAGERANK PYTHON ВЕБ-ИНТЕРФЕЙС NETWORKX PANDAS MASTER'S THESIS WIKIPEDIA PAGERANK PYTHON WEB INTERFACE NETWORKX PANDAS SCIKIT-LEARN FRAMEWORK STREAMLIT
2	Unsupervised anomaly detection for structured data - Finding similarities between retail products Fockstedt, Jonas, Krcic, Ema January 2021 (has links) Data is one of the most contributing factors for modern business operations. Having bad data could therefore lead to tremendous losses, both financially and for customer experience. This thesis seeks to find anomalies in real-world, complex, structured data, causing an international enterprise to miss out on income and the potential loss of customers. By using graph theory and similarity analysis, the findings suggest that certain countries contribute to the discrepancies more than other countries. This is believed to be an effect of countries customizing their products to match the market’s needs. This thesis is just scratching the surface of the analysis of the data, and the number of opportunities for future work are therefore many. relational data similarity analysis data analysis SQL NetworkX graph theory anomaly detection unsupervised retail products real-world data AWS amazon web services similarity learning data statistics data preprocessing similarity analysis algorithm data validation Computer Sciences Datavetenskap (datalogi)

Search results

Unsupervised anomaly detection for structured data - Finding similarities between retail products