Global ETD Search

21	Analýza postojů českých uživatelů k obchodním řetězcům na základě dat ze sociálních sítí a webových diskusí / Sentiment Analysis of Czech Social Networks and Web Discussions on Retail Chains Bolješik, Michal January 2017 (has links) The goal of this thesis is to design and implement a system that analyses data from the web mentioning Czech grocery chain stores. Implemented system is able to download such data automatically, perform sentiment analysis of the data, extract locations and chain stores' names from the data and index the data. The system also includes a user interface showing results of the analyses. The first part of the thesis surveys the state of the art in collecting data from web, sentiment analysis and indexing documents. A description of the discussed system's design and its implementation follows. The last part of the thesis evaluates implemented system
22	Systém pro integraci webových datových zdrojů / System for Web Data Source Integration Kolečkář, David January 2020 (has links) The thesis aims at designing and implementing a web application that will be used for the integration of web data sources. For data integration, a method using domain model of the target information system was applied. The work describes individual methods used for extracting information from web pages. The text describes the process of designing the architecture of the system including a description of the chosen technologies and tools. The main part of the work is implementation and testing the final web application that is written in Java and Angular framework. The outcome of the work is a web application that will allow its users to define web data sources and save data in the target database.
23	Creating Financial Database for Education and Research: Using WEB SCRAPING Technique Rodrigues, Lanny Anthony, Polepally, Srujan Kumar January 2020 (has links) Our objective of this thesis is to expand the microdata database of publicly available corporate information of the university by web scraping mechanism. The tool for this thesis is a web scraper that can access and concentrate information from websites utilizing a web application as an interface for client connection. In our comprehensive work we have demonstrated that the GRI text files approximately consist of 7227 companies; from the total number of companies the data is filtered with “listed” companies. Among the filtered 2252 companies some do not have income statements data. Hence, we have finally collected data of 2112 companies with 36 different sectors and 13 different countries in this thesis. The publicly available information of income statements between 2016 to 2020 have been collected by GRI of microdata department. Collecting such data from any proprietary database by web scraping may cost more than $ 24000 a year were collecting the same from the public database may cost almost nil, which we will discuss further in our thesis.In our work we are motivated to collect the financial data from the annual financial statement or financial report of the business concerns which can be used for the purpose to measure and investigate the trading costs and changes of securities, common assets, futures, cryptocurrencies, and so forth. Stock exchange, official statements and different business-related news are additionally sources of financial data that individuals will scrape. We are helping those petty investors and students who require financial statements from numerous companies for several years to verify the condition of the economy and finance concerning whether to capitalise or not, which is not possible in a conventional way; hence they use the web scraping mechanism to extract financial statements from diverse websites and make the investment decisions on further research and analysis.Here in this thesis work, we have indicated the outcome of the web scraping is to keep the extracted data in a database. The gathered data of the resulted database can be implemented for the required goal of further research, education, and other purposes with the further use of the web scraping technique. Web Scraping GRI Text Files Proprietary Database Public Database Financial Report Financial Statement Financial Data Extracted Data. Computer and Information Sciences Data- och informationsvetenskap
24	The Dynamics of Rent Gap Formation in Copenhagen : An empirical look into international investments in the rental market Bonde-Hansen, Martin January 2021 (has links) No description available. rent gap theory gentrification housing urban economics web scraping GIS big data Blackstone
25	The One Spider To Rule Them All : Web Scraping Simplified: Improving Analyst Productivity and Reducing Development Time with A Generalized Spider / Spindeln som härskar över dom alla : Webbskrapning förenklat: förbättra analytikerproduktiviteten och minska utvecklingstiden med generaliserade spindlar Johansson, Rikard January 2023 (has links) This thesis addresses the process of developing a generalized spider for web scraping, which can be applied to multiple sources, thereby reducing the time and cost involved in creating and maintaining individual spiders for each website or URL. The project aims to improve analyst productivity, reduce development time for developers, and ensure high-quality and accurate data extraction. The research involves investigating web scraping techniques and developing a more efficient and scalable approach to report retrieval. The problem statement emphasizes the inefficiency of the current method with one customized spider per source and the need for a more streamlined approach to web scraping. The research question focuses on identifying patterns in the web scraping process and functions required for specific publication websites to create a more generalized web scraper. The objective is to reduce manual effort, improve scalability, and maintain high-quality data extraction. The problem is resolved using a quantitative approach that involves the analysis and implementation of spiders for each data source. This enables a comprehensive understanding of all potential scenarios and provides the necessary knowledge to develop a general spider. These spiders are then grouped based on their similarity, and through the application of simple logic, they are consolidated into a single general spider capable of handling all the sources. To construct the general spider, a utility library is created, equipped with the essential tools for extracting relevant information such as title, description, date, and PDF links. Subsequently, all the individual information is transferred to configuration files, enabling the execution of the general spider. The findings demonstrate the successful integration of multiple sources and spiders into a unified general spider. However, due to the limited time frame of the project, there is potential for further improvement. Enhancements could include better structuring of the configuration files, expansion of the utility library, or even the integration of AI capabilities to enhance the performance of the general spider. Nevertheless, the current solution is deemed suitable for automated article retrieval and ready to be used. / Denna rapport tar upp processen att utveckla en generaliserad spindel för webbskrapning, som kan appliceras på flera källor, och därigenom minska tiden och kostnaderna för att skapa och underhålla individuella spindlar för varje webbplats eller URL. Projektet syftar till att förbättra analytikers produktivitet, minska utvecklingstiden för utvecklare och säkerställa högkvalitativ och korrekt dataextraktion. Forskningen går ut på att undersöka webbskrapningstekniker och utveckla ett mer effektivt och skalbart tillvägagångssätt för att hämta rapporter. Problemformuleringen betonar ineffektiviteten hos den nuvarande metoden med en anpassad spindel per källa och behovet av ett mer effektiviserad tillvägagångssätt för webbskrapning. Forskningsfrågan fokuserar på att identifiera mönster i webbskrapningsprocessen och funktioner som krävs för specifika publikationswebbplatser för att skapa en mer generaliserad webbskrapa. Målet är att minska den manuella ansträngningen, förbättra skalbarheten och upprätthålla datautvinning av hög kvalitet. Problemet löses med hjälp av en kvantitativ metod som involverar analys och implementering av spindlar för varje datakälla. Detta möjliggör en omfattande förståelse av alla potentiella scenarier och ger den nödvändiga kunskapen för att utveckla en allmän spindel. Dessa spindlar grupperas sedan baserat på deras likhet, och genom tillämpning av enkel logik konsolideras de till en enda allmän spindel som kan hantera alla källor. För att konstruera den allmänna spindeln skapas ett verktygsbibliotek, utrustat med de väsentliga verktygen för att extrahera relevant information som titel, beskrivning, datum och PDF-länkar. Därefter överförs all individuell information till konfigurationsfiler, vilket möjliggör exekvering av den allmänna spindeln. Resultaten visar den framgångsrika integrationen av flera källor och spindlar till en enhetlig allmän spindel. Men på grund av projektets begränsade tidsram finns det potential för ytterligare förbättringar. Förbättringar kan inkludera bättre strukturering av konfigurationsfilerna, utökning av verktygsbiblioteket eller till och med integrering av AI-funktioner för att förbättra den allmänna spindelns prestanda. Ändå bedöms den nuvarande lösningen vara lämplig för automatisk artikelhämtning och redo att användas. Web scraping Web crawlers HTML Scrapy Optimization Web data extraction Webbskrapning Webbsökrobotar HTML Scrapy Optimering Webbdataextraktion Computer and Information Sciences Data- och informationsvetenskap
26	INVESTIGATORY ANALYSIS OF BIG DATA’S ROLE AND IMPACT ON LOCAL ORGANIZATIONS, INSTITUTIONS, AND BUSINESSES’ DECISION-MAKING AND DAY-TO-DAY OPERATIONS Markle, Scott Timothy 30 March 2023 (has links) No description available. Computer Science web scraping comparative analysis Big Data survey stream processing batch processing hesitancies and obstructions industry utilization collegiate supplement ParseHub Simplescraper
27	A Platform for Aligning Academic Assessments to Industry and Federal Job Postings Parks, Tyler J. 07 1900 (has links) The proposed tool will provide users with a platform to access a side-by-side comparison of classroom assessment and job posting requirements. Using techniques and methodologies from NLP, machine learning, data analysis, and data mining: the employed algorithm analyzes job postings and classroom assessments, extracts and classifies skill units within, then compares sets of skills from different input volumes. This effectively provides a predicted alignment between academic and career sources, both federal and industrial. The compilation of tool results indicates an overall accuracy score of 82%, and an alignment score of only 75.5% between the input assessments and overall job postings. These results describe that the 50 UNT assessments and 5,000 industry and federal job postings examined, demonstrate a compatibility (alignment) of 75.5%; and, that this measure was calculated using a tool operating at an 82% precision rate. academia assessment industry federal job posting natural language processing machine learning data analysis data mining web scraping keyword extraction text classification learning outcome text comparison Computer Science
28	Evaluating information content of earnings calls to predict bankruptcy using machine learnings techniques Ghaffar, Arooba January 2022 (has links) This study investigates the prediction of firms’ health in terms of bankruptcy and non-bankruptcy based on the sentiments extracted from the earnings calls. Bankruptcy prediction has long been a critical topic in the world of accounting and finance. A firm's economic health is the current financial condition of the firm and is crucial to its stakeholders such as creditors, investors, shareholders, partners, and even customers and suppliers. Various methodologies and strategies have been proposed in research domain for predicting company bankruptcy more promptly and accurately. Conventionally, financial risk prediction has solely been based on historic financial data. However, an increasing number of finance papers also analyze textual data during the last few years. Company’s earnings calls are the key source of information to investigate the current financial condition and how the businesses are doing and what the expectations are for the next quarters. During the call, management offers an overview of recent performance and provide a guidance for the next quarter expectations. The earnings calls summary is provided by the management and can extract the CEO’s sentiments using sentiment analysis. In the last decade, Machine Learnings based techniques have been proposed to achieve accurate predictions of firms’ economic health. Even though most of these techniques work well in a limited context, on a broader perspective these techniques are unable to retrieve the true semantic from the earnings calls, which result in the lower accuracy in predicting the actual condition of firms’ economic health. Thus, state-of-the-art Machine Learnings and Deep Learnings techniques have been used in this thesis to improve accuracy in predicting the firms’ health from the earnings calls. Various machine learnings and deep learnings method have been applied on web-scraped earnings calls data-set, and the results show that LONG SHORT-TERM MEMORY (LSTM) is the best machine learnings technique as compared to the comparison set of models. Computer and Information Sciences Data- och informationsvetenskap
29	Data mining historical insights for a software keyword from GitHub and Libraries.io; GraphQL / Datautvinning av historiska insikter för ett mjukvara nyckelord från GitHub och Libraries.io; GraphQL Bodemar, Gustaf January 2022 (has links) This paper explores an approach to extracting historical insights into a software keyword by data mining GitHub and Libraries.io. We test our method using the keyword GraphQL to see what insights we can gain. We managed to plot several timelines of how repositories and software libraries related to our keyword were created over time. We could also do a rudimentary analysis of how active said items were. We also extracted programing language data associated with each repository and library from GitHub and Libraries.io. With this data, we could, at worst, correlate which programming languages were associated with each item or, in the best case, predict what implementations of GraphQL they used. We found through our attempt many problems and caveats that needed to be dealt with but still concluded that extracting historical insights by data mining GitHub and Libraries.io is worthwhile. Data mining Web scraping Historical data analysis GitHub Libraries.io GraphQL Datautvinning Webbskrapning Historisk dataanalys GitHub Libraries.io GraphQL Other Computer and Information Science Annan data- och informationsvetenskap
30	Získávání znalostí z veřejných semistrukturovaných dat na webu / Knowledge Discovery in Public Semistructured Data on the Web Kefurt, Pavel January 2016 (has links) The first part of the thesis deals with the methods and tools that can be used to retrieve data from websites and the tools used for data mining. The second part is devoted to practical demonstration of the entire process. Web Czech Dance Sport Federation, which is available on www.csts.cz , is used as the source web site.

Search results