Spelling suggestions: "subject:"eeb scraping"" "subject:"eeb scrapping""
11 |
Developing a Python based web scraper : A study on the development of a web scraper for TimeEditAndersson, Pontus January 2021 (has links)
I en värld där alltmer information lagras på internet är det svårt för en vanlig användare att hänga med. Även när informationen finns tillgänglig på en och samma hemsida kan den hemsidan sakna funktioner eller vara svår att läsa av. Idén bakom att skrapa hemsidor, tidningar eller spel på information är inte ny och detta examensarbete fokuserar på att bygga en web scraper med tillhörande hemsida där användare kan ladda upp sitt schema skrapat från TimeEdit. Hemsidan ska sedan presentera denna skrapade data på ett visuellt tilltalande sett. När system är färdigutvecklade utvärderas dem för att se om examensarbetets mål har uppnåtts samt om systemen har förbättrat det befintliga sättet att hantera schemaläggning i TimeEdit hos lärare och studenter. I sammanfattningen finns sedan framtida forskning och arbeten presenterat. / The concept of scraping the web is not new, however, with modern programming languages it is possible to build web scrapers that can collect unstructured data and save this in a structured way. TimeEdit, a scheduling platform used by Mid Sweden University, has no feasible way to count how many hours has been scheduled at any given week to a specific course, student, or professor. The goal of this thesis is to build a python-based web scraper that collects data from TimeEdit and saves this in a structured manner. Users can then upload this text file to a dynamic website where it is extracted from the file and saved into a predetermined database and unique to that user. The user can then get this data presented in a fast, efficient, and user-friendly way. This platform is developed and evaluated with the resulting platform being a good and fast way to scan a TimeEdit schedule and evaluate the extracted data. With the platform built future work is recommended to make it a finishes product ready for live use by all types of users.
|
12 |
Analýza registra zmlúvKuciaková, Andrea January 2019 (has links)
The thesis describes the creation of the acquisition and analysis of data from the public register of contracts and other defined sources. In the introduction there are described the existing variants of scrapping tools and their use, as well as the problems of Business Intelligence and data warehouses. The next section is devoted to identifying source data. Subsequently, the procedure and design of the solution are described, on which basis was designed the data warehouse. Implementation of ETL processes and creation of final reports are mentioned in the implementation part.
|
13 |
Topic propagation over time in internet security conferences : Topic modeling as a tool to investigate trends for future research / Ämnesspridning över tid inom säkerhetskonferenser med hjälp av topic modelingJohansson, Richard, Engström Heino, Otto January 2021 (has links)
When conducting research, it is valuable to find high-ranked papers closely related to the specific research area, without spending too much time reading insignificant papers. To make this process more effective an automated process to extract topics from documents would be useful, and this is possible using topic modeling. Topic modeling can also be used to provide topic trends, where a topic is first mentioned, and who the original author was. In this paper, over 5000 articles are scraped from four different top-ranked internet security conferences, using a web scraper built in Python. From the articles, fourteen topics are extracted, using the topic modeling library Gensim and LDA Mallet, and the topics are visualized in graphs to find trends about which topics are emerging and fading away over twenty years. The result found in this research is that topic modeling is a powerful tool to extract topics, and when put into a time perspective, it is possible to identify topic trends, which can be explained when put into a bigger context.
|
14 |
PRICE PREMIUMS FOR MEAT PRODUCTS WITH CARBON FOOTPRINT RELATED LABELSMaria Berikou (13208586) 27 July 2023 (has links)
<p>This study investigates the price premium for labeling of carbon-relevant practices and other potentially relevant labels on meat product, including organic, grass-feeding/-fed, gluten-free, and whether it is non-GMO, etc. Prices and labeling information about beef, pork, chicken, and other meat products in selected stores from 48 states were collected via web-scraping and investigated for product claims and labels directly or indirectly related to carbon. Market-observed price premiums for reduced carbon labels or using sustainable practices were investigated alongside impacts of geography on product prices. </p>
<p>Our results showed significant price premiums for almost all of the claims investigated. For beef and chicken products the variable/label with the highest price premium associated was <em>Less greenhouse gas </em>and for the pork products, the variable/label <em>Non-GMO</em> was associated with the highest price premium of those studied.</p>
|
15 |
Deal Organizer : personalized alerts for shoppersReyes, Ulises Uriel 27 November 2012 (has links)
Deal Organizer is a web-based application that scans multiple websites for online bargains. It allows users to specify their preferences in order for them to receive notifications based on personalized content.
The application obtains deals from other websites through data extraction techniques that include reading RSS feeds and web scraping. In order to better facilitate content personalization, the application tracks the user's activity by recording clicks on both links to deals and rating buttons, e.g., the Facebook like button.
Due to the dynamic nature of the source websites offering these deals and the ever-evolving web technologies available to software developers, the Deal Organizer application was designed to implement an interface-based design using the Spring Framework. This yielded to an extensible, pluggable and flexible system that accommodates maintenance and future work gracefully.
The application's performance was evaluated by executing resource-intensive tests on a constrained environment. Results show the application responded positively. / text
|
16 |
Att Strukturera Data Med Nyckelord: Utvecklandet av en Skrapande ArtefaktBramell, Fredrik, From, From January 2022 (has links)
Development of different methods for processing information has long been a central area in computer science. Being able to structure and compile different types of information can streamline many tasks that facilitate various assignments. In addition, the web is getting bigger and as a result larger amounts of information become more accessible. It also means that it can be more difficult to find and compile relevant information. This raises the questions; Is a layered architecture suitable for extracting semi-structured data from various web-based documents such as HTML and PDF and structuring the content as generically as possible? and How can you find semi-structured data in various forms of documents on the web based on keywords to save the data in tabular form? Review of previous research shows a gap when it comes to processing different levels of structures with the web as a source of data. When processing data, previous projects have usually used a layered architecture where each layer has a specific task and it is also this architecture that was chosen for this artifact. To create the artifact, the Design and Creation method is applied with an included literature study. This method is common in assignments where the goal is to create an artifact with the purpose of answering research questions. Tests of the artifact are also performed in this method and result in how well the artifact follows instructions and whether or not it can answer the research questions. This work has resulted in an artifact that works well and lays a foundation for future work. However, there is room for improvement such as that the artifact could be able to understand context and find more relevant information, but also for future research on how other software can be implemented to streamline and improve results. / Utveckling av olika metoder för att bearbeta information har länge varit ett centralt område inom datavetenskap. Att kunna strukturera och sammanställa olika typer av information kan effektivisera många uppgifter som underlättar olika uppdrag. Dessutom blir webben större och som ett resultat blir större mängder information mer tillgänglig. Det gör också att det kan vara svårare att hitta och sammanställa relevant information. Detta väcker frågorna; Lämpar sig lagerarkitektur för att extrahera semi-strukturerad data från olika webbaserade dokument som HTML och PDF och strukturera innehållet så generiskt som möjligt? och Hur kan man hitta semi-strukturerad data i olika former av dokument på webben baserat på nyckelord för att spara data i tabellform? Granskning av tidigare forskning visar på ett gap när det gäller att bearbeta olika nivåer av strukturer med webben som datakälla. Vid bearbetning av data har tidigare projekt vanligtvis använt en lagerarkitektur där varje lager har en specifik uppgift och det är även denna arkitektur som valdes för denna artefakt. För att skapa artefakten tillämpas Design and Creation metoden med en inkluderad litteraturstudie. Denna metod är vanlig i arbeten där målet är att skapa en artefakt med syftet att svara på forskningsfrågor. Tester av artefakten utförs också i denna metod och resulterar i hur väl artefakten följer instruktionerna och om den kan svara på forskningsfrågorna eller inte. Detta arbete har resulterat i en artefakt som fungerar bra och som lägger en grund för framtida arbete. Det finns dock utrymme för förbättringar som att artefakten skulle kunna förstå sammanhang och hitta mer relevant information, men också för framtida forskning om hur annan mjukvara kan implementeras för att effektivisera och förbättra resultat.
|
17 |
Discovery and Analysis of Social Media Data : How businesses can create customized filters to more effectively use public data / Upptäckt och Utvärdering av Data från Sociala Medier : Hur företag kan skapa egna filter för att bättre nyttja publik dataWöldern, Lars January 2018 (has links)
The availability of prospective customer information present on social media platforms has led to many marketing and customer-facing departments utilizing social media data in processes such as demographics research, and sales and campaign planning. However, if your business needs require further filtration of data, beyond what is provided by existing filters, the volume and rate at which data can be manually sifted, is constrained by the speed and accuracy of employees, and their digital competency. The repetitive nature of filtration work, lends itself to automation, that ultimately has the potential to alleviate large productivity bottlenecks, enabling organizations to distill larger volumes of unfiltered data, faster and with greater precision. This project employs automation and artificial intelligence, to filter Linkedin profiles using customized selection criteria, beyond what is currently available, such as nationality and age. By introducing the ability to produce tailored indices of social media data, automated filtration offers organizations the opportunity to better utilize rich prospective data for more efficient customer review and targeting.
|
18 |
NBA ON-BALL SCREENS: AUTOMATIC IDENTIFICATION AND ANALYSIS OF BASKETBALL PLAYSYu, Andrew Seohwan 15 May 2017 (has links)
No description available.
|
19 |
Clustering Web Users by Mouse Movement to Detect Bots and Botnet AttacksMorgan, Justin L 01 March 2021 (has links) (PDF)
The need for website administrators to efficiently and accurately detect the presence of web bots has shown to be a challenging problem. As the sophistication of modern web bots increases, specifically their ability to more closely mimic the behavior of humans, web bot detection schemes are more quickly becoming obsolete by failing to maintain effectiveness. Though machine learning-based detection schemes have been a successful approach to recent implementations, web bots are able to apply similar machine learning tactics to mimic human users, thus bypassing such detection schemes. This work seeks to address the issue of machine learning based bots bypassing machine learning-based detection schemes, by introducing a novel unsupervised learning approach to cluster users based on behavioral biometrics. The idea is that, by differentiating users based on their behavior, for example how they use the mouse or type on the keyboard, information can be provided for website administrators to make more informed decisions on declaring if a user is a human or a bot. This approach is similar to how modern websites require users to login before browsing their website; which in doing so, website administrators can make informed decisions on declaring if a user is a human or a bot. An added benefit of this approach is that it is a human observational proof (HOP); meaning that it will not inconvenience the user (user friction) with human interactive proofs (HIP) such as CAPTCHA, or with login requirements
|
20 |
Design and Evaluation of Web-Based Economic Indicators: A Big Data Analysis ApproachBlázquez Soriano, María Desamparados 15 January 2020 (has links)
Tesis por compendio / [ES] En la Era Digital, el creciente uso de Internet y de dispositivos digitales está transformando completamente la forma de interactuar en el contexto económico y social. Miles de personas, empresas y organismos públicos utilizan Internet en sus actividades diarias, generando de este modo una enorme cantidad de datos actualizados ("Big Data") accesibles principalmente a través de la World Wide Web (WWW), que se ha convertido en el mayor repositorio de información del mundo. Estas huellas digitales se pueden rastrear y, si se procesan y analizan de manera apropiada, podrían ayudar a monitorizar en tiempo real una infinidad de variables económicas.
En este contexto, el objetivo principal de esta tesis doctoral es generar indicadores económicos, basados en datos web, que sean capaces de proveer regularmente de predicciones a corto plazo ("nowcasting") sobre varias actividades empresariales que son fundamentales para el crecimiento y desarrollo de las economías. Concretamente, tres indicadores económicos basados en la web han sido diseñados y evaluados: en primer lugar, un indicador de orientación exportadora, basado en un modelo que predice si una empresa es exportadora; en segundo lugar, un indicador de adopción de comercio electrónico, basado en un modelo que predice si una empresa ofrece la posibilidad de venta online; y en tercer lugar, un indicador de supervivencia empresarial, basado en dos modelos que indican la probabilidad de supervivencia de una empresa y su tasa de riesgo. Para crear estos indicadores, se han descargado una diversidad de datos de sitios web corporativos de forma manual y automática, que posteriormente se han procesado y analizado con técnicas de análisis Big Data.
Los resultados muestran que los datos web seleccionados están altamente relacionados con las variables económicas objeto de estudio, y que los indicadores basados en la web que se han diseñado en esta tesis capturan en un alto grado los valores reales de dichas variables económicas, siendo por tanto válidos para su uso por parte del mundo académico, de las empresas y de los decisores políticos. Además, la naturaleza online y digital de los indicadores basados en la web hace posible proveer regularmente y de forma barata de predicciones a corto plazo. Así, estos indicadores son ventajosos con respecto a los indicadores tradicionales.
Esta tesis doctoral ha contribuido a generar conocimiento sobre la viabilidad de producir indicadores económicos con datos online procedentes de sitios web corporativos. Los indicadores que se han diseñado pretenden contribuir a la modernización en la producción de estadísticas oficiales, así como ayudar a los decisores políticos y los gerentes de empresas a tomar decisiones informadas más rápidamente. / [CA] A l'Era Digital, el creixent ús d'Internet i dels dispositius digitals està transformant completament la forma d'interactuar al context econòmic i social. Milers de persones, empreses i organismes públics utilitzen Internet a les seues activitats diàries, generant d'aquesta forma una enorme quantitat de dades actualitzades ("Big Data") accessibles principalment mitjançant la World Wide Web (WWW), que s'ha convertit en el major repositori d'informació del món. Aquestes empremtes digitals poden rastrejar-se i, si se processen i analitzen de forma apropiada, podrien ajudar a monitoritzar en temps real una infinitat de variables econòmiques.
En aquest context, l'objectiu principal d'aquesta tesi doctoral és generar indicadors econòmics, basats en dades web, que siguen capaços de proveïr regularment de prediccions a curt termini ("nowcasting") sobre diverses activitats empresarials que són fonamentals per al creixement i desenvolupament de les economies. Concretament, tres indicadors econòmics basats en la web han sigut dissenyats i avaluats: en primer lloc, un indicador d'orientació exportadora, basat en un model que prediu si una empresa és exportadora; en segon lloc, un indicador d'adopció de comerç electrònic, basat en un model que prediu si una empresa ofereix la possibilitat de venda online; i en tercer lloc, un indicador de supervivència empresarial, basat en dos models que indiquen la probabilitat de supervivència d'una empresa i la seua tasa de risc. Per a crear aquestos indicadors, s'han descarregat una diversitat de dades de llocs web corporatius de forma manual i automàtica, que posteriorment s'han analitzat i processat amb tècniques d'anàlisi Big Data.
Els resultats mostren que les dades web seleccionades estan altament relacionades amb les variables econòmiques objecte d'estudi, i que els indicadors basats en la web que s'han dissenyat en aquesta tesi capturen en un alt grau els valors reals d'aquestes variables econòmiques, sent per tant vàlids per al seu ús per part del món acadèmic, de les empreses i dels decisors polítics. A més, la naturalesa online i digital dels indicadors basats en la web fa possible proveïr regularment i de forma barata de prediccions a curt termini. D'aquesta forma, són avantatjosos en comparació als indicadors tradicionals.
Aquesta tesi doctoral ha contribuït a generar coneixement sobre la viabilitat de produïr indicadors econòmics amb dades online procedents de llocs web corporatius. Els indicadors que s'han dissenyat pretenen contribuïr a la modernització en la producció d'estadístiques oficials, així com ajudar als decisors polítics i als gerents d'empreses a prendre decisions informades més ràpidament. / [EN] In the Digital Era, the increasing use of the Internet and digital devices is completely transforming the way of interacting in the economic and social framework. Myriad individuals, companies and public organizations use the Internet for their daily activities, generating a stream of fresh data ("Big Data") principally accessible through the World Wide Web (WWW), which has become the largest repository of information in the world. These digital footprints can be tracked and, if properly processed and analyzed, could help to monitor in real time a wide range of economic variables.
In this context, the main goal of this PhD thesis is to generate economic indicators, based on web data, which are able to provide regular, short-term predictions ("nowcasting") about some business activities that are basic for the growth and development of an economy. Concretely, three web-based economic indicators have been designed and evaluated: first, an indicator of firms' export orientation, which is based on a model that predicts if a firm is an exporter; second, an indicator of firms' engagement in e-commerce, which is based on a model that predicts if a firm offers e-commerce facilities in its website; and third, an indicator of firms' survival, which is based on two models that indicate the probability of survival of a firm and its hazard rate. To build these indicators, a variety of data from corporate websites have been retrieved manually and automatically, and subsequently have been processed and analyzed with Big Data analysis techniques.
Results show that the selected web data are highly related to the economic variables under study, and the web-based indicators designed in this thesis are capturing to a great extent their real values, thus being valid for their use by the academia, firms and policy-makers. Additionally, the digital and online nature of web-based indicators makes it possible to provide timely, inexpensive predictions about the economy. This way, they are advantageous with respect to traditional indicators.
This PhD thesis has contributed to generating knowledge about the viability of producing economic indicators with data coming from corporate websites. The indicators that have been designed are expected to contribute to the modernization of official statistics and to help in making earlier, more informed decisions to policy-makers and business managers. / Blázquez Soriano, MD. (2019). Design and Evaluation of Web-Based Economic Indicators: A Big Data Analysis Approach [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/116836 / Compendio
|
Page generated in 0.0699 seconds