Spelling suggestions: "subject:"crawler"" "subject:"trawler""
41 |
How to Build a Web Scraper for Social MediaLloyd, Oskar, Nilsson, Christoffer January 2019 (has links)
In recent years, the act of scraping websites for information has become increasingly relevant. However, along with this increase in interest, the internet has also grown substantially and advances and improvements to websites over the years have in fact made it more difficult to scrape. One key reason for this is that scrapers simply account for a significant portion of the traffic to many websites, and so developers often implement anti-scraping measures along with the Robots Exclusion Protocol (robots.txt) to try to stymie this traffic. The popular use of dynamically loaded content – content which loads after user interaction – poses another problem for scrapers. In this paper, we have researched what kinds of issues commonly occur when scraping and crawling websites – more specifically when scraping social media – and how to solve them. In order to understand these issues better and to test solutions, a literature review was performed and design and creation methods were used to develop a prototype scraper using the frameworks Scrapy and Selenium. We found that automating interaction with dynamic elements worked best to solve the problem of dynamically loaded content. We also theorize that having an artificial random delay when scraping and randomizing intervals between each visit to a website would counteract some of the anti-scraping measures. Another, smaller aspect of our research was the legality and ethicality of scraping. Further thoughts and comments on potential solutions to other issues have also been included.
|
42 |
A Framework for Fashion Data Gathering, Hierarchical-Annotation and Analysis for Social Media and Online Shop : TOOLKIT FOR DETAILED STYLE ANNOTATIONS FOR ENHANCED FASHION RECOMMENDATIONWara, Ummul January 2018 (has links)
Due to the transformation of different recommendation system from contentbased to hybrid cross-domain-based, there is an urge to prepare a socialnetwork dataset which will provide sufficient data as well as detail-level annotation from a predefined hierarchical clothing category and attribute based vocabulary by considering user interactions. However, existing fashionbased datasets lack either in hierarchical-category based representation or user interactions of social network. The thesis intends to represent two datasets- one from photo-sharing platform Instagram which gathers fashionistas images with all possible user-interactions and another from online-shop Zalando with every cloths detail. We present a design of a customized crawler that enables the user to crawl data based on category or attributes. Moreover, an efficient and collaborative web-solution is designed and implemented to facilitate large-scale hierarchical category-based detaillevel annotation of Instagram data. By considering all user-interactions, the developed solution provides a detail-level annotation facility that reflects the user’s preference. The web-solution is evaluated by the team as well as the Amazon Turk Service. The annotated output from different users proofs the usability of the web-solution in terms of availability and clarity. In addition to data crawling and annotation web-solution development, this project analyzes the Instagram and Zalando data distribution in terms of cloth category, subcategory and pattern to provide meaningful insight over data. Researcher community will benefit by using these datasets if they intend to work on a rich annotated dataset that represents social network and resembles in-detail cloth information. / Med tanke på trenden inom forskning av rekommendationssystem, där allt fler rekommendationssystem blir hybrida och designade för flera domäner, så finns det ett behov att framställa en datamängd från sociala medier som innehåller detaljerad information om klädkategorier, klädattribut, samt användarinteraktioner. Nuvarande datasets med inriktning mot mode saknar antingen en hierarkisk kategoristruktur eller information om användarinteraktion från sociala nätverk. Detta projekt har syftet att ta fram två dataset, ett dataset som insamlats från fotodelningsplattformen Instagram, som innehåller foton, text och användarinteraktioner från fashionistas, samt ett dataset som insamlats från klädutbutdet som ges av onlinebutiken Zalando. Vi presenterar designen av en webbcrawler som är anpassad för att kunna hämta data från de nämnda domänerna och är optimiserad för mode och klädattribut. Vi presenterar även en effektiv webblösning som är designad och implementerad för att möjliggöra annotering av stora mängder data från Instagram med väldigt detaljerad information om kläder. Genom att vi inkluderar användarinteraktioner i applikationen så kan vår webblösning ge användaranpassad annotering av data. Webblösningen har utvärderats av utvecklarna samt genom AmazonTurk tjänsten. Den annoterade datan från olika användare demonstrerar användarvänligheten av webblösningen. Utöver insamling av data och utveckling av ett system för webb-baserad annotering av data så har datadistributionerna i två modedomäner, Instagram och Zalando, analyserats. Datadistributionerna analyserades utifrån klädkategorier och med syftet att ge datainsikter. Forskning inom detta område kan dra nytta av våra resultat och våra datasets. Specifikt så kan våra datasets användas i domäner som kräver information om detaljerad klädinformation och användarinteraktioner.
|
43 |
Matching ESCF Prescribed Cyber Security Skills with the Swedish Job Market : Evaluating the Effectiveness of a Language ModelAhmad, Al Ghaith, Abd ULRAHMAN, Ibrahim January 2023 (has links)
Background: As the demand for cybersecurity professionals continues to rise, it is crucial to identify the key skills necessary to thrive in this field. This research project sheds light on the cybersecurity skills landscape by analyzing the recommendations provided by the European Cybersecurity Skills Framework (ECSF), examining the most required skills in the Swedish job market, and investigating the common skills identified through the findings. The project utilizes the large language model, ChatGPT, to classify common cybersecurity skills and evaluate its accuracy compared to human classification. Objective: The primary objective of this research is to examine the alignment between the European Cybersecurity Skills Framework (ECSF) and the specific skill demands of the Swedish cybersecurity job market. This study aims to identify common skills and evaluate the effectiveness of a Language Model (ChatGPT) in categorizing jobs based on ECSF profiles. Additionally, it seeks to provide valuable insights for educational institutions and policymakers aiming to enhance workforce development in the cybersecurity sector. Methods: The research begins with a review of the European Cybersecurity Skills Framework (ECSF) to understand its recommendations and methodology for defining cybersecurity skills as well as delineating the cybersecurity profiles along with their corresponding key cybersecurity skills as outlined by ECSF. Subsequently, a Python-based web crawler, implemented to gather data on cybersecurity job announcements from the Swedish Employment Agency's website. This data is analyzed to identify the most frequently required cybersecurity skills sought by employers in Sweden. The Language Model (ChatGPT) is utilized to classify these positions according to ECSF profiles. Concurrently, two human agents manually categorize jobs to serve as a benchmark for evaluating the accuracy of the Language Model. This allows for a comprehensive assessment of its performance. Results: The study thoroughly reviews and cites the recommended skills outlined by the ECSF, offering a comprehensive European perspective on key cybersecurity skills (Tables 4 and 5). Additionally, it identifies the most in-demand skills in the Swedish job market, as illustrated in Figure 6. The research reveals the matching between ECSF-prescribed skills in different profiles and those sought after in the Swedish cybersecurity market. The skills of the profiles 'Cybersecurity Implementer' and 'Cybersecurity Architect' emerge as particularly critical, representing over 58% of the market demand. This research further highlights shared skills across various profiles (Table 7). Conclusion: This study highlights the matching between the European Cybersecurity Skills Framework (ECSF) recommendations and the evolving demands of the Swedish cybersecurity job market. Through a review of ECSF-prescribed skills and a thorough examination of the Swedish job landscape, this research identifies crucial areas of alignment. Significantly, the skills associated with 'Cybersecurity Implementer' and 'Cybersecurity Architect' profiles emerge as central, collectively constituting over 58% of market demand. This emphasizes the urgent need for educational programs to adapt and harmonize with industry requisites. Moreover, the study advances our understanding of the Language Model's effectiveness in job categorization. The findings hold significant implications for workforce development strategies and educational policies within the cybersecurity domain, underscoring the pivotal role of informed skills development in meeting the evolving needs of the cybersecurity workforce.
|
44 |
Jacking and Equalizing Cylinders for NASA- Crawler TransporterRühlicke, Ingo 03 May 2016 (has links) (PDF)
For the transport of their spacecraft from the vehicle assembly building to the launch pads at Kennedy Space Centre, Florida, the National Aeronautics and Space Administration (NASA) is using two special crawler transporters since 1965. First developed for the Saturn V rocket the crawler transporters have been sufficient for all following generations of space ships so far. But for the new generation of Orionspacecraft which is under development now, a load capacity increase for the crawler transporter of plus 50% was necessary. For this task Hunger Hydraulik did develop new jacking, equalizing and levelling (JEL) cylinders with sufficient load capacity but also with some new features to improve the availability, reliability and safety of this system. After design approval and manufacture of the cylinders they have been tested in a special developed one-to-one scale dynamic test rig and after passing this the cylinders had to prove their performance in the crawler transporter itself. This article describes the general application and introduces the technical requirements of this project as well as the realized solution.
|
45 |
Modélisation de parcours du Web et calcul de communautés par émergenceToufik, Bennouas 16 December 2005 (has links) (PDF)
Le graphe du Web, plus précisément le crawl qui permet de l'obtenir et les communautés qu'il contient est le sujet de cette thèse, qui est divisée en deux parties.<br />La première partie fait une analyse des grand réseaux d'interactions et introduit un nouveau modèle de crawls du Web. Elle commence par définir les propriétés communes des réseaux d'interactions, puis donne quelques modèles graphes aléatoires générant des graphes semblables aux réseaux d'interactions. Pour finir, elle propose un nouveau modèle de crawls aléatoires.<br />La second partie propose deux modèles de calcul de communautés par émergence dans le graphe du Web. Après un rappel sur les mesures d'importances, PageRank et HITS est présenté le modèle gravitationnel dans lequel les nœuds d'un réseau sont mobile et interagissent entre eux grâce aux liens entre eux. Les communautés émergent rapidement au bout de quelques itérations. Le second modèle est une amélioration du premier, les nœuds du réseau sont dotés d'un objectif qui consiste à atteindre sa communautés.
|
46 |
Uma abordagem para captura automatizada de dados abertos governamentaisFerreira, Juliana Sabino 07 November 2017 (has links)
Submitted by Juliana Ferreira (julianasabfer@gmail.com) on 2018-01-06T16:01:21Z
No. of bitstreams: 1
Dissertação 2.1- avaliação da proposta+conclusao+final- REVISADA.pdf: 5906746 bytes, checksum: 0e38cac22651d3e8fc9d0919fc9e0159 (MD5) / Rejected by Milena Rubi ( ri.bso@ufscar.br), reason: Bom dia Juliana!
Além da dissertação, você deve submeter também a carta comprovante devidamente preenchida e assinada pelo orientador.
O modelo da carta encontra-se na página inicial do site do Repositório Institucional.
Att.,
Milena P. Rubi
Bibliotecária
CRB8-6635
Biblioteca Campus Sorocaba
on 2018-01-08T11:07:30Z (GMT) / Submitted by Juliana Ferreira (julianasabfer@gmail.com) on 2018-01-09T00:48:08Z
No. of bitstreams: 2
Dissertação 2.1- avaliação da proposta+conclusao+final- REVISADA.pdf: 5906746 bytes, checksum: 0e38cac22651d3e8fc9d0919fc9e0159 (MD5)
Termo de encaminhamento da versão definitiva.pdf: 214426 bytes, checksum: 41e6d886f9d6683d460f0de7d83c35d3 (MD5) / Approved for entry into archive by Milena Rubi ( ri.bso@ufscar.br) on 2018-01-09T11:15:53Z (GMT) No. of bitstreams: 2
Dissertação 2.1- avaliação da proposta+conclusao+final- REVISADA.pdf: 5906746 bytes, checksum: 0e38cac22651d3e8fc9d0919fc9e0159 (MD5)
Termo de encaminhamento da versão definitiva.pdf: 214426 bytes, checksum: 41e6d886f9d6683d460f0de7d83c35d3 (MD5) / Approved for entry into archive by Milena Rubi ( ri.bso@ufscar.br) on 2018-01-09T11:16:03Z (GMT) No. of bitstreams: 2
Dissertação 2.1- avaliação da proposta+conclusao+final- REVISADA.pdf: 5906746 bytes, checksum: 0e38cac22651d3e8fc9d0919fc9e0159 (MD5)
Termo de encaminhamento da versão definitiva.pdf: 214426 bytes, checksum: 41e6d886f9d6683d460f0de7d83c35d3 (MD5) / Made available in DSpace on 2018-01-09T11:16:12Z (GMT). No. of bitstreams: 2
Dissertação 2.1- avaliação da proposta+conclusao+final- REVISADA.pdf: 5906746 bytes, checksum: 0e38cac22651d3e8fc9d0919fc9e0159 (MD5)
Termo de encaminhamento da versão definitiva.pdf: 214426 bytes, checksum: 41e6d886f9d6683d460f0de7d83c35d3 (MD5)
Previous issue date: 2017-11-07 / Não recebi financiamento / Currently open government data run an important job on regards to public transparency, besides being obligated by law. But most of this data are stored in non-standard ways, isolated and independent, making it very hard for its use by third party systems providers. This work proposes the creation of an approach for capturing this open government data in an automated way, allowing its use in various applications.
For that a Web Crawler was built for the capture and storing of this open government data, as well as an API for making this data available in JSON format, that way developers can easily use this data on their application.
We also performed an evaluation of the API for developers with different levels of experience. / Atualmente os dados abertos governamentais exercem um papel fundamental na transparência pública na gestão dos governos, além de ser uma obrigação legal. Porém grande parte desses dados são publicados em formatos diversos, isolados e independentes, dificultado seu reaproveitamento por sistemas de terceiros que poderiam reusar informações disponibilizadas em tais portais. Este trabalho propõe a criação de uma abordagem para captura de dados abertos governamentais de forma automatizada, permitindo sua reutilização em outras aplicações.
Para isso foi construído um Web Crawler para captura e armazenamento de Dados Abertos Governamentais (DAG) e a API DAG Prefeituras para disponibilizar esses dados no formato JSON para que outros desenvolvedores possam utilizar esses dados em suas aplicações.
Também foi realizada uma avaliação do uso da API para desenvolvedores com diferentes níveis de experiência
|
47 |
Characterizing the Third-Party Authentication Landscape : A Longitudinal Study of how Identity Providers are Used in Modern Websites / Longitudinella mätningar av användandet av tredjepartsautentisering på moderna hemsidorJosefsson Ågren, Fredrik, Järpehult, Oscar January 2021 (has links)
Third-party authentication services are becoming more common since it eases the login procedure by not forcing users to create a new login for every website thatuses authentication. Even though it simplifies the login procedure the users still have to be conscious about what data is being shared between the identity provider (IDP) and the relying party (RP). This thesis presents a tool for collecting data about third-party authentication that outperforms previously made tools with regards to accuracy, precision and recall. The developed tool was used to collect information about third-party authentication on a set of websites. The collected data revealed that third-party login services offered by Facebook and Google are most common and that Twitters login service is significantly less common. Twitter's login service shares the most data about the users to the RPs and often gives the RPs permissions to perform write actions on the users Twitter account. In addition to our large-scale automatic data collection, three manual data collections were performed and compared to previously made manual data collections from a nine-year period. The longitudinal comparison showed that over the nine-year period the login services offered by Facebook and Google have been dominant.It is clear that less information about the users are being shared today compared to earlier years for Apple, Facebook and Google. The Twitter login service is the only IDP that have not changed their permission policies. This could be the reason why the usage of the Twitter login service on websites have decreased. The results presented in this thesis helps provide a better understanding of what personal information is exchanged by IDPs which can guide users to make well educated decisions on the web.
|
48 |
Služba pro ověření spolehlivosti a pečlivosti českých advokátů / A Service for Verification of Czech AttorneysJílek, Radim January 2017 (has links)
This thesis deals with the design and implementation of the Internet service, which allows to objectively assess and verify the reliability and diligence of Czech lawyers based on publicly available data of several courts. The aim of the thesis is to create and put into operation this service. The result of the work are the programs that provide partial actions in the realization of this intention.
|
49 |
Jacking and Equalizing Cylinders for NASA- Crawler TransporterRühlicke, Ingo January 2016 (has links)
For the transport of their spacecraft from the vehicle assembly building to the launch pads at Kennedy Space Centre, Florida, the National Aeronautics and Space Administration (NASA) is using two special crawler transporters since 1965. First developed for the Saturn V rocket the crawler transporters have been sufficient for all following generations of space ships so far. But for the new generation of Orionspacecraft which is under development now, a load capacity increase for the crawler transporter of plus 50% was necessary. For this task Hunger Hydraulik did develop new jacking, equalizing and levelling (JEL) cylinders with sufficient load capacity but also with some new features to improve the availability, reliability and safety of this system. After design approval and manufacture of the cylinders they have been tested in a special developed one-to-one scale dynamic test rig and after passing this the cylinders had to prove their performance in the crawler transporter itself. This article describes the general application and introduces the technical requirements of this project as well as the realized solution.
|
50 |
Raupenfahrzeug-DynamikGraneß, Henry 27 March 2018 (has links)
Bei Raupenfahrwerken wird das allgemeingültige Prinzip verfolgt, dass durch die scharnierbare Aneinanderreihung von Kettengliedern eine fahrzeugeigene Fahrstrecke entsteht. Dies erlaubt selbst schwere Geräte im unwegsamen, brüchigen Gelände mit großen Vortriebskräften zu mobilisieren. Jedoch wohnt, der Diskretisierung des Raupenbandes in Glieder endlicher Länge geschuldet, dem Fahrwerk eine hohe Fahrunruhe inne. Dadurch entstehen zeitvariante Lasten im Fahrwerk, welche die Lebensdauer der Kette, des Fahrwerkantriebs und der Tragstruktur des Fahrzeugs limitieren und somit regelmäßig kostenintensive Instandsetzungsmaßnahmen erzwingen. Diese Problemstellung aufgreifend beschäftigt sich die Arbeit mit der Analyse und Optimierung des fahrdynamischen Verhaltens von Raupenfahrzeugen. Zugleich werden Methoden vorgestellt, welche eine rechenzeiteffiziente Simulation von Raupenfahrzeugen und Antriebssystemen zulassen.:Inhaltsverzeichnis V
Symbolverzeichnis VIII
Abkürzungsverzeichnis XII
1 Einleitung 1
1.1 Eigenschaften und Anwendungsbereiche von Raupenfahrwerken 1
1.2 Problemstellung 2
1.3 Gesamtaufbau Bagger 293 4
1.4 Raupenfahrwerk Bagger 293 5
1.5 Raupenfahrwerk – Fahrschiff 6
1.6 Präzisierte Aufgabenstellung 7
2 Grundlagen und Stand der Technik 11
2.1 Grundlagen zur Fahrunruhe von Raupenfahrwerken 11
2.1.1 Allgemeine Einteilung der Fahrunruhe 11
2.1.2 Innere Fahrwiderstände 12
2.1.3 Äußere Fahrwiderstände 18
2.1.4 Kettenvorspannung 19
2.2 Arbeiten zur Beschreibung der Fahrunruhe von Raupenfahrwerken 20
2.3 Ganzheitliche Analyse von Raupenfahrzeugen 22
2.3.1 Ganzheitliche Systembetrachtung 22
2.3.2 Beiträge zur ganzheitlichen Raupenfahrzeuganalyse 22
3 Detaillierte Modellfindung von Raupenfahrzeugkomponenten 26
3.1 Hintergrund 26
3.2 Elektrisch-Regelungstechnisches System 27
3.2.1 Regelungsprinzip für das einzelne Fahrschiff 27
3.2.2 Regelungsprinzip für das gesamte Fahrwerk 27
3.2.3 PI-Drehzahlregelung 29
3.2.4 P-Drehzahldifferenzregelung 30
3.2.5 Lenkwinkelkorrektur 31
3.2.6 Asynchronmaschine 33
3.2.7 Feldorientierte Regelung 37
3.2.8 Frequenzumrichter 40
3.2.9 Simulation und Analyse des Einzelraupenmodells der Regelung 41
3.3 Fahrwerksmodell 43
3.3.1 Modellbildung und Topologie 43
3.3.2 Fahrsimulation ohne Schakentäler 46
3.3.3 Fahrsimulation mit Schakentälern 51
3.3.4 Fahrsimulation Hangfahrt mit Schakentälern 54
3.3.5 Fahrsimulation Kurvenfahrt mit Schakentälern 56
3.3.6 Sensitivität des Fahrverhaltens 59
3.3.7 Fazit zur Fahrdynamik eines Fahrschiffes 63
3.4 Mechanisches System – Getriebe 63
3.4.1 Modellbildung und Topologie 63
3.4.2 Simulation mit synthetischem Lastfall 67
3.5 Mechanisches System – Unterwagen und Oberbau 69
3.5.1 Modellbildung 69
3.5.2 Simulation im Frequenzbereich 71
4 Rechenzeiteffiziente Ersatzmodelle von Raupenfahrzeugkomponenten 72
4.1 Hintergrund 72
4.2 Elektrisch-Regelungstechnisches System 72
4.2.1 Methodik 72
4.2.2 Simulation und Bewertung 73
4.3 Fahrwerksmodell 74
4.3.1 Methodik 74
4.3.2 Simulation und Bewertung ohne Schakentäler 87
4.3.3 Simulation und Bewertung mit Schakentälern 90
4.4 Getriebemodell 92
4.4.1 Methodik 92
4.4.2 Simulation und Bewertung 96
4.5 Unterwagen- und Oberbaumodell 98
4.5.1 Methodik 98
4.5.2 Simulation und Bewertung 99
5 Ganzheitliche Fahrdynamik-Simulation und Messdatenabgleich 101
5.1 Modellstufen 101
5.1.1 Rheonom betriebenes Fahrschiffmodell 101
5.1.2 Ganzheitliches Fahrschiffmodell 101
5.1.3 Ganzheitliches Fahrzeugmodell 102
5.2 Simulation 103
5.2.1 Vergleich des rheonomen mit dem ganzheitlichen Fahrschiffmodell 103
5.2.2 Einfluss der Oberbauelastizität auf das Fahrverhaltens 104
5.2.3 Einfluss der Phasenlage (Parallelfahrt) 105
5.2.4 Vergleich Messung und Simulation 108
6 Ganzheitliche Optimierung am Fahrschiffmodell 115
6.1 Methodik 115
6.2 Kontinuierliche Rollbahn 115
6.2.1 Hintergrund 115
6.2.2 Erprobung am Ersatzmodell des Fahrwerkes 116
6.2.3 Erprobung am MKS-Kontaktmodell des Fahrwerkes 117
6.3 PI-Motordrehzahlregelung 118
6.3.1 Hintergrund 118
6.3.2 Erprobung am Ersatzmodell mit Schakental-Design 119
6.3.3 Erprobung am MKS-Kontanktmodell mit Schakental-Design 122
6.3.4 Erprobung am Ersatzmodell mit kontinuierlicher Rollbahn 124
6.3.5 Erprobung am MKS-Kontaktmodell mit kontinuierlicher Rollbahn 126
6.3.6 Fazit PI-Drehzahlregelung 127
6.4 PI-Zustandsregelung 127
6.4.1 Methodik 127
6.4.2 Erprobung am Ersatzmodell mit Schakental-Design 133
6.4.3 Erprobung am MKS-Kontaktmodell mit Schakental-Design 135
6.4.4 Erprobung am Ersatzmodell mit kontinuierlicher Rollbahn 135
6.4.5 Erprobung am MKS-Kontaktmodell mit kontinuierlicher Rollbahn 137
6.4.6 Fazit PI-Zustandsregelung 138
6.5 Statische und statisch-dynamische Kettenvorspannung 139
6.5.1 Hintergrund 139
6.5.2 Erprobung am Ersatzmodell 140
6.5.3 Erprobung am MKS-Kontaktmodell 142
6.5.4 Kritische Bewertung 143
7 Ganzheitliche Optimierung am Fahrzeugmodell 144
7.1 Methodik 144
7.2 Kontinuierliche Rollbahn 144
7.3 Kontinuierliche Rollbahn und statische Kettenvorspannung 145
8 Zusammenfassung und Ausblick 146
Literatur 149
Abbildungsverzeichnis 154
Tabellenverzeichnis 159
A Auswertungsgrößen 160
A.1. Amplitudensignal 160
A.2. Schwingungseffektivwert 160
A.3. Kreuzkorrelationskoeffizient 161
B Analytische Berechnung der Lasten bei Kurvenfahrt 162
C Korrelationen CB-Set 164
|
Page generated in 0.2547 seconds