Global ETD Search

1	Universalus programuojamas internetinių robotų kūrimo įrankis / Universal programable tool for Web robots Paškevičius, Marijus 27 May 2005 (has links) Actually, a whole new class of web user is developing. These users are computer programs that have the ability to access the Web in much the same way as a human user with a browser does. There are many names for these kinds of programs, and these names reflect many of the specialized tasks assigned to them. Spiders, bots, and aggregators are all so-called intelligent agents, which execute tasks on the Web without the intervention of a human being. In this research we will examine the differences between them and study possibilities to create user friendly programming toll which would be able to generate such kind of programs. Informatics Botas Agent Agregator Robotas web spider Generatorius Bot
2	NETWORK : Learning from the Architects of Nature Thorup, Matilda January 2021 (has links) The aim of this thesis is to attempt to solve technical and spatial issues in an architectural project by looking at a species of spider, Cyrtphora Citricola. This will be done using desk-based research, reference reading and testing models. The work of architect Frei Otto will also be used as a reference for technical and programmatic solutions in the architectural intervention. The thesis will attempt to answer the question, ‘What aspects of technical and spatial adaptability can be brought into an architectural context by studying spiders and their behavior?’ Spider silk is built up through a protein chain hierarchy, making for a unique structural material. As a species, spiders are particularly adaptable to different living conditions. The specific species Cyrtphora Citricola has a very unique way of building its web which has a tent-shaped formation. It is very adaptable to different sites and living conditions and shares similarities with the tent and netted roof structures designed by Otto. Being a pioneer in the fields of minimal architecture and tension construction, he claims architecture needs to integrate with nature as well as be light and minimal in order to solve the environmental problems we face in modern society. These theories have influenced this thesis and the resulting architectural project proposal. To gain further understanding of tensional structures, experiments using two different methods of model making have been explored. The first uses string and soap film to test the naturally occurring minimal surface of physical models and the second uses a similar method by programming computational software to act like the soap film. The project is summarized in one potential usage of the spider in architecture, an elementary school located in the planned neighborhood Tomtebo Strand, Umeå. The plot is currently all forest, which will be used in the project as a statement of adaptability. As a result of insufficient research surrounding spiders, the project developed into a modern recreation of Otto’s work with tensile construction. The purpose of the architectural project ‘NETWORK’ is to investigate how a large structure can adapt to any location, causing minimal impact. By studying spiders and spider technology and combining the research with the work of Otto; aspects of adaptability, technical function and aesthetical form have been combined to create a project which answers the thesis question. Architecture biomimicry Cyrtophora citricola tent web spider Frei Otto tensile architecture Architecture Arkitektur
3	Protein Composition Correlates with the Mechanical Properties of Spider (<i>Argiope Trifasciata</i>) Dragline Silk Marhabaie, Mohammad 20 September 2013 (has links) No description available. Biology Biomechanics Biochemistry Orb web spider spider silk silk composition natural variation mechanical properties
4	Mot effektiv identifiering och insamling avbrutna länkar med hjälp av en spindel / Towards effective identification and collection of broken links using a web crawler Anttila, Pontus January 2018 (has links) I dagsläget har uppdragsgivaren ingen automatiserad metod för att samla in brutna länkar på deras hemsida, utan detta sker manuellt eller inte alls. Detta projekt har resulterat i en praktisk produkt som idag kan appliceras på uppdragsgivarens hemsida. Produktens mål är att automatisera arbetet med att hitta och samla in brutna länkar på hemsidan. Genom att på ett effektivt sätt samla in alla eventuellt brutna länkar, och placera dem i en separat lista så kan en administratör enkelt exportera listan och sedan åtgärda de brutna länkar som hittats. Uppdragsgivaren kommer att ha nytta av denna produkt då en hemsida utan brutna länkar höjer hemsidans kvalité, samtidigt som den ger besökare en bättre upplevelse. / Today, the customer has no automated method for finding and collecting broken links on their website. This is done manually or not at all. This project has resulted in a practical product, that can be applied to the customer’s website. The aim of the product is to ease the work when collecting and maintaining broken links on the website. This will be achieved by gathering all broken links effectively, and place them in a separate list that at will can be exported by an administrator who will then fix these broken links. The quality of the customer’s website will be higher, as all broken links will be easier to find and remove. This will ultimately give visitors a better experience. error 404 spider web spider links broken links internet web crawler crawler spindel internet brutna länkar länkar 404 crawler web crawler sökmotor Computer and Information Sciences Data- och informationsvetenskap
5	[en] ALUMNI TOOL: INFORMATION RECOVERY OF PERSONAL DATA ON THE WEB IN AUTHENTICATED SOCIAL NETWORKS / [pt] ALUMNI TOOL: RECUPERAÇÃO DE DADOS PESSOAIS NA WEB EM REDES SOCIAIS AUTENTICADAS LUIS GUSTAVO ALMEIDA 02 August 2018 (has links) [pt] O uso de robôs de busca para coletar informações para um determinado contexto sempre foi um problema desafiante e tem crescido substancialmente nos últimos anos. Por exemplo, robôs de busca podem ser utilizados para capturar dados de redes sociais profissionais. Em particular, tais redes permitem estudar as trajetórias profissionais dos egressos de uma universidade, e responder diversas perguntas, como por exemplo: Quanto tempo um ex-aluno da PUC-Rio leva para chegar a um cargo de relevância? No entanto, um problema de natureza comum a este cenário é a impossibilidade de coletar informações devido a sistemas de autenticação, impedindo um robô de busca de acessar determinadas páginas e conteúdos. Esta dissertação aborda uma solução para capturar dados, que contorna o problema de autenticação e automatiza o processo de coleta de dados. A solução proposta coleta dados de perfis de usuários de uma rede social profissional para armazenamento em banco de dados e posterior análise. A dissertação contempla ainda a possibilidade de adicionar diversas outras fontes de dados dando ênfase a uma estrutura de armazém de dados. / [en] The use of search bots to collect information for a given context has grown substantially in recent years. For example, search bots may be used to capture data from professional social networks. In particular, such social networks facilitate studying the professional trajectory of the alumni of a given university, and answer several questions such as: How long does a former student of PUC-Rio take to arrive at a management position? However, a common problem in this scenario is the inability to collect information due to authentication systems, preventing a search robot from accessing certain pages and content. This dissertation addresses a solution to capture data, which circumvents the authentication problem and automates the data collection process. The proposed solution collects data from user profiles for later database storage and analysis. The dissertation also contemplates the possibility of adding several other sources of data giving emphasis to a data warehouse structure. [pt] RECUPERACAO DE INFORMACAO [en] INFORMATION RETRIEVAL [pt] WEB CRAWLING [en] WEB CRAWLING [pt] COLETA DE DADOS [en] DATA RETRIEVAL [pt] BIG DATA [en] BIG DATA [pt] BOTS [en] BOTS [pt] REDES SOCIAIS [en] SOCIAL MEDIA [pt] SELENIUM [en] SELENIUM [pt] SCRAPING [en] SCRAPING [pt] ROBOS DE BUSCA [en] SEARCH ENGINE [pt] WEB SPIDER [en] WEB SPIDER
6	Generic Data Harvester Asp, William, Valck, Johannes January 2022 (has links) This report goes through the process of developing a generic article scraper which shall extract relevant information from an arbitrary web article. The extraction is implemented by searching and examining the HTML of the article, by using Python and XPath. The data that shall be extracted is the title, summary, publishing date and body text of the article. As there is no standard way that websites, and in particular news articles, is built, the extraction needs to be adapted for every different structure and language of articles. The resulting program should provide a proof of concept method of extracting the data showing that future development is possible. The thesis host company Acuminor is working with financial crime intelligence and are collecting information through articles and reports. To scale up the data collection and minimize the maintenance of the scraping programs, a general article scraper is needed. There exist an open source alternative called Newspaper, but since this is no longer being maintained and it can be argued is not properly designed, an internal implementation for the company could be beneficial. The program consists of a main class that imports extractor classes that have an API for extracting the data. Each extractor are decoupled from the rest in order to keep the program as modular as possible. The extraction for title, summary and date are similar, with the extractors looking for specific HTML tags that contain some common attribute that most websites implement. The text extraction is implemented using a tree that is built up from the existing text on the page and then searching the tree for the most likely node containing only the body text, using attributes such as amount of text, depth and number of text nodes. The resulting program does not match the performance of Newspaper, but shows promising results on every part of the extraction. The text extraction is very slow and often takes too much text of the article but provides a great blueprint for further improvement at the company. Acuminor will be able to have their in-house article extraction that suits their wants and needs. / Den här rapporten går igenom processen av att utveckla en generisk artikelskrapare som ska extrahera reöevamt information från en godtycklig artikelhemsida. Extraheringen kommer bli implementerad genom att söka igenom och undersöka HTML-en i artikeln, genom att använda Python och XPath. Datan som skall extraheras är titeln, summering, publiceringsdatum och brödtexten i artikeln. Eftersom det inte finns något standard sätt som hemsidor, och mer specifikt nyhetsartiklar är uppbyggda, extraheringen måste anpassas för varje olika struktur och språk av artiklar. Det resulterande programmed skall visa på ett bevis för ett koncept sätt att extrahera datan som visar på att framtida utveckling är möjlig. Projektets värdföretag Acuminor jobbar inom finansiell brottsintelligens och samlar ihop information genom artiklar och rapporter. För att skala upp insamlingen av data och minimera underhåll av skrapningsprogrammen, behövs en generell artikelskrapare. Det existerar ett öppen källkodsalternativ kallad Newspaper, men eftersom denna inte länge är underhållen och det kan argumenteras att den inte är så bra designad, är en intern implementation för företaget fördelaktigt. Programmet består av en huvudklass som importerar extraheringsklasser som har ett API för att extrahera datan. Varje extraherare är bortkopplad från resten av programmet för att hålla programmet så moodulärt som möjligt. Extraheringen för titel, summering och datum är liknande, där extragherarna tittar efter specifika HTML taggar som innehåller något gemensamt attribut som de flesta hemsidor implementerar. Textextraheringen är implementerad med ett träd som byggs upp från grunden från den existerande texten på sidan och sen söks igenom för att hitta den mest troliga noden som innehåller brödtexten, där den använder attribut såsom text, djup och antal textnoder. Det resulterande programmet matchar inte prestandan av Newspaper, men visar på lovande resultat vid varje del av extraheringen. Textextraheringen är väldigt långsam och hämtar ofta för mycket text från artikeln men lämnar ett bra underlag för vidare förbättring hos företaget. Allt som allt kommer Acuminor kunna bygga vidare på deras egna artikel extraherare som passar deras behov. News Articles Newspapers Web crawler Web site parsing Optimization Web robot Web spider Web data extraction HTML Scrapy Nyheter Artiklar Tidningar Sökrobot Analys av hemsida Optimering Webbrobot Webbspindel Data extrahering hemsidor HTML Scrapy Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.0469 seconds