Global ETD Search

1	Observed and predicted drawbar pull of crawler track shoes Kuether, D. O. January 1964 (has links) Thesis (M.S.)--University of Wisconsin--Madison, 1964. / eContent provider-neutral record in process. Description based on print version record. Bibliography: l. 68-69. Crawler tractors.
2	A Domain Based Approach to Crawl the Hidden Web Pandya, Milan 04 December 2006 (has links) There is a lot of research work being performed on indexing the Web. More and more sophisticated Web crawlers are been designed to search and index the Web faster. But all these traditional crawlers crawl only the part of Web we call “Surface Web”. They are unable to crawl the hidden portion of the Web. These traditional crawlers retrieve contents only from surface Web pages which are just a set of Web pages linked by some hyperlinks and ignoring the hidden information. Hence, they ignore tremendous amount of information hidden behind these search forms in Web pages. Most of the published research has been done to detect such searchable forms and make a systematic search over these forms. Our approach here will be based on a Web crawler that analyzes search forms and fills tem with appropriate content to retrieve maximum relevant information from the database. web crawler search spider web bot best first crawler focused web crawler web page domain based Computer Sciences
3	Fresh Analysis of Streaming Media Stored on the Web Karki, Rabin 13 January 2011 (has links) With the steady increase in the bandwidth available to end users and Web sites hosting user generated content, there appears to be more multimedia content on the Web than ever before. Studies to quantify media stored on the Web done in 1997 and 2003 are now dated since the nature, size and number of streaming media objects on the Web have changed considerably. Although there have been more recent studies characterizing specific streaming media sites like YouTube, there are only a few studies that focus on characterizing the media stored on the Web as a whole. We build customized tools to crawl the Web, identify streaming media content and extract the characteristics of the streaming media found. We choose 16 different starting points and crawled 1.25 million Web pages from each starting point. Using the custom built tools, the media objects are identified and analyzed to determine attributes including media type, media length, codecs used for encoding, encoded bitrate, resolution, and aspect ratio. A little over half the media clips we encountered are video. MP3 and AAC are the most prevalent audio codecs whereas H.264 and FLV are the most common video codecs. The median size and encoded bitrates of stored media have increased since the last study. Information on the characteristics of stored multimedia and their trends over time can help system designers. The results can also be useful for empirical Internet measurements studies that attempt to mimic the behavior of streaming media traffic over the Internet. stored media streaming media media crawler
4	Effective web crawlers Ali, Halil, hali@cs.rmit.edu.au January 2008 (has links) Web crawlers are the component of a search engine that must traverse the Web, gathering documents in a local repository for indexing by a search engine so that they can be ranked by their relevance to user queries. Whenever data is replicated in an autonomously updated environment, there are issues with maintaining up-to-date copies of documents. When documents are retrieved by a crawler and have subsequently been altered on the Web, the effect is an inconsistency in user search results. While the impact depends on the type and volume of change, many existing algorithms do not take the degree of change into consideration, instead using simple measures that consider any change as significant. Furthermore, many crawler evaluation metrics do not consider index freshness or the amount of impact that crawling algorithms have on user results. Most of the existing work makes assumptions about the change rate of documents on the Web, or relies on the availability of a long history of change. Our work investigates approaches to improving index consistency: detecting meaningful change, measuring the impact of a crawl on collection freshness from a user perspective, developing a framework for evaluating crawler performance, determining the effectiveness of stateless crawl ordering schemes, and proposing and evaluating the effectiveness of a dynamic crawl approach. Our work is concerned specifically with cases where there is little or no past change statistics with which predictions can be made. Our work analyses different measures of change and introduces a novel approach to measuring the impact of recrawl schemes on search engine users. Our schemes detect important changes that affect user results. Other well-known and widely used schemes have to retrieve around twice the data to achieve the same effectiveness as our schemes. Furthermore, while many studies have assumed that the Web changes according to a model, our experimental results are based on real web documents. We analyse various stateless crawl ordering schemes that have no past change statistics with which to predict which documents will change, none of which, to our knowledge, has been tested to determine effectiveness in crawling changed documents. We empirically show that the effectiveness of these schemes depends on the topology and dynamics of the domain crawled and that no one static crawl ordering scheme can effectively maintain freshness, motivating our work on dynamic approaches. We present our novel approach to maintaining freshness, which uses the anchor text linking documents to determine the likelihood of a document changing, based on statistics gathered during the current crawl. We show that this scheme is highly effective when combined with existing stateless schemes. When we combine our scheme with PageRank, our approach allows the crawler to improve both freshness and quality of a collection. Our scheme improves freshness regardless of which stateless scheme it is used in conjunction with, since it uses both positive and negative reinforcement to determine which document to retrieve. Finally, we present the design and implementation of Lara, our own distributed crawler, which we used to develop our testbed. Web Crawler Search Engines Index Consistency Efficiency
5	Mot effektiv identifiering och insamling avbrutna länkar med hjälp av en spindel / Towards effective identification and collection of broken links using a web crawler Anttila, Pontus January 2018 (has links) I dagsläget har uppdragsgivaren ingen automatiserad metod för att samla in brutna länkar på deras hemsida, utan detta sker manuellt eller inte alls. Detta projekt har resulterat i en praktisk produkt som idag kan appliceras på uppdragsgivarens hemsida. Produktens mål är att automatisera arbetet med att hitta och samla in brutna länkar på hemsidan. Genom att på ett effektivt sätt samla in alla eventuellt brutna länkar, och placera dem i en separat lista så kan en administratör enkelt exportera listan och sedan åtgärda de brutna länkar som hittats. Uppdragsgivaren kommer att ha nytta av denna produkt då en hemsida utan brutna länkar höjer hemsidans kvalité, samtidigt som den ger besökare en bättre upplevelse. / Today, the customer has no automated method for finding and collecting broken links on their website. This is done manually or not at all. This project has resulted in a practical product, that can be applied to the customer’s website. The aim of the product is to ease the work when collecting and maintaining broken links on the website. This will be achieved by gathering all broken links effectively, and place them in a separate list that at will can be exported by an administrator who will then fix these broken links. The quality of the customer’s website will be higher, as all broken links will be easier to find and remove. This will ultimately give visitors a better experience. error 404 spider web spider links broken links internet web crawler crawler spindel internet brutna länkar länkar 404 crawler web crawler sökmotor Computer and Information Sciences Data- och informationsvetenskap
6	Population ecology of the beech scale (Cryptococcus fagisuga Ldgr.) Gate, Imogen Mary January 1990 (has links) No description available. 577
7	A Novel Hybrid Focused Crawling Algorithm to Build Domain-Specific Collections Chen, Yuxin 28 March 2007 (has links) The Web, containing a large amount of useful information and resources, is expanding rapidly. Collecting domain-specific documents/information from the Web is one of the most important methods to build digital libraries for the scientific community. Focused Crawlers can selectively retrieve Web documents relevant to a specific domain to build collections for domain-specific search engines or digital libraries. Traditional focused crawlers normally adopting the simple Vector Space Model and local Web search algorithms typically only find relevant Web pages with low precision. Recall also often is low, since they explore a limited sub-graph of the Web that surrounds the starting URL set, and will ignore relevant pages outside this sub-graph. In this work, we investigated how to apply an inductive machine learning algorithm and meta-search technique, to the traditional focused crawling process, to overcome the above mentioned problems and to improve performance. We proposed a novel hybrid focused crawling framework based on Genetic Programming (GP) and meta-search. We showed that our novel hybrid framework can be applied to traditional focused crawlers to accurately find more relevant Web documents for the use of digital libraries and domain-specific search engines. The framework is validated through experiments performed on test documents from the Open Directory Project. Our studies have shown that improvement can be achieved relative to the traditional focused crawler if genetic programming and meta-search methods are introduced into the focused crawling process. / Ph. D. meta-search digital libraries focused crawler classification
8	Lokman: A Medical Ontology Based Topical Web Crawler Kayisoglu, Altug 01 September 2005 (has links) (PDF) Use of ontology is an approach to overcome the &ldquo / search-on-the-net&rdquo / problem. An ontology based web information retrieval system requires a topical web crawler to construct a high quality document collection. This thesis focuses on implementing a topical web crawler with medical domain ontology in order to find out the advantages of ontological information in web crawling. Crawler is implemented with Best-First search algorithm. Design of the crawler is optimized to UMLS ontology. Crawler is tested with Harvest Rate and Target Recall Metrics and compared to a non-ontology based Best-First Crawler. Performed test results proved that ontology use in crawler URL selection algorithm improved the crawler performance by 76%. QA Computer Software 76.75-76.765
9	Development of an online reputation monitor / Gerhardus Jacobus Christiaan Venter Venter, Gerhardus Jacobus Christiaan January 2015 (has links) The opinion of customers about companies are very important as this can influence a company’s profit. Companies often get customer feedback via surveys or other official methods in order to improve their services. However, some customers feel threatened when their opinions are publicly asked and thus prefer to voice their opinion on the internet where they take comfort in anonymity. This form of customer feedback is difficult to monitor as the information can be found anywhere on the internet and new information is generated at an astonishing rate. Currently there are companies such as Brandseye and Brand.Com that provide online reputation management services. These services have various shortcomings such as cost and is incapable of accessing historical data. Companies are also not allowed to purchase these software and can only use the software on a subscription basis. The design proposed in this document will be able to scan any number of user defined websites and save all the information found on the websites in a series of index files, which can be queried for occurrences of user defined keywords at any time. Additionally, the software will also be able to scan Twitter and Facebook for any number of user defined keywords and save any occurrences of the keywords to a database. After scanning the internet, the results will be passed through a similarity filter, which will filter out insignificant results as well as any duplicates that might be present. Once passed through the filter the remaining results will be analysed by a sentiment analysis tool which will determine whether the sentence in which the keyword occurs is positive or negative. The analysed results will determine the overall reputation of the keyword that was used. The proposed design has several advantages over current systems: - By using the modular design several tasks can execute at the same time without influencingeach other. For example; information can be extracted from the internet while existing resultsare being analysed. - By providing the keywords and websites that the system will use the user will have full controlover the online reputation management process. - By saving all the information contained in a website the user will be able to take historicalinformation into account to determine how the keywords reputation changes over time. Savingthe information will also allow the user to search for any keyword without rescanning theinternet. The proposed system was tested and successfully used to determine the online reputation of many user defined keywords. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2015 Online Reputation Monitor Web crawler Facebook Twitter dtSearch Sentiment Analysis
10	Development of an online reputation monitor / Gerhardus Jacobus Christiaan Venter Venter, Gerhardus Jacobus Christiaan January 2015 (has links) The opinion of customers about companies are very important as this can influence a company’s profit. Companies often get customer feedback via surveys or other official methods in order to improve their services. However, some customers feel threatened when their opinions are publicly asked and thus prefer to voice their opinion on the internet where they take comfort in anonymity. This form of customer feedback is difficult to monitor as the information can be found anywhere on the internet and new information is generated at an astonishing rate. Currently there are companies such as Brandseye and Brand.Com that provide online reputation management services. These services have various shortcomings such as cost and is incapable of accessing historical data. Companies are also not allowed to purchase these software and can only use the software on a subscription basis. The design proposed in this document will be able to scan any number of user defined websites and save all the information found on the websites in a series of index files, which can be queried for occurrences of user defined keywords at any time. Additionally, the software will also be able to scan Twitter and Facebook for any number of user defined keywords and save any occurrences of the keywords to a database. After scanning the internet, the results will be passed through a similarity filter, which will filter out insignificant results as well as any duplicates that might be present. Once passed through the filter the remaining results will be analysed by a sentiment analysis tool which will determine whether the sentence in which the keyword occurs is positive or negative. The analysed results will determine the overall reputation of the keyword that was used. The proposed design has several advantages over current systems: - By using the modular design several tasks can execute at the same time without influencingeach other. For example; information can be extracted from the internet while existing resultsare being analysed. - By providing the keywords and websites that the system will use the user will have full controlover the online reputation management process. - By saving all the information contained in a website the user will be able to take historicalinformation into account to determine how the keywords reputation changes over time. Savingthe information will also allow the user to search for any keyword without rescanning theinternet. The proposed system was tested and successfully used to determine the online reputation of many user defined keywords. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2015 Online Reputation Monitor Web crawler Facebook Twitter dtSearch Sentiment Analysis

Search results