• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 35
  • 12
  • 10
  • 6
  • 5
  • 3
  • 2
  • 1
  • Tagged with
  • 77
  • 45
  • 15
  • 13
  • 12
  • 11
  • 10
  • 10
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

M-crawler: Crawling Rich Internet Applications Using Menu Meta-model

Choudhary, Suryakant 27 July 2012 (has links)
Web applications have come a long way both in terms of adoption to provide information and services and in terms of the technologies to develop them. With the emergence of richer and more advanced technologies such as Ajax, web applications have become more interactive, responsive and user friendly. These applications, often called Rich Internet Applications (RIAs) changed the traditional web applications in two primary ways: Dynamic manipulation of client side state and Asynchronous communication with the server. At the same time, such techniques also introduce new challenges. Among these challenges, an important one is the difficulty of automatically crawling these new applications. Crawling is not only important for indexing the contents but also critical to web application assessment such as testing for security vulnerabilities or accessibility. Traditional crawlers are no longer sufficient for these newer technologies and crawling in RIAs is either inexistent or far from perfect. There is a need for an efficient crawler for web applications developed using these new technologies. Further, as more and more enterprises use these new technologies to provide their services, the requirement for a better crawler becomes inevitable. This thesis studies the problems associated with crawling RIAs. Crawling RIAs is fundamentally more difficult than crawling traditional multi-page web applications. The thesis also presents an efficient RIA crawling strategy and compares it with existing methods.
12

The breastfeeding triangle: crawling as a mediator of breastfeeding duration and cognitive development at 2 years of age

Bodnarchuk, Jennifer L. 07 April 2005 (has links)
Longer breastfeeding durations may enhance cognition and accelerate motor development; motor development, and in particular, crawling, may lead to dramatic changes in cognition. Based on these empirical relations, the hypothesis that crawling mediates breastfeeding duration and cognitive outcome was tested. Specifically, it was hypothesized that longer breastfeeding durations would significantly predict both earlier crawling and higher cognitive scores at 2 years of age, that earlier crawling would also predict higher cognitive scores, and that earlier crawling would account for part of the relationship between longer breastfeeding durations and higher cognitive scores. A sample of 44 full term infants from Winnipeg, Manitoba was followed longitudinally between birth and 2 years of age. Data on breastfeeding duration and crawling were collected through daily parent checklists, with supplemental breastfeeding information obtained via questionnaires. Near the toddlers’ 2nd birthdays, cognitive abilities were assessed with the MacArthur Communicative Development Inventory: Words and Sentences (Fenson et al., 1993) and the Parent Report of Children’s Abilities (Saudino et al., 1998). All 3 key variables were measured on continuous scales, and a mediational analysis based on Baron and Kenny’s (1986) classic approach of 3 regressions was used. Several covariates were considered for inclusion in the regressions, but none reached significance in preliminary tests and thus, were not included. In the first 2 regression analyses, exclusive and partial breastfeeding durations significantly predicted neither cognitive scores (p = .59) nor age of crawling attainment (p = .41). The 3rd regression analysis showed a significant, small-to-medium effect size for earlier crawling attainment predicting higher cognitive scores (p < .05, adjusted R2 = .09). However, crawling onset had no effect on the breastfeeding-cognition link. The overall test of the mediation was inconclusive, due to low power. The significant finding between age of crawling onset and cognitive outcomes at 2 years of age may be due to earlier crawling altering the course of development, to reverse causation whereby more cognitively advanced infants are motivated to crawl sooner, or to a 3rd variable affecting both crawling and cognition. Future research should continue to explore motor and cognitive connections in infant development. / May 2005
13

Topic-Oriented Collaborative Web Crawling

Chung, Chiasen January 2001 (has links)
A <i>web crawler</i> is a program that "walks" the Web to gather web resources. In order to scale to the ever-increasing Web, multiple crawling agents may be deployed in a distributed fashion to retrieve web data co-operatively. A common approach is to divide the Web into many partitions with an agent assigned to crawl within each one. If an agent obtains a web resource that is not from its partition, the resource will be transferred to the rightful owner. This thesis proposes a novel approach to distributed web data gathering by partitioning the Web into topics. The proposed approach employs multiple focused crawlers to retrieve pages from various topics. When a crawler retrieves a page of another topic, it transfers the page to the appropriate crawler. This approach is known as <i>topic-oriented collaborative web crawling</i>. An implementation of the system was built and experimentally evaluated. In order to identify the topic of a web page, a topic classifier was incorporated into the crawling system. As the classifier categorizes only English pages, a language identifier was also introduced to distinguish English pages from non-English ones. From the experimental results, we found that redundance retrieval was low and that a resource, retrieved by an agent, is six times more likely to be retained than a system that uses conventional hashing approach. These numbers were viewed as strong indications that <i>topic-oriented collaborative web crawling system</i> is a viable approach to web data gathering.
14

Topic-Oriented Collaborative Web Crawling

Chung, Chiasen January 2001 (has links)
A <i>web crawler</i> is a program that "walks" the Web to gather web resources. In order to scale to the ever-increasing Web, multiple crawling agents may be deployed in a distributed fashion to retrieve web data co-operatively. A common approach is to divide the Web into many partitions with an agent assigned to crawl within each one. If an agent obtains a web resource that is not from its partition, the resource will be transferred to the rightful owner. This thesis proposes a novel approach to distributed web data gathering by partitioning the Web into topics. The proposed approach employs multiple focused crawlers to retrieve pages from various topics. When a crawler retrieves a page of another topic, it transfers the page to the appropriate crawler. This approach is known as <i>topic-oriented collaborative web crawling</i>. An implementation of the system was built and experimentally evaluated. In order to identify the topic of a web page, a topic classifier was incorporated into the crawling system. As the classifier categorizes only English pages, a language identifier was also introduced to distinguish English pages from non-English ones. From the experimental results, we found that redundance retrieval was low and that a resource, retrieved by an agent, is six times more likely to be retained than a system that uses conventional hashing approach. These numbers were viewed as strong indications that <i>topic-oriented collaborative web crawling system</i> is a viable approach to web data gathering.
15

The breastfeeding triangle: crawling as a mediator of breastfeeding duration and cognitive development at 2 years of age

Bodnarchuk, Jennifer L. 07 April 2005 (has links)
Longer breastfeeding durations may enhance cognition and accelerate motor development; motor development, and in particular, crawling, may lead to dramatic changes in cognition. Based on these empirical relations, the hypothesis that crawling mediates breastfeeding duration and cognitive outcome was tested. Specifically, it was hypothesized that longer breastfeeding durations would significantly predict both earlier crawling and higher cognitive scores at 2 years of age, that earlier crawling would also predict higher cognitive scores, and that earlier crawling would account for part of the relationship between longer breastfeeding durations and higher cognitive scores. A sample of 44 full term infants from Winnipeg, Manitoba was followed longitudinally between birth and 2 years of age. Data on breastfeeding duration and crawling were collected through daily parent checklists, with supplemental breastfeeding information obtained via questionnaires. Near the toddlers’ 2nd birthdays, cognitive abilities were assessed with the MacArthur Communicative Development Inventory: Words and Sentences (Fenson et al., 1993) and the Parent Report of Children’s Abilities (Saudino et al., 1998). All 3 key variables were measured on continuous scales, and a mediational analysis based on Baron and Kenny’s (1986) classic approach of 3 regressions was used. Several covariates were considered for inclusion in the regressions, but none reached significance in preliminary tests and thus, were not included. In the first 2 regression analyses, exclusive and partial breastfeeding durations significantly predicted neither cognitive scores (p = .59) nor age of crawling attainment (p = .41). The 3rd regression analysis showed a significant, small-to-medium effect size for earlier crawling attainment predicting higher cognitive scores (p < .05, adjusted R2 = .09). However, crawling onset had no effect on the breastfeeding-cognition link. The overall test of the mediation was inconclusive, due to low power. The significant finding between age of crawling onset and cognitive outcomes at 2 years of age may be due to earlier crawling altering the course of development, to reverse causation whereby more cognitively advanced infants are motivated to crawl sooner, or to a 3rd variable affecting both crawling and cognition. Future research should continue to explore motor and cognitive connections in infant development.
16

The breastfeeding triangle: crawling as a mediator of breastfeeding duration and cognitive development at 2 years of age

Bodnarchuk, Jennifer L. 07 April 2005 (has links)
Longer breastfeeding durations may enhance cognition and accelerate motor development; motor development, and in particular, crawling, may lead to dramatic changes in cognition. Based on these empirical relations, the hypothesis that crawling mediates breastfeeding duration and cognitive outcome was tested. Specifically, it was hypothesized that longer breastfeeding durations would significantly predict both earlier crawling and higher cognitive scores at 2 years of age, that earlier crawling would also predict higher cognitive scores, and that earlier crawling would account for part of the relationship between longer breastfeeding durations and higher cognitive scores. A sample of 44 full term infants from Winnipeg, Manitoba was followed longitudinally between birth and 2 years of age. Data on breastfeeding duration and crawling were collected through daily parent checklists, with supplemental breastfeeding information obtained via questionnaires. Near the toddlers’ 2nd birthdays, cognitive abilities were assessed with the MacArthur Communicative Development Inventory: Words and Sentences (Fenson et al., 1993) and the Parent Report of Children’s Abilities (Saudino et al., 1998). All 3 key variables were measured on continuous scales, and a mediational analysis based on Baron and Kenny’s (1986) classic approach of 3 regressions was used. Several covariates were considered for inclusion in the regressions, but none reached significance in preliminary tests and thus, were not included. In the first 2 regression analyses, exclusive and partial breastfeeding durations significantly predicted neither cognitive scores (p = .59) nor age of crawling attainment (p = .41). The 3rd regression analysis showed a significant, small-to-medium effect size for earlier crawling attainment predicting higher cognitive scores (p < .05, adjusted R2 = .09). However, crawling onset had no effect on the breastfeeding-cognition link. The overall test of the mediation was inconclusive, due to low power. The significant finding between age of crawling onset and cognitive outcomes at 2 years of age may be due to earlier crawling altering the course of development, to reverse causation whereby more cognitively advanced infants are motivated to crawl sooner, or to a 3rd variable affecting both crawling and cognition. Future research should continue to explore motor and cognitive connections in infant development.
17

Engenhos de Busca Distribuídos: Uma abordagem visando escalabilidade para Crawling e Indexação

Fernandes, Marcelo Rômulo January 2001 (has links)
Made available in DSpace on 2014-06-12T15:59:10Z (GMT). No. of bitstreams: 2 arquivo4931_1.pdf: 581419 bytes, checksum: 6d9e1efec074c836155c1e69761c3415 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2001 / A Internet é uma das principais fontes de informação utilizadas no apoio à solução de problemas. Paralelamente a este fato, os Engenhos de Busca surgem como um dos meios mais utilizados para pesquisa de informação nesse ambiente. Observa-se que o tamanho extraordinário, o crescimento exponencial e a elevada taxa de modificação da World-Wide-Web (www) requerem novas abordagens aos problemas de indexação e pesquisa de informação na estrutura dos Engenhos de Busca. Neste trabalho, uma solução distribuída para operação de Engenhos de Busca é apresentada, visando escalabilidade e atualidade. São comentadas arquiteturas distribuídas para Engenhos de Busca. Apresenta-se o Radix, um Engenho de Busca distribuído para indexar e pesquisar informação na www, baseado em visões Web. Um protótipo é desenvolvido, focalizando a implementação de crawling e indexação do Radix distribuído, a fim de validar o ambiente proposto. Um estudo de caso comparativo de desempenho entre Engenhos de Busca centralizados e distribuídos é apresentado, encorajando o uso de técnicas de distribuição para elevar os valores de cobertura e atualidade desses sistemas
18

M-crawler: Crawling Rich Internet Applications Using Menu Meta-model

Choudhary, Suryakant January 2012 (has links)
Web applications have come a long way both in terms of adoption to provide information and services and in terms of the technologies to develop them. With the emergence of richer and more advanced technologies such as Ajax, web applications have become more interactive, responsive and user friendly. These applications, often called Rich Internet Applications (RIAs) changed the traditional web applications in two primary ways: Dynamic manipulation of client side state and Asynchronous communication with the server. At the same time, such techniques also introduce new challenges. Among these challenges, an important one is the difficulty of automatically crawling these new applications. Crawling is not only important for indexing the contents but also critical to web application assessment such as testing for security vulnerabilities or accessibility. Traditional crawlers are no longer sufficient for these newer technologies and crawling in RIAs is either inexistent or far from perfect. There is a need for an efficient crawler for web applications developed using these new technologies. Further, as more and more enterprises use these new technologies to provide their services, the requirement for a better crawler becomes inevitable. This thesis studies the problems associated with crawling RIAs. Crawling RIAs is fundamentally more difficult than crawling traditional multi-page web applications. The thesis also presents an efficient RIA crawling strategy and compares it with existing methods.
19

Skrapa Facebook : En kartläggning över hur data kan samlas in från Facebook / Scraping Facebook : A survey of how data could be collected from Facebook

Holm, Andreas, Ahlm, Oscar January 2021 (has links)
På sociala medier delas det varje dag en stor mängd data. Om denna data kan samlas in ochsorteras, kan den vara värdefull som underlag för forskningsarbete. Särskilt för forskning iländer där sociala medier kan vara enda platsen för medborgare att göra sin röst hörd. Fa-cebook är en av världens mest använda sociala medieplattformar och är därför en potentiellrik källa att samla data ifrån. Dock har Facebook på senare år valt att vara mer restrik-tiv kring vem som får tillgång till data på deras plattform. Detta har öppnat ett intresseför hur man kan få tillgång till den data som delas på Facebooks plattform utan explicittillstånd från Facebook. Det öppnar samtidigt för frågor kring etik och legalitet gällandedetsamma. Detta arbete ämnade därför undersöka olika aspekter, så som tekniska, etiska,lagliga, kring att samla data från Facebooks plattform genom att utföra en litteraturstudiesamt experiment. Litteraturstudien visade att det var svårt att hitta material om vilkatekniska åtgärder som Facebook tar för att förhindra webbskrapning. Experimenten somgenomfördes visade en del av dessa, bland annat att HTML-strukturen förändras och attid för HTML-element förändras vid vissa händelser, vilket försvårar webbskrapningspro-cessen. Litteraturstudien visade även att det är besvärligt att veta vad som är lagligt attskrapa från Facebook och vad som är olagligt. Detta dels för att olika länder har olika lagaratt förhålla sig till när det kommer till webbskrapning, dels för att det kan vara svårt attveta vad som räknas som personlig data och som då skyddas av bland annat GDPR. / A vast amount of data is shared daily on social media platforms. Data that if it can becollected and sorted can prove valueable as a basis for research work. Especially in countrieswhere social media constitutes the only possible place for citizens to make their voicesheard. Facebook is one of the most frequently used social media platforms and thus can bea potential rich source from which data can be collected. But Facebook has become morerestrictive about who gets access to the data on their platform. This has created an interestin ways how to get access to the data that is shared on Facebooks platform without gettingexplicit approval from Facebook. At the same time it creates questions about the ethicsand the legality of it. This work intended to investigate different aspects, such as technical,ethical, legal, related to the collecting of data from Facebooks platform by performing aliterary review and experiments. The literary review showed that it was difficult to findmaterial regarding technical measures taken by Facebook to prevent web scraping. Theexperiments that were performed identified some of these measures, among others thatthe structure of the HTML code changes and that ids of HTML elements updates whendifferent events occur on the web page, which makes web scraping increasingly difficult.The literary review also showed that it is troublesome to know which data is legal to scrapefrom Facebook and which is not. This is partly due to the fact that different countries havedifferent laws to which one must conform when scraping web data, and partly that it canbe difficult to know what counts as personal data and thus is protected by GDPR amongother laws.
20

[en] TEXT MINING AT THE INTELLIGENT WEB CRAWLING PROCESS / [pt] MINERAÇÃO DE TEXTOS NA COLETA INTELIGENTE DE DADOS NA WEB

FABIO DE AZEVEDO SOARES 31 March 2009 (has links)
[pt] Esta dissertação apresenta um estudo sobre a utilização de Mineração de Textos no processo de coleta inteligente de dados na Web. O método mais comum de obtenção de dados na Web consiste na utilização de web crawlers. Web crawlers são softwares que, uma vez alimentados por um conjunto inicial de URLs (sementes), iniciam o procedimento metódico de visitar um site, armazenálo em disco e extrair deste os hyperlinks que serão utilizados para as próximas visitas. Entretanto, buscar conteúdo desta forma na Web é uma tarefa exaustiva e custosa. Um processo de coleta inteligente de dados na Web, mais do que coletar e armazenar qualquer documento web acessível, analisa as opções de crawling disponíveis para encontrar links que, provavelmente, fornecerão conteúdo de alta relevância a um tópico definido a priori. Na abordagem de coleta de dados inteligente proposta neste trabalho, tópicos são definidos, não por palavras chaves, mas, pelo uso de documentos textuais como exemplos. Em seguida, técnicas de pré-processamento utilizadas em Mineração de Textos, entre elas o uso de um dicionário thesaurus, analisam semanticamente o documento apresentado como exemplo. Baseado nesta análise, o web crawler construído será guiado em busca do seu objetivo: recuperar informação relevante sobre o documento. A partir de sementes ou realizando uma consulta automática nas máquinas de buscas disponíveis, o crawler analisa, igualmente como na etapa anterior, todo documento recuperado na Web. Então, é executado um processo de comparação entre cada documento recuperado e o documento exemplo. Depois de obtido o nível de similaridade entre ambos, os hyperlinks do documento recuperado são analisados, empilhados e, futuramente, serão desempilhados de acordo seus respectivos e prováveis níveis de importância. Ao final do processo de coleta de dados, outra técnica de Mineração de Textos é aplicada, objetivando selecionar os documentos mais representativos daquela coleção de textos: a Clusterização de Documentos. A implementação de uma ferramenta que contempla as heurísticas pesquisadas permitiu obter resultados práticos, tornando possível avaliar o desempenho das técnicas desenvolvidas e comparar os resultados obtidos com outras formas de recuperação de dados na Web. Com este trabalho, mostrou-se que o emprego de Mineração de Textos é um caminho a ser explorado no processo de recuperação de informação relevante na Web. / [en] This dissertation presents a study about the application of Text Mining as part of the intelligent Web crawling process. The most usual way of gathering data in Web consists of the utilization of web crawlers. Web crawlers are softwares that, once provided with an initial set of URLs (seeds), start the methodical proceeding of visiting a site, store it in disk and extract its hyperlinks that will be used for the next visits. But seeking for content in this way is an expensive and exhausting task. An intelligent web crawling process, more than collecting and storing any web document available, analyses its available crawling possibilities for finding links that, probably, will provide high relevant content to a topic defined a priori. In the approach suggested in this work, topics are not defined by words, but rather by the employment of text documents as examples. Next, pre-processing techniques used in Text Mining, including the use of a Thesaurus, analyze semantically the document submitted as example. Based on this analysis, the web crawler thus constructed will be guided toward its objective: retrieve relevant information to the document. Starting from seeds or querying through available search engines, the crawler analyzes, exactly as in the previous step, every document retrieved in Web. the similarity level between them is obtained, the retrieved document`s hyperlinks are analysed, queued and, later, will be dequeued according to each one`s probable degree of importance. By the end of the gathering data process, another Text Mining technique is applied, with the propose of selecting the most representative document among the collected texts: Document Clustering. The implementation of a tool incorporating all the researched heuristics allowed to achieve results, making possible to evaluate the performance of the developed techniques and compare all obtained results with others means of retrieving data in Web. The present work shows that the use of Text Mining is a track worthy to be exploited in the process of retrieving relevant information in Web.

Page generated in 0.0736 seconds