Spelling suggestions: "subject:"hidden web"" "subject:"midden web""
1 |
Autonomous Consolidation of Heterogeneous Record-Structured HTML Data in ChameleonChouvarine, Philippe 07 May 2005 (has links)
While progress has been made in querying digital information contained in XML and HTML documents, success in retrieving information from the so called "hidden Web" (data behind Web forms) has been modest. There has been a nascent trend of developing autonomous tools for extracting information from the hidden Web. Automatic tools for ontology generation, wrapper generation, Weborm querying, response gathering, etc., have been reported in recent research. This thesis presents a system called Chameleon for automatic querying of and response gathering from the hidden Web. The approach to response gathering is based on automatic table structure identification, since most information repositories of the hidden Web are structured databases, and so the information returned in response to a query will have regularities. Information extraction from the identified record structures is performed based on domain knowledge corresponding to the domain specified in a query. So called "domain plug-ins" are used to make the dynamically generated wrappers domain-specific, rather than conventionally used document-specific.
|
2 |
What is the Hidden Web? / Was ist das Hidden Web? Die Entstehung, Eigenschaften und gesellschaftliche Bedeutung von anonymer Kommunikation im Hidden Web.Papsdorf, Christian 27 April 2016 (has links) (PDF)
More than two-and-a-half million people currently use the Tor network to communicate anonymously via the Internet and gain access to online media that are not accessible using standard Internet technology. This sphere of communication can be described as the hidden web. In part because this phenomenon is very recent, the subject has scarcely been studied in the social sciences. It is therefore the purpose of this paper to answer four fundamental questions: What is the hidden web? What characterises the communication sphere of the hidden web in contrast to the “normal Internet”? Which reasons can be identified to explain the development of the hidden web as a new communication sphere? And, finally, what is the social significance of the hidden web? / Über zweieinhalb Millionen Menschen nutzen gegenwärtig das Tor Network, um anonym über das Internet zu kommunizieren und Zugriff auf Online-Medien zu erhalten, die mit gewöhnlicher Internettechnik nicht nutzbar ist. Diese Kommunikationssphäre kann als Hidden Web bezeichnet werden. Unter anderem weil es sich um ein sehr junges Phänomen handelt, liegen bisher nahezu keine sozialwissenschaftlichen Erkenntnisse zu dem Thema vor. Dementsprechend werden hier vier grundlegende Fragen beantwortet: Was ist das Hidden Web? Welche Eigenschaften weist die Kommunikationssphäre des Hidden Web im Vergleich zum „normalen“ Internet auf? Welche Gründen lassen sich identifizieren, die die Entstehung des Hidden Web als neue Kommunikationssphäre erklären können? Und welche gesellschaftliche Bedeutung kommt dem Hidden Web schließlich zu?
|
3 |
Exploring the Hidden WebPapsdorf, Christian 14 June 2017 (has links) (PDF)
Das Forschungsprojekt „Exploring the Hidden Web. Zu den Nutzungsweisen, Eigenschaften und Spezifika anonymer Kommunikation im Internet“ ging im Rahmen des von der VolkswagenStiftung ausgeschriebenen Programms „Offen - für Außergewöhnliches“ von vier zentralen Fragestellungen aus. Erstens sollte erforscht werden, worüber im Hidden Web kommuniziert wird. Zweitens ging es darum, welche Medien dafür genutzt werden. Und drittens sollte danach gefragt werden, wie unter den Bedingungen der Anonymität das für Interaktionen notwendige Vertrauen hergestellt wird. Für diese drei Aspekte sollte viertens jeweils untersucht
werden, welche Unterschiede, Gemeinsamkeiten und Schnittstellen zu frei zugänglichen, gemeinhin als Internet bezeichneten Medien („Clearnet“) bestehen. Diese Fragen wurden im Rahmen eines explorativen, qualitativen Vorgehens untersucht. / The research project “Exploring the Hidden Web. Use, features and specific character of anonymous communication on the Internet”, as a part of the VolkswagenStiftung funding initiative “Off the beaten track”, was based on four distinct issues: The central research questions pursued are (a) what the topics of communication on the Hidden Web are and (b) which media is used for the communication. Another issue building on this is (c) how, under the condition of anonymity, the trust necessary for any communication is built. Regarding these three aspects, the question is to be posed of (d) which differences, common aspects and interfaces there are with freely-accessible media, commonly referred to as the Internet (“Clearnet”). The empirical foundation of this project is an explorative, qualitative approach.
|
4 |
Σχεδίαση και υλοποίηση συστήματος αξιολόγησης της δομής και του περιεχομένου ιστότοπων για κινητές συσκευέςΣτεφανής, Βασίλειος 12 February 2008 (has links)
Τα τελευταία χρόνια η πρόσβαση στον παγκόσμιο ιστό δεν περιορίζεται μόνο στους επιτραπέζιους υπολογιστές αλλά πλέον περιλαμβάνει τα κινητά τηλέφωνα, τα PDAs και γενικότερα κάθε είδους κινητή συσκευή. Μάλιστα, στις αναπτυσσόμενες χώρες ο αριθμός των χρηστών που πλοηγούνται στον παγκόσμιο ιστό από κινητές συσκευές είναι μεγαλύτερος από αυτόν των χρηστών που πλοηγούνται μέσω επιτραπέζιων υπολογιστών. Επίσης, η ανάπτυξη περιεχομένου για τον παγκόσμιο ιστό έχει γίνει ευκολότερη λόγω της ύπαρξης αρκετών εργαλείων, που υπόσχονται τη γρήγορη και εύκολη παραγωγή του, χωρίς να απαιτούνται ιδιαίτερες γνώσεις από το χρήστη. Το ερώτημα είναι ποια χαρακτηριστικά θα πρέπει να έχουν οι ιστότοποι και το περιεχόμενό τους ώστε να προσφέρεται η βέλτιστη εμπειρία πλοήγησης στους χρήστες κινητών συσκευών.
Το World Wide Web Consortium (W3C) έχει συντάξει τις πρακτικές που θα πρέπει να εφαρμόζονται για τη σωστή παρουσίαση του περιεχομένου του παγκόσμιου ιστού σε κινητές συσκευές (Mobile Web Best Practices). Η συμμόρφωση με τις πρακτικές αυτές είναι απαραίτητη κυρίως λόγω των περιορισμών των κινητών συσκευών. Οι κυριότεροι περιορισμοί είναι το μικρό μέγεθος οθόνης, ο τρόπος εισαγωγής δεδομένων στη συσκευή από το χρήστη, η διαθέσιμη μνήμη, η μικρή υπολογιστική ισχύ, η ταχύτητα μετάδοσης δεδομένων και η αυτονομία των συσκευών σε ενέργεια.
Οι παραπάνω πρακτικές έχουν αντιστοιχηθεί, από το ίδιο το W3C, σε μία σειρά από ελέγχους που μπορούν να γίνουν στη δομή και το περιεχόμενο μιας ιστοσελίδας. Οι έλεγχοι αυτοί αποσκοπούν στο να εξασφαλίσουν ότι η συγκεκριμένη ιστοσελίδα μπορεί να προσφέρει μία αποδεκτή εμπειρία πλοήγησης στους χρήστες κινητών συσκευών. Ένα μέρος από τις πρακτικές αυτές ορίζουν ελέγχους που μπορούν να πραγματοποιηθούν αυτόματα με τη χρήση υπολογιστή, ενώ άλλες ελέγχους που απαιτούν και την ανθρώπινη κρίση.
Στα πλαίσια της διπλωματικής, αφού παρουσιάστηκαν και αναλύθηκαν οι πρακτικές του W3C, σχεδιάστηκε και υλοποιήθηκε σύστημα για την αξιολόγηση της δομής και του περιεχομένου ιστότοπων που απευθύνονται σε κινητές συσκευές. Σκοπός του συστήματος είναι ανάλυση του ιστότοπου, η ανάκτηση των ιστοσελίδων που τον αποτελούν και ο έλεγχος της κάθε ιστοσελίδας για την ικανοποίηση ή όχι των παραπάνω ελέγχων. Τελικός στόχος αποτελεί η δημιουργία αναφοράς που θα αφορά συνολικά τον ιστότοπο καθώς και η παραγωγή βαθμού αξιολόγησης του ιστότοπου. Επίσης, ιδιαίτερο βάρος δόθηκε στην ανάκτηση και την αξιολόγηση σελίδων και περιεχομένου του ιστότοπου που αποτελούν μέρος του «κρυμμένου ιστού» (hidden web). Τέλος, στους χρήστες του συστήματος δίνεται η δυνατότητα χρήσης βαρών σημαντικότητας των ελέγχων που πραγματοποιούνται. / During the last years the access to the Web, not only from desktop PCs but from mobile devices too, such as mobile phones and PDAs, is a fact. Furthermore, in developing countries the number of users that browse the Web through mobile devices is larger than the number of users that browses the web from desktop PCs. Also, the creation of web content is much easier, due to a large number of applications that promise the fast and easy creation of web content without demanding special knowledge from their users. The question is which characteristics the web sites and their content should have in order to improve the user experience when accessed from mobile devices.
The World Wide Web Consortium (W3C) has gathered the practices for delivering Web content to mobile devices (Mobile Web Best Practices). Those practices are strongly recommended because of the limitations of mobile devices. Those limitations are the small screen size, the inputting text method, the available memory, the small computational power and the power consumption.
W3C, based on the above practices, has published a set of tests that refer to the structure and the content of a web page. Web pages which pass the tests provide a functional user experience for users of mobile devices. Some of the practices define tests that are machine verifiable and others tests that require the human judge as well.
In this thesis at first the W3C Mobile Web Best practices are presented. Then, a system for the evaluation of the content and the structure of mobile web sites was designed and implemented. Purpose of the system is the analysis of a web site, the crawling of its web pages and the check of every web page against the W3C tests. The final goal of the system is to provide a report and a rating for the whole web site. Also, a module for crawling and evaluating content of the web site that is part of the "hidden web" is provided. Finally, the system's users may put weights of importance to each W3C test.
|
5 |
Descoberta de ruído em páginas da web oculta através de uma abordagem de aprendizagem supervisionada / A supervised learning approach for noise discovery in web pages found in the hidden webLutz, João Adolfo Froede January 2013 (has links)
Um dos problemas da extração de dados na web é a remoção de ruído existente nas páginas. Esta tarefa busca identificar todos os elementos não informativos em meio ao conteúdo, como por exemplo cabeçalhos, menus ou propagandas. A presença de ruído pode prejudicar seriamente o desempenho de motores de busca e tarefas de mineração de dados na web. Este trabalho aborda o problema da descoberta de ruído em páginas da web oculta, a parte da web que é acessível apenas através do preenchimento de formulários. No processamento da web oculta, a extração de dados geralmente é precedida por uma etapa de inserção de dados, na qual os formulários que dão acesso às páginas ocultas são automaticamente ou semi-automaticamente preenchidos. Durante esta fase, são coleta- dos dados do domínio em questão, como os rótulos e valores dos campos. A proposta deste trabalho é agregar este tipo de dados com informações sintáticas dos elementos que compõem a página. É mostrado empiricamente que esta combinação atinge resultados melhores que uma abordagem baseada apenas em informações sintáticas. / One of the problems of data extraction from web pages is the identification of noise in pages. This task aims at identifying non-informative elements in pages, such as headers, menus, or advertisement. The presence of noise may hinder the performance of search engines and web mining tasks. In this paper we tackle the problem of discovering noise in web pages found in the hidden web, i.e., that part of the web that is only accessible by filling web forms. In hidden web processing, data extraction is usually preceeded by a form filling step, in which the query forms that give access to the hidden web pages are automatically or semi-automatically filled. During form filling relevant data about the queried domain are collected, as field names and field values. Our proposal combines this type of data with syntactic information about the nodes that compose the page. We show empirically that this combination achieves better results than an approach that is based solely on syntactic information. Keywords:
|
6 |
Preenchimento automático de formulários na web oculta / Automatically filling in hiddenweb formsKantorski, Gustavo Zanini January 2014 (has links)
Muitas informações disponíveis na Web estão armazenadas em bancos de dados on-line e são acessíveis somente após um usuário enviar uma consulta por meio de uma interface de busca. Essas informações estão localizadas em uma parte da Web conhecida como Web Oculta ou Web Profunda e, geralmente, são inacessíveis por máquinas de busca tradicionais. Uma vez que a forma de acessar os dados na Web Oculta se dá por intermédio de submissões de consultas, muitos trabalhos têm focado em como preencher automaticamente campos de formulários. Esta tese apresenta uma metodologia para o preenchimento de formulários na Web Oculta. Além disso, descreve uma categorização das técnicas de preenchimento de formulários existentes no estado da arte de coleta na Web Oculta, produzindo uma análise comparativa entre elas. A solução proposta descreve um método automático para seleção de valores para campos de formulários combinando heurísticas e técnicas de aprendizagem de máquina. Experimentos foram realizados em formulários reais da Web, de vários domínios, e os resultados indicam que a abordagem proposta apresenta desempenho comparável aos obtidos pelas técnicas do estado da arte, sendo inclusive significativamente diferente com base em avaliação estatística. / A large portion of the information on the Web is stored inside online databases. Such information is accessible only after the users submit a query through a search interface. TheWeb portion in which that information is located is called HiddenWeb or DeepWeb, and generally this part is inaccessible by traditional search engines crawlers. Since the only way to access the Hidden Web pages is through the query submissions, many works have focused on how to fill in form fields automatically, aiming at enhancing the amount of distinct information hidden behind Web forms. This thesis presents an automatic solution to value selection for fields in Web forms. The solution combines heuristics and machine learning techniques for improving the selection of values. Furthermore, this proposal also describes a categorization of form filling techniques and a comparative analysis between works in the state of the art. Experiments were conducted on real Web sites and the results indicated that our approach significantly outperforms a baseline method in terms of coverage without additional computational cost.
|
7 |
Descoberta de ruído em páginas da web oculta através de uma abordagem de aprendizagem supervisionada / A supervised learning approach for noise discovery in web pages found in the hidden webLutz, João Adolfo Froede January 2013 (has links)
Um dos problemas da extração de dados na web é a remoção de ruído existente nas páginas. Esta tarefa busca identificar todos os elementos não informativos em meio ao conteúdo, como por exemplo cabeçalhos, menus ou propagandas. A presença de ruído pode prejudicar seriamente o desempenho de motores de busca e tarefas de mineração de dados na web. Este trabalho aborda o problema da descoberta de ruído em páginas da web oculta, a parte da web que é acessível apenas através do preenchimento de formulários. No processamento da web oculta, a extração de dados geralmente é precedida por uma etapa de inserção de dados, na qual os formulários que dão acesso às páginas ocultas são automaticamente ou semi-automaticamente preenchidos. Durante esta fase, são coleta- dos dados do domínio em questão, como os rótulos e valores dos campos. A proposta deste trabalho é agregar este tipo de dados com informações sintáticas dos elementos que compõem a página. É mostrado empiricamente que esta combinação atinge resultados melhores que uma abordagem baseada apenas em informações sintáticas. / One of the problems of data extraction from web pages is the identification of noise in pages. This task aims at identifying non-informative elements in pages, such as headers, menus, or advertisement. The presence of noise may hinder the performance of search engines and web mining tasks. In this paper we tackle the problem of discovering noise in web pages found in the hidden web, i.e., that part of the web that is only accessible by filling web forms. In hidden web processing, data extraction is usually preceeded by a form filling step, in which the query forms that give access to the hidden web pages are automatically or semi-automatically filled. During form filling relevant data about the queried domain are collected, as field names and field values. Our proposal combines this type of data with syntactic information about the nodes that compose the page. We show empirically that this combination achieves better results than an approach that is based solely on syntactic information. Keywords:
|
8 |
Preenchimento automático de formulários na web oculta / Automatically filling in hiddenweb formsKantorski, Gustavo Zanini January 2014 (has links)
Muitas informações disponíveis na Web estão armazenadas em bancos de dados on-line e são acessíveis somente após um usuário enviar uma consulta por meio de uma interface de busca. Essas informações estão localizadas em uma parte da Web conhecida como Web Oculta ou Web Profunda e, geralmente, são inacessíveis por máquinas de busca tradicionais. Uma vez que a forma de acessar os dados na Web Oculta se dá por intermédio de submissões de consultas, muitos trabalhos têm focado em como preencher automaticamente campos de formulários. Esta tese apresenta uma metodologia para o preenchimento de formulários na Web Oculta. Além disso, descreve uma categorização das técnicas de preenchimento de formulários existentes no estado da arte de coleta na Web Oculta, produzindo uma análise comparativa entre elas. A solução proposta descreve um método automático para seleção de valores para campos de formulários combinando heurísticas e técnicas de aprendizagem de máquina. Experimentos foram realizados em formulários reais da Web, de vários domínios, e os resultados indicam que a abordagem proposta apresenta desempenho comparável aos obtidos pelas técnicas do estado da arte, sendo inclusive significativamente diferente com base em avaliação estatística. / A large portion of the information on the Web is stored inside online databases. Such information is accessible only after the users submit a query through a search interface. TheWeb portion in which that information is located is called HiddenWeb or DeepWeb, and generally this part is inaccessible by traditional search engines crawlers. Since the only way to access the Hidden Web pages is through the query submissions, many works have focused on how to fill in form fields automatically, aiming at enhancing the amount of distinct information hidden behind Web forms. This thesis presents an automatic solution to value selection for fields in Web forms. The solution combines heuristics and machine learning techniques for improving the selection of values. Furthermore, this proposal also describes a categorization of form filling techniques and a comparative analysis between works in the state of the art. Experiments were conducted on real Web sites and the results indicated that our approach significantly outperforms a baseline method in terms of coverage without additional computational cost.
|
9 |
Preenchimento automático de formulários na web oculta / Automatically filling in hiddenweb formsKantorski, Gustavo Zanini January 2014 (has links)
Muitas informações disponíveis na Web estão armazenadas em bancos de dados on-line e são acessíveis somente após um usuário enviar uma consulta por meio de uma interface de busca. Essas informações estão localizadas em uma parte da Web conhecida como Web Oculta ou Web Profunda e, geralmente, são inacessíveis por máquinas de busca tradicionais. Uma vez que a forma de acessar os dados na Web Oculta se dá por intermédio de submissões de consultas, muitos trabalhos têm focado em como preencher automaticamente campos de formulários. Esta tese apresenta uma metodologia para o preenchimento de formulários na Web Oculta. Além disso, descreve uma categorização das técnicas de preenchimento de formulários existentes no estado da arte de coleta na Web Oculta, produzindo uma análise comparativa entre elas. A solução proposta descreve um método automático para seleção de valores para campos de formulários combinando heurísticas e técnicas de aprendizagem de máquina. Experimentos foram realizados em formulários reais da Web, de vários domínios, e os resultados indicam que a abordagem proposta apresenta desempenho comparável aos obtidos pelas técnicas do estado da arte, sendo inclusive significativamente diferente com base em avaliação estatística. / A large portion of the information on the Web is stored inside online databases. Such information is accessible only after the users submit a query through a search interface. TheWeb portion in which that information is located is called HiddenWeb or DeepWeb, and generally this part is inaccessible by traditional search engines crawlers. Since the only way to access the Hidden Web pages is through the query submissions, many works have focused on how to fill in form fields automatically, aiming at enhancing the amount of distinct information hidden behind Web forms. This thesis presents an automatic solution to value selection for fields in Web forms. The solution combines heuristics and machine learning techniques for improving the selection of values. Furthermore, this proposal also describes a categorization of form filling techniques and a comparative analysis between works in the state of the art. Experiments were conducted on real Web sites and the results indicated that our approach significantly outperforms a baseline method in terms of coverage without additional computational cost.
|
10 |
Descoberta de ruído em páginas da web oculta através de uma abordagem de aprendizagem supervisionada / A supervised learning approach for noise discovery in web pages found in the hidden webLutz, João Adolfo Froede January 2013 (has links)
Um dos problemas da extração de dados na web é a remoção de ruído existente nas páginas. Esta tarefa busca identificar todos os elementos não informativos em meio ao conteúdo, como por exemplo cabeçalhos, menus ou propagandas. A presença de ruído pode prejudicar seriamente o desempenho de motores de busca e tarefas de mineração de dados na web. Este trabalho aborda o problema da descoberta de ruído em páginas da web oculta, a parte da web que é acessível apenas através do preenchimento de formulários. No processamento da web oculta, a extração de dados geralmente é precedida por uma etapa de inserção de dados, na qual os formulários que dão acesso às páginas ocultas são automaticamente ou semi-automaticamente preenchidos. Durante esta fase, são coleta- dos dados do domínio em questão, como os rótulos e valores dos campos. A proposta deste trabalho é agregar este tipo de dados com informações sintáticas dos elementos que compõem a página. É mostrado empiricamente que esta combinação atinge resultados melhores que uma abordagem baseada apenas em informações sintáticas. / One of the problems of data extraction from web pages is the identification of noise in pages. This task aims at identifying non-informative elements in pages, such as headers, menus, or advertisement. The presence of noise may hinder the performance of search engines and web mining tasks. In this paper we tackle the problem of discovering noise in web pages found in the hidden web, i.e., that part of the web that is only accessible by filling web forms. In hidden web processing, data extraction is usually preceeded by a form filling step, in which the query forms that give access to the hidden web pages are automatically or semi-automatically filled. During form filling relevant data about the queried domain are collected, as field names and field values. Our proposal combines this type of data with syntactic information about the nodes that compose the page. We show empirically that this combination achieves better results than an approach that is based solely on syntactic information. Keywords:
|
Page generated in 0.0673 seconds