11 |
[en] IS SECRECY STILL THE SOUL OF BUSINESS?: A DISCOURSE ANALYSIS OF NATIONAL INTELLIGENCE SERVICES' HOME PAGES / [pt] O SEGREDO AINDA É A ALMA DO NEGÓCIO?: UMA ANÁLISE DO DISCURSO DE HOME PAGES DE SERVIÇOS NACIONAIS DE INTELIGÊNCIASANDRA MARA SANTA BARBA MIRANDA 01 March 2007 (has links)
[pt] Os serviços nacionais de inteligência são instituições que,
tradicionalmente, têm se fechado ao escrutínio público e
cujas atividades são
cercadas por certo mistério. Este trabalho investiga a
página inicial (home page)
na internet de três serviços nacionais de inteligência: o
turco, o italiano e o
australiano. A análise das páginas discute o discurso
institucional dos serviços em
duas esferas: a verbal e a visual. A análise verbal
focaliza a declaração de
missão/lema dessas instituições e a análise visual
contempla as imagens e o layout
das páginas como um todo. Como suporte teórico para a
análise verbal, utiliza-se
a gramática funcional de Halliday (1994). Já a análise
visual se fundamenta na
teoria da multimodalidade de Kress e van Leeuwen (1996). O
estudo sugere que,
embora a presença dos serviços nacionais de inteligência
na internet possa
parecer, à primeira vista, uma mudança de postura de
relacionamento para com o
público em geral, muito pouco é dito acerca de princípios,
objetivos específicos
ou métodos de atuação pelos quais se pautam essas
instituições. / [en] National intelligence services are institutions which have
traditionally
avoided public scrutiny and whose activities have been
clothed in some mistery.
This study investigates the home page of three national
intelligence services: the
Turkish, the Italian and the Australian. The analysis
examines two aspects of the
institutional discourse of the services: the verbal and
the visual. The verbal
analysis centers on the institutional mission
statement/motto and the visual
analysis focuses on images and layout of the pages as a
whole. The verbal
analysis is based on Halliday's functional grammar (1994)
and critical discourse
analysis. The verbal analysis draws on Kress and van
Leeuwen's theory of
multimodality (1996). The results suggest that, although
the presence of national
intelligence services on the internet may seem to be a
step forward toward a more
transparent relationship with the general public, very
little is actually said about
specific principles, objectives and modus operandi these
institutions guide
themselves.
|
12 |
Nové metody segmentace webových stránek / New Web Page Segmentation MethodsMalaník, Michal January 2016 (has links)
The aim of this work is to introduce a new vision based web page segmentation method. This method is based on very popular VIPS segmentation algorithm, which is trying to represent the segmented web document in the same way as it is perceived by a user using a web browser. Compared to the VIPS algorithm, there are some optimizations for modern websites in our method, especially for documents created in the HTML 5 language. We also deal with the implementaion of the proposed method using the FITLayout framework.
|
13 |
Segmentation de pages web, évaluation et applications / Web page segmentation, evaluation and applicationsSanoja Vargas, Andrés 22 January 2015 (has links)
Les pages web sont devenues plus complexes que jamais, principalement parce qu'elles sont générées par des systèmes de gestion de contenu (CMS). Il est donc difficile de les analyser, c'est-à-dire d'identifier et classifier automatiquement les différents éléments qui les composent. La segmentation de pages web est une des solutions à ce problème. Elle consiste à décomposer une page web en segments, visuellement et sémantiquement cohérents, appelés blocs. La qualité d'une segmentation est mesurée par sa correction et sa généricité, c'est-à-dire sa capacité à traiter des pages web de différents types. Notre recherche se concentre sur l'amélioration de la segmentation et sur une mesure fiable et équitable de la qualité des segmenteurs. Nous proposons un modèle pour la segmentation ainsi que notre segmenteur Block-o-Matic (BoM). Nous définissons un modèle d'évaluation qui prend en compte le contenu ainsi que la géométrie des blocs pour mesurer la correction d'un segmenteur par rapport à une vérité de terrain. Ce modèle est générique, il permet de tester tout algorithme de segmentation et observer ses performances sur différents types de page. Nous l'avons testé sur quatre segmenteurs et quatre types de pages. Les résultats montrent que BOM surpasse ses concurrents en général et que la performance relative d'un segmenteur dépend du type de page. Enfin, nous présentons deux applications développées au dessus de BOM. Pagelyzer compare deux versions de pages web et décide si elles sont similaires ou pas. C'est la principale contribution de notre équipe au projet européen Scape (FP7-IP). Nous avons aussi développé un outil de migration de pages HTML4 vers le nouveau format HTML5. / Web pages are becoming more complex than ever, as they are generated by Content Management Systems (CMS). Thus, analyzing them, i.e. automatically identifying and classifying different elements from Web pages, such as main content, menus, among others, becomes difficult. A solution to this issue is provided by Web page segmentation which refers to the process of dividing a Web page into visually and semantically coherent segments called blocks.The quality of a Web page segmenter is measured by its correctness and its genericity, i.e. the variety of Web page types it is able to segment. Our research focuses on enhancing this quality and measuring it in a fair and accurate way. We first propose a conceptual model for segmentation, as well as Block-o-Matic (BoM), our Web page segmenter. We propose an evaluation model that takes the content as well as the geometry of blocks into account in order to measure the correctness of a segmentation algorithm according to a predefined ground truth. The quality of four state of the art algorithms is experimentally tested on four types of pages. Our evaluation framework allows testing any segmenter, i.e. measuring their quality. The results show that BoM presents the best performance among the four segmentation algorithms tested, and also that the performance of segmenters depends on the type of page to segment.We present two applications of BoM. Pagelyzer uses BoM for comparing two Web pages versions and decides if they are similar or not. It is the main contribution of our team to the European project Scape (FP7-IP). We also developed a migration tool of Web pages from HTML4 format to HTML5 format in the context of Web archives.
|
14 |
Automated retrieval and extraction of training course information from unstructured web pagesXhemali, Daniela January 2010 (has links)
Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, Naïve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance.
|
15 |
Grid-Enabled Automatic Web Page ClassificationMetikurke, Seema Sreenivasamurthy 12 June 2006 (has links)
Much research has been conducted on the retrieval and classification of web-based information. A big challenge is the performance issue, especially for a classification algorithm returning results for a large set of data that is typical when accessing the Web. This thesis describes a grid-enabled approach for automatic web page classification. The basic approach is first described that uses a vector space model (VSM). An enhancement of the approach through the use of a genetic algorithm (GA) is then described. The enhanced approach can efficiently process candidate web pages from a number of web sites and classify them. A prototype is implemented and empirical studies are conducted. The contributions of this thesis are: 1) Application of grid computing to improve performance of both VSM and GA using VSM based web page classification; 2) Improvement of the VSM classification algorithm by applying GA that uniquely discovers a set of training web pages while also generating a near optimal parameter values set for VSM.
|
16 |
Hypertextual Ultrastructures: Movement and Containment in Texts and HypertextsCoste, Rosemarie L. 14 January 2010 (has links)
The surface-level experience of hypertextuality as formless and unbounded, blurring boundaries among texts and between readers and writers, is created by a deep structure which is not normally presented to readers and which, like the ultrastructure of living cells, defines and controls texts' nature and functions. Most readers, restricted to surface-level interaction with texts, have little access to the deep structure of any hypertext. In this dissertation, I argue that digital hypertexts differ essentially from paper texts in that hypertexts are constructed in multiple layers, with surface-level appearance and behavior controlled by sub-surface ultrastructure, and that these multiple layers of structure enable and necessitate new methods of textual study designed for digital texts.
Using participant-observation from within my own practice as a webmaster, I closely examine the sub-surface structural layers that create several kinds of Web-based digital hypertexts: blogs, forums, static Web pages, and dynamic Web pages. With these hypertexts as the primary models, along with their enabling software and additional digital texts-wikis, news aggregators, word processing documents, digital photographs, electronic mail, electronic forms-available to me as a reader/author rather than a webmaster, I demonstrate methods of investigating and describing the development of digital texts. These methods, like methods already established within textual studies to trace the development of printed texts, can answer questions about accidental and intentional textual change, the roles of collaborators, and the ways texts are shaped by production processes and mediating technologies. As a step toward a formalist criticism of hypertext, I propose concrete ways of categorizing, describing, and comparing hypertexts and their components. I also demonstrate techniques for visualizing the structures, histories, and interrelationships of hypertexts and explore methods of using self-descriptive surface elements in paper-like texts as partial substitutes for the sub-surface self-description available in software-like texts. By identifying digitization as a gateway to cooperation between human and artificial intelligences rather than an end in itself, I suggest natural areas of expansion for the humanities computing collaboration as well as new methodologies by which originally-printed texts can be studied in their digital forms alongside originally-digital texts.
|
17 |
Web Usage Mining And Recommendation With Semantic InformationSalin, Suleyman 01 March 2009 (has links) (PDF)
Web usage mining has become popular in various business areas related with Web site development. In Web usage mining, the commonly visited navigational paths are extracted in terms of Web page addresses from the Web server visit logs, and the patterns are used in various applications. The semantic information of the Web page contents is generally not included in Web usage mining. In this thesis, a framework for integrating semantic information with Web usage mining is implemented. The frequent navigational patterns are extracted in the forms of ontology instances instead of Web page addresses and the result is used for making page recommendations to the visitor. Moreover, an evaluation mechanism is implemented to find the success of the recommendation. Test results proved that stronger and more accurate recommendations are obtained by including semantic information in the Web usage mining instead of using on visited Web page addresses.
|
18 |
The Study of Applying Category Management on Adaptive Customer-centered Online CatalogsLiu, Chiang-Luan 26 June 2001 (has links)
The Internet with growing electronic commerce is regarded as
a new selling channel for retailers. Online catalog organization has
become an important issue for e-tailing business development.
While most online retailing web sites provide assistance for
searchers who know exactly what they are seeking, little has been
done to aid browsers who take a more open-minded and
exploratory approach to navigation. Good design of online catalogs
is essential for browsers to shop over the web.
In this paper, we propose a two-phase approach to the design
of online catalogs. In the first phase, the idea of category
management that analyzes customers¡¦ purchasing behaviors is
employed to construct a customer-centered online catalog.
Cluster-based market segmentation helps determine the web
hierarchy with clusters of products in higher levels indicating more
interesting to customers. The second phase is to dynamically adjust
the hierarchy when customers¡¦ preference indicated by browsing
patterns is changed.'Relative access' that reflects the popularity of
web pages is used as a basis to make online catalog adaptation.
Finally, we apply this approach to real-world data collected at
Galleze.com by Blue Martini Software. It shows that our approach
can result in meaningful online catalog organization for customers
to navigate. Our study therefore provides a good direction for
researchers in designing online catalogs. Furthermore, e-tailing
practitioners can apply our approach easily and gain benefits from
such a design.
|
19 |
Design and implementation of hypermedia learning environments that facilitate the construction of knowledge about analytical geometryPavaputanon, Lha January 2007 (has links)
This study aimed to develop a teaching and learning model, based on principles derived from the fields of constructivist theory, schema theory, critical literacy theory, and design theory, to inform the development of hypermedia-mediated learning environments that facilitate the construction of mathematical knowledge by secondary school students in Thailand. In this study, the participants were a group of three secondary school students from the Demonstration school attached to the Faculty of Education at Khon Kaen University (Thailand). In order to ascertain how mathematical learning could be facilitated by the process of designing a web page that could be used to introduce other students to analytic geometry, all three participants were asked to work collaboratively to design an analytic geometry web page. The process of designing the web page was informed by a theoretical model derived from an analysis and synthesis from the research literature on constructivist theory, schema theory, critical literacy theory, and design theory. Findings from the study indicated that the creation of a web page facilitated and enhanced the Thai students' learning about analytic geometry. The major outcomes from the study are a revised theoretical framework to inform the integration of the design of mathematical web pages into Thai mathematics classrooms and a conceptual map framework to assess qualitative and quantitative changes to students' repertoires of knowledge about analytic geometry that emerge during the process of designing a webpage.
|
20 |
OPIS : um método para identificação e busca de páginas-objeto / OPIS : a method for object page identifying and searchingColpo, Miriam Pizzatto January 2014 (has links)
Páginas-objeto são páginas que representam exatamente um objeto inerente do mundo real na web, considerando um domínio específico, e a busca por essas páginas é chamada de busca-objeto. Os motores de busca convencionais (do Inglês, General Search Engine - GSE) conseguem responder, de forma satisfatória, à maioria das consultas realizadas na web atualmente, porém, isso dificilmente ocorre no caso de buscas-objeto, uma vez que, em geral, a quantidade de páginas-objeto recuperadas é bastante limitada. Essa dissertação propõe um novo método para a identificação e a busca de páginas-objeto, denominado OPIS (acrônimo para Object Page Identifying and Searching). O cerne do OPIS está na adoção de técnicas de realimentação de relevância e aprendizagem de máquina na tarefa de classificação, baseada em conteúdo, de páginas-objeto. O OPIS não descarta o uso de GSEs e, ao invés disso, em sua etapa de busca, propõe a integração de um classificador a um GSE, adicionando uma etapa de filtragem ao processo de busca tradicional. Essa abordagem permite que somente páginas identificadas como páginas-objeto sejam recuperadas pelas consultas dos usuários, melhorando, assim, os resultados de buscas-objeto. Experimentos, considerando conjuntos de dados reais, mostram que o OPIS supera o baseline com ganho médio de 47% de precisão média. / Object pages are pages that represent exactly one inherent real-world object on the web, regarding a specific domain, and the search for these pages is named as object search. General Search Engines (GSE) can satisfactorily answer most of the searches performed in the web nowadays, however, this hardly occurs with object search, since, in general, the amount of retrieved object pages is limited. This work proposes a method for both identifying and searching object pages, named OPIS (acronyms to Object Page Identifying and Searching). The kernel of OPIS is to adopt relevance feedback and machine learning techniques in the task of content-based classification of object pages. OPIS does not discard the use of GSEs and, instead, in his search step, proposes the integration of a classifier to a GSE, adding a filtering step to the traditional search process. This simple approach allows that only pages identified as object pages are retrieved by user queries, improving the results for object search. Experiments with real datasets show that OPIS outperforms the baseline with average boost of 47% considering the average precision.
|
Page generated in 0.0432 seconds