Spelling suggestions: "subject:"3research engine."" "subject:"1research engine.""
131 |
Websites are capable of reflecting a particular human temperament : fact or fad?Theron, Annatjie. January 2008 (has links)
Thesis (MIT(Informatics))--University of Pretoria, 2008. / Abstract in English and Afrikaans. Includes bibliographical references.
|
132 |
Ολοκληρωμένο διαδικτυακό σύστημα διαχείρισης και οργάνωσης δομημένων ηλεκτρονικών δημοσιευμάτωνΤσαρούχης, Αθανάσιος 16 June 2011 (has links)
Τα τελευταία χρόνια αυξάνεται διαρκώς ο όγκος και η ποικιλομορφία των εργασιών πραγματοποιούνται με τη χρήση του Διαδικτύου και των πηγών που είναι διαθέσιμες σε αυτό. Ιδιαίτερη σημασία δίνεται στην ικανοποίηση των αναγκών των κοινοτήτων χρηστών, των οποίων οι αριθμοί παρουσιάζουν εκρηκτική αύξηση. Ένα από τα σημαντικότερα είδη πηγών που ικανοποιούν τις ανάγκες των κοινοτήτων του Διαδικτύου είναι οι ψηφιακές βιβλιοθήκες. Οι ψηφιακές βιβλιοθήκες αποτελούν συλλογές από δομημένα ηλεκτρονικά δημοσιεύματα, οι οποίες προσφέρουν στην κοινότητα των χρηστών όπου απευθύνονται εξειδικευμένες υπηρεσίες, που αφορούν στο περιεχόμενο των εγγράφων που περιλαμβάνουν, εξασφαλίζοντας συγκεκριμένο επίπεδο ποιότητας σύμφωνα με κωδικοποιημένες πολιτικές.
Στα πλαίσια της παρούσας διπλωματικής εργασίας πραγματοποιήθηκε ο σχεδιασμός και η υλοποίηση ενός διαδικτυακού εργαλείου για την φορμαλιστική αποθήκευση, παρουσίαση και διαχείριση των δομημένων ηλεκτρονικών δημοσιευμάτων μιας ψηφιακής βιβλιοθήκης. Κατά το σχεδιασμό των υπηρεσιών που παρέχονται μελετήθηκε η δυνατότητα ενσωμάτωσης τεχνικών επεξεργασίας φυσικής γλώσσας, με στόχο την αποτίμηση της συνεισφοράς και της απόδοσης τέτοιων τεχνικών στην αποθήκευση και ανάκτηση δομημένων ηλεκτρονικών δημοσιευμάτων γραμμένα στη Νέα Ελληνική, μια γλώσσα με ιδιαίτερη μορφολογία. / Over the past years the amounts of tasks that are being carried out the web as well as the size of the web data demonstrate a signature growth. The major challenge associated with exploiting this data is how to satisfy the user’s needs into fulfilling tasks. One significant source of web data is scientific digital libraries, which contain structured information sources of research documents and which offer end users specialized search services that ensure good retrieved quality.
In the course of the present study we designed and implemented a web-based data retrieval service that incorporates novel indexing and information modules specialized for the core of Modern Greek. The novelty of our service is that relies on advanced NLP applications in order to improve retrieval performance. The experimental evaluation of our Modern Greek Service reveals that NLP significantly improves retrieval performance compared to baseline information retrieval systems.
|
133 |
Search engine poisoning and its prevalence in modern search enginesBlaauw, Pieter January 2013 (has links)
The prevalence of Search Engine Poisoning in trending topics and popular search terms on the web within search engines is investigated. Search Engine Poisoning is the act of manipulating search engines in order to display search results from websites infected with malware. Research done between February and August 2012, using both manual and automated techniques, shows us how easily the criminal element manages to insert malicious content into web pages related to popular search terms within search engines. In order to provide the reader with a clear overview and understanding of the motives and the methods of the operators of Search Engine Poisoning campaigns, an in-depth review of automated and semi-automated web exploit kits is done, as well as looking into the motives for running these campaigns. Three high profile case studies are examined, and the various Search Engine Poisoning campaigns associated with these case studies are discussed in detail to the reader. From February to August 2012, data was collected from the top trending topics on Google’s search engine along with the top listed sites related to these topics, and then passed through various automated tools to discover if these results have been infiltrated by the operators of Search Engine Poisoning campaings, and the results of these automated scans are then discussed in detail. During the research period, manual searching for Search Engine Poisoning campaigns was also done, using high profile news events and popular search terms. These results are analysed in detail to determine the methods of attack, the purpose of the attack and the parties behind it
|
134 |
Search engine strategies: a model to improve website visibility for SMME websitesChambers, Rickard January 2005 (has links)
THESIS
Submitted in fulfilment
of the requirements for the degree
MAGISTER TECHNOLOGIAE
in
INFORMATION TECHNOLOGY
in the
FACULTY OF BUSINESS INFORMATICS
at the
CAPE PENINSULA UNIVERSITY OF TECHNOLOGY
2005 / The Internet has become the fastest growing technology the world has
ever seen. It also has the ability to permanently change the face of
business, including e-business. The Internet has become an important
tool required to gain potential competitiveness in the global information
environment. Companies could improve their levels of functionality and
customer satisfaction by adopting e-commerce, which ultimately could
improve their long-term profitability.
Those companies who do end up adopting the use of the Internet, often
fail to gain the advantage of providing a visible website. Research has
also shown that even though the web provides numerous opportunities,
the majority of SMMEs (small, medium and micro enterprises) are often
ill equipped to exploit the web’s commercial potential. It was determined
in this research project through the analysis of 300 websites, that only
6.3% of SMMEs in the Western Cape Province of South Africa appears
within the top 30 results of six search engines, when searching for
services/products.
This lack of ability to produce a visible website is believed to be due to
the lack of education and training, financial support and availability of
time prevalent in SMMEs. For this reason a model was developed to
facilitate the improvement of SMME website visibility.
To develop the visibility model, this research project was conducted to
identify potential elements which could provide a possible increase in
website visibility. A criteria list of these elements was used to evaluate a
sample of websites, to determine to what extent they made use of these
potential elements.
An evaluation was then conducted with 144 different SMME websites by
searching for nine individual keywords within four search engines
(Google, MSN, Yahoo, Ananzi), and using the first four results of every
keyword from every search engine for analysis. Elements gathered
through academic literature were then listed according to the usage of
these elements in the top-ranking websites when searching for
predetermined keywords. Further qualitative research was conducted to
triangulate the data gathered from the literature and the quantitative
research.
The evaluative results provided the researcher with possible elements /
designing techniques to formulate a model to develop a visible website
that is not only supported by arrant research, but also through real
current applications. The research concluded that, as time progresses
and technology improves, new ways to improve website visibility will
evolve. Furthermore, that there is no quick method for businesses to
produce a visible website as there are many aspects that should be
considered when developing “visible” websites.
|
135 |
Organização e representação do conhecimento : fundamentos teórico-metodológicos na busca e recuperação da informação em ambientes virtuaisMiranda, Marcos Luiz Cavalcanti de 30 March 2005 (has links)
Made available in DSpace on 2015-10-19T11:49:48Z (GMT). No. of bitstreams: 1
miranda2006.pdf: 3122088 bytes, checksum: 442cc4ddb8bc4d866c42612383c292df (MD5)
Previous issue date: 2005-03-30 / The present work approaches information retrieval in virtual environments analysing search engines performance using natural language in the context of knowledge organization. The research has as starting point theories and methodologies of the
knowledge organization for information retrieval found in Library and Information Sciences and Documentation areas. Different forms and levels of information retrieval in the Web were characterized by analyzing search engines available. The
search engines were also analyzed in order to identify their potential performance in information retrieval, and in order to verify the information retrieval rate in the Web using natural language. The analyses performed on the given search engines
(Altavista, Google and Yahoo) considered as variables documents, terms, occurrences, subject scope and ranking. The results obtained, interpreted on the basis of theoretical and methodological approaches to knowledge organization for
Information retrieval in current environments, revealed that this is a valuable support to subsidize knowledge organization in virtual environment. The results also indicate that information retrieval rate may be improved if knowledge organization for
Information retrieval in the web considers information processing, mainly based on conceptual relations, approaching human cognitive structures / Aborda a recuperação da informação em ambientes virtuais analisando o desempenho dos mecanismos de busca que utilizam a linguagem natural no contexto da organização do conhecimento. Esta pesquisa teve como ponto de partida as teorias e as metodologias da Organização do Conhecimento para
recuperação da informação encontradas nas áreas de Biblioteconomia, Ciência da Informação e Documentação. As diferentes formas e os diversos níveis de recuperação da informação na Web foram caracterizados pela análise de alguns
mecanismos de busca disponíveis. Os mecanismos de busca também foram analisados de forma a identificar seu potencial desempenho na recuperação da informação e verificar o índice de recuperabilidade da informação na Web com o uso da linguagem natural. As análises dos mecanismos de busca (Altavista, Google e Yahoo) foram realizadas considerando variáveis como: documentos, termos, ocorrências, cobertura de assunto e ranqueamento. Os resultados obtidos, interpretados sob o corpus de conhecimento téorico-metodológico da organização do conhecimento para recuperação da informação em ambientes atuais, revelaram que este corpus de conhecimento tem embasamento válido para subsidiar a
organização do conhecimento em ambientes virtuais. Os resultados também indicaram que o índice de recuperabilidade da informação na Web poderá aumentar se a organização do conhecimento para recuperação da informação considerar o
processamento da informação, principalmente baseado em relações conceituais, de maneira similar á estrutura cognitiva humana
|
136 |
Classification into Readability Levels : Implementation and EvaluationLarsson, Patrik January 2006 (has links)
The use for a readability classification model is mainly as an integrated part of an information retrieval system. By matching the user's demands of readability to the documents with the corresponding readability, the classification model can further improve the results of, for example, a search engine. This thesis presents a new solution for classification into readability levels for Swedish. The results from the thesis are a number of classification models. The models were induced by training a Support Vector Machines classifier on features that are established by previous research as good measurements of readability. The features were extracted from a corpus annotated with three readability levels. Natural Language Processing tools for tagging and parsing were used to analyze the corpus and enable the extraction of the features from the corpus. Empirical testings of different feature combinations were performed to optimize the classification model. The classification models render a good and stable classification. The best model obtained a precision score of 90.21\% and a recall score of 89.56\% on the test-set, which is equal to a F-score of 89.88. / Uppsatsen beskriver utvecklandet av en klassificeringsmodell för Svenska texter beroende på dess läsbarhet. Användningsområdet för en läsbaretsklassificeringsmodell är främst inom informationssökningssystem. Modellen kan öka träffsäkerheten på de dokument som anses relevanta av en sökmotor genom att matcha användarens krav på läsbarhet med de indexerade dokumentens läsbarhet. Resultatet av uppsatsen är ett antal modeller för klassificering av text beroende på läsbarhet. Modellerna har tagits fram genom att träna upp en Support Vector Machines klassificerare, på ett antal särdrag som av tidigare forskning har fastslagits vara goda mått på läsbarhet. Särdragen extraherades från en korpus som är annoterad med tre läsbarhetsnivåer. Språkteknologiska verktyg för taggning och parsning användes för att möjliggöra extraktionen av särdragen. Särdragen utvärderades empiriskt i olika särdragskombinationer för att optimera modellerna. Modellerna testades och utvärderades med goda resultat. Den bästa modellen hade en precision på 90,21 och en recall på 89,56, detta ger en F-score som är 89,88. Uppsatsen presenterar förslag på vidareutveckling samt potentiella användningsområden.
|
137 |
[en] NCE: AN ALGORITHM FOR CONTENT EXTRACTION IN NEWS PAGES / [pt] NCE: UM ALGORITMO PARA EXTRAÇÃO DE CONTEÚDO DE PÁGINAS DE NOTÍCIASEVELIN CARVALHO FREIRE DE AMORIM 15 September 2017 (has links)
[pt] A extração de entidades de páginas web é comumente utilizada para melhorar a qualidade de muitas tarefas realizadas por máquinas de busca como detecção de páginas duplicadas e ranking. Essa tarefa se torna ainda mais relevante devido ao crescente volume de informação da internet com as quais as máquinas de busca precisam lidar. Existem diversos algoritmos para detecção de conteúdo na literatura, alguns orientados a sites e outros que utilizam uma abordagem mais local e são chamados de algoritmos orientados a páginas. Os algoritmos orientados a sites utilizam várias páginas de um mesmo site para criar um modelo que detecta o conteúdo relevante da página. Os algoritmos orientados a páginas detectam conteúdo avaliando as características de cada página, sem comparar com outras páginas. Neste trabalho apresentamos um algoritmo, chamado NCE ( News Content Extractor), orientado a página e que se propõe a realizar extração de entidades em páginas de notícias. Ele utiliza atributos de uma árvore DOM para localizar determinadas entidades de uma página de notícia, mais especificamente, o título e o corpo da notícia. Algumas métricas são apresentadas e utilizadas para aferir a qualidade do NCE. Quando comparado com outro método baseado em página e que utiliza atributos visuais, o NCE se mostrou superior tanto em relação à qualidade de extração quanto no que diz respeito ao tempo de execução. / [en] The entity extraction of web pages is commonly used to enhance the quality of tasks performed by search engines, like duplicate pages and ranking. The relevance of entity extraction is crucial due to the fact that
search engines have to deal with fast growning volume of information on the web. There are many algorithms that detect entities in the literature, some using site level strategy and others using page level strategy. The site level strategy uses many pages from the same site to create a model that extracts templates. The page level strategy creates a model to extract templates according to features of the page. Here we present an algorithm, called NCE (News Content Extractor), that uses a page level strategy and
its objective is to perform entity extraction on news pages. It uses features from a DOM tree to search for certain entities, namely, the news title and news body. Some measures are presented and used to evaluate how good NCE is. When we compare NCE to a page level algorithm that uses visual features, NCE shows better execution time and extraction quality.
|
138 |
An exploratory study of techniques in passive network telescope data analysisCowie, Bradley January 2013 (has links)
Careful examination of the composition and concentration of malicious traffic in transit on the channels of the Internet provides network administrators with a means of understanding and predicting damaging attacks directed towards their networks. This allows for action to be taken to mitigate the effect that these attacks have on the performance of their networks and the Internet as a whole by readying network defences and providing early warning to Internet users. One approach to malicious traffic monitoring that has garnered some success in recent times, as exhibited by the study of fast spreading Internet worms, involves analysing data obtained from network telescopes. While some research has considered using measures derived from network telescope datasets to study large scale network incidents such as Code-Red, SQLSlammer and Conficker, there is very little documented discussion on the merits and weaknesses of approaches to analyzing network telescope data. This thesis is an introductory study in network telescope analysis and aims to consider the variables associated with the data received by network telescopes and how these variables may be analysed. The core research of this thesis considers both novel and previously explored analysis techniques from the fields of security metrics, baseline analysis, statistical analysis and technical analysis as applied to analysing network telescope datasets. These techniques were evaluated as approaches to recognize unusual behaviour by observing the ability of these techniques to identify notable incidents in network telescope datasets
|
139 |
A multi-agent collaborative personalized web mining system model.Oosthuizen, Ockmer Louren 02 June 2008 (has links)
The Internet and world wide web (WWW) have in recent years, grown exponentially in size and in terms of the volume of information that is available on it. In order to effectively deal with the huge amount of information on the web, so called web search engines have been developed for the task of retrieving useful and relevant information for its users. Unfortunately, these web search engines have not kept pace with the boom growth and commercialization of the web. The main goal of this dissertation is the development of a model for a collaborative personalized meta-search agent (COPEMSA) system for the WWW. This model will enable the personalization of web search for users. Furthermore, the model aims to leverage on current search engines on the web as well as enable collaboration between users of the search system for the purposes of sharing useful resources between them. The model also employs the use of multiple intelligent agents and web content mining techniques. This enables the model to autonomously retrieve useful information for it’s user(s) and present this information in an effective manner. In order to achieve the above stated, the COPEMSA model employs the use of multiple intelligent agents. COPEMSA consists of five core components: a user agent, a query agent, a community agent, a content mining agent and a directed web spider. The user agent learns about the user in order to introduce personal preference into user queries. The query agent is a scaled down meta-search engine with the task of submitting the personalized queries it receives from the user agent to multiple search services on theWWW. The community agent enables the search system to communicate and leverage on the search experiences of a community of searchers. The content mining agent is responsible for analysis of the retrieved results from theWWWand the presentation of these results to the system user. Finally, a directed web spider is used by the content mining agent to retrieve the actual web pages it analyzes from the WWW. In this dissertation an additional model is also presented to deal with a specific problem all web spidering software must deal with namely content and link encapsulation. / Prof. E.M. Ehlers
|
140 |
A pauta e o fazer jornalístico no contexto dos dispositivos de busca e indexação baseados em palavras-chave: perspectivas de noticiabilidade dentro e fora do ambiente imersivo digitalAntunes, Mariana do Amaral 21 May 2014 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2016-01-19T13:24:41Z
No. of bitstreams: 1
marianadoamaralantunes.pdf: 3509048 bytes, checksum: a42bdfdd5833d65a14b442db72753a74 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-01-25T17:57:59Z (GMT) No. of bitstreams: 1
marianadoamaralantunes.pdf: 3509048 bytes, checksum: a42bdfdd5833d65a14b442db72753a74 (MD5) / Made available in DSpace on 2016-01-25T17:57:59Z (GMT). No. of bitstreams: 1
marianadoamaralantunes.pdf: 3509048 bytes, checksum: a42bdfdd5833d65a14b442db72753a74 (MD5)
Previous issue date: 2014-05-21 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Esta pesquisa tem como objetivo investigar a influência dos mecanismos de busca e demais
tecnologias de indexação das redes telemáticas na produção e seleção de conteúdo dentro do
ambiente imersivo digital – mais precisamente nos valores-notícia que fazem com que
determinado assunto se torne notícia na internet e, posteriormente, na TV. Para tal, analisa, de
forma empírica, palavras e termos-chave (keywords) mais buscados no Google e comentados
no Twitter com notícias coletadas no Portal G1 e no Jornal Nacional, buscando refletir sobre
as possíveis causas que levariam os assuntos comumente buscados e discutidos nessas
plataformas a servir como pauta ou não dos veículos online e tradicionais, de acordo com sua
relevância e apelo lúdico/social/político/ideológico. Dessa forma, a abordagem do trabalho
explorao impacto das tecnologias digitais baseadas em indexação no jornalismo, frente a um
meio que se encontra em constante evolução. / This research wanted to investigate the influence of search engines and other indexing
technologies of such networks in the production and selection of digital content within the
immersive environment - more specifically the news values that make certain issue becomes
news on the internet and later on TV. It analyzes, empirically, words and key terms
(keywords) most popular searches on Google and Twittered collected with the G1 news portal
and the National Journal of Globo TV, trying to reflect on the possible causes that would lead
commonly sought and discussed issues these platforms to serve as non-tariff or online and
traditional vehicles, according to their relevance and recreational / social / political /
ideological appeal. The approach to work explores the impact of digital technologies based
indexing in journalism, facing a medium that is constantly evolution.
|
Page generated in 0.052 seconds