Global ETD Search

41	Self-adapting parallel metric-space search engine for variable query loads Al Ruqeishi, Khalil January 2016 (has links) This research focuses on automatically adapting a search engine size in response to fluctuations in query workload. Deploying a search engine in an Infrastructure as a Service (IaaS) cloud facilitates allocating or deallocating computer resources to or from the engine. Our solution is to contribute an adaptive search engine that will repeatedly re-evaluate its load and, when appropriate, switch over to a dierent number of active processors. We focus on three aspects and break them out into three sub-problems as follows: Continually determining the Number of Processors (CNP), New Grouping Problem (NGP) and Regrouping Order Problem (ROP). CNP means that (in the light of the changes in the query workload in the search engine) there is a problem of determining the ideal number of processors p active at any given time to use in the search engine and we call this problem CNP. NGP happens when changes in the number of processors are determined and it must also be determined which groups of search data will be distributed across the processors. ROP is how to redistribute this data onto processors while keeping the engine responsive and while also minimising the switchover time and the incurred network load. We propose solutions for these sub-problems. For NGP we propose an algorithm for incrementally adjusting the index to t the varying number of virtual machines. For ROP we present an ecient method for redistributing data among processors while keeping the search engine responsive. Regarding the solution for CNP, we propose an algorithm determining the new size of the search engine by re-evaluating its load. We tested the solution performance using a custom-build prototype search engine deployed in the Amazon EC2 cloud. Our experiments show that when we compare our NGP solution with computing the index from scratch, the incremental algorithm speeds up the index computation 2{10 times while maintaining a similar search performance. The chosen redistribution method is 25% to 50% faster than other methods and reduces the network load around by 30%. For CNP we present a deterministic algorithm that shows a good ability to determine a new size of search engine. When combined, these algorithms give an adapting algorithm that is able to adjust the search engine size with a variable workload. 025.04
42	On the derivation of value from geospatial linked data Black, Jennifer January 2013 (has links) Linked Data (LD) is a set of best practices for publishing and connecting structured data on the web. LD and Linked Open Data (LOD) are often con ated to the point where there is an expectation that LD will be free and unrestricted. The current research looks at deriving commercial value from LD. When there is both free and paid for data available the issue arises of how users will react to a situation where two or more options are provided. The current research examines the factors that would affect choices made by users, and subsequently created prototypes for users to interact with, in order to understand how consumers reacted to each of the di�erent options. Our examination of commercial providers of LD uses Ordnance Survey (OS) (the UK national mapping agency) as a case study by studying their requirements for and experiences of publishing LD, and we further extrapolate from this by comparing the OS to other potential commercial publishers of LD. Our research looks at the business case for LD and introduces the concept of LOD and Linked Closed Data (LCD). We also determine that there are two types of LD users; non-commercial users and commercial users and as such, two types of use of LD; LD as a raw commodity and LD as an application. Our experiments aim to identify the issues users would find whereby LD is accessed via an application. Our first investigation brought together technical users and users of Geographic Information (GI). With the idea of LOD and LCD we asked users what factors would affect their view of data quality. We found 3 different types of buying behaviour on the web. We also found that context actively affected the users decision, i.e. users were willing to pay when the data was to make a professional decision but not for leisure use. To enable us to observe the behaviour of consumers whilst using data online, we built a working prototype of a LD application that would enable potential users of the system to experience the data and give us feedback about how they would behave in a LD environment. This was then extended into a second LD application to find if the same principles held true if actual capital was involved and they had to make a conscious decision regarding payment. With this in mind we proposed a potential architecture for the consumption of LD on the web. We determined potential issues which affect a consumers willingness to pay for data which surround quality factors. This supported our hypothesis that context affects a consumers willingness to pay and that willingness to pay is related to a requirement to reduce search times. We also found that a consumers perception of value and criticality of purpose also affected their willingness to pay. Finally we outlined an architecture to enable users to use LD where different scenarios may be involved which may have potential payment restrictions. This work is our contribution to the issue of the business case for LD on the web and is a starting point. 025.04 QA76 Computer software
43	Μελέτη συστημάτων Ψηφιακών Βιβλιοθηκών και διασύνδεσή τους με συστήματα Διαχείρισης Περιεχομένου. Ανάπτυξη πρότυπης εφαρμογής / A study of Digital Library Systems and Inteconnection with Content Management Systems. Παπαδημητρόπουλος, Πέτρος 21 November 2007 (has links) Η ψηφιοποίηση περιεχομένου είναι πλέον μια καλά καθορισμένη διαδικασία σε παγκόσμια κλίμακα και αποσκοπεί στη διατήρηση και προβολή της πολιτιστικής κληρονομιάς των λαών. Παρόλα αυτά, η διαχείριση του ψηφιοποιημένου περιεχομένου και των μεταδεδομένων που το συνοδεύουν δεν έχει ωριμάσει σε τέτοιο βαθμό ώστε να ισχύει ένα κοινά αποδεκτό πρότυπο. Το γεγονός αυτό, σε συνδυασμό με το ότι ο όγκος των δεδομένων αυξάνεται συνεχώς καθιστά επιτακτική την ανάγκη δημιουργίας συστημάτων οργάνωσης και διαχείρισης τους. Οι Ψηφιακές Βιβλιοθήκες είναι συστήματα λογισμικού, που αποθηκεύουν και οργανώνουν την πληροφορία ώστε να παρέχουν προηγμένες υπηρεσίες αναζήτησης και πλοήγησης στους χρήστες τους. Αποτελούν δε, ένα ιδιαίτερα αναπτυσσόμενο πεδίο έρευνας και ανάπτυξης που συνδυάζει επιστημονικά πεδία από διάφορους τομείς. Στην παρούσα διπλωματική εργασία μελετάμε συστήματα Ψηφιακών Βιβλιοθηκών ανοικτού κώδικα καθώς και τα πρότυπα που έχουν προταθεί για την διαλειτουργικότητα του περιεχομένου τους. Επιπλέον, σημειώνουμε τις αδυναμίες των σύγχρονων συστημάτων Ψηφιακών Βιβλιοθηκών όσον αφορά στην κάλυψη των αναγκών ενός Πολιτιστικού Οργανισμού και επιλέγοντας τα πλέον ενεργά και χρησιμοποιούμενα από τη διεθνή κοινότητα πρότυπα δημιουργούμε ένα σύστημα που διασυνδέει Ψηφιακές Βιβλιοθήκες με συστήματα Διαχείρισης Περιεχομένου. Τέλος, η διασύνδεση επιτυγχάνεται με χρήση της γλώσσας XML και εφαρμόζεται σε δύο αντιπροσωπευτικά συστήματα από την κάθε κατηγορία. / Digitization of content is a well known process worldwide that aims in preserving and promoting national cultural heritage. Nevertheless, the management of digital content and of accompanying metadata has not matured to a globally respected standard. This fact combined with the rapid growth of digital content increases the need for the existence of systems that organize and manage this data. Digital libraries are software systems that store and organize data in order to provide advanced search and browsing services to their users. They compose a rapid growing domain of research that combines multiple scientific fields. In the present dissertation we study open source Digital Library Systems and the standards proposed for the interoperability of their content. In addition, we notice their inabilities to cover certain needs of a cultural heritage organization and choose the most popular standards in use today in order to interconnect Digital Library with Content Management Systems. Ψηφιακές Βιβλιοθήκες 025.04 Digital Libraries
44	The theory of extended topic and its application in information retrieval Yin, Ling January 2012 (has links) This thesis analyses the structure of natural language queries to document repositories, with the aim of finding better methods for information retrieval. The exponential increase of information on the Web and in other large document repositories during recent decades motivates research on facilitating the process of finding relevant information to meet end users' information needs. A shared problem among several related research areas, such as information retrieval, text summarisation and question answering, is to derive concise textual expressions to describe what a document is about, to function as the bridge between queries and the document content. In current approaches, such textual expressions are typically generated by shallow features, for example, by simply selecting a few most-frequently- occurring key words. However, such approaches are inadequate to generate expressions that truly resemble user queries. The study of what a document is about is closely related to the widely discussed notion of topic, which is defined in many different ways in theoretical linguistics as well as in practical natural language processing research. We compare these different definitions and analyse how they differ from user queries. The main function of a query is that it defines which facts are relevant in some underlying knowledge base. We show that, to serve this purpose, queries are typically formulated by first (a) specifying a focused entity and then (b) defining a perspective from which the entity is approached. For example, in the query 'history of Britain', 'Britain' is the focused entity and 'history' is the perspective. Existing theories of topic often focus on (a) and leave out (b). We develop a theory of extended topic to formalise this distinction. We demonstrate the distinction in experiments with real life topic expressions, such as WH-questions and phrases describing plans of academic papers. The theory of extended topic could be applied to help various application areas, including knowledge organisation and generating titles, etc. We focus on applying the theory to the problem of information retrieval from a document repository. Currently typical information retrieval systems retrieve relevant documents to a query by counting numbers of key word matches between a document and the query. This approach is better suited to retrieving the focused entities than the perspectives. We aim to improve the performance of information retrieval by providing better support for perspectives. To do so, we further subdivide the perspectives into different types and present different approaches to addressing each type. We illustrate our approaches with three example perspectives: 'cause', 'procedure' and 'biography'. Experiments on retrieving causal, procedural and biographical questions achieve better results than the traditional key-word-matching-based approach. 025.04 G000 Computing and Mathematical Sciences
45	A semi-automated FAQ retrieval system for HIV/AIDS Thuma, Edwin January 2015 (has links) This thesis describes a semi-automated FAQ retrieval system that can be queried by users through short text messages on low-end mobile phones to provide answers on HIV/AIDS related queries. First we address the issue of result presentation on low-end mobile phones by proposing an iterative interaction retrieval strategy where the user engages with the FAQ retrieval system in the question answering process. At each iteration, the system returns only one question-answer pair to the user and the iterative process terminates after the user's information need has been satisfied. Since the proposed system is iterative, this thesis attempts to reduce the number of iterations (search length) between the users and the system so that users do not abandon the search process before their information need has been satisfied. Moreover, we conducted a user study to determine the number of iterations that users are willing to tolerate before abandoning the iterative search process. We subsequently used the bad abandonment statistics from this study to develop an evaluation measure for estimating the probability that any random user will be satisfied when using our FAQ retrieval system. In addition, we used a query log and its click-through data to address three main FAQ document collection deficiency problems in order to improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. Conclusions are derived concerning whether we can reduce the rate at which users abandon their search before their information need has been satisfied by using information from previous searches to: Address the term mismatch problem between the users' SMS queries and the relevant FAQ documents in the collection; to selectively rank the FAQ document according to how often they have been previously identified as relevant by users for a particular query term; and to identify those queries that do not have a relevant FAQ document in the collection. In particular, we proposed a novel template-based approach that uses queries from a query log for which the true relevant FAQ documents are known to enrich the FAQ documents with additional terms in order to alleviate the term mismatch problem. These terms are added as a separate field in a field-based model using two different proposed enrichment strategies, namely the Term Frequency and the Term Occurrence strategies. This thesis thoroughly investigates the effectiveness of the aforementioned FAQ document enrichment strategies using three different field-based models. Our findings suggest that we can improve the overall recall and the probability that any random user will be satisfied by enriching the FAQ documents with additional terms from queries in our query log. Moreover, our investigation suggests that it is important to use an FAQ document enrichment strategy that takes into consideration the number of times a term occurs in the query when enriching the FAQ documents. We subsequently show that our proposed enrichment approach for alleviating the term mismatch problem generalise well on other datasets. Through the evaluation of our proposed approach for selectively ranking the FAQ documents, we show that we can improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system by incorporating the click popularity score of a query term t on an FAQ document d into the scoring and ranking process. Our results generalised well on a new dataset. However, when we deploy the click popularity score of a query term t on an FAQ document d on an enriched FAQ document collection, we saw a decrease in the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. Furthermore, we used our query log to build a binary classifier for detecting those queries that do not have a relevant FAQ document in the collection (Missing Content Queries (MCQs))). Before building such a classifier, we empirically evaluated several feature sets in order to determine the best combination of features for building a model that yields the best classification accuracy in identifying the MCQs and the non-MCQs. Using a different dataset, we show that we can improve the overall retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system by deploying a MCQs detection subsystem in our FAQ retrieval system to filter out the MCQs. Finally, this thesis demonstrates that correcting spelling errors can help improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. We tested our FAQ retrieval system with two different testing sets, one containing the original SMS queries and the other containing the SMS queries which were manually corrected for spelling errors. Our results show a significant improvement in the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. 025.04
46	Implicit feedback for interactive information retrieval White, Ryen William January 2004 (has links) Searchers can find the construction of query statements for submission to Information Retrieval (IR) systems a problematic activity. These problems are confounded by uncertainty about the information they are searching for, or an unfamiliarity with the retrieval system being used or collection being searched. On the World Wide Web these problems are potentially more acute as searchers receive little or no training in how to search effectively. Relevance feedback (RF) techniques allow searchers to directly communicate what information is relevant and help them construct improved query statements. However, the techniques require explicit relevance assessments that intrude on searchers’ primary lines of activity and as such, searchers may be unwilling to provide this feedback. Implicit feedback systems are unobtrusive and make inferences of what is relevant based on searcher interaction. They gather information to better represent searcher needs whilst minimising the burden of explicitly reformulating queries or directly providing relevance information. In this thesis I investigate implicit feedback techniques for interactive information retrieval. The techniques proposed aim to increase the quality and quantity of searcher interaction and use this interaction to infer searcher interests. I develop search interfaces that use representations of the top-ranked retrieved documents such as sentences and summaries to encourage a deeper examination of search results and drive the information seeking process. Implicit feedback frameworks based on heuristic and probabilistic approaches are described. These frameworks use interaction to identify needs and estimate changes in these needs during a search. The evidence gathered is used to modify search queries and make new search decisions such as re-searching the document collection or restructuring already retrieved information. The term selection models from the frameworks and elsewhere are evaluated using a simulation-based evaluation methodology that allows different search scenarios to be modelled. Findings show that the probabilistic term selection model generated the most effective search queries and learned what was relevant in the shortest time. Different versions of an interface that implements the probabilistic framework are evaluated to test it with human subjects and investigate how much control they want over its decisions. The experiment involved 48 subjects with different skill levels and search experience. The results show that searchers are happy to delegate responsibility to RF systems for relevance assessment (through implicit feedback), but not more severe search decisions such as formulating queries or selecting retrieval strategies. Systems that help searchers make these decisions are preferred to those that act directly on their behalf or await searcher action. 025.04
47	Ο σημασιολογικός ιστός Κολλάρας, Νίκος 11 September 2008 (has links) Ο Παγκόσμιος Ιστός έχει αλλάξει τον τρόπο που οι άνθρωποι επικοινωνούν μεταξύ τους καθώς και τον τρόπο με τον οποίο οι επιχειρήσεις διευθύνονται. Ως επέκταση του παγκόσμιου ιστού εμφανίζεται ο Σημασιολογικός Ιστός. Ο Σημασιολογικός Ιστός είναι ειδικότερα, ένας Ιστός από πληροφορίες που είναι δυνατό να διαβαστούν από τις μηχανές και η έννοια των οποίων είναι σαφώς καθορισμένη από πρότυπα. Στόχος της διπλωματική αυτής είναι η επέκταση και η εμβάθυνση σε θεμελιώδεις έννοιες του Σημασιολογικού Ιστού και η σύνδεση του με τις ΤΠΕ στην εκπαίδευση. Στα παρακάτω κεφάλαια προσεγγίζεται αναλυτικά η έννοια του Σημασιολογικού Ιστού, καθώς επίσης και οι βασικές γλώσσες σήμανσης του οι οποίες είναι: 1. Η γλώσσα σήμανσης XML, μια γλώσσα που επιτρέπει τη γραφή δομημένων εγγράφων ιστού με ένα καθορισμένο από το χρήστη λεξιλόγιο. 2. Tο Resource Description Framework (RDF) που είναι ένα πλαίσιο για την αντιπροσώπευση των πληροφοριών στον Παγκόσμιο Ιστό, όπου αναλύονται οι βασικές έννοιες που χρησιμοποιεί όπως το γραφικό μοντέλο δεδομένων, λεξιλόγιο βασισμένο στο URI, τύποι δεδομένων κ.α. 3. Το λεξιλόγιο καθορισμού του RDF (RDF Schema), που καθορίζει τις κλάσεις και τις ιδιότητες που μπορούν να χρησιμοποιηθούν για να περιγράψουν κλάσεις, ιδιότητες και άλλες πηγές. Σε επόμενο κεφάλαιο εισάγουμε στον αναγνώστη τον όρο της οντολογίας. Αναλύουμε το τι είναι οντολογία, τις κύριες λειτουργίες των οντολογιών, και προτείνεται ένα περίγραμμα για την κατασκευή των οντολογιών. Ξεχωριστή αναφορά γίνεται στο Ontology Inference Layer OIL που είναι μια πρόταση για ένα βασισμένο στον Ιστό στρώμα αντιπροσώπευσης και συμπεράσματος των οντολογιών. Επίσης γίνεται αναφορά στη βασική ιδέα σχεδιασμού του OIL που είναι το onion model του. Ακολούθως, αναλύουμε τον τρόπο με τον οποίο συντάσσεται μια οντολογία στη γλώσσα OIL. Ως συνέχεια του παραπάνω εμφανίζεται η DAML+OIL, που είναι μια σημασιολογική γλώσσα για τις πηγές του Ιστού. Στηρίζεται στα πρότυπα του W3C όπως το RDF και το RDF Schema, και επεκτείνει αυτές τις γλώσσες με ακόμα πιο πλούσιες αρχές διαμόρφωσης. Στη συνέχεια προσεγγίζουμε εκτενώς την Web Ontology Language OWL που είναι μια σημασιολογική γλώσσα σήμανσης για την έκδοση και τη διανομή των οντολογιών στον Παγκόσμιο Ιστό. Η σύνδεση του σημασιολογικού ιστού με την εκπαίδευση είναι το θέμα που αναλύεται παρακάτω με τίτλο Εκπαιδευτικός Σημασιολογικός Ιστός και γίνεται εκτενής αναφορά στις προκλήσεις και τις προοπτικές που διαφαίνονται. Η σύνδεση αυτή γίνεται με το προσαρμοστικό WBES. Ακόμα, εξετάζουμε μερικά σενάρια εφαρμογής του εκπαιδευτικού Σημασιολογικού Ιστού όπως την βασισμένη στη σημασιολογία αναζήτηση για εκπαιδευτικό περιεχόμενο, την περιήγηση στη γνώση ή «προσωπικές πύλες», τα βασισμένα στη σημασιολογία μαθήματα και τις εκπαιδευτικές υπηρεσίες σημασιολογικού ιστού. Τέλος, αναφέρουμε με συντομία εφαρμογές εκπαίδευσης βασισμένες σε οντολογίες όπως το πρόγραμμα CIPHER, το πρόγραμμα Connexions κ.α. / - Οντολογίες Λογικές περιγραφής 025.04 Ontologies Description logics
48	Σημασιολογικός παγκόσμιος ιστός και τεχνικές εξατομίκευσης στις διαδικασίες αναζήτησης/διαπέρασης / Semantic web and personalization in searching and crawling Καϊτανίδης, Χρήστος 01 October 2008 (has links) Η συγκεκριμένη μεταπτυχιακή διπλωματική εργασία ασχολείται με την αλληλεπίδραση δύο παράλληλων διεργασιών στην προσπάθεια αξιοποίησης του Παγκόσμιου Ιστού (Web): (α) τη διεργασία μετεξέλιξης του Παγκόσμιου Ιστού στο σημασιολογικό Παγκόσμιο Ιστό, (β) τη διεργασία βελτίωσης των διαδικασιών διαπέρασης (crawling) και ψαξίματος (searching) στον Παγκόσμιο Ιστό. Στα πρώτα βήματα του Παγκόσμιου Ιστού το σημαντικότερο ίσως πρόβλημα για τους χρήστες που ήθελαν να αναζητήσουν πληροφορίες σε αυτό ήταν η έλλειψη πολλών και χρήσιμων πηγών. Σταδιακά, αλλά με ιδιαίτερα γρήγορους ρυθμούς ο Παγκόσμιος Ιστός μετατράπηκε σε μία από τις μεγαλύτερες πηγές πληροφοριών που χρησιμοποιεί ο άνθρωπος καθώς όλο και περισσότεροι εισάγουν δεδομένα για κάθε είδους δραστηριότητα και θέμα. Το πρόβλημα των χρηστών λοιπόν που αναζητούν πληροφορίες ανάχθηκε στη γρήγορη εξαγωγή των χρήσιμων, από τον τεράστιο όγκο των παρεχόμενων, πληροφοριών. Όροι και τεχνικές όπως Data Mining (Εξόρυξη Δεδομένων), Information Retrieval (Ανάκτηση Πληροφορίας), Knowledge Management (Διαχείριση Γνώσης) επεκτάθηκαν για να καλύψουν και το νεοεμφανιζόμενο μέσο. Επιπλέον, στην προσπάθεια για καλύτερη ποιότητα των παρεχόμενων αποτελεσμάτων στο χρήστη σημαντικό ρόλο διαδραμάτισε η εκμετάλλευση των ιδιαίτερων στοιχείων που μπορούν να εξαχθούν για τα ενδιαφέροντά του, τόσο στο στάδιο της διαπέρασης, όπου συγκεντρώνονται σελίδες συγκεκριμένης θεματολογίας (topic-focused crawling), όσο και στο στάδιο της αναζήτησης μέσα από αυτές των πιο σημαντικών για τον εκάστοτε χρήστη (personalization). Παράλληλα, καθώς ο Παγκόσμιος Ιστός σταδιακά μετεξελίσσεται στο Σημασιολογικό Παγκόσμιο Ιστό (Semantic Web) νέα μοντέλα και πρότυπα (XML, RDF, OWL) αναπτύσσονται για την προώθηση αυτής της διαδικασίας. Η έκφραση, μετάδοση και αναζήτηση πληροφοριών με χρήση αυτών των προτύπων ανοίγει νέους ορίζοντες στη χρήση του Διαδικτύου. Το βασικό αντικείμενο της εργασίας αυτής είναι η αξιοποίηση των παρεχόμενων μοντέλων και προτύπων του Σημασιολογικού Ιστού σε συνδυασμό με ήδη εφαρμοσμένες ιδέες και αλγορίθμους στον απλό Παγκόσμιο Ιστό ώστε να είναι εφικτή η ταχύτερη και ακριβέστερη ανάκτηση και επεξεργασία πληροφοριών. Δόθηκε επίσης προσπάθεια στην αξιοποίηση τεχνικών που εκμεταλλεύονται τις ιδιαίτερες προτιμήσεις κάθε χρήστη, και στη διερεύνηση της χρήσης των νέων μοντέλων και προτύπων του Σημασιολογικού Ιστού για την προώθηση της διαδικασίας αυτής. / The presented master thesis examines the interaction between two parallel tasks aiming to the better utilization of the World Wide Web: (a) the task of transforming the World Wide Web into Semantic Web, (b) the task of improving the results of crawling and searching methods on the Web. In the advent of the World Wide Web the most disconcerting problem for the users searching for information in the Web was the lack of useful and sufficient sources of information. Gradually, though in really fast pace, the World Wide Web transformed into the biggest storage of information that humans can use. More and more people contribute new data on the web about every aspect of their life, activity, job or interest. Eventually, users searching for information have to deal with another problem, quite the opposite than the one mentioned above. They need to find the information they are looking for through an enormous amount of data in the minimum amount of time spend in browsing. Terms and techniques such as Data Mining, Information Retrieval, Knowledge Management were extended to be applicable and to the newly presented media. Moreover, on the strive for better quality of the results returned to users, the utilization of user’s special interests that can be extracted played an important role both in the field of crawling, where pages of a certain subject are gathered (topic-focused crawling), and in the field of searching, where pages are valued according to each user’s needs (personalization). At the same time, while the World Wide Web gradually transforms into Semantic Web, new standards and models (XML, RDF, OWL) are evolving in order to launch this inquiry. The storage, presentation, transmission and search of information according to those standards open up new horizons in the utilization of the Web. The principal effort of this master thesis is the utilization of the newly provided models and standards of the Semantic Web in conjunction with already tested, positively evaluated and applicable ideas and algorithms of the World Wide Web, in order to achieve higher speed in retrieval and accuracy of information. Moreover, strong efforts were given in integrating techniques that take into account the special preferences of each user and in the exploration of the benefits that come from the adaptation of these new models of the Semantic Web. Σημασιολογικός Ιστός Διαπέραση 025.04 Semantic Web Crawling
49	An analysis of the barriers to UK small business web infrastructure development Boyes, James Alfred January 2006 (has links) This thesis analyses the Web infrastructure development process experienced by UK Small Businesses and considers the nature and impact of the barriers and problems that affect it. In doing so the thesis combines three previously disparate streams of research; research that considers the infrastructure development process, research that considers the benefits that become available via the use of an infrastructure and research that considers the barriers to benefit realisation. Analysis reveals that while the organisational advantages and benefits are well documented, Small Businesses routinely encounter problems to their realisation. Likewise, current developmental methodologies appear ill suited for use by Small Business. This thesis addresses those gaps within current knowledge and understanding. The study utilises a multiple case study research strategy. The research design utilises multiple data collection methods to triangulate the study data thereby corroborating the accuracy, veracity and parsimony of the study findings. The study findings reveal that the development process encompasses three stages, initial development, corrective development and long-term development. The findings also reveal that as the sophistication of an infrastructure is enhanced, increasingly sophisticated benefits become available. At the same time however, barriers to development will be encountered. Each can curtail benefit realisation or can block ongoing development entirely. Within the development process, the business's owner/manager is the driving force behind development and is motivated to undertake development because of the benefits that will bring to their organisation. The thesis makes a demonstrable contribution to knowledge because its combined analysis of three previously disparate streams of research is novel as is its depiction of a three stage Web infrastructure development process. Future work can build upon this study's findings by testing the theories developed within this thesis so that they can be generalised more widely. 025.04
50	Selective web information retrieval Plachouras, Vasileios January 2006 (has links) This thesis proposes selective Web information retrieval, a framework formulated in terms of statistical decision theory, with the aim to apply an appropriate retrieval approach on a per-query basis. The main component of the framework is a decision mechanism that selects an appropriate retrieval approach on a per-query basis. The selection of a particular retrieval approach is based on the outcome of an experiment, which is performed before the final ranking of the retrieved documents. The experiment is a process that extracts features from a sample of the set of retrieved documents. This thesis investigates three broad types of experiments. The first one counts the occurrences of query terms in the retrieved documents, indicating the extent to which the query topic is covered in the document collection. The second type of experiments considers information from the distribution of retrieved documents in larger aggregates of related Web documents, such as whole Web sites, or directories within Web sites. The third type of experiments estimates the usefulness of the hyperlink structure among a sample of the set of retrieved Web documents. The proposed experiments are evaluated in the context of both informational and navigational search tasks with an optimal Bayesian decision mechanism, where it is assumed that relevance information exists. This thesis further investigates the implications of applying selective Web information retrieval in an operational setting, where the tuning of a decision mechanism is based on limited existing relevance information and the information retrieval system’s input is a stream of queries related to mixed informational and navigational search tasks. First, the experiments are evaluated using different training and testing query sets, as well as a mixture of different types of queries. Second, query sampling is introduced, in order to approximate the queries that a retrieval system receives, and to tune an ad-hoc decision mechanism with a broad set of automatically sampled queries. 025.04

Search results