Spelling suggestions: "subject:"informationretrieval."" "subject:"informationsretrieval.""
171 |
Resource Discovery and Fair Intelligent Admission Control over Scalable InternetJanuary 2004 (has links)
The Internet currently supports a best-effort connectivity service. There has been an increasing demand for the Internet to support Quality of Service (QoS) to satisfy stringent service requirements from many emerging networking applications and yet to utilize the network resources efficiently. However, it has been found that even with augmented QoS architecture, the Internet cannot achieve the desired QoS and furthermore, there are concerns about the scalability of any available QoS solutions. If the network is not provisioned adequately, the Internet is not capable to handle congestion condition. This is because the Internet is unaware of its internal network QoS states therefore it is not possible to provide QoS when the network state changes dynamically. This thesis addresses the following question: Is it possible to deliver the applications with QoS in the Internet fairly and efficiently while keeping scalability? In this dissertation we answer this question affirmatively by proposing an innovative service architecture: the Resource Discovery (RD) and Fair Intelligent Admission Control (FIAC) over scalable Internet. The main contributions of this dissertation are as follows: 1. To detect the network QoS state, we propose the Resource Discovery (RD) framework to provide network QoS state dynamically. The Resource Discovery (RD) adopts feedback loop mechanism to collect the network QoS state and reports to the Fair Intelligent Admission Control module, so that FIAC is capable to take resource control efficiently and fairly. 2. To facilitate network resource management and flow admission control, two scalable Fair Intelligent Admission Control architectures are designed and analyzed on two levels: per-class level and per-flow level. Per-class FIAC handles the aggregate admission control for certain pre-defined aggregate. Per-flow FIAC handles the flow admission control in terms of fairness within the class. 3. To further improve its scalability, the Edge-Aware Resource Discovery and Fair Intelligent Admission Control is proposed which does not need the core routers involvement. We devise and analyze implementation of the proposed solutions and demonstrate the effectiveness of the approach. For the Resource Discovery, two closed-loop feedback solutions are designed and investigated. The first one is a core-aware solution which is based on the direct QoS state information. To further improve its scalability, the edge-aware solution is designed where only the edges (not core)are involved in the feedback QoS state estimation. For admission control, FIAC module bridges the gap between 'external' traffic requirements and the 'internal' network ability. By utilizing the QoS state information from RD, FIAC intelligently allocate resources via per-class admission control and per-flow fairness control. We study the performance and robustness of RD-FIAC through extensive simulations. Our results show that RD can obtain the internal network QoS state and FIAC can adjust resource allocation efficiently and fairly.
|
172 |
Explorations In Searching Compressed Nucleic Acid And Protein Sequence Databases And Their Cooperatively-Compressed IndicesGardner-Stephen, Paul Mark, paul.gardner-stephen@flinders.edu.au January 2008 (has links)
Nucleic acid and protein databases such as GenBank are growing at a rate that perhaps eclipses even Moores Law of increase in computational power. This poses a problem for the biological sciences, which have become increasingly dependant on searching and manipulating these databases. It was once reasonably practical to perform exhaustive searches of these databases, for example using the algorithm described by Smith and Waterman, however it has been many years since this was the case. This has led to the development of a series of search algorithms, such as FASTA, BLAST and BLAT, that are each successively faster, but at similarly successive costs in terms of thoroughness.
Attempts have been made to remedy this problem by devising search algorithms that are both fast and thorough. An example is CAFE, which seeks to construct a search system with a sub-linear relationship between search time and database size, and argues that this property must be present for any search system to be successful in the long term.
This dissertation explores this notion by seeking to construct a search system that takes advantage of the growing redundancy in databases such as GenBank in order to reduce both the search time and the space required to store the databases and their indices, while preserving or increasing the thoroughness of the search.
The result is the creation and implementation of new genomic sequence search and alignment,
database compression, and index compression algorithms and systems that make progress toward resolving the problem of reducing search speed and space requirements while improving sensitivity. However, success is tempered by the need for databases with adequate local redundancy, and the computational cost of these algorithms when servicing un-batched queries.
|
173 |
Effective retrieval techniques for Arabic textNwesri, Abdusalam F Ahmad, nwesri@yahoo.com January 2008 (has links)
Arabic is a major international language, spoken in more than 23 countries, and the lingua franca of the Islamic world. The number of Arabic-speaking Internet users has grown over nine-fold in the Middle East between the year 2000 and 2007, yet research in Arabic Information Retrieval (AIR) has not advanced as in other languages such as English. In this thesis, we explore techniques that improve the performance of AIR systems. Stemming is considered one of the most important factors to improve retrieval effectiveness of AIR systems. Most current stemmers remove affixes without checking whether the removed letters are actually affixes. We propose lexicon-based improvements to light stemming that distinguish core letters from proper Arabic affixes. We devise rules to stem most affixes and show their effects on retrieval effectiveness. Using the TREC 2001 test collection, we show that applying relevance feedback with our rules produces significantly better results than light stemming. Techniques for Arabic information retrieval have been studied in depth on clean collections of newswire dispatches. However, the effectiveness of such techniques is not known on other noisy collections in which text is generated using automatic speech recognition (ASR) systems and queries are generated using machine translations (MT). Using noisy collections, we show that normalisation, stopping and light stemming improve results as in normal text collections but that n-grams and root stemming decrease performance. Most recent AIR research has been undertaken using collections that are far smaller than the collections used for English text retrieval; consequently, the significance of some published results is debatable. Using the LDC Arabic GigaWord collection that contains more than 1 500 000 documents, we create a test collection of~90 topics with their relevance judgements. Using this test collection, we show empirically that for a large collection, root stemming is not competitive. Of the approaches we have studied, lexicon-based stemming approaches perform better than light stemming approaches alone. Arabic text commonly includes foreign words transliterated into Arabic characters. Several transliterated forms may be in common use for a single foreign word, but users rarely use more than one variant during search tasks. We test the effectiveness of lexicons, Arabic patterns, and n-grams in distinguishing foreign words from native Arabic words. We introduce rules that help filter foreign words and improve the n-gram approach used in language identification. Our combined n-grams and lexicon approach successfully identifies 80% of all foreign words with a precision of 93%. To find variants of a specific foreign word, we apply phonetic and string similarity techniques and introduce novel algorithms to normalise them in Arabic text. We modify phonetic techniques used for English to suit the Arabic language, and compare several techniques to determine their effectiveness in finding foreign word variants. We show that our algorithms significantly improve recall. We also show that expanding queries using variants identified by our Soutex4 phonetic algorithm results in a significant improvement in precision and recall. Together, the approaches described in this thesis represent an important step towards realising highly effective retrieval of Arabic text.
|
174 |
Document management and retrieval for specialised domains: an evolutionary user-based approachKim, Mihye, Computer Science & Engineering, Faculty of Engineering, UNSW January 2003 (has links)
Browsing marked-up documents by traversing hyperlinks has become probably the most important means by which documents are accessed, both via the World Wide Web (WWW) and organisational Intranets. However, there is a pressing demand for document management and retrieval systems to deal appropriately with the massive number of documents available. There are two classes of solution: general search engines, whether for the WWW or an Intranet, which make little use of specific domain knowledge or hand-crafted specialised systems which are costly to build and maintain. The aim of this thesis was to develop a document management and retrieval system suitable for small communities as well as individuals in specialised domains on the Web. The aim was to allow users to easily create and maintain their own organisation of documents while ensuring continual improvement in the retrieval performance of the system as it evolves. The system developed is based on the free annotation of documents by users and is browsed using the concept lattice of Formal Concept Analysis (FCA). A number of annotation support tools were developed to aid the annotation process so that a suitable system evolved. Experiments were conducted in using the system to assist in finding staff and student home pages at the School of Computer Science and Engineering, University of New South Wales. Results indicated that the annotation tools provided a good level of assistance so that documents were easily organised and a lattice-based browsing structure that evolves in an ad hoc fashion provided good efficiency in retrieval performance. An interesting result suggested that although an established external taxonomy can be useful in proposing annotation terms, users appear to be very selective in their use of terms proposed. Results also supported the hypothesis that the concept lattice of FCA helped take users beyond a narrow search to find other useful documents. In general, lattice-based browsing was considered as a more helpful method than Boolean queries or hierarchical browsing for searching a specialised domain. We conclude that the concept lattice of Formal Concept Analysis, supported by annotation techniques is a useful way of supporting the flexible open management of documents required by individuals, small communities and in specialised domains. It seems likely that this approach can be readily integrated with other developments such as further improvements in search engines and the use of semantically marked-up documents, and provide a unique advantage in supporting autonomous management of documents by individuals and groups - in a way that is closely aligned with the autonomy of the WWW.
|
175 |
Change management and synchronization of local and shared versions of a controlled vocabularyOliver, Diane Elizabeth. January 1900 (has links)
Thesis (Ph.D)--Stanford University, 2000. / Title from pdf t.p. (viewed April 3, 2002). "August 2000." "Adminitrivia V1/Prg/20000831"--Metadata.
|
176 |
Multimedia Data Mining and Retrieval for Multimedia Databases Using Associations and CorrelationsLin, Lin 23 June 2010 (has links)
With the explosion in the complexity and amount of pervasive multimedia data, there are high demands of multimedia services and applications in various areas for people to easily access and distribute multimedia data. Facing with abundance multimedia resources but inefficient and rather old-fashioned keyword-based information retrieval approaches, a content-based multimedia information retrieval (CBMIR) system is required to (i) reduce the dimension space for storage saving and computation reduction; (ii) advance multimedia learning methods to accurately identify target semantics for bridging the semantics between low-level/mid-level features and high-level semantics; and (iii) effectively search media content for dynamical media delivery and enable the extensive applications to be media-type driven. This research mainly focuses on multimedia data mining and retrieval system for multimedia databases by addressing some main challenges, such as data imbalance, data quality, semantic gap, user subjectivity and searching issues. Therefore, a novel CBMIR system is proposed in this dissertation. The proposed system utilizes both association rule mining (ARM) technique and multiple correspondence analysis (MCA) technique by taking into account both pattern discovery and statistical analysis. First, media content is represented by the global and local low-level and mid-level features and stored in the multimedia database. Second, a data filtering component is proposed in the system to improve the data quality and reduce the data imbalance. To be specific, the proposed filtering step is able to vertically select features and horizontally prune instances in multimedia databases. Third, a new learning and classification method mining weighted association rules is proposed in the retrieval system. The MCA-based correlation is used to generate and select the weighted N-feature-value pair rules, where the N varies from one to many. Forth, a ranking method independent of classifiers is proposed in the system to sort the retrieved results and put the most interesting ones on the top of the browsing list. Finally, a user interface is implemented in CBMIR system that allows the user to choose his/her interested concept, searches media based on the target concept, ranks the retrieved segments using the proposed ranking algorithm, and then displays the top-ranked segments to the user. The system is experimented with various high-level semantics from TRECVID benchmark data sets. TRECVID sound and vision data is a large data set, includes various types of videos, and has very rich semantics. Overall, the proposed system achieves promising results in comparison with the other well-known methods. Moreover, experiments that compare each component with some other famous algorithms are conducted. The experimental results show that all proposed components improve the functionalities of the CBMIR system, and the proposed system reaches effectiveness, robustness and efficiency for a high-dimensional multimedia database.
|
177 |
Top-k aggregation of ranked inputsCheng, Kit-hung. January 2005 (has links)
Thesis (M. Phil.)--University of Hong Kong, 2005. / Title proper from title frame. Also available in printed format.
|
178 |
Novelty and Diversity in Retrieval EvaluationKolla, Maheedhar 21 December 2012 (has links)
Queries submitted to search engines rarely provide a complete and precise
description of a user's information need.
Most queries are ambiguous to some extent, having multiple interpretations.
For example, the seemingly unambiguous query ``tennis lessons'' might be submitted
by a user interested in attending classes in her neighborhood, seeking lessons
for her child, looking for online videos lessons, or planning to start a business
teaching tennis.
Search engines face the challenging task of satisfying different groups of users
having diverse information needs associated with a given query.
One solution is to optimize ranking functions to satisfy diverse sets of information
needs.
Unfortunately, existing evaluation frameworks do not support such optimization.
Instead, ranking functions are rewarded for satisfying the most likely intent
associated with a given query.
In this thesis, we propose a framework and associated evaluation metrics that are
capable of optimizing ranking functions to satisfy diverse information needs.
Our proposed measures explicitly reward those ranking functions capable of presenting
the user with information that is novel with respect to previously viewed
documents.
Our measures reflects quality of a ranking function by taking into account its
ability to satisfy diverse users submitting a query.
Moreover, the task of identifying and establishing test frameworks to compare
ranking functions on a web-scale can be tedious.
One reason for this problem is the dynamic nature of the web, where documents
are constantly added and updated, making it necessary for search engine developers
to seek additional human assessments.
Along with issues of novelty and diversity, we explore one approximate
approach to compare different ranking functions by overcoming the problem of
lacking complete human assessments.
We demonstrate that our approach is capable of accurately sorting ranking
functions based on their capability of satisfying diverse users, even in the
face of incomplete human assessments.
|
179 |
A Geometric Approach to Pattern Matching in Polyphonic MusicTanur, Luke January 2005 (has links)
The music pattern matching problem involves finding matches of a small fragment of music called the "pattern" into a larger body of music called the "score". We represent music as a series of horizontal line segments in the plane, and reformulate the problem as finding the best translation of a small set of horizontal line segments into a larger set of horizontal line segments. We present an efficient algorithm that can handle general weight models that measure the musical quality of a match of the pattern into the score, allowing for approximate pattern matching.
We give an algorithm with running time <em>O</em>(<em>nm</em>(<em>d</em> + log <em>m</em>)), where <em>n</em> is the size of the score, <em>m</em> is the size of the pattern, and <em>d</em> is the size of the discrete set of musical pitches used. Our algorithm compares favourably to previous approaches to the music pattern matching problem. We also demonstrate that this geometric formulation of the music pattern matching problem is unlikely to have a significantly faster algorithm since it is at least as hard as 3SUM, a basic problem that is conjectured to have no subquadratic algorithm. Lastly, we present experiments to show how our algorithm can find musically sensible variations of a theme, as well as polyphonic musical patterns in a polyphonic score.
|
180 |
Learning Automatic Question Answering from Community DataWang, Di 21 August 2012 (has links)
Although traditional search engines can retrieval thousands or millions of web links related to input keywords, users still need to manually locate answers to their information needs from multiple returned documents or initiate further searches. Question Answering (QA) is an effective paradigm to address this problem, which automatically finds one or more accurate and concise answers to natural language questions. Existing QA systems often rely on off-the-shelf Natural Language Processing (NLP) resources and tools that are not optimized for the QA task. Additionally, they tend to require hand-crafted rules to extract properties from input questions which, in turn, means that it would be time and manpower consuming to build comprehensive QA systems. In this thesis, we study the potentials of using the Community Question Answering (cQA) archives as a central building block of QA systems. To that end, this thesis proposes two cQA-based query expansion and structured query generation approaches, one employed in Text-based QA and the other in Ontology-based QA. In addition, based on above structured query generation method, an end-to-end open-domain Ontology-based QA is developed and evaluated on a standard factoid QA benchmark.
|
Page generated in 0.0787 seconds