81 |
Distributed NetFlow Processing Using the Map-Reduce ModelMorken, Jan Tore January 2010 (has links)
<p>We study the viability of using the map-reduce model and frameworks for NetFlow data processing. The map-reduce model is an approach to distributed processing that simplifies implementation work, and it can also help in adding fault tolerance to large processing jobs. We design and implement two prototypes of a NetFlow processing tool. One prototype is based on a design where we freely choose an approach that we consider optimal with regard to performance. This prototype functions as a reference design. The other prototype is based on and makes use of the supporting features of a map-reduce framework. The performance of both prototypes is benchmarked, and we evaluate the performance of the framework based prototype against the reference design. Based on the benchmarks we analyse and comment the differences in performance, and make a conclusion about the suitability of the map-reduce model and frameworks for the problem at hand.</p>
|
82 |
Digital Home : An architecture for easy administration and updates of servicesBjerkhaug, Andreas Wigmostad, Ellingbø, Øystein January 2006 (has links)
<p>In the last years digital home solutions have made their entry and are are now becoming mainstream. From a service provider's point of view, this creates an interesting opportunity. Today, if a service provider wishes to change the services offered, the flash memory of the customers' Set-Top Boxes must be updated or the Set-Top Boxes must perhaps be replaced for larger updates. Replacing the Set-Top Box is both costly and time consuming. As digital home solutions become more normal, it is possible to use such a system to offer services. This way adding, removing or updating a service can be done in software only. Our proposed architecture does this, and with minimal involvment of the customers. The architecture offers a generic interface, providing for third party development. As the world gets more and more digitalized, the expectation of everything to be availabe from everywere gets more common. Our architecture lets a service provider offer their customers remote access to, and control of, their digital home. We have based our system on Microsoft Media Center Edition. This was chosen after first studying the concept of an ideal digital home and then researching which existing digital home solution would bring us closest to this ideal situation.</p>
|
83 |
Derby: Replication and AvailabilitySørensen, Egil January 2007 (has links)
<p>This paper describes the work done to add hot standby replication functionality to the Apache Derby Database Management System. The Apache Derby project is a relational database implemented entirely in Java. Its key advantages are that it has a small footprint and it is based on the standard Java JDBC and SQL standards. It is also easy to install, deploy and use as well as it can be embedded in almost any light-weight Java application. By implementing a hot standby scheme in Apache Derby several features are added. The contents of the database is replicated at run time to another site providing online runtime backup. As the hot standby takes over on faults availability is added in that a client can connect to the hot standby after a crash. Thus the crash is masked from the clients. In addition to this, online upgrades of software and hardware can be done by taking down one database at the time. Then when the upgrade is completed the upgraded server is synchronized and back online with no downtime. A fully functional prototype of the Apache Derby hot standby scheme has been created in this project using logical logs, fail-fast takeovers and logical catchups after an internal up-to-crash recovery and reconnection. This project builds on the ideas that are presented in Derby: Write to Neighbor Mode.</p>
|
84 |
Apache Derby SMP scalability : Investigating limitations and opportunities for improvementMorken, Anders, Pahr, Per Ottar Ribe January 2007 (has links)
<p>This report investigates the B-Tree access method of Apache Derby. Apache Derby is an open source Java database system. The detailed focus of the report is on performance aspects of the Derby page latch implementation. Our focal point is the interaction between the B-Tree access method and page latching, and the impact of these components on the ability of Derby to scale on multiprocessor systems. Derby uses simple and in the single-threaded case inexpensive exclusive-only page latches. We investigate the impact on scalability of this design, and contrast it with a version of Derby modified to support both shared read-only and exclusive page access for lookups in index structures. This evaluation is made for single-threaded as well as multi-threaded scenarios on multiprocessing systems. Based on analyses of benchmark results and profiler traces, we then suggest how Derby may be able to utilize modern Java locking primitives to improve multiprocessor scalability.</p>
|
85 |
Temporal Text Mining : The TTM TestbenchFivelstad, Ole Kristian January 2007 (has links)
<p>This master thesis presents the Temporal Text Mining(TTM) Testbench, an application for discovering association rules in temporal document collections. It is a continuation of work done in a project the fall of 2005 and work done in a project the fall of 2006. These projects have laid the foundation for this thesis. The focus of the work is on identifying and extracting meaningful terms from textual documents to improve the meaningfulness of the mined association rules. Much work has been done to compile the theoretical foundation of this project. This foundation has been used for assessing different approaches for finding meaningful and descriptive terms. The old TTM Testbench has been extended to include usage of WordNet, and operations for finding collocations, performing word sense disambiguation, and for extracting higher-level concepts and categories from the individual documents. A method for rating association rules based on the semantic similarity of the terms present in the rules has also been implemented. This was done in an attempt to narrow down the result set, and filter out rules which are not likely to be interesting. Experiments performed with the improved application shows that the usage of WordNet and the new operations can help increase the meaningfulness of the rules. One factor which plays a big part in this, is that synonyms of words are added to make the term more understandable. However, the experiments showed that it was difficult to decide if a rule was interesting or not, this made it impossible to draw any conclusions regarding the suitability of semantic similarity for finding interesting rules. All work on the TTM Testbench so far has focused on finding association rules in web newspapers. It may however be useful to perform experiments in a more limited domain, for example medicine, where the interestingness of a rule may be more easily decided.</p>
|
86 |
Combining Audio FingerprintsLarsen, Vegard Andreas January 2008 (has links)
<p>Large music collections are now more common than ever before. Yet, search technology for music is still in its infancy. Audio fingerprinting is one method that allows searching for music. In this thesis several audio fingerprinting solutions are combined into a single solution to determine if such a combination can yield better results than any of the solutions can separately. The solution is used to find duplicate music files in a personal collection. The results show that applying the weighted root-mean square (WRMS) to the problem most effectively ranked the results in a satisfying manner. It was notably better than the other approaches tried. The WRMS produced 61% more correct matches than the original FDMF solution, and 49% more correct matches than libFooID.</p>
|
87 |
Storing and Querying RDF in MarsBang, Ole Petter, Fjeldskår, Tormod January 2009 (has links)
<p>As part of the Semantic Web movement, the Resource Description Framework (RDF) is gaining momentum as a format for storing data, particularly metadata. The SPARQL Protocol and RDF Query Language is a SQL-like query language, recommended by W3C for querying RDF data. FAST is exploring the possibilities of supporting storage and querying of RDF data in their Mars search engine. To facilitate this, a SPARQL parser has been created for the Microsoft .NET Framework, using the MPLex and MPPG tools from Microsoft's Managed Babel package. This thesis proposes a solution for efficiently storing and retrieving RDF data in Mars, based on decomposition and B+ Tree indexing. Further, a method for transforming SPARQL queries into Mars operator graphs is described. Finally, the implementation of a prototype implementation is discussed. The prototype has been developed in collaboration with FAST and has required customized indexing in Mars. Some deviations from the proposed solution were made in order to create a working prototype within the available time frame. The focus has been on exploring possibilities, and performance has thus not been a priority, neither in indexing nor in evaluation.</p>
|
88 |
Feature Selection for Text CategorisationGarnes, Øystein Løhre January 2009 (has links)
<p>Text categorization is the task of discovering the category or class text documents belongs to, or in other words spotting the correct topic for text documents. While there today exists many machine learning schemes for building automatic classifiers, these are typically resource demanding and do not always achieve the best results when given the whole contents of the documents. A popular solution to these problems is called feature selection. The features (e.g. terms) in a document collection are given weights based on a simple scheme, and then ranked by these weights. Next, each document is represented using only the top ranked features, typically only a few percent of the features. The classifier is then built in considerably less time, and might even improve accuracy. In situations where the documents can belong to one of a series of categories, one can either build a multi-class classifier and use one feature set for all categories, or one can split the problem into a series of binary categorization tasks (deciding if documents belong to a category or not) and create one ranked feature subset for each category/classifier. Many feature selection metrics have been suggested over the last decades, including supervised methods that make use of a manually pre-categorized set of training documents, and unsupervised methods that need only training documents of the same type or collection that is to be categorized. While many of these look promising, there has been a lack of large-scale comparison experiments. Also, several methods have been proposed the last two years. Moreover, most evaluations are conducted on a set of binary tasks instead of a multi-class task as this often gives better results, although multi-class categorization with a joint feature set often is used in operational environments. In this report, we present results from the comparison of 16 feature selection methods (in addition to random selection) using various feature set sizes. Of these, 5 were unsupervised , and 11 were supervised. All methods are tested on both a Naive Bayes (NB) classifier and a Support Vector Machine (SVM) classifier. We conducted multi-class experiments using a collection with 20 non-overlapping categories, and each feature selection method produced feature sets common for all the categories. We also combined feature selection methods and evaluated their joint efforts. We found that the classical supervised methods had the best performance, including Chi Square, Information Gain and Mutual Information. The Chi Square variant GSS coefficient was also among the top performers. Odds Ratio showed excellent performance for NB, but not for SVM. The three unsupervised methods Collection Frequency, Collection Frequency Inverse Document Frequency and Term Frequency Document Frequency all showed performances close to the best group. The Bi-Normal Separation metric produced excellent results for the smallest feature subsets. The weirdness factor performed several times better than random selection, but was not among the top performing group. Some combination experiments achieved better results than each method alone, but the majority did not. The top performers Chi square and GSS coefficient classified more documents when used together than alone.Four of the five combinations that showed increase in performance included the BNS metric.</p>
|
89 |
Semantic Cache Investment : Adaption of Cache Investment for DASCOSABeiske, Konrad Giæver, Bjørndalen, Jan January 2009 (has links)
<p>Semantic cache and distribution introduce new obstacles to how we use cache in query processing in databases. We have adapted a caching strategy called cache investment to work in a peer-to-peer database with semantic cache. Cache investment is a technique that influences the query optimizer without changing it. It suggests cache candidates based on knowledge about queries executed in the past. These queries are not only limited to the local site, but also detects locality in queries by looking at queries processed on remote sites. Our implementation of Semantic cache investment for distributed databases shows a great performance improvement, especially when multiple queries are active at the same time. To utilize cache investment we have looked into how a distributed query optimizer can be extended to use cache content in planning. This allows the query optimizer to detect and include beneficial cache content on remote sites that it otherwise would have ignored. Our implementation of a cache-aware optimizer shows an improvement in performance, but its most important task is to evaluate cache candidates provided through cache investment.</p>
|
90 |
Prototyping a location aware application for UBiT. : A map-based application, designed, implemented and evaluated.Olsen, Bjarne Sletten January 2009 (has links)
<p>Through the research performed in this thesis, it has been shown how location awareness and maps can be exploited to facilitate the use of library resources, such as information on documents and objects. A prototype has been developed to demonstrate the feasibility of integrating several different information sources for this use. The prototype created allows for users located within the city centre of Trondheim to search for documents and to locate the library departments holding them. The user is shown a map and given information on how to travel to the nearest bus stop, as well as bus schedules on how to get to the selected library department. Several information sources for the prototype has been identified and evaluated. The prototype communicates with BIBSYS for document information retrieval, Google Maps for map generation, team-trafikk.no for bus schedules querying and Amazon.com and LibraryThing.com for book cover image downloading. To ensure data consistency some local data sources are also maintained, such as a list of all the UBiT (NTNU library) departments in Trondheim. The prototype was implemented so that it would satisfy a set of requirements. These requirements were created by applying the technique of use cases. Each requirement has been discussed and prioritised based on requests from UBiT. The most important requirements have been incorporated into the design of the prototype. This focuses on modularity and it has been discussed how the external sources best can be integrated with the prototype. The prototype is implemented using a combination of programming languages. The differences between these languages have posed a challenge, and solutions to how these can be avoided are presented. The prototype has been tested according to an extensive test plan, and the results of these tests have been document and evaluated. Each of the design decisions have been evaluated and discussed, and suggestions on how these could have been improved are given. Finally, suggestions on how the functionality of the prototype can be extended are presented. The prototype created in this thesis allows for users, familiar or unfamiliar with the city and its transportation network, to locate a document and travel to the library holding it. It demonstrates how emerging technologies such as location awareness can contribute to increased use of library services.</p>
|
Page generated in 0.1152 seconds