Global ETD Search

1	The Implementation and Applications of Multi-pattern Matching Algorithm over General Purpose GPU Cheng, Yan-Hui 08 July 2011 (has links) With the current technology more and more developed, in our daily life, whether doing research or work, we often use a variety of computer equipment to help us deal with some of our frequently used data. And the type and quantity of data have become more and more, such as satellite imaging data, genetic engineering, the global climate forecasting data, and complex event processing, etc. Some certain types of the data require both accuracy and timeliness. That is, we hope to look for some data in a shorter time. According to MIT Technology Review in August 2010 reported that the relevant published, complex event processing becomes a new research, and it also includes in the part of data search. Data search often means data comparing. Given specified keywords or key information which we are looking for, we design a pattern matching algorithm to find the results within a shorter time, or even real-time. In our research, the purpose is to use the general-purpose GPU, NVIDIA Tesla C2050, with parallel computing architecture to implement parallelism of the pattern matching. Finally, we construct a service to handle a large number of real-time data. We also make some performance tests and compare the results with the well-known software ¡§Apache Solr¡¨ to find the differences and the possible application in the future. real-time GPU parallel compute Solr pattern matching
2	A Visualization Dashboard for Muslim Social Movements January 2012 (has links) abstract: Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is ineffective in containing radicalism as a whole. There is a need to understand the origin, ideologies and behavior of Radical and Counter-Radical organizations and how they shape up over a period of time. Recognizing and supporting counter-radical organizations is one of the most important steps towards impeding radical organizations. A lot of research has already been done to categorize and recognize organizations, to understand their behavior, their interactions with other organizations, their target demographics and the area of influence. We have a huge amount of information which is a result of the research done over these topics. This thesis provides a powerful and interactive way to navigate through all this information, using a Visualization Dashboard. The dashboard makes it easier for Social Scientists, Policy Analysts, Military and other personnel to visualize an organization's propensity towards violence and radicalism. It also tracks the peaking religious, political and socio-economic markers, their target demographics and locations. A powerful search interface with parametric search helps in narrowing down to specific scenarios and view the corresponding information related to the organizations. This tool helps to identify moderate Counter-Radical organizations and also has the potential of predicting the orientation of various organizations based on the current information. / Dissertation/Thesis / M.S. Computer Science 2012 Computer science Data Mining Sentiment Analysis SOLR Visualization Web Mining
3	Improve and optimize search engine : To provide better and relevant content for the customer Ramsell, Daniel January 2019 (has links) This report has conducted a research of comparing a few open source search engines. The research contains two evaluation processes, the first evaluation will evaluate each open source search engine found on today’s market. Points will be given between one to five points depending on how well the open source search engine meets the requirements. The open source search engine with the highest score will then be chosen for implementation. The first evaluation resulted in Elasticsearch being the selected open source search engine and will continue to the implementation phase. The second evaluation will be measuring the system performance and the relevance of the SERP (Search Engine Results Pages). This phase will evaluate the system performance by taking time measurements on how long it takes for the search engines to deliver the SERP. The relevance of the search results will be judge by a group of CSN employers. The group will be giving point be-tween one to five points depending on the relevance of the SERP. It will eval-uate Elasticsearch with the search engine CSN are using today on their web-site (www.csn.se). This phase resulted in Elasticsearch being the better in performance measurements but not in the relevance of the SERP. This was discussed and came to the conclusion that most points were lost because of the first search result Elasticsearch delivered. If this search result was re-moved Elasticsearch could deliver as good results as the old search engine. The survey came to the conclusion that Elasticsearch is recommended for CSN if certain problem areas could be corrected before implementation into their systems. Open Source Search Engine Apache Lucene Core Elasticsearch Sphinx Xapian Solr SiteVision Computer Systems Datorsystem
4	Development of enhanced multiport network analyzer calibrations using non-ideal standards Daniel, John Edward 01 June 2005 (has links) An Improved Short-Open-Load-Reciprocal (SOLR) Vector Network Analyzer (VNA) calibration is developed and validated. Through the use of a more complex load model the usable frequency range of the SOLR calibration algorithm is expanded. Comparisons are made between this new calibration and existing calibration techniques that are known to be accurate at high frequencies. The Anritsu 37xxx Lightning series 65GHz VNA is used as the principle measurement tool for calibration comparison and verification. This work is built off of previous work done at USF in which it is shown that the Short-Open-Load-Thru (SOLT) calibrations accuracy improves through the implementation of more complex load and thru models. One of the most significant advantages of the SOLR calibration algorithm is that it does not require an ideal well behaved thru standard. This is extremely useful in multiport probing environments where it is often necessary for speed and space conservation purposes to use loopback thrus or other non-ideal transmission structures during calibration. Multiport test equipment and measurement techniques are highlighted and discussed. A general n-port expansion of a two-port calibration algorithm is presented and used to adapt the improved two-port SOLR algorithm to a four-port calibration. In doing so a theoretical development that addresses error model treatment, and switch term corrections is presented that includes an improved set of the redundancy equations that enable the multiport SOLR algorithm. The algorithm uses a four-port SOL calibration at each port and then determines the remaining error terms by measuring a minimal set of reciprocal passive standards. The four-port SOLR algorithm developed was illustrated through the use of a four-port test set that consists of a two-port VNA input multiplexed to four-ports through an RF switch array. Verification of the four-port SOLR calibration is made by comparing it to available four-port calibration techniques using available on-wafer test structures. As another promising advance of the work the possibility of using of a multiport reciprocal standard is shown to have potential for reducing the number of standard connections needed to accomplish multiport SOLR calibration. Differential measurements are facilitated through mixed-mode calculations of single ended S-parameter measurements made with the four-port SOLR calibrations improved with this work. Rf Models Loads Vna 4-port Solr American Studies Arts and Humanities
5	BioEve: User Interface Framework Bridging IE and IR January 2010 (has links) abstract: Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many systems aid in navigating through this wide collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also facilitate discovery of the unknown information implicitly conveyed in the texts. This thesis presents the different approaches used for large scale biomedical named entity recognition, and the challenges faced in each. It also proposes BioEve: an integrative framework to fuse a faceted search with information extraction to provide a search service that addresses the user's desire for "completeness" of the query results, not just the top-ranked ones. This information extraction system enables discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines and events from biomedical text on MEDLINE, which is the largest publicly available database of the world's biomedical journal literature. It is an innovative search and discovery service that makes it easier to search/navigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm. / Dissertation/Thesis / M.S. Computer Science 2010 Computer Science Information Technology Biology, Bioinformatics Apache Solr BioEve bio named entity annotation facet PubMed Sedna
6	Using clickthrough data to optimize search result ranking : An evaluation of clickthrough data in terms of relevancy and efficiency / Användning av clickthrough data för att optimera rankning av sökresultat : En utvärdering av clickthrough data gällande relevans och effektivitet Paulsson, Anton January 2017 (has links) Search engines are in a constant need for improvements as the rapid growth of information is affecting the search engines ability to return documents with high relevance. Search results are being lost in between pages and the search algorithms are being exploited to gain a higher ranking on the documents. This study attempts to minimize those two issues, as well as increasing the relevancy of search results by usage of clickthrough data to add another layer of weighting the search results. Results from the evaluation indicate that clickthrough data in fact can be used to gain more relevant search results. Search Result Reorganization Clickthrough Data Information Retrieval Apache Solr Computer Sciences Datavetenskap (datalogi)
7	Intelligent Retrieval and Clustering of Inventions Andrabi, Liaqat Hussain January 2015 (has links) Ericsson’s Region IPR & Licensing (RIPL) receives about 3000 thousands Invention Disclosures (IvDs) every year submitted by researchers as a result of their R&D activities. To decide whether an IvD has a good business value and a patent application should be filed; a rigorous evaluation process is carried out by a selected Patent Attorney (PA). One of most important elements of the evaluation process is to find prior art similar, including similar IvDs that have been evaluated before. These documents are not public and therefore can’t be searched using available search tools. For now the process of finding prior art is done manually (without the help of any search tools) and takes up significant amount of time. The aim of this Master’s thesis is to develop and test an information retrieval search engine as a proof of concept to find similar Invention Disclosure documents and related patent applications. For this purpose, a SOLR database server is setup with up to seven thousand five hundred (7500) IvDs indexed. A similarity algorithm is implemented which is customized to weight different fields. LUCENE is then used to query the server and display the relevant documents in a web application. Information Retrieval Solr Lucene Patents Patent Search Engine NoSql. Computer and Information Sciences Data- och informationsvetenskap
8	Changing a user’s search experience byincorporating preferences of metadata / Andra en användares sökupplevelse genom att inkorporera metadatapreferenser Ali, Miran January 2014 (has links) Implicit feedback is usually data that comes from users’ clicks, search queries and text highlights. It exists in abun- dance, but it is riddled with much noise and requires advanced algorithms to properly make good use of it. Several findings suggest that factors such as click-through data and reading time could be used to create user behaviour models in order to predict the users’ information need. This Master’s thesis aims to use click-through data and search queries together with heuristics to create a model that prioritises metadata-fields of the documents in order to predict the information need of a user. Simply put, implicit feedback will be used to improve the precision of a search engine. The Master’s thesis was carried out at Findwise AB - a search engine consultancy firm. Documents from the benchmark dataset INEX were indexed into a search engine. Two different heuristics were proposed that increment the priority of different metadata-fields based on the users’ search queries and clicks. It was assumed that the heuristics would be able to change the listing order of the search results. Evaluations were carried out for the two heuristics and the unmodified search engine was used as the baseline for the experiment. The evaluations were based on simulating a user that searches queries and clicks on documents. The queries and documents, with manually tagged relevance, used in the evaluation came from a data set given by INEX. It was expected that listing order would change in a way that was favourable for the user; the top-ranking results would be documents that truly were in the interest of the user. The evaluations revealed that the behaviour of the heuristics and the baseline have erratic behaviours and metrics never converged to any specific mean-relevance. A statistical test revealed that there is no difference in accuracy between the heuristics and the baseline. These results mean that the proposed heuristics do not improve the precision of the search engine and several factors, such as the indexing of too redundant metadata, could have been responsible for this outcome. / Implicit feedback är oftast data som kommer från användarnas klick, sökfrågor och textmarkeringar. Denna data finns i överflöd, men har för mycket brus och kräver avancerade algoritmer för att man ska kunna dra nytta av den. Flera rön föreslår att faktorer som klickdata och läsningstid kan användas för att skapa beteendemodeller för att förutse användarens informationsbehov. Detta examensarbete ämnar att använda klickdata och sökfrågor tillsammans med heuristiker för att skapa en modell som prioriterar metadata-fält i dokument så att användarens informationsbehov kan förutses. Alltså ska implicit feedback användas för att förbättra en sökmotors precision. Examensarbetet utfördes hos Findwise AB - en konsultfirma som specialiserar sig på söklösningar. Dokument från utvärderingsdatamängden INEX indexerades i en sökmotor. Två olika heuristiker skapades för att ändra prioriteten av metadata-fälten utifrån användarnas sök- och klickdata. Det antogs att heuristikerna skulle kunna förändra ordningen av sökresultaten. Evalueringar utfördes för båda heuristiker och den omodifierade sökmotorn användes som måttstock för experimentet. Evalueringarna gick ut på att simulera en användare som söker på frågor och klickar på dokument. Dessa frågor och dokument, med manuellt taggad relevansdata, kom från en datamängd som tillhandahölls av INEX. Evalueringarna visade att beteendet av heuristikerna och måttstocket är slumpmässiga och oberäkneliga. Ingen av heuristikerna konvergerar mot någon specifik medelrelevans. Ett statistiskt test visar att det inte är någon signifikant skillnad på uppmätt träffsäkerhet mellan heuristikerna och måttstocket. Dessa resultat innebär att heuristikerna inte förbättrar sökmotorns precision. Detta utfall kan bero på flera faktorer som t.ex. indexering av överflödig meta-data. search engine search findwise solr searching relevance qf cosine similarity mongodb inex Computer Sciences Datavetenskap (datalogi)
9	Improving Solr search with Natural Language Processing : An NLP implementation for information retrieval in Solr / Att förbättra Solr med Natural Language Processing Lager, Adam January 2021 (has links) The field of AI is emerging fast and institutions and companies are pushing the limits of impossibility. Natural Language Processing is a branch of AI where the goal is to understand human speech and/or text. This technology is used to improve an inverted index,the full text search engine Solr. Solr is open source and has integrated OpenNLP makingit a suitable choice for these kinds of operations. NLP-enabled Solr showed great results compared to the Solr that’s currently running on the systems, where NLP-Solr was slightly worse in terms of precision, it excelled at recall and returning the correct documents. NLP Natural Language Processing Solr Information Retrieval
10	Investigations of Free Text Indexing Using NLP : Comparisons of Search Algorithms and Models in Apache Solr / Undersöka hur fritextindexering kan förbättras genom NLP Sundstedt, Alfred January 2023 (has links) As Natural Language Processing progresses societal and applications like OpenAI obtain more considerable popularity in society, businesses encourage the integration of NLP into their systems. Both to improve the user experience and provide users with their requested information. For case management systems, a complicated task is to provide the user with relevant documents, since customers often have large databases containing similar information. This presumes that the user needs to match the requested topic perfectly. Imagine if there was a solution to search for context, instead of formulating the perfect prompt, via established NLP models like BERT. Imagine if the system understood its content. This thesis aims to investigate how a free text index can be improved using NLP from a user perspective and implement it. Using AI to help a free text index, in this case, Apache Solr, can make it easier for users to find the specific content the users are looking for. It is interesting to see how the search can be improved with the help of NLP models and present a more relevant result for the user. NLP can improve user prompts, known as queries, and assist in indexing the information. The task is to conduct a practical investigation by configuring the free text database Apache Solr, with and without NLP support. This is investigated by learning the search models' content, letting the search models provide their relevant search results, for some user queries, and evaluating the results. The investigated search models were a string-based model, an OpenNLP model, and BERT models segmented on paragraph level and sentence level. A hybrid search model of OpenNLP and BERT, on paragraph level, was the best solution overall. NLP Apache Solr Document Retrieval Context OpenNLP BERT Computer Sciences Datavetenskap (datalogi)

Search results