Global ETD Search

11	The smoothed Dirichlet distribution: Understanding cross -entropy ranking in information retrieval Nallapati, Ramesh 01 January 2006 (has links) Unigram Language modeling is a successful probabilistic framework for Information Retrieval (IR) that uses the multinomial distribution to model documents and queries. An important feature in this approach is the usage of the empirically successful cross-entropy function between the query model and document models as a document ranking function. However, this function does not follow directly from the underlying models and as such there is no justification available for its usage till date. Another related and interesting observation is that the naïve Bayes model for text classification uses the same multinomial distribution to model documents but in contrast, employs document-log-likelihood that follows directly from the model, as a scoring function. Curiously, the document-log-likelihood closely corresponds to cross entropy, but to an asymmetric counterpart of the function used in language modeling. It has been empirically demonstrated that the version of cross entropy used in IR is a better performer than document-log-likelihood, but this interesting phenomenon remains largely unexplained. One of the main objectives of this work is to develop a theoretical understanding of the reasons for the success of the version of cross entropy function used for ranking in IR. We also aim to construct a likelihood based generative model that directly corresponds to this cross-entropy function. Such a model, if successful, would allow us to view IR essentially as a machine learning problem. A secondary objective is to bridge the gap between the generative approaches used in IR and text classification through a unified model. In this work we show that the cross entropy ranking function corresponds to the log-likelihood of documents w.r.t. the approximate Smoothed-Dirichlet (SD) distribution, a novel variant of the Dirichlet distribution. We also empirically demonstrate that this new distribution captures term occurrence patterns in documents much better than the multinomial, thus offering a reason behind the superior performance of the cross entropy ranking function compared to the multinomial document-likelihood. Our experiments in text classification show that a classifier based on the Smoothed Dirichlet performs significantly better than the multinomial based naïve Bayes model and on par with the Support Vector Machines (SVM), confirming our reasoning. In addition, this classifier is as quick to train as the naïve Bayes and several times faster than the SVMs owing to its closed form maximum likelihood solution, making it ideal for many practical IR applications. We also construct a well-motivated generative classifier for IR based on SD distribution that uses the EM algorithm to learn from pseudo-feedback and show that its performance is equivalent to the Relevance model (RM), a state-of-the-art model for IR in the language modeling framework that uses the same cross-entropy as its ranking function. In addition, the SD based classifier provides more flexibility than RM in modeling documents owing to a consistent generative framework. We demonstrate that this flexibility translates into a superior performance compared to RM on the task of topic tracking, an online classification task. Computer science\|Information systems
12	An execution context optimization framework for disk energy Hom, Jerry Yin. January 2008 (has links) Thesis (Ph. D.)--Rutgers University, 2008. / "Graduate Program in Computer Science." Includes bibliographical references (p. 89-95).
13	Answering why-not questions on spatial keyword top-k queries /Chen Lei. Chen, Lei 13 December 2016 (has links) With the continued proliferation of location-based services, a growing number of web-accessible data objects are geo-tagged and have text descriptions. Spatial keyword top-k queries retrieve k such objects with the best score according to a ranking function that takes into account a query location and query keywords. However, it is in some cases difficult for users to specify appropriate query parameters. After a user issues an initial query and gets back the result, the user may find that some expected objects are missing and may wonder why. Answering the resulting why-not questions can aid users in retrieving better results and thus improve the overall utility of the query functionality. While spatial keyword querying has been studied intensively, no proposals exist for how to offer users explanations of why such expected objects are missing from results. In this dissertation, we take the first step to study the why-not questions on spatial keyword top-k queries. We provide techniques that allow different revisions of spatial keyword queries such that their results include one or more desired, but missing objects. Detailed problem analysis and extensive experimental studies consistently demonstrate the effectiveness and robustness of our proposed techniques in a broad range of settings.
14	Retrieval of passages for information reduction Daniels, Jody J 01 January 1997 (has links) Information Retrieval (IR) typically retrieves entire documents in response to a user's information need. However, many times a user would prefer to examine smaller portions of a document. One example of this is when building a frame-based representation of a text. The user would like to read all and only those portions of the text that are about predefined important features. This research addresses the problem of automatically locating text about these features, where the important features are those defined for use by a case-based reasoning (CBR) system in the form of features and values or slots and fillers. To locate important text pieces we gathered a small set of "excerpts", textual segments, when creating the original case-base representations. Each segment contains the local context for a particular feature within a document. We used these excerpts to generate queries that retrieve relevant passages. By locating passages for display to the user, we winnow a text down to sets of several sentences, greatly reducing the time and effort expended searching through each text for important features.
15	Exécution d'applications stockées dans la mémoire non-adressable d'une carte à puce Cogniaux, Geoffroy 13 December 2012 (has links) (PDF) La dernière génération de cartes à puce permet le téléchargement d'applications après leur mise en circulation. Outre les problèmes que cela implique, cette capacité d'extension applicative reste encore aujourd'hui bridée par un espace de stockage adressable restreint. La thèse défendue dans ce mémoire est qu'il est possible d'exécuter efficacement des applications stockées dans la mémoire non-adressable des cartes à puce, disponible en plus grande quantité, et ce, malgré ses temps de latences très longs, donc peu favorables a priori à l'exécution de code. Notre travail consiste d'abord à étudier les forces et faiblesse de la principale réponse proposée par l'état de l'art qu'est un cache. Cependant, dans notre contexte, il ne peut être implémenté qu'en logiciel, avec alors une latence supplémentaire. De plus, ce cache doit respecter les contraintes mémoires des cartes à puce et doit donc avoir une empreinte mémoire faible. Nous montrons comment et pourquoi ces deux contraintes réduisent fortement les performances d'un cache, qui devient alors une réponse insuffisante pour la résolution de notre challenge. Nous appliquons notre démonstration aux caches de code natif, puis de code et méta-données Java et JavaCard2. Forts de ces constats, nous proposons puis validons une solution reposant sur une pré-interprétation de code, dont le but est à la fois de détecter précocement les données manquantes en cache pour les charger à l'avance et en parallèle, mais aussi grouper des accès au cache et réduire ainsi l'impact de son temps de latence logiciel, démontré comme son principal coût. Le tout produit alors une solution efficace, passant l'échelle des cartes à puce. Cartes à puce mémoires non-adressable mémoires caches
16	Robust decentralized authentication for public keys and geographic location Pathak, Vivek. January 2009 (has links) Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 121o-128).
17	Safety through security Simpson, Andrew C. January 1996 (has links) In this thesis, we investigate the applicability of the process algebraic formal method Communicating Sequential Processes (CSP) [Hoa85] to the development and analysis of safetycritical systems. We also investigate how these tasks might be aided by mechanical verification, which is provided in the form of the proof tool Failures-Divergences Refinement (FDR) [Ros94]. Initially, we build upon the work of [RWW94, Ros95], in which CSP treatments of the security property of non-interference are described. We use one such formulation to define a property called protection, which unifies our views of safety and security. As well as applying protection to the analysis of safety-critical systems, we develop a proof system for this property, which in conjunction with the opportunity for automated analysis provided by FDR, enables us to apply the approach to problems of a sizable complexity. We then describe how FDR can be applied to the analysis of mutual exclusion, which is a specific form of non-interference. We investigate a number of well-known solutions to the problem, and illustrate how such mutual exclusion algorithms can be interpreted as CSP processes and verified with FDR. Furthermore, we develop a means of verifying the faulttolerance of such algorithms in terms of protection. In turn, mutual exclusion is used to describe safety properties of geographic data associated with Solid State Interlocking (SSI) railway signalling systems. We show how FDR can be used to describe these properties and model interlocking databases. The CSP approach to compositionality allows us to decompose such models, thus reducing the complexity of analysing safety invariants of SSI geographic data. As such, we describe how the mechanical verification of Solid State Interlocking geographic data, which was previously considered to be an intractable problem for the current generation of mechanical verification tools, is computationally feasible using FDR. Thus, the goals of this thesis are twofold. The first goal is to establish a formal encapsulation of a theory of safety-critical systems based upon the relationship which exists between safety and security. The second goal is to establish that CSP, together with FDR, can be applied to the modelling of Solid State Interlocking geographic databases. Furthermore, we shall attempt to demonstrate that such modelling can scale up to large-scale systems. 005
18	Binary image compression using run length encoding and multiple scanning techniques / Merkl, Frank J. January 1988 (has links) Thesis (M.S.)--Rochester Institute of Technology, 1988. / Includes bibliographical references.
19	Channel management, message representation and event handling of a protocol implementation framework for Linux using generative programming / Zhang, Song, January 1900 (has links) Thesis (M.Sc.) - Carleton University, 2002. / Includes bibliographical references (p. 92-94). Also available in electronic format on the Internet.
20	Adaptive frame selection for enhanced face recognition in low-resolution videos Jillela, Raghavender Reddy. January 1900 (has links) Thesis (M.S.)--West Virginia University, 2008. / Title from document title page. Document formatted into pages; contains viii, 67 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 62-67).

Search results