Global ETD Search

61	Learning deep embeddings by learning to rank He, Kun 05 February 2019 (has links) We study the problem of embedding high-dimensional visual data into low-dimensional vector representations. This is an important component in many computer vision applications involving nearest neighbor retrieval, as embedding techniques not only perform dimensionality reduction, but can also capture task-specific semantic similarities. In this thesis, we use deep neural networks to learn vector embeddings, and develop a gradient-based optimization framework that is capable of optimizing ranking-based retrieval performance metrics, such as the widely used Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). Our framework is applied in three applications. First, we study Supervised Hashing, which is concerned with learning compact binary vector embeddings for fast retrieval, and propose two novel solutions. The first solution optimizes Mutual Information as a surrogate ranking objective, while the other directly optimizes AP and NDCG, based on the discovery of their closed-form expressions for discrete Hamming distances. These optimization problems are NP-hard, therefore we derive their continuous relaxations to enable gradient-based optimization with neural networks. Our solutions establish the state-of-the-art on several image retrieval benchmarks. Next, we learn deep neural networks to extract Local Feature Descriptors from image patches. Local features are used universally in low-level computer vision tasks that involve sparse feature matching, such as image registration and 3D reconstruction, and their matching is a nearest neighbor retrieval problem. We leverage our AP optimization technique to learn both binary and real-valued descriptors for local image patches. Compared to competing approaches, our solution eliminates complex heuristics, and performs more accurately in the tasks of patch verification, patch retrieval, and image matching. Lastly, we tackle Deep Metric Learning, the general problem of learning real-valued vector embeddings using deep neural networks. We propose a learning to rank solution through optimizing a novel quantization-based approximation of AP. For downstream tasks such as retrieval and clustering, we demonstrate promising results on standard benchmarks, especially in the few-shot learning scenario, where the number of labeled examples per class is limited. Computer science Average precision Computer vision Deep learning Learning to rank Nearest neighbor retrieval Vector embedding
62	Automatic text categorization for information filtering. January 1998 (has links) Ho Chao Yang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 157-163). / Abstract also in Chinese. / Abstract --- p.i / Acknowledgment --- p.iii / List of Figures --- p.viii / List of Tables --- p.xiv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic Document Categorization --- p.1 / Chapter 1.2 --- Information Filtering --- p.3 / Chapter 1.3 --- Contributions --- p.6 / Chapter 1.4 --- Organization of the Thesis --- p.7 / Chapter 2 --- Related Work --- p.9 / Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9 / Chapter 2.1.1 --- Rule-Based Approach --- p.10 / Chapter 2.1.2 --- Similarity-Based Approach --- p.13 / Chapter 2.2 --- Existing Information Filtering Approaches --- p.19 / Chapter 2.2.1 --- Information Filtering Systems --- p.19 / Chapter 2.2.2 --- Filtering in TREC --- p.21 / Chapter 3 --- Document Pre-Processing --- p.23 / Chapter 3.1 --- Document Representation --- p.23 / Chapter 3.2 --- Classification Scheme Learning Strategy --- p.26 / Chapter 4 --- A New Approach - IBRI --- p.31 / Chapter 4.1 --- Overview of Our New IBRI Approach --- p.31 / Chapter 4.2 --- The IBRI Representation and Definitions --- p.34 / Chapter 4.3 --- The IBRI Learning Algorithm --- p.37 / Chapter 5 --- IBRI Experiments --- p.43 / Chapter 5.1 --- Experimental Setup --- p.43 / Chapter 5.2 --- Evaluation Metric --- p.45 / Chapter 5.3 --- Results --- p.46 / Chapter 6 --- A New Approach - GIS --- p.50 / Chapter 6.1 --- Motivation of GIS --- p.50 / Chapter 6.2 --- Similarity-Based Learning --- p.51 / Chapter 6.3 --- The Generalized Instance Set Algorithm (GIS) --- p.58 / Chapter 6.4 --- Using GIS Classifiers for Classification --- p.63 / Chapter 6.5 --- Time Complexity --- p.64 / Chapter 7 --- GIS Experiments --- p.68 / Chapter 7.1 --- Experimental Setup --- p.68 / Chapter 7.2 --- Results --- p.73 / Chapter 8 --- A New Information Filtering Approach Based on GIS --- p.87 / Chapter 8.1 --- Information Filtering Systems --- p.87 / Chapter 8.2 --- GIS-Based Information Filtering --- p.90 / Chapter 9 --- Experiments on GIS-based Information Filtering --- p.95 / Chapter 9.1 --- Experimental Setup --- p.95 / Chapter 9.2 --- Results --- p.100 / Chapter 10 --- Conclusions and Future Work --- p.108 / Chapter 10.1 --- Conclusions --- p.108 / Chapter 10.2 --- Future Work --- p.110 / Chapter A --- Sample Documents in the corpora --- p.111 / Chapter B --- Details of Experimental Results of GIS --- p.120 / Chapter C --- Computational Time of Reuters-21578 Experiments --- p.141 Text processing (Computer science) Nearest neighbor analysis (Statistics) Information retrieval
63	Superseding neighbor search on uncertain data. / 在不確定的空間數據庫中尋找最高取代性的最近鄰 / Zai bu que ding de kong jian shu ju ku zhong xun zhao zui gao qu dai xing de zui jin lin January 2009 (has links) Yuen, Sze Man. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves [44]-46). / Abstract also in Chinese. / Thesis Committee --- p.i / Abstract --- p.ii / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Related Work --- p.6 / Chapter 2.1 --- Nearest Neighbor Search on Precise Data --- p.6 / Chapter 2.2 --- NN Search on Uncertain Data --- p.8 / Chapter 3 --- Problem Definitions and Basic Characteristics --- p.11 / Chapter 4 --- The Full-Graph Approach --- p.16 / Chapter 5 --- The Pipeline Approach --- p.19 / Chapter 5.1 --- The Algorithm --- p.20 / Chapter 5.2 --- Edge Phase --- p.24 / Chapter 5.3 --- Pruning Phase --- p.27 / Chapter 5.4 --- Validating Phase --- p.28 / Chapter 5.5 --- Discussion --- p.29 / Chapter 6 --- Extension --- p.31 / Chapter 7 --- Experiment --- p.34 / Chapter 7.1 --- Properties of the SNN-core --- p.34 / Chapter 7.2 --- Efficiency of Our Algorithms --- p.38 / Chapter 8 --- Conclusions and Future Work --- p.42 / Chapter A --- List of Publications --- p.43 / Bibliography --- p.44 Database management Nearest neighbor analysis (Statistics) Data structures (Computer science) Uncertainty (Information theory)
64	Evaluation of decentralized email architecture and social network analysis based on email attachment sharing Tsipenyuk, Gregory January 2018 (has links) Present day email is provided by centralized services running in the cloud. The services transparently connect users behind middleboxes and provide backup, redundancy, and high availability at the expense of user privacy. In present day mobile environments, users can access and modify email from multiple devices with updates reconciled on the central server. Prioritizing updates is difficult and may be undesirable. Moreover, legacy email protocols do not provide optimal email synchronization and access. Recent phenomena of the Internet of Things (IoT) will see the number of interconnected devices grow to 27 billion by 2021. In the first part of my dissertation I am proposing a decentralized email architecture which takes advantage of user's a IoT devices to maintain a complete email history. This addresses the email reconciliation issue and places data under user control. I replace legacy email protocols with a synchronization protocol to achieve eventual consistency of email and optimize bandwidth and energy usage. The architecture is evaluated on a Raspberry Pi computer. There is an extensive body of research on Social Network Analysis (SNA) based on email archives. Typically, the analyzed network reflects either communication between users or a relationship between the email and the information found in the email's header and the body. This approach discards either all or some email attachments that cannot be converted to text; for instance, images. Yet attachments may use up to 90% of an email archive size. In the second part of my dissertation I suggest extracting the network from email attachments shared between users. I hypothesize that the network extracted from shared email attachments might provide more insight into the social structure of the email archive. I evaluate communication and shared email attachments networks by analyzing common centrality measures and classication and clustering algorithms. I further demonstrate how the analysis of the shared attachments network can be used to optimize the proposed decentralized email architecture.
65	Modern k-nearest neighbour methods in entropy estimation, independence testing and classification Berrett, Thomas Benjamin January 2017 (has links) Nearest neighbour methods are a classical approach in nonparametric statistics. The k-nearest neighbour classifier can be traced back to the seminal work of Fix and Hodges (1951) and they also enjoy popularity in many other problems including density estimation and regression. In this thesis we study their use in three different situations, providing new theoretical results on the performance of commonly-used nearest neighbour methods and proposing new procedures that are shown to outperform these existing methods in certain settings. The first problem we discuss is that of entropy estimation. Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this chapter, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko (1987), based on the k-nearest neighbour distances of a sample. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient in up to three dimensions. A related topic of study is the estimation of the mutual information between two random vectors, and its application to testing for independence. We propose tests for the two different situations of the marginal distributions being known or unknown and analyse their performance. Finally, we study the classical k-nearest neighbour classifier of Fix and Hodges (1951) and provide a new asymptotic expansion for its excess risk. We also show that, in certain situations, a new modification of the classifier that allows k to vary with the location of the test point can provide improvements. This has applications to the field of semi-supervised learning, where, in addition to labelled training data, we also have access to a large sample of unlabelled data. 519.5
66	Barn som närstående : Betydelsen av att vara delaktig och besöka närstående som vårdas på sjukhus Evaldsson, Caroline, Sörensen, Mia January 2010 (has links) Inom sjukvården brister det i bemötandet av barn som närstående där de sällan synliggörs, blir delaktiga eller informerade om det som berör den sjukes/skadades hälsotillstånd. Oberoende av sin familj har barn och unga egna rättigheter i samhället vilket barnkonventionen stödjer och arbetar för. 1 januari 2010 omarbetades innehållet i Hälso- och sjukvårdslagen (HSL) och Lagen om yrkesverksamhet på hälso- och sjukvårdens område (LYHS). Där klargörs barns rättigheter till information, råd och stöd samt hälso- och sjukvårdens ansvar att uppmärksamma detta, om barnets förälder lider av sjukdom/skada. Syftet är att beskriva barns upplevelser av att ha en närstående som vårdas på sjukhus samt ge en beskrivning av faktorer som kan påverka barnens upplevelser. Den här uppsatsen baseras på en litteraturöversikt beskriven av Friberg (2006) där granskning och analys av vårdvetenskapliga artiklars innehåll gjorts. Litteraturstudien bygger på kvantitativa och kvalitativa artiklar samt ”literature review”. Resultatet av studien är strukturerat i faktorer och upplevelser som redovisas i fyra huvudteman och tio subteman. Subtemana beskriver orsaker till restriktioner kring barns besök som närstående på sjukhus och betydelsen av att tillåta barn besöka. Barns behov av delaktighet och information skildras samt deras oro och bemästring av krissituationen. Studien visar på vikten av att ett handlingsprogram för barn som närstående utformas och att detta tillvaratar barns rättigheter och konstrueras på ett pedagogiskt sätt där ställning tas till barnets ålder och erfarenheter. Det kan vara fördelaktigt för barnet, familjen och den sjuke/skadade men också för barns framtida välbefinnande om sjukvårdspersonalen bemöter och tar hand om barn som närstående i den rådande stunden. Om barn som närstående till en sjuk/skadad patient uppmärksammas, blir sedda och bekräftade genom besök, tilldelas information, råd eller stöd, skonas de från onödigt lidande samt att de känner sig delaktiga i situationen tillsammans med de övriga i familjen. / Program: Sjuksköterskeutbildning child experience nearest visiting involvement information family Medical and Health Sciences Medicin och hälsovetenskap
67	Evaluating the use of neighborhoods for query dependent estimation of survival prognosis for oropharyngeal cancer patients Shay, Keegan P. 01 May 2019 (has links) Oropharyngeal Cancer diagnoses make up three percent of all cancer diagnoses in the United States per year. Recently, there has been an increase in the incidence of HPV-associated oropharyngeal cancer, necessitating updates to prior survival estimation techniques, in order to properly account for this shift in demographic. Clinicians depend on accurate survival prognosis estimates in order to create successful treatment plans that aim to maximize patient life while minimizing adverse treatment side effects. Additionally, recent advances in data analysis have resulted in richer and more complex data, motivating the use of more advanced data analysis techniques. Incorporation of sophisticated survival analysis techniques can leverage complex data, from a variety of sources, resulting in improved personalized prediction. Current survival prognosis prediction methods often rely on summary statistics and underlying assumptions regarding distribution or overall risk. We propose a k-nearest neighbor influenced approach for predicting oropharyngeal survival outcomes. We evaluate our approach for overall survival (OS), recurrence-free survival (RFS), and recurrence-free overall survival (RF+OS). We define two distance functions, not subject to the curse of dimensionality, in order to reconcile heterogeneous features with patient-to-patient similarity scores to produce a meaningful overall measure of distance. Using these distance functions, we obtain the k-nearest neighbors for each patient, forming neighborhoods of similar patients. We leverage these neighborhoods for prediction in two novel ensemble methods. The first ensemble method uses the nearest neighbors for each patient to combine globally trained predictions, weighted by their accuracies within a selected neighborhood. The second ensemble method combines Kaplan-Meier predictions from a variety of neighborhoods. Both proposed methods outperform an ensemble of standard global survival predictive models, with statistically significant calibration. k nearest neighbors local prediction Machine learning oropharyngeal QED Electrical and Computer Engineering
68	Ensembles for Distributed Data Shoemaker, Larry 21 October 2005 (has links) Many simulation data sets are so massive that they must be distributed among disk farms attached to different computing nodes. The data is partitioned into spatially disjoint sets that are not easily transferable among nodes due to bandwidth limitations. Conventional machine learning methods are not designed for this type of data distribution. Experts mark a training data set with different levels of saliency emphasizing speed rather than accuracy due to the size of the task. The challenge is to develop machine learning methods that learn how the expert has marked the training data so that similar test data sets can be marked more efficiently. Ensembles of machine learning classifiers are typically more accurate than individual classifiers. An ensemble of machine learning classifiers requires substantially less memory than the corresponding partition of the data set. This allows the transfer of ensembles among partitions. If all the ensembles are sent to each partition, they can vote for a level of saliency for each example in the partition. Different partitions of the data set may not have any salient points, especially if the data set has a time step dimension. This means the learned classifier for such partitions can not vote for saliency since they have not been trained to recognize it. In this work, we investigate the performance of different ensembles of classifiers on spatially partitioned data sets. Success is measured by the correct recognition of unknown and salient regions of data points. Random forests Nearest centroid Exodus ParaView Region labeling American Studies Arts and Humanities
69	Learning From Spatially Disjoint Data Bhadoria, Divya 02 April 2004 (has links) Committees of classifiers, also called mixtures or ensembles of classifiers, have become popular because they have the potential to improve on the performance of a single classifier constructed from the same set of training data. Bagging and boosting are some of the better known methods of constructing a committee of classifiers. Committees of classifiers are also important because they have the potential to provide a computationally scalable approach to handling massive datasets. When the emphasis is on computationally scalable approaches to handling massive datasets, the individual classifiers are often constructed from a small faction of the total data. In this context, the ability to improve on the accuracy of a hypothetical single classifier created from all of the training data may be sacrificed. The design of a committee of classifiers typically assumes that all of the training data is equally available to be assigned to subsets as desired, and that each subset is used to train a classifier in the committee. However, there are some important application contexts in which this assumption is not valid. In many real life situations, massive data sets are created on a distributed computer, recording the simulation of important physical processes. Currently, experts visually browse such datasets to search for interesting events in the simulation. This sort of manual search for interesting events in massive datasets is time consuming. Therefore, one would like to construct a classifier that could automatically label the "interesting" events. The problem is that the dataset is distributed across a large number of processors in chunks that are spatially homogenous with respect to the underlying physical context in the simulation. Here, a potential solution to this problem using ensembles is explored. data mining decision tree nearest neighbor distributed learning classification American Studies Arts and Humanities
70	Brain Tumor Target Volume Determination for Radiation Therapy Treatment Planning Through the Use of Automated MRI Segmentation Mazzara, Gloria Patrika 27 February 2004 (has links) Radiation therapy seeks to effectively irradiate the tumor cells while minimizing the dose to adjacent normal cells. Prior research found that the low success rates for treating brain tumors would be improved with higher radiation doses to the tumor area. This is feasible only if the target volume can be precisely identified. However, the definition of tumor volume is still based on time-intensive, highly subjective manual outlining by radiation oncologists. In this study the effectiveness of two automated Magnetic Resonance Imaging (MRI) segmentation methods, k-Nearest Neighbors (kNN) and Knowledge-Guided (KG), in determining the Gross Tumor Volume (GTV) of brain tumors for use in radiation therapy was assessed. Three criteria were applied: accuracy of the contours; quality of the resulting treatment plan in terms of dose to the tumor; and a novel treatment plan evaluation technique based on post-treatment images. The kNN method was able to segment all cases while the KG method was limited to enhancing tumors and gliomas with clear enhancing edges. Various software applications were developed to create a closed smooth contour that encompassed the tumor pixels from the segmentations and to integrate these results into the treatment planning software. A novel, probabilistic measurement of accuracy was introduced to compare the agreement of the segmentation methods with the weighted average physician volume. Both computer methods under-segment the tumor volume when compared with the physicians but performed within the variability of manual contouring (28% plus/minus12% for inter-operator variability). Computer segmentations were modified vertically to compensate for their under-segmentation. When comparing radiation treatment plans designed from physician-defined tumor volumes with treatment plans developed from the modified segmentation results, the reference target volume was irradiated within the same level of conformity. Analysis of the plans based on post- treatment MRI showed that the segmentation plans provided similar dose coverage to areas being treated by the original treatment plans. This research demonstrates that computer segmentations provide a feasible route to automatic target volume definition. Because of the lower variability and greater efficiency of the automated techniques, their use could lead to more precise plans and better prognosis for brain tumor patients. automatic segmentation glioma magnetic resonance knowledge guided k-nearest neighbor American Studies Arts and Humanities

Search results