• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 3
  • 3
  • 3
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

DISTRIBUTED NEAREST NEIGHBOR CLASSIFICATION WITH APPLICATIONS TO CROWDSOURCING

Jiexin Duan (11181162) 26 July 2021 (has links)
The aim of this dissertation is to study two problems of distributed nearest neighbor classification (DiNN) systematically. The first one compares two DiNN classifiers based on different schemes: majority voting and weighted voting. The second one is an extension of the DiNN method to the crowdsourcing application, which allows each worker data has a different size and noisy labels due to low worker quality. Both statistical guarantees and numerical comparisons are studied in depth.<br><div><br></div><div><div>The first part of the dissertation focuses on the distributed nearest neighbor classification in big data. The sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed a considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of the regret, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one.</div></div><div><br></div><div>The second part of the dissertation extends the DiNN methods to the application in crowdsourcing. The noisy labels in crowdsourcing data and different sizes of worker data will deteriorate the performance of DiNN methods. We propose an enhanced nearest neighbor classifier (ENN) to overcome this issue. Our proposed method achieves the same regret as its oracle version on the expert data with the same size. We also propose two algorithms to estimate the worker quality if it is unknown in practice. One method constructs the estimators for worker quality based on the denoised worker labels through applying kNN classifier on expert data. Unlike previous worker quality estimation methods, which have no statistical guarantee, it achieves the same regret as the ENN with observed worker quality. The other method estimates the worker quality iteratively based on ENN, and it works well without expert data required by most previous methods.<br></div>
2

Time-Series Classification: Technique Development and Empirical Evaluation

Yang, Ching-Ting 31 July 2002 (has links)
Many interesting applications involve decision prediction based on a time-series sequence or a set of time-series sequences, which are referred to as time-series classification problems. Past classification analysis research predominately focused on constructing a classification model from training instances whose attributes are atomic and independent. Direct application of traditional classification analysis techniques to time-series classification problems requires the transformation of time-series data into non-time-series data attributes by applying some statistical operations (e.g., average, sum, etc). However, such statistical transformation often results in information loss. In this thesis, we proposed the Time-Series Classification (TSC) technique, based on the nearest neighbor classification approach. The result of empirical evaluation showed that the proposed time-series classification technique had better performance than the statistical-transformation-based approach.
3

Pattern Synthesis Techniques And Compact Data Representation Schemes For Efficient Nearest Neighbor Classification

Pulabaigari, Viswanath 01 1900 (has links) (PDF)
No description available.

Page generated in 0.1172 seconds