Spelling suggestions: "subject:"[een] DISTRIBUTED LEARNING"" "subject:"[enn] DISTRIBUTED LEARNING""
11 |
Trends in Participation Rates of Home Educating in B.C., 1993 to 2013Gardner, Nicole 21 August 2015 (has links)
When a family in British Columbia (B.C.) chooses to educate their child at home, they have two legal options: enrollment in a Distributed Learning (DL) program or registration under Section 12 (S12) of the School Act as a homeschooler. To date, there has been very little published on trends in participation rates and growth rates with regards to home educating options in B.C. The current study employs a quantitative archival design to document trends in DL and S12 across age, gender and location. Home educating is on the rise in B.C. over the past twenty years, largely due to an increase in enrollment in DL programs while registration under S12 has declined. Distinct patterns in age, gender and location between S12 and DL are apparent in the data. Growth rates among age categories in DL mirror declines in S12. While there are slightly more males than females in the total school-aged population in B.C., within DL programs there are more females than males at the secondary level. In 1993/1994 rural children were more likely to be educated at home than urban children in B.C.; today the opposite is true. Further research is needed to ascertain why these trends persist. / Graduate / 0525 / 0529 / ngardner@uvic.ca
|
12 |
Distributed Statistical Learning under Communication ConstraintsEl Gamal, Mostafa 21 June 2017 (has links)
"In this thesis, we study distributed statistical learning, in which multiple terminals, connected by links with limited capacity, cooperate to perform a learning task. As the links connecting the terminals have limited capacity, the messages exchanged between the terminals have to be compressed. The goal of this thesis is to investigate how to compress the data observations at multiple terminals and how to use the compressed data for inference. We first focus on the distributed parameter estimation problem, in which terminals send messages related to their local observations using limited rates to a fusion center that will obtain an estimate of a parameter related to the observations of all terminals. It is well known that if the transmission rates are in the Slepian-Wolf region, the fusion center can fully recover all observations and hence can construct an estimator having the same performance as that of the centralized case. One natural question is whether Slepian-Wolf rates are necessary to achieve the same estimation performance as that of the centralized case. In this thesis, we show that the answer to this question is negative. We then examine the optimality of data dimensionality reduction via sufficient statistics compression in distributed parameter estimation problems. The data dimensionality reduction step is often needed especially if the data has a very high dimension and the communication rate is not as high as the one characterized above. We show that reducing the dimensionality by extracting sufficient statistics of the parameter to be estimated does not degrade the overall estimation performance in the presence of communication constraints. We further analyze the optimal estimation performance in the presence of communication constraints and we verify the derived bound using simulations. Finally, we study distributed optimization problems, for which we examine the randomized distributed coordinate descent algorithm with quantized updates. In the literature, the iteration complexity of the randomized distributed coordinate descent algorithm has been characterized under the assumption that machines can exchange updates with an infinite precision. We consider a practical scenario in which the messages exchange occurs over channels with finite capacity, and hence the updates have to be quantized. We derive sufficient conditions on the quantization error such that the algorithm with quantized update still converge."
|
13 |
Learning From Spatially Disjoint DataBhadoria, Divya 02 April 2004 (has links)
Committees of classifiers, also called mixtures or ensembles of classifiers, have become popular because they have the potential to improve on the performance of a single classifier constructed from the same set of training data. Bagging and boosting are some of the better known methods of constructing a committee of classifiers. Committees of classifiers are also important because they have the potential to provide a computationally scalable approach to handling massive datasets. When the emphasis is on computationally scalable approaches to handling massive datasets, the individual classifiers are often constructed from a small faction of the total data. In this context, the ability to improve on the accuracy of a hypothetical single classifier created from all of the training data may be sacrificed.
The design of a committee of classifiers typically assumes that all of the training data is equally available to be assigned to subsets as desired, and that each subset is used to train a classifier in the committee. However, there are some important application contexts in which this assumption is not valid. In many real life situations, massive data sets are created on a distributed computer, recording the simulation of important physical processes.
Currently, experts visually browse such datasets to search for interesting events in the simulation. This sort of manual search for interesting events in massive datasets is time consuming. Therefore, one would like to construct a classifier that could automatically label the "interesting" events. The problem is that the dataset is distributed across a large number of processors in chunks that are spatially homogenous with respect to the underlying physical context in the simulation. Here, a potential solution to this problem using ensembles is explored.
|
14 |
A Classification Framework for Imbalanced DataPhoungphol, Piyaphol 18 December 2013 (has links)
As information technology advances, the demands for developing a reliable and highly accurate predictive model from many domains are increasing. Traditional classification algorithms can be limited in their performance on highly imbalanced data sets. In this dissertation, we study two common problems when training data is imbalanced, and propose effective algorithms to solve them.
Firstly, we investigate the problem in building a multi-class classification model from imbalanced class distribution. We develop an effective technique to improve the performance of the model by formulating the problem as a multi-class SVM with an objective to maximize G-mean value. A ramp loss function is used to simplify and solve the problem. Experimental results on multiple real-world datasets confirm that our new method can effectively solve the multi-class classification problem when the datasets are highly imbalanced.
Secondly, we explore the problem in learning a global classification model from distributed data sources with privacy constraints. In this problem, not only data sources have different class distributions but combining data into one central data is also prohibited. We propose a privacy-preserving framework for building a global SVM from distributed data sources. Our new framework avoid constructing a global kernel matrix by mapping non-linear inputs to a linear feature space and then solve a distributed linear SVM from these virtual points. Our method can solve both imbalance and privacy problems while achieving the same level of accuracy as regular SVM.
Finally, we extend our framework to handle high-dimensional data by utilizing Generalized Multiple Kernel Learning to select a sparse combination of features and kernels. This new model produces a smaller set of features, but yields much higher accuracy.
|
15 |
Automated Discovery and Analysis of Social Networks from Threaded DiscussionsGruzd, Anatoliy A, Haythornthwaite, Caroline January 2008 (has links)
To gain greater insight into the operation of online social networks, we applied Natural Language Processing (NLP) techniques to text-based communication to identify and describe underlying social structures in online communities. This paper presents our approach and preliminary evaluation for content-based, automated discovery of social networks. Our research question is: What syntactic and semantic features of postings in a threaded discussions help uncover explicit and implicit ties between network members, and which provide a reliable estimate of the strengths of interpersonal ties among the network members? To evaluate our automated procedures, we compare the results from the NLP processes with social networks built from basic who-to-whom data, and a sample of hand-coded data derived from a close reading of the text.
For our test case, and as part of ongoing research on networked learning, we used the archive of threaded discussions collected over eight iterations of an online graduate class. We first associate personal names and nicknames mentioned in the postings with class participants. Next we analyze the context in which each name occurs in the postings to determine whether or not there is an interpersonal tie between a sender of the posting and a person mentioned in it. Because information exchange is a key factor in the operation and success of a learning community, we estimate and assign weights to the ties by measuring the amount of information exchanged between each pair of the nodes; information in this case is operationalized as counts of important concept terms in the postings as derived through the NLP analyses. Finally, we compare the resulting network(s) against those derived from other means, including basic who-to-whom data derived from posting sequences (e.g., whose postings follow whose). In this comparison we evaluate what is gained in understanding network processes by our more elaborate analyses.
|
16 |
The experience of teachers in distributed learning environments : implications for teaching practiceLemieux, Kimberly 09 August 2012 (has links)
This qualitative study used a narrative inquiry approach to conduct in-depth interviews of eight
distributed learning educators who designed and offered online English courses in British
Columbia during the 2011/12 school year. There were three research questions: (1) How do
teachers describe their professional experiences of teaching in a full time online environment?
(2) What are the enablers and inhibitors for online teacher development? (3) Do teachers feel
their teaching practice has changed over their career as online educators?
Findings were examined through the lens of Korthagen’s (2004) Onion Model. Six
themes that comprised this model, provided a framework for data analysis and insight into the
process by which teachers made sense of their lived experience. The findings revealed that
online educators valued their online experience because it removed the constraints of a regular
classroom. They expressed frustration with some aspects of the current model of online
education in BC because it prevented them from engaging in synchronous, highly connective
learning projects with their students. Recognition of the fact that online educators work in a
different milieu with a different set of environmental pressures is necessary to ensure the success
of distributed learning in BC.
|
17 |
DISTRIBUTED NEAREST NEIGHBOR CLASSIFICATION WITH APPLICATIONS TO CROWDSOURCINGJiexin Duan (11181162) 26 July 2021 (has links)
The aim of this dissertation is to study two problems of distributed nearest neighbor classification (DiNN) systematically. The first one compares two DiNN classifiers based on different schemes: majority voting and weighted voting. The second one is an extension of the DiNN method to the crowdsourcing application, which allows each worker data has a different size and noisy labels due to low worker quality. Both statistical guarantees and numerical comparisons are studied in depth.<br><div><br></div><div><div>The first part of the dissertation focuses on the distributed nearest neighbor classification in big data. The sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed a considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of the regret, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one.</div></div><div><br></div><div>The second part of the dissertation extends the DiNN methods to the application in crowdsourcing. The noisy labels in crowdsourcing data and different sizes of worker data will deteriorate the performance of DiNN methods. We propose an enhanced nearest neighbor classifier (ENN) to overcome this issue. Our proposed method achieves the same regret as its oracle version on the expert data with the same size. We also propose two algorithms to estimate the worker quality if it is unknown in practice. One method constructs the estimators for worker quality based on the denoised worker labels through applying kNN classifier on expert data. Unlike previous worker quality estimation methods, which have no statistical guarantee, it achieves the same regret as the ENN with observed worker quality. The other method estimates the worker quality iteratively based on ENN, and it works well without expert data required by most previous methods.<br></div>
|
18 |
Distributed Bootstrap for Massive DataYang Yu (12466911) 27 April 2022 (has links)
<p>Modern massive data, with enormous sample size and tremendous dimensionality, are usually stored and processed using a cluster of nodes in a master-worker architecture. A shortcoming of this architecture is that inter-node communication can be over a thousand times slower than intra-node computation, which makes communication efficiency a desirable feature when developing distributed learning algorithms. In this dissertation, we tackle this challenge and propose communication-efficient bootstrap methods for simultaneous inference in the distributed computational framework.</p>
<p> </p>
<p>First, we propose two generic distributed bootstrap methods, \texttt{k-grad} and \texttt{n+k-1-grad}, which apply multiplier bootstrap at the master node on the gradients communicated across nodes. Based on them, we develop a communication-efficient method of producing an $\ell_\infty$-norm confidence region using distributed data with dimensionality not exceeding the local sample size. Our theory establishes the communication efficiency by providing a lower bound on the number of communication rounds $\tau_{\min}$ that warrants the statistical accuracy and efficiency and showing that $\tau_{\min}$ only increases logarithmically with the number of workers and the dimensionality. Our simulation studies validate our theory.</p>
<p> </p>
<p>Then, we extend \texttt{k-grad} and \texttt{n+k-1-grad} to the high-dimensional regime and propose a distributed bootstrap method for simultaneous inference on high-dimensional distributed data. The method produces an $\ell_\infty$-norm confidence region based on a communication-efficient de-biased lasso, and we propose an efficient cross-validation approach to tune the method at every iteration. We theoretically prove a lower bound on the number of communication rounds $\tau_{\min}$ that warrants the statistical accuracy and efficiency. Furthermore, $\tau_{\min}$ only increases logarithmically with the number of workers and the intrinsic dimensionality, while nearly invariant to the nominal dimensionality. We test our theory by extensive simulation studies and a variable screening task on a semi-synthetic dataset based on the US Airline On-Time Performance dataset.</p>
|
19 |
Non-convex Stochastic Optimization With Biased Gradient EstimatorsSokolov, Igor 03 1900 (has links)
Non-convex optimization problems appear in various applications of machine learning. Because of their practical importance, these problems gained a lot of attention in recent years, leading to the rapid development of new efficient stochastic gradient-type methods. In the quest to improve the generalization performance of modern deep learning models, practitioners are resorting to using larger and larger datasets in the training process, naturally distributed across a number of edge devices. However, with the increase of trainable data, the computational costs of gradient-type methods increase significantly. In addition, distributed methods almost invariably suffer from the so-called communication bottleneck: the cost of communication of the information necessary for the workers to jointly solve the problem is often very high, and it can be orders of magnitude higher than the cost of computation. This thesis provides a study of first-order stochastic methods addressing these issues. In particular, we structure this study by considering certain classes of methods. That allowed us to understand current theoretical gaps, which we successfully filled by providing new efficient algorithms.
|
20 |
Learning to Learn Multi-party Learning : FROM Both Distributed and Decentralized PerspectivesJi, Jinlong 07 September 2020 (has links)
No description available.
|
Page generated in 0.044 seconds