Spelling suggestions: "subject:"nearest neighborhood"" "subject:"dearest neighborhood""
41 |
Evaluation of decentralized email architecture and social network analysis based on email attachment sharingTsipenyuk, Gregory January 2018 (has links)
Present day email is provided by centralized services running in the cloud. The services transparently connect users behind middleboxes and provide backup, redundancy, and high availability at the expense of user privacy. In present day mobile environments, users can access and modify email from multiple devices with updates reconciled on the central server. Prioritizing updates is difficult and may be undesirable. Moreover, legacy email protocols do not provide optimal email synchronization and access. Recent phenomena of the Internet of Things (IoT) will see the number of interconnected devices grow to 27 billion by 2021. In the first part of my dissertation I am proposing a decentralized email architecture which takes advantage of user's a IoT devices to maintain a complete email history. This addresses the email reconciliation issue and places data under user control. I replace legacy email protocols with a synchronization protocol to achieve eventual consistency of email and optimize bandwidth and energy usage. The architecture is evaluated on a Raspberry Pi computer. There is an extensive body of research on Social Network Analysis (SNA) based on email archives. Typically, the analyzed network reflects either communication between users or a relationship between the email and the information found in the email's header and the body. This approach discards either all or some email attachments that cannot be converted to text; for instance, images. Yet attachments may use up to 90% of an email archive size. In the second part of my dissertation I suggest extracting the network from email attachments shared between users. I hypothesize that the network extracted from shared email attachments might provide more insight into the social structure of the email archive. I evaluate communication and shared email attachments networks by analyzing common centrality measures and classication and clustering algorithms. I further demonstrate how the analysis of the shared attachments network can be used to optimize the proposed decentralized email architecture.
|
42 |
Learning From Spatially Disjoint DataBhadoria, Divya 02 April 2004 (has links)
Committees of classifiers, also called mixtures or ensembles of classifiers, have become popular because they have the potential to improve on the performance of a single classifier constructed from the same set of training data. Bagging and boosting are some of the better known methods of constructing a committee of classifiers. Committees of classifiers are also important because they have the potential to provide a computationally scalable approach to handling massive datasets. When the emphasis is on computationally scalable approaches to handling massive datasets, the individual classifiers are often constructed from a small faction of the total data. In this context, the ability to improve on the accuracy of a hypothetical single classifier created from all of the training data may be sacrificed.
The design of a committee of classifiers typically assumes that all of the training data is equally available to be assigned to subsets as desired, and that each subset is used to train a classifier in the committee. However, there are some important application contexts in which this assumption is not valid. In many real life situations, massive data sets are created on a distributed computer, recording the simulation of important physical processes.
Currently, experts visually browse such datasets to search for interesting events in the simulation. This sort of manual search for interesting events in massive datasets is time consuming. Therefore, one would like to construct a classifier that could automatically label the "interesting" events. The problem is that the dataset is distributed across a large number of processors in chunks that are spatially homogenous with respect to the underlying physical context in the simulation. Here, a potential solution to this problem using ensembles is explored.
|
43 |
Brain Tumor Target Volume Determination for Radiation Therapy Treatment Planning Through the Use of Automated MRI SegmentationMazzara, Gloria Patrika 27 February 2004 (has links)
Radiation therapy seeks to effectively irradiate the tumor cells while minimizing the dose to adjacent normal cells. Prior research found that the low success rates for treating brain tumors would be improved with higher radiation doses to the tumor area. This is feasible only if the target volume can be precisely identified. However, the definition of tumor volume is still based on time-intensive, highly subjective manual outlining by radiation oncologists. In this study the effectiveness of two automated Magnetic Resonance Imaging (MRI) segmentation methods, k-Nearest Neighbors (kNN) and Knowledge-Guided (KG), in determining the Gross Tumor Volume (GTV) of brain tumors for use in radiation therapy was assessed. Three criteria were applied: accuracy of the contours; quality of the resulting treatment plan in terms of dose to the tumor; and a novel treatment plan evaluation technique based on post-treatment images.
The kNN method was able to segment all cases while the KG method was limited to enhancing tumors and gliomas with clear enhancing edges. Various software applications were developed to create a closed smooth contour that encompassed the tumor pixels from the segmentations and to integrate these results into the treatment planning software. A novel, probabilistic measurement of accuracy was introduced to compare the agreement of the segmentation methods with the weighted average physician volume. Both computer methods under-segment the tumor volume when compared with the physicians but performed within the variability of manual contouring (28% plus/minus12% for inter-operator variability).
Computer segmentations were modified vertically to compensate for their under-segmentation. When comparing radiation treatment plans designed from physician-defined tumor volumes with treatment plans developed from the modified segmentation results, the reference target volume was irradiated within the same level of conformity. Analysis of the plans based on post- treatment MRI showed that the segmentation plans provided similar dose coverage to areas being treated by the original treatment plans.
This research demonstrates that computer segmentations provide a feasible route to automatic target volume definition. Because of the lower variability and greater efficiency of the automated techniques, their use could lead to more precise plans and better prognosis for brain tumor patients.
|
44 |
Efficient Adjacency Queries and Dynamic Refinement for Meshfree Methods with Applications to Explicit Fracture ModelingOlliff, James 22 June 2018 (has links)
Meshfree methods provide a more practical approach to solving problems involving large deformation and modeling fracture compared to the Finite Element Method (FEM). However meshfree methods are more computationally intensive compared to FEM, which can limit their practicality in engineering. Meshfree methods also lack a clear boundary definition, restricting available visualization techniques. Determining particle locations and attributes such that a consistent approximation is ensured can be challenging in meshfree methods, especially when employing h-refinement. The primary objective of this work is to address the limitations associated with computational efficiency, meshfree domain discretization, and h-refinement, including both placement of particles as well as determination of particle attributes. To demonstrate the efficacy of these algorithms, a model predicting the failure of laminated composite structures using a meshfree method will be presented.
|
45 |
Defining activity areas in the Early Neolithic site at Foeni-Salaş (southwest Romania): A spatial analytic approach with geographical information systems in archaeologyLawson, Kathryn Sahara 20 September 2007 (has links)
Through the years, there has been a great deal of archaeological research focused on the earliest farming cultures of Europe (i.e. Early Neolithic). However, little effort has been expended to uncover the type and nature of daily activities performed within Early Neolithic dwellings, particularly in the Balkans.
This thesis conducts a spatial analysis of the Early Neolithic pit house levels of the Foeni-Salaş site in southeast Romania, in the northern half of the Balkans, to determine the kinds and locations of activities that occurred in these pit houses. Characteristic Early Neolithic dwellings in the northern Balkans are pit houses. The data are analyzed using Geographic Information Systems (GIS) technology in an attempt to identify non-random patterns that will indicate how the pit house inhabitants used their space. Both visual and statistical (Nearest Neighbor) techniques are used to identify spatial patterns. Spreadsheet data are incorporated into the map database in order to compare and contrast the results from the two techniques of analysis. Map data provides precise artefact locations, while spreadsheet data yield more generalized quad centroid information. Unlike the mapped data, the spreadsheet data also included artefacts recovered in sieves. Utilizing both data types gave a more complexand fuller understanding of how space was used at Foeni-Salaş.
The results show that different types of activity areas are present within each of the pit houses. Comparison of interior to exterior artifact distributions demonstrates that most activities take place within pit house. Some of the activities present include weaving, food preparation, butchering, hide processing, pottery making, ritual, and other activities related to the running of households. It was found that these activities are placed in specific locations relative to features within the pit house and the physical structure of the pit house itself. This research adds to the growing body of archaeological research that implements GIS to answer questions and solve problems related to the spatial dimension of human behaviour. / February 2008
|
46 |
Clusters Identification: Asymmetrical CaseMao, Qian January 2013 (has links)
Cluster analysis is one of the typical tasks in Data Mining, and it groups data objects based only on information found in the data that describes the objects and their relationships. The purpose of this thesis is to verify a modified K-means algorithm in asymmetrical cases, which can be regarded as an extension to the research of Vladislav Valkovsky and Mikael Karlsson in Department of Informatics and Media. In this thesis an experiment is designed and implemented to identify clusters with the modified algorithm in asymmetrical cases. In the experiment the developed Java application is based on knowledge established from previous research. The development procedures are also described and input parameters are mentioned along with the analysis. This experiment consists of several test suites, each of which simulates the situation existing in real world, and test results are displayed graphically. The findings mainly emphasize the limitations of the algorithm, and future work for digging more essences of the algorithm is also suggested.
|
47 |
Efficient Kernel Methods for Statistical DetectionSu, Wanhua 20 March 2008 (has links)
This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of
new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection.
We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step.
Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines.
One drawback of the existing LAGO is that it only
provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO
provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO.
|
48 |
Efficient Kernel Methods for Statistical DetectionSu, Wanhua 20 March 2008 (has links)
This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of
new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection.
We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step.
Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines.
One drawback of the existing LAGO is that it only
provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO
provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO.
|
49 |
Improving WiFi positioning through the use of successive in-sequence signal strength samplesHallström, Per, Dellrup, Per January 2006 (has links)
As portable computers and wireless networks are becoming ubiquitous, it is natural to consider the user’s position as yet another aspect to take into account when providing services that are tailored to meet the needs of the consumers. Location aware systems could guide persons through buildings, to a particular bookshelf in a library or assist in a vast variety of other applications that can benefit from knowing the user’s position. In indoor positioning systems, the most commonly used method for determining the location is to collect samples of the strength of the received signal from each base station that is audible at the client’s position and then pass the signal strength data on to a positioning server that has been previously fed with example signal strength data from a set of reference points where the position is known. From this set of reference points, the positioning server can interpolate the client’s current location by comparing the signal strength data it has collected with the signal strength data associated with every reference point. Our work proposes the use of multiple successive received signal strength samples in order to capture periodic signal strength variations that are the result of effects such as multi-path propagation, reflections and other types of radio interference. We believe that, by capturing these variations, it is possible to more easily identify a particular point; this is due to the fact that the signal strength fluctuations should be rather constant at every position, since they are the result of for example reflections on the fixed surfaces of the building’s interior. For the purpose of investigating our assumptions, we conducted measurements at a site at Växjö university, where we collected signal strength samples at known points. With the data collected, we performed two different experiments: one with a neural network and one where the k-nearest-neighbor method was used for position approximation. For each of the methods, we performed the same set of tests with single signal strength samples and with multiple successive signal strength samples, to evaluate their respective performances. We concluded that the k-nearest-neighbor method does not seem to benefit from multiple successive signal strength samples, at least not in our setup, compared to when using single signal strength samples. However, the neural network performed about 17% better when multiple successive signal strength samples were used.
|
50 |
A Hilbert Curve-Based Algorithm for Order-Sensitive Moving KNN QueriesFeng, Fei-Chung 11 July 2012 (has links)
¡@¡@Due to wireless communication technologies, positioning technologies, and mobile computing develop quickly, mobile services are becoming practical and important on big spatiotemporal databases management. Mobile service users move only inside a spatial space, e:g: a country. They often issue the K Nearest Neighbor (kNN) query to obtain data objects reachable through the spatial database. The challenge problem of mobile services is how to efficiently answer the data objects which users interest to the corresponding mobile users. One type of kNN query problems is the order-sensitive moving kNN (order-sensitive MkNN) query problem. In the order-sensitive MkNN query problem, the query point is dynamic and unpredictable, the kNN answers should be responded in real time and sorted by the distance in the ascending order. Therefore, how to respond the kNN answers effectively, incrementally and correctly is an important issue. Nutanong et al: have proposed the V*-kNN algorithm to process the order-sensitive MkNN query. The V*-kNN algorithm uses their the V*-diagram algorithm to generate the safe region. It also uses the Incremental Rank Updates algorithm (IRU) to handle the events while the query point passing the bisectors or the boundary of the safe region. However, the V*-kNN algorithm uses the BF-kNN algorithm to retrieve NNs, which is non-incremental. This makes the search time increase while the density of the object increases. Moreover, they do not consider the situation that there are multiple objects at the same order, and the situation that there are multiple events happen in a single step. These situations may cause that the kNN answers are incorrect. Therefore, in this thesis, we propose the Hilbert curve-based kNN algorithm (HC-kNN) algorithm to process the ordersensitive MkNN query. The HC-kNN algorithm can handle the situation that there are multiple events happen in a single step. We also propose new data structure of the kNN answers. Next, we propose the Intersection of Perpendicular Bisectors algorithm (IPB) in order to handle order update events of the kNN answers. The IPB algorithm handles the situation which there are multiple objects at the same order. Finally, based on the Hilbert curve index, we propose the ONHC-kNN algorithm to get NNs incrementally and to generate the safe region. The safe region will not be affected while the density of the object increases. The safe region of our algorithm is larger than that of the V*-kNN algorithm. From our simulation result, we show that the HC-kNN algorithm provides better performance than the V*-kNN algorithm.
|
Page generated in 0.0622 seconds