11 |
Geometric Computing over Uncertain DataZhang, Wuzhou January 2015 (has links)
<p>Entering the era of big data, human beings are faced with an unprecedented amount of geometric data today. Many computational challenges arise in processing the new deluge of geometric data. A critical one is data uncertainty: the data is inherently noisy and inaccuracy, and often lacks of completeness. The past few decades have witnessed the influence of geometric algorithms in various fields including GIS, spatial databases, and computer vision, etc. Yet most of the existing geometric algorithms are built on the assumption of the data being precise and are incapable of properly handling data in the presence of uncertainty. This thesis explores a few algorithmic challenges in what we call geometric computing over uncertain data.</p><p>We study the nearest-neighbor searching problem, which returns the nearest neighbor of a query point in a set of points, in a probabilistic framework. This thesis investigates two different nearest-neighbor formulations: expected nearest neighbor (ENN), where we consider the expected distance between each input point and a query point, and probabilistic nearest neighbor (PNN), where we estimate the probability of each input point being the nearest neighbor of a query point.</p><p>For the ENN problem, we consider a probabilistic framework in which the location of each input point and/or query point is specified as a probability density function and the goal is to return the point that minimizes the expected distance. We present methods for computing an exact ENN or an \\eps-approximate ENN, for a given error parameter 0 < \\eps < 1, under different distance functions. These methods build an index of near-linear size and answer ENN queries in polylogarithmic or sublinear time, depending on the underlying function. As far as we know, these are the first nontrivial methods for answering exact or \\eps-approximate ENN queries with provable performance guarantees. Moreover, we extend our results to answer exact or \\eps-approximate k-ENN queries. Notably, when only the query points are uncertain, we obtain state-of-the-art results for top-k aggregate (group) nearest-neighbor queries in the L1 metric using the weighted SUM operator.</p><p>For the PNN problem, we consider a probabilistic framework in which the location of each input point is specified as a probability distribution function. We present efficient algorithms for (i) computing all points that are nearest neighbors of a query point with nonzero probability; (ii) estimating, within a specified additive error, the probability of a point being the nearest neighbor of a query point; (iii) using it to return the point that maximizes the probability being the nearest neighbor, or all the points with probabilities greater than some threshold to be the nearest neighbor. We also present some experimental results to demonstrate the effectiveness of our approach.</p><p>We study the convex-hull problem, which asks for the smallest convex set that contains a given point set, in a probabilistic setting. In our framework, the uncertainty of each input point is described by a probability distribution over a finite number of possible locations including a null location to account for non-existence of the point. Our results include both exact and approximation algorithms for computing the probability of a query point lying inside the convex hull of the input, time-space tradeoffs for the membership queries, a connection between Tukey depth and membership queries, as well as a new notion of \\beta-hull that may be a useful representation of uncertain hulls.</p><p>We study contour trees of terrains, which encode the topological changes of the level set of the height value \\ell as we raise \\ell from -\\infty to +\\infty on the terrains, in a probabilistic setting. We consider a terrain that is defined by linearly interpolating each triangle of a triangulation. In our framework, the uncertainty lies in the height of each vertex in the triangulation, and we assume that it is described by a probability distribution. We first show that the probability of a vertex being a critical point, and the expected number of nodes (resp. edges) of the contour tree, can be computed exactly efficiently. Then we present efficient sampling-based methods for estimating, with high probability, (i) the probability that two points lie on an edge of the contour tree, within additive error; (ii) the expected distance of two points p, q and the probability that the distance of p, q is at least \\ell on the contour tree, within additive error and/or relative error, where the distance of p, q on a contour tree is defined to be the difference between the maximum height and the minimum height on the unique path from p to q on the contour tree.</p> / Dissertation
|
12 |
Distance to the Border in Spatial Point PatternsJoyner, Michele, Ross, Chelsea, Seier, Edith 01 November 2013 (has links)
The analysis of spatial point patterns is commonly focused on the distances to the nearest neighbor. The distance of organisms to the edge of the enclosure is also of interest in some biological studies performed in the laboratory. We define the B (border) function and derive its shape assuming complete spatial randomness (CSR) for square, rectangular, circular, and some three-dimensional arenas. The idea is then extended outside the laboratory setting to work with maps and points located in geographical regions. Commands in R ( R Core Team, 2012) to calculate and plot the empirical B̂ function are included. The B function, based on distances to the nearest edge, in addition to the G function, based on distances to the nearest neighbor, contributes to the understanding of the spatial distribution of the points.
|
13 |
Distance to the Border in Spatial Point PatternsJoyner, Michele, Ross, Chelsea, Seier, Edith 01 November 2013 (has links)
The analysis of spatial point patterns is commonly focused on the distances to the nearest neighbor. The distance of organisms to the edge of the enclosure is also of interest in some biological studies performed in the laboratory. We define the B (border) function and derive its shape assuming complete spatial randomness (CSR) for square, rectangular, circular, and some three-dimensional arenas. The idea is then extended outside the laboratory setting to work with maps and points located in geographical regions. Commands in R ( R Core Team, 2012) to calculate and plot the empirical B̂ function are included. The B function, based on distances to the nearest edge, in addition to the G function, based on distances to the nearest neighbor, contributes to the understanding of the spatial distribution of the points.
|
14 |
A GENE ONTOLOGY BASED COMPUTATIONAL APPROACH FOR THE PREDICTION OF PROTEIN FUNCTIONSKharsikar, Saket 13 September 2007 (has links)
No description available.
|
15 |
APPROXIMATE N-NEAREST NEIGHBOR CLUSTERING ON DISTRIBUTED DATABASES USING ITERATIVE REFINEMENTCALENDER, CHRISTOPHER R. 06 October 2004 (has links)
No description available.
|
16 |
Density Based Clustering using Mutual K-Nearest NeighborsDixit, Siddharth January 2015 (has links)
No description available.
|
17 |
Karst Database Implementation in Minnesota: Analysis of Sinkhole DistributionGao, Y., Alexander, E. C., Barnes, R. J. 01 May 2005 (has links)
This paper presents the overall sinkhole distributions and conducts hypothesis tests of sinkhole distributions and sinkhole formation using data stored in the Karst Feature Database (KFD) of Minnesota. Nearest neighbor analysis (NNA) was extended to include different orders of NNA, different scales of concentrated zones of sinkholes, and directions to the nearest sinkholes. The statistical results, along with the sinkhole density distribution, indicate that sinkholes tend to form in highly concentrated zones instead of scattered individuals. The pattern changes from clustered to random to regular as the scale of the analysis decreases from 10-100 km2 to 5-30 km 2 to 2-10 km2. Hypotheses that may explain this phenomenon are: (1) areas in the highly concentrated zones of sinkholes have similar geologic and topographical settings that favor sinkhole formation; (2) existing sinkholes change the hydraulic gradient in the surrounding area and increase the solution and erosional processes that eventually form more new sinkholes.
|
18 |
Advanced query processing on spatial networksYiu, Man-lung., 姚文龍. January 2006 (has links)
published_or_final_version / abstract / Computer Science / Doctoral / Doctor of Philosophy
|
19 |
On the Neutralome of Great Apes and Nearest Neighbor Search in Metric SpacesWoerner, August Eric, Woerner, August Eric January 2016 (has links)
Problems of population genetics are magnified by problems of big data. My dissertation spans the disciplines of computer science and population genetics, leveraging computational approaches to biological problems to address issues in genomics research. In this dissertation I develop more efficient metric search algorithms. I also show that vast majority of the genomes of great apes are impacted by the forces of natural selection. Finally, I introduce a heuristic to identify neutralomes—regions that are evolving with minimal selective pressures—and use these neutralomes for inferences on effective population size in great apes. We begin with a formal and far-reaching problem that impacts a broad array of disciplines including biology and computer science; the 𝑘-nearest neighbors problem in generalized metric spaces. The 𝑘-nearest neighbors (𝑘-NN) problem is deceptively simple. The problem is as follows: given a query q and dataset D of size 𝑛, find the 𝑘-closest points to q. This problem can be easily solved by algorithms that compute 𝑘th order statistics in O(𝑛) time and space. It follows that if D can be ordered, then it is perhaps possible to solve 𝑘-NN queries in sublinear time. While this is not possible for an arbitrary distance function on the points in D, I show that if the points are constrained by the triangle inequality (such as with metric spaces), then the dataset can be properly organized into a dispersion tree (Appendix A). Dispersion trees are a hierarchical data structure that is built around a large dispersed set of points. Dispersion trees have construction times that are sub-quadratic (O(𝑛¹·⁵ log 𝑛)) and use O(𝑛) space, and they use a provably optimal search strategy that minimizes the number of times the distance function is invoked. While all metric data structures have worst-case O(𝑛) search times, dispersion trees have average-case search times that are substantially faster than a large sampling of comparable data structures in the vast majority of spaces sampled. Exceptions to this include extremely high dimensional space (d>20) which devolve into near-linear scans of the dataset, and unstructured low-dimensional (d<6) Euclidean spaces. Dispersion trees have empirical search times that appear to scale as O(𝑛ᶜ) for 0<c<1. As solutions to the 𝑘-NN problem are in general too slow to be used effectively in the arena of big data in genomics, it is my hope that dispersion trees may help lift this barrier. With source-code that is freely available for academic use, dispersion trees may be useful for nearest neighbor classification problems in machine learning, fast read-mapping against a reference genome, and as a general computational tool for problems such clustering. Next, I turn to problems in population genomics. Genomic patterns of diversity are a complex function of the interplay between demographics, natural selection and mechanistic forces. A central tenet of population genetics is the neutral theory of molecular evolution which states the vast majority of changes at the molecular level are (relatively) selectively neutral; that is, they do not effect fitness. A corollary of the neutral theory is that the frequency of most alleles in populations are dictated by neutral processes and not selective processes. The forces of natural selection impact not just the site of selection, but linked neutral sites as well. I proposed an empirical assessment of the extents of linked selection in the human genome (Appendix B). Recombination decouples sites of selection from the genomic background, thus it serves to mitigate the effects of linked selection. I use two metrics on recombination, both the minimum genetic distance to genes and local rates of recombination, to parse the effects of linked selection into selection from genic and nongenic sources in the human genome. My empirical assessment shows profound linked selective effects from nongenic sources, with these effects being greater than that of genic sources on the autosomes, as well as generally greater effects on the X chromosome than on the autosomes. I quantify these trends using multiple linear regression, and then I model the effects of linked selection to conserved elements across the whole of the genome. Places predicted to be neutral by my model do not, unlike the vast majority of the genome, show these linked selective effects. This demonstrates that linkage to these regulatory elements, and not some other mechanistic force, accounts for our findings. Further, neutrally evolving regions are extremely rare (~1%) in the genome, and despite generally larger linked selective effects on the X chromosome, the size of this “neutralome” is proportionally larger on the X chromosome than on the autosomes. To account for this and to extend my findings to other great apes I improve on my procedure to find neutralomes, and apply this procedure to the genome of humans, Nigerian chimpanzees, bonobos, and western lowland gorillas (Appendix C). In doing so I show that like humans, these other apes are also enormously impacted by linked selection, with their neutralomes being substantially smaller than the neutralomes of humans. I then use my genomic predictions on neutrality to see how the landscape of linked selection changes across the X chromosome and the autosomes in regions close to, and far from, genes. While I had previously demonstrated the linked selective forces near genes are stronger on the X chromosome than on the autosomes in these taxa, I show that regions far from genes show the opposite; regions far from genes show more selection from noncoding targets on the autosomes than on the X chromosome. This finding is replicated across our great ape samples. Further, inferences on the relative effective population size of the X chromosome and the autosomes both near and far from genes can be biased as a result.
|
20 |
Comparing node-sorting algorithms for multi-goal pathfinding with obstaclesÅleskog, Christoffer, Ljungberg Fayyazuddin, Salomon January 2019 (has links)
Background. Pathfinding plays a big role in both digital games and robotics, and is used in many different ways. One of them is multi-goal pathfinding (MGPF) which is used to calculate paths from a start position to a destination with the condition that the resulting path goes though a series of goals on the way to the destination. For the most part research on this topic is sparse, and when the complexity is increased through obstacles that are introduced to the scenario, there are only a few articles in the field that relate to the problem.Objectives. The objective in this thesis is to conduct an experiment to compare four algorithms for solving the MGPF problem on six different maps with obstacles, and then analyze and draw conclusions on which of the algorithms is best suited to use for the MGPF problem. The first is the traditional Nearest Neighbor algorithm, the second is a variation on the Greedy Search algorithm, and the third and fourth are variations on the Nearest Neighbor algorithm. Methods. To reach the Objectives all the four algorithms are tested fifty times on six different maps of varying sizes and obstacle layout. Results. The data from the experiment is compiled in graphs for all the different maps, with the time to calculate a path and the path lengths as the metrics. The averages of all the metrics are put in tables to visualize the difference between the results for the four algorithms.Conclusions. The conclusions were that the dynamic version of the Nearest Neighbor algorithm has the best result if both the metrics are taken into account. Otherwise the common Nearest Neighbor algorithm gives the best results in respect to the time taken to calculate the paths and the Greedy Search algorithm creates the shortest paths of all the tested algorithms.
|
Page generated in 0.0594 seconds