Spelling suggestions: "subject:"nearest neighborhood"" "subject:"dearest neighborhood""
51 |
Efficient case-based reasoning through feature weighting, and its application in protein crystallographyGopal, Kreshna 02 June 2009 (has links)
Data preprocessing is critical for machine learning, data mining, and pattern
recognition. In particular, selecting relevant and non-redundant features in highdimensional
data is important to efficiently construct models that accurately describe the
data. In this work, I present SLIDER, an algorithm that weights features to reflect
relevance in determining similarity between instances. Accurate weighting of features
improves the similarity measure, which is useful in learning algorithms like nearest
neighbor and case-based reasoning. SLIDER performs a greedy search for optimum
weights in an exponentially large space of weight vectors. Exhaustive search being
intractable, the algorithm reduces the search space by focusing on pivotal weights at
which representative instances are equidistant to truly similar and different instances in
Euclidean space. SLIDER then evaluates those weights heuristically, based on
effectiveness in properly ranking pre-determined matches of a set of cases, relative to
mismatches.
I analytically show that by choosing feature weights that minimize the mean rank of
matches relative to mismatches, the separation between the distributions of Euclidean
distances for matches and mismatches is increased. This leads to a better distance metric,
and consequently increases the probability of retrieving true matches from a database. I
also discuss how SLIDER is used to improve the efficiency and effectiveness of case
retrieval in a case-based reasoning system that automatically interprets electron density
maps to determine the three-dimensional structures of proteins. Electron density patterns
for regions in a protein are represented by numerical features, which are used in a distance metric to efficiently retrieve matching patterns by searching a large database.
These pre-selected cases are then evaluated by more expensive methods to identify truly
good matches – this strategy speeds up the retrieval of matching density regions, thereby
enabling fast and accurate protein model-building. This two-phase case retrieval
approach is potentially useful in many case-based reasoning systems, especially those
with computationally expensive case matching and large case libraries.
|
52 |
The Incremental Benefits of the Nearest Neighbor Forecast of U.S. Energy Commodity PricesKudoyan, Olga 2010 December 1900 (has links)
This thesis compares the simple Autoregressive (AR) model against the k-
Nearest Neighbor (k-NN) model to make a point forecast of five energy commodity
prices. Those commodities are natural gas, heating oil, gasoline, ethanol, and crude oil.
The data for the commodities are monthly and, for each commodity, two-thirds of the
data are used for an in-sample forecast, and the remaining one-third of the data are used
to perform an out-of-sample forecast. Mean Absolute Error (MAE) and Root Mean
Squared Error (RMSE) are used to compare the two forecasts. The results showed that
one method is superior by one measure but inferior by another. Although the differences
of the two models are minimal, it is up to a decision maker as to which model to choose.
The Diebold-Mariano (DM) test was performed to test the relative accuracy of
the models. For all five commodities, the results failed to reject the null hypothesis
indicating that both models are equally accurate.
|
53 |
Ammunition Transfer System Optimization ProblemGunsel, H. Sinem 01 March 2012 (has links) (PDF)
Ammunition Transfer System (ATS) is the electro-mechanical system of the Ammunition Resupply Vehicle (ARV) which will be used to meet T-155 mm Firtina howitzers&rsquo / ammunition demand for tactical requirements of higher firing rate by off-road mobility and survivability. The transfer of ammunitions from ARV to Firtina is to be optimized for an effective improvement of firing rate.
In this thesis the transferring order of carried ammunitions is being optimized to minimize the total ammunition transferring time. This transfer problem is modeled as a modification of Travelling Salesman Problem (TSP). The given locations of the ammunitions are treated as cities to be visited and the gripper of ATS is treated as the traveling salesman. By GAMS / the small-size problems are solved optimally but large-size ones get only local optimum. A heuristic algorithm that contains nearest neighbor heuristics as construction method and 2-opt exchange heuristic as improvement method is developed to obtain same or better solutions obtained by GAMS with less computational time.
|
54 |
A Local Expansion Approach for Continuous Nearest Neighbor QueriesLiu, Ta-Wei 16 June 2008 (has links)
Queries on spatial data commonly concern a certain range or area, for example, queries related to intersections, containment and nearest neighbors. The Continuous Nearest Neighbor (CNN) query is one kind of the nearest neighbor queries. For example, people may want to know where those gas stations are along the super highway from the starting position to the ending position. Due to that there is no total ordering of spatial proximity among spatial objects, the space filling curve (SFC) approach has proposed to preserve the spatial locality. Chen and Chang have proposed efficient algorithms based on SFC to answer nearest neighbor queries, so we may perform a sequence of individually nearest neighbor queries to answer such a CNN query in the centralized system by one of Chen and Chang's algorithms. However, each searched range of these nearest neighbor queries could be overlapped, and these queries may access several same pages on the disk, resulting in many redundant disk accesses. On the other hand, Zheng et al. have proposed an algorithm based on the Hilbert curve for the CNN query for the wireless broadcast environment, and it contains two phases. In the first phase, Zheng et al.'s algorithm designs a searched range to find candidate objects. In the second phase, it uses some heuristics to filter the candidate objects for the final answer. However, Zheng et al.'s algorithm may check some data blocks twice or some useless data blocks, resulting in some redundant disk accesses. Therefore, in this thesis, to avoid these disadvantages in the first phase of Zheng et al.'s algorithm, we propose a local expansion approach based on the Peano curve for the CNN query in the centralized system. In the first phase, we determine the searched range to obtain all candidate objects. Basically, we first calculate the route between the starting point and the ending point. Then, we move forward one block from the starting point to the ending point, and locally spread the searched range to find the candidate objects. In the second phase, we use heuristics mentioned in Zheng et al.'s algorithm to filter the candidate objects for the final answer. Based on such an approach, we proposed two algorithms: the forward moving (FM) algorithm and the forward moving* (FM*) algorithm. The FM algorithm assumes that each object is in the center of a block, and the FM* algorithm assumes that each object could be in any place of a block. Our local expansion approach can avoid the duplicated check in Zheng et al.'s algorithm, and determine a searched range with higher accuracy than that of Zhenget al.'s algorithm. From our simulation results, we show that the performance of the FM or FM* algorithm is better than that of Zheng et al.'s algorithm, in terms of the accuracy and the processing time.
|
55 |
AKDB-Tree: An Adjustable KDB-tree for Efficiently Supporting Nearest Neighbor Queries in P2P SystemsLiu, Hung-ze 06 July 2008 (has links)
In the future, more data intensive applications, such as P2P auction networks, P2P job--search networks, P2P multi--player games, will require the capability to respond to more complex queries such as the nearest neighbor queries involving numerous data types. For the problem of answering nearest neighbor queries (NN query) for spatial region data in the P2P environment, a quadtree-based structure probably is a good choice. However, the quadtree stores the data in the leaf nodes, resulting in the load unbalance and expensive cost of any query. The MX--CIF quadtree can solve this problem. The MX--CIF quadtree has three properties: controlling efficiently the height of the tree, reducing load unbalance, and reducing the NNquery scope with controlling the value of the radius. Although the P2P MX--CIF quadtree can do the NN query efficiently, it still has some problems as follows: low accuracy of the nearest neighbor query, the expensive cost of the tree construction, the high search cost of the NN query, and load unbalance. In fact, the index structures for the region data can also work for the point data which can be considered as the degenerated case of the region data. Therefore, the KDB--tree which is a well-known algorithm for the point data can be used to reduce load unbalance, but it has the same problem as the quadtree. The data is stored only in the leaf nodes of the KDB--tree. In this thesis, we propose an Adjustable KDB--tree (AKDB--tree) to improve this situation for the P2P system. The AKDB--tree has five properties: reducing load unbalance, low cost of the tree construction, storing the data in the internal nodes and leaf nodes, high accuracy and low search cost of the NN query. The Chord system is a well--known structured P2P system in which the data search is performed by a hash function, instead of flooding used in most of the unstructured P2P system. Since the Chord system is a hash approach, it is easy to deal with peers joining/exiting. Besides, in order to combine AKDB--tree with the Chord system, we design the IDs of the nodes in the AKDB--tree. Each node is hashed to the Chord system by the ID. The IDs can be used to differentiate the edge node in the AKDB-tree is a vertical edge or a horizontal edge and the relative position of two nodes in the 2D space. And, we can calculate the related edge of a region in the 2D space according to the ID of the region. As discussed above, we make use of the property of IDs to reduce the search cost of the NN query by a wide margin. In our simulation study, we compare our method with the P2P MX--CIF quadtree by considering five performance measures under four different situations of the P2P MX--CIF quadtree. From our simulation results, for the NN query, our AKDB-tree can provide the higher accuracy and lower search cost than the P2P MX--CIF quadtree. For the problem of load, our AKDB-tree is more balance than the P2P MX--CIF quadtree. For the time of the tree construction, our AKDB-tree needs shorter time than the P2P MX--CIF quadtree.
|
56 |
Improving WiFi positioning through the use of successive in-sequence signal strength samplesHallström, Per, Dellrup, Per January 2006 (has links)
<p>As portable computers and wireless networks are becoming ubiquitous, it is natural to consider the user’s position as yet another aspect to take into account when providing services that are tailored to meet the needs of the consumers. Location aware systems could guide persons through buildings, to a particular bookshelf in a library or assist in a vast variety of other applications that can benefit from knowing the user’s position.</p><p>In indoor positioning systems, the most commonly used method for determining the location is to collect samples of the strength of the received signal from each base station that is audible at the client’s position and then pass the signal strength data on to a positioning server that has been previously fed with example signal strength data from a set of reference points where the position is known. From this set of reference points, the positioning server can interpolate the client’s current location by comparing the signal strength data it has collected with the signal strength data associated with every reference point.</p><p>Our work proposes the use of multiple successive received signal strength samples in order to capture periodic signal strength variations that are the result of effects such as multi-path propagation, reflections and other types of radio interference. We believe that, by capturing these variations, it is possible to more easily identify a particular point; this is due to the fact that the signal strength fluctuations should be rather constant at every position, since they are the result of for example reflections on the fixed surfaces of the building’s interior.</p><p>For the purpose of investigating our assumptions, we conducted measurements at a site at Växjö university, where we collected signal strength samples at known points. With the data collected, we performed two different experiments: one with a neural network and one where the k-nearest-neighbor method was used for position approximation. For each of the methods, we performed the same set of tests with single signal strength samples and with multiple successive signal strength samples, to evaluate their respective performances.</p><p>We concluded that the k-nearest-neighbor method does not seem to benefit from multiple successive signal strength samples, at least not in our setup, compared to when using single signal strength samples. However, the neural network performed about 17% better when multiple successive signal strength samples were used.</p>
|
57 |
Time series discrimination, signal comparison testing, and model selection in the state-space framework /Bengtsson, Thomas January 2000 (has links)
Thesis (Ph. D.)--University of Missouri-Columbia, 2000. / Typescript. Vita. Includes bibliographical references (leaf 104). Also available on the Internet.
|
58 |
Time series discrimination, signal comparison testing, and model selection in the state-space frameworkBengtsson, Thomas January 2000 (has links)
Thesis (Ph. D.)--University of Missouri-Columbia, 2000. / Typescript. Vita. Includes bibliographical references (leaf 104). Also available on the Internet.
|
59 |
Nearest neighbor queries in spatial and spatio-temporal databases /Zhang, Jun. January 2003 (has links)
Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 125-131). Also available in electronic version. Access restricted to campus users.
|
60 |
Small Scale Distribution of the Sand Dollars Mellita tenuis and Encope spp. (Echinodermata)Swigart, James P. 01 January 2006 (has links)
Small scale distributions of Mellita tenuis and Encope spp. were quantified at Fort De Soto Park on Mullet Key, off Egmont Key and off Captiva Island, Florida during 2005. Off Captiva Island, Encope spp. were aggregated in 33.3% of plots in March. Off Egmont Key, M. tenuis were aggregated in 100% of plots in March but in no plots in September. At Fort De Soto Park, M. tenuis were aggregated in 37.5% of plots in May 12.5% in July and 50.0% in September. Sand dollars in 6.3% of the plots in September at Fort De Soto had a uniform distribution. Individuals in all other plots at all sites had random distributions. At Fort De Soto, each plot was revisited a few hours after the initial observation; 37.5% of plots had a different distribution at the second observation.
Percent organic content of the smallest sediment grains (<105 μm) was not correlated with sand dollar distribution, except off Egmont Key. There was a significant negative correlation between nearest neighbor index and percent organic content. Mellita tenuis do aggregate on occasion. The cause of aggregation is not known. If localized differences in percent organic content of the sediment influence distribution, then homogeneity in the percent organic content of the sediment, as found in the majority of plots, would suggest random distribution of sand dollars.
|
Page generated in 0.037 seconds