Global ETD Search

71	Clusters Identification: Asymmetrical Case Mao, Qian January 2013 (has links) Cluster analysis is one of the typical tasks in Data Mining, and it groups data objects based only on information found in the data that describes the objects and their relationships. The purpose of this thesis is to verify a modified K-means algorithm in asymmetrical cases, which can be regarded as an extension to the research of Vladislav Valkovsky and Mikael Karlsson in Department of Informatics and Media. In this thesis an experiment is designed and implemented to identify clusters with the modified algorithm in asymmetrical cases. In the experiment the developed Java application is based on knowledge established from previous research. The development procedures are also described and input parameters are mentioned along with the analysis. This experiment consists of several test suites, each of which simulates the situation existing in real world, and test results are displayed graphically. The findings mainly emphasize the limitations of the algorithm, and future work for digging more essences of the algorithm is also suggested. Modified K-means algorithm Nearest neighbor clustering Kolmogorov-Smirnov-test Hypothesis testing
72	Efficient Kernel Methods for Statistical Detection Su, Wanhua 20 March 2008 (has links) This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection. We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step. Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines. One drawback of the existing LAGO is that it only provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO. statistical detection Bayesian inference LAGO Laplace approximation support vector machines k-nearest neighbor Statistics (Biostatistics)
73	Efficient Kernel Methods for Statistical Detection Su, Wanhua 20 March 2008 (has links) This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection. We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step. Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines. One drawback of the existing LAGO is that it only provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO. statistical detection Bayesian inference LAGO Laplace approximation support vector machines k-nearest neighbor Statistics (Biostatistics)
74	Improving WiFi positioning through the use of successive in-sequence signal strength samples Hallström, Per, Dellrup, Per January 2006 (has links) As portable computers and wireless networks are becoming ubiquitous, it is natural to consider the user’s position as yet another aspect to take into account when providing services that are tailored to meet the needs of the consumers. Location aware systems could guide persons through buildings, to a particular bookshelf in a library or assist in a vast variety of other applications that can benefit from knowing the user’s position. In indoor positioning systems, the most commonly used method for determining the location is to collect samples of the strength of the received signal from each base station that is audible at the client’s position and then pass the signal strength data on to a positioning server that has been previously fed with example signal strength data from a set of reference points where the position is known. From this set of reference points, the positioning server can interpolate the client’s current location by comparing the signal strength data it has collected with the signal strength data associated with every reference point. Our work proposes the use of multiple successive received signal strength samples in order to capture periodic signal strength variations that are the result of effects such as multi-path propagation, reflections and other types of radio interference. We believe that, by capturing these variations, it is possible to more easily identify a particular point; this is due to the fact that the signal strength fluctuations should be rather constant at every position, since they are the result of for example reflections on the fixed surfaces of the building’s interior. For the purpose of investigating our assumptions, we conducted measurements at a site at Växjö university, where we collected signal strength samples at known points. With the data collected, we performed two different experiments: one with a neural network and one where the k-nearest-neighbor method was used for position approximation. For each of the methods, we performed the same set of tests with single signal strength samples and with multiple successive signal strength samples, to evaluate their respective performances. We concluded that the k-nearest-neighbor method does not seem to benefit from multiple successive signal strength samples, at least not in our setup, compared to when using single signal strength samples. However, the neural network performed about 17% better when multiple successive signal strength samples were used. k-nearest-neighbor neural network positioning signal strength WiFi wireless networks
75	The Study of Sino-American Relations in Northeast Asia : Conflict and Cooperation Huang, Shu-fen 29 November 2011 (has links) The approach of this study is balance of power theory, providing an analysis to explain how Sino-American Relations in that region is full of conflicts and cooperation. In the study, there are two major aspects for further analysis, one is the regional security, the other is regional economy. After the end of Cold War Era, the power structure in Northeast Asia has faced adjustment. The power of China has raised and profoundly influenced on the aspect of economic and security of the region. United States, out of its consideration of national interests, adopts ¡§balancing¡¨ strategies to confront any possible threat. In general, conflicts may break out between the two countries, but, there are some possibilities for two sides to cooperate, for example, the denuclearization of Korea Peninsula and development of clean Energy promote China and U.S to collaborate with each other. The results of this study provide information for the Taiwanese Government to further develop national security and economic strategies. Denuclearization Energy development good-neighbor policy Balance of power Free Trade Zone
76	A Hilbert Curve-Based Algorithm for Order-Sensitive Moving KNN Queries Feng, Fei-Chung 11 July 2012 (has links) ¡@¡@Due to wireless communication technologies, positioning technologies, and mobile computing develop quickly, mobile services are becoming practical and important on big spatiotemporal databases management. Mobile service users move only inside a spatial space, e:g: a country. They often issue the K Nearest Neighbor (kNN) query to obtain data objects reachable through the spatial database. The challenge problem of mobile services is how to efficiently answer the data objects which users interest to the corresponding mobile users. One type of kNN query problems is the order-sensitive moving kNN (order-sensitive MkNN) query problem. In the order-sensitive MkNN query problem, the query point is dynamic and unpredictable, the kNN answers should be responded in real time and sorted by the distance in the ascending order. Therefore, how to respond the kNN answers effectively, incrementally and correctly is an important issue. Nutanong et al: have proposed the V-kNN algorithm to process the order-sensitive MkNN query. The V-kNN algorithm uses their the V-diagram algorithm to generate the safe region. It also uses the Incremental Rank Updates algorithm (IRU) to handle the events while the query point passing the bisectors or the boundary of the safe region. However, the V-kNN algorithm uses the BF-kNN algorithm to retrieve NNs, which is non-incremental. This makes the search time increase while the density of the object increases. Moreover, they do not consider the situation that there are multiple objects at the same order, and the situation that there are multiple events happen in a single step. These situations may cause that the kNN answers are incorrect. Therefore, in this thesis, we propose the Hilbert curve-based kNN algorithm (HC-kNN) algorithm to process the ordersensitive MkNN query. The HC-kNN algorithm can handle the situation that there are multiple events happen in a single step. We also propose new data structure of the kNN answers. Next, we propose the Intersection of Perpendicular Bisectors algorithm (IPB) in order to handle order update events of the kNN answers. The IPB algorithm handles the situation which there are multiple objects at the same order. Finally, based on the Hilbert curve index, we propose the ONHC-kNN algorithm to get NNs incrementally and to generate the safe region. The safe region will not be affected while the density of the object increases. The safe region of our algorithm is larger than that of the V-kNN algorithm. From our simulation result, we show that the HC-kNN algorithm provides better performance than the V-kNN algorithm. Spatial database Real-Time Systems Hilbert curve K nearest neighbor Mobile service
77	Efficient case-based reasoning through feature weighting, and its application in protein crystallography Gopal, Kreshna 02 June 2009 (has links) Data preprocessing is critical for machine learning, data mining, and pattern recognition. In particular, selecting relevant and non-redundant features in highdimensional data is important to efficiently construct models that accurately describe the data. In this work, I present SLIDER, an algorithm that weights features to reflect relevance in determining similarity between instances. Accurate weighting of features improves the similarity measure, which is useful in learning algorithms like nearest neighbor and case-based reasoning. SLIDER performs a greedy search for optimum weights in an exponentially large space of weight vectors. Exhaustive search being intractable, the algorithm reduces the search space by focusing on pivotal weights at which representative instances are equidistant to truly similar and different instances in Euclidean space. SLIDER then evaluates those weights heuristically, based on effectiveness in properly ranking pre-determined matches of a set of cases, relative to mismatches. I analytically show that by choosing feature weights that minimize the mean rank of matches relative to mismatches, the separation between the distributions of Euclidean distances for matches and mismatches is increased. This leads to a better distance metric, and consequently increases the probability of retrieving true matches from a database. I also discuss how SLIDER is used to improve the efficiency and effectiveness of case retrieval in a case-based reasoning system that automatically interprets electron density maps to determine the three-dimensional structures of proteins. Electron density patterns for regions in a protein are represented by numerical features, which are used in a distance metric to efficiently retrieve matching patterns by searching a large database. These pre-selected cases are then evaluated by more expensive methods to identify truly good matches – this strategy speeds up the retrieval of matching density regions, thereby enabling fast and accurate protein model-building. This two-phase case retrieval approach is potentially useful in many case-based reasoning systems, especially those with computationally expensive case matching and large case libraries. Case-Based Reasoning Nearest Neighbor Learning Feature Selection Feature Weighting Protein Crystallography
78	The Incremental Benefits of the Nearest Neighbor Forecast of U.S. Energy Commodity Prices Kudoyan, Olga 2010 December 1900 (has links) This thesis compares the simple Autoregressive (AR) model against the k- Nearest Neighbor (k-NN) model to make a point forecast of five energy commodity prices. Those commodities are natural gas, heating oil, gasoline, ethanol, and crude oil. The data for the commodities are monthly and, for each commodity, two-thirds of the data are used for an in-sample forecast, and the remaining one-third of the data are used to perform an out-of-sample forecast. Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are used to compare the two forecasts. The results showed that one method is superior by one measure but inferior by another. Although the differences of the two models are minimal, it is up to a decision maker as to which model to choose. The Diebold-Mariano (DM) test was performed to test the relative accuracy of the models. For all five commodities, the results failed to reject the null hypothesis indicating that both models are equally accurate. Forecast k-Nearest Neighbor Regression Autoregression one-step-ahead forecast two-step-ahead forecast
79	Ammunition Transfer System Optimization Problem Gunsel, H. Sinem 01 March 2012 (has links) (PDF) Ammunition Transfer System (ATS) is the electro-mechanical system of the Ammunition Resupply Vehicle (ARV) which will be used to meet T-155 mm Firtina howitzers&rsquo / ammunition demand for tactical requirements of higher firing rate by off-road mobility and survivability. The transfer of ammunitions from ARV to Firtina is to be optimized for an effective improvement of firing rate. In this thesis the transferring order of carried ammunitions is being optimized to minimize the total ammunition transferring time. This transfer problem is modeled as a modification of Travelling Salesman Problem (TSP). The given locations of the ammunitions are treated as cities to be visited and the gripper of ATS is treated as the traveling salesman. By GAMS / the small-size problems are solved optimally but large-size ones get only local optimum. A heuristic algorithm that contains nearest neighbor heuristics as construction method and 2-opt exchange heuristic as improvement method is developed to obtain same or better solutions obtained by GAMS with less computational time. Q Special Topics 172
80	A Local Expansion Approach for Continuous Nearest Neighbor Queries Liu, Ta-Wei 16 June 2008 (has links) Queries on spatial data commonly concern a certain range or area, for example, queries related to intersections, containment and nearest neighbors. The Continuous Nearest Neighbor (CNN) query is one kind of the nearest neighbor queries. For example, people may want to know where those gas stations are along the super highway from the starting position to the ending position. Due to that there is no total ordering of spatial proximity among spatial objects, the space filling curve (SFC) approach has proposed to preserve the spatial locality. Chen and Chang have proposed efficient algorithms based on SFC to answer nearest neighbor queries, so we may perform a sequence of individually nearest neighbor queries to answer such a CNN query in the centralized system by one of Chen and Chang's algorithms. However, each searched range of these nearest neighbor queries could be overlapped, and these queries may access several same pages on the disk, resulting in many redundant disk accesses. On the other hand, Zheng et al. have proposed an algorithm based on the Hilbert curve for the CNN query for the wireless broadcast environment, and it contains two phases. In the first phase, Zheng et al.'s algorithm designs a searched range to find candidate objects. In the second phase, it uses some heuristics to filter the candidate objects for the final answer. However, Zheng et al.'s algorithm may check some data blocks twice or some useless data blocks, resulting in some redundant disk accesses. Therefore, in this thesis, to avoid these disadvantages in the first phase of Zheng et al.'s algorithm, we propose a local expansion approach based on the Peano curve for the CNN query in the centralized system. In the first phase, we determine the searched range to obtain all candidate objects. Basically, we first calculate the route between the starting point and the ending point. Then, we move forward one block from the starting point to the ending point, and locally spread the searched range to find the candidate objects. In the second phase, we use heuristics mentioned in Zheng et al.'s algorithm to filter the candidate objects for the final answer. Based on such an approach, we proposed two algorithms: the forward moving (FM) algorithm and the forward moving* (FM) algorithm. The FM algorithm assumes that each object is in the center of a block, and the FM algorithm assumes that each object could be in any place of a block. Our local expansion approach can avoid the duplicated check in Zheng et al.'s algorithm, and determine a searched range with higher accuracy than that of Zhenget al.'s algorithm. From our simulation results, we show that the performance of the FM or FM* algorithm is better than that of Zheng et al.'s algorithm, in terms of the accuracy and the processing time. Continuous Nearest Neighbor Space Filling Curve Spatial Database Spatial Locality Point Data

Search results