• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 202
  • 33
  • 32
  • 27
  • 10
  • 6
  • 5
  • 4
  • 4
  • 4
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 378
  • 199
  • 187
  • 100
  • 94
  • 91
  • 80
  • 76
  • 75
  • 68
  • 66
  • 57
  • 57
  • 56
  • 53
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Identifying Calcium-Binding Sites and Predicting Disulfide Connectivity

Deng, Hai 06 August 2007 (has links)
Most questions in proteomics require complex answers. Yet graph theory, supervised learning, and statistical model have decomposed complex questions into simple questions with simple answers. The expertise in the field of protein study often address tasks that demand answers as complex as the questions. Such complex answers may consist of multiple factors that must be weighed against each other to arrive at a globally satisfactory and consistent solution to the question. In the prediction of calcium binding in proteins, we construct a global oxygen contact graph of a protein, then apply a graph algorithm to find oxygen clusters with the fixed size of four, finally employ a geometry algorithm to judge if the oxygen clusters are calcium-binding sites or not. Additionally, we can predict the locations of those sites. Furthermore, we construct a global oxygen contact graph including oxygen-bonded carbon atoms of a protein, then apply a graph algorithm to find local biggest oxygen clusters, finally design another geometric filter to exclude the non-calcium binding oxygen clusters. In addition, we apply observed chemical properties as a chemical filter to recognize some non-calcium binding oxygen clusters. In order to explore the characteristics of calcium-binding sites in proteins, we conduct a statistic survey on four datasets derived from 1994 to 2005 about the geometric parameters and chemical properties of calcium-binding sites. In the prediction of disulfide bond connectivity, we analyze protein sequences to predict the folding of proteins relative to the cystines using nearest neighboring methods. we extend a new pattern-wise method to all available template proteins, and find global pattern of pairing cysteines with a new descriptor of cysteine separation profile on protein secondary structure.
72

Defining activity areas in the Early Neolithic site at Foeni-Salaş (southwest Romania): A spatial analytic approach with geographical information systems in archaeology

Lawson, Kathryn Sahara 20 September 2007 (has links)
Through the years, there has been a great deal of archaeological research focused on the earliest farming cultures of Europe (i.e. Early Neolithic). However, little effort has been expended to uncover the type and nature of daily activities performed within Early Neolithic dwellings, particularly in the Balkans. This thesis conducts a spatial analysis of the Early Neolithic pit house levels of the Foeni-Salaş site in southeast Romania, in the northern half of the Balkans, to determine the kinds and locations of activities that occurred in these pit houses. Characteristic Early Neolithic dwellings in the northern Balkans are pit houses. The data are analyzed using Geographic Information Systems (GIS) technology in an attempt to identify non-random patterns that will indicate how the pit house inhabitants used their space. Both visual and statistical (Nearest Neighbor) techniques are used to identify spatial patterns. Spreadsheet data are incorporated into the map database in order to compare and contrast the results from the two techniques of analysis. Map data provides precise artefact locations, while spreadsheet data yield more generalized quad centroid information. Unlike the mapped data, the spreadsheet data also included artefacts recovered in sieves. Utilizing both data types gave a more complexand fuller understanding of how space was used at Foeni-Salaş. The results show that different types of activity areas are present within each of the pit houses. Comparison of interior to exterior artifact distributions demonstrates that most activities take place within pit house. Some of the activities present include weaving, food preparation, butchering, hide processing, pottery making, ritual, and other activities related to the running of households. It was found that these activities are placed in specific locations relative to features within the pit house and the physical structure of the pit house itself. This research adds to the growing body of archaeological research that implements GIS to answer questions and solve problems related to the spatial dimension of human behaviour. / February 2008
73

Clusters Identification: Asymmetrical Case

Mao, Qian January 2013 (has links)
Cluster analysis is one of the typical tasks in Data Mining, and it groups data objects based only on information found in the data that describes the objects and their relationships. The purpose of this thesis is to verify a modified K-means algorithm in asymmetrical cases, which can be regarded as an extension to the research of Vladislav Valkovsky and Mikael Karlsson in Department of Informatics and Media. In this thesis an experiment is designed and implemented to identify clusters with the modified algorithm in asymmetrical cases. In the experiment the developed Java application is based on knowledge established from previous research. The development procedures are also described and input parameters are mentioned along with the analysis. This experiment consists of several test suites, each of which simulates the situation existing in real world, and test results are displayed graphically. The findings mainly emphasize the limitations of the algorithm, and future work for digging more essences of the algorithm is also suggested.
74

Efficient Kernel Methods for Statistical Detection

Su, Wanhua 20 March 2008 (has links)
This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection. We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step. Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines. One drawback of the existing LAGO is that it only provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO.
75

Efficient Kernel Methods for Statistical Detection

Su, Wanhua 20 March 2008 (has links)
This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection. We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step. Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines. One drawback of the existing LAGO is that it only provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO.
76

Improving WiFi positioning through the use of successive in-sequence signal strength samples

Hallström, Per, Dellrup, Per January 2006 (has links)
As portable computers and wireless networks are becoming ubiquitous, it is natural to consider the user’s position as yet another aspect to take into account when providing services that are tailored to meet the needs of the consumers. Location aware systems could guide persons through buildings, to a particular bookshelf in a library or assist in a vast variety of other applications that can benefit from knowing the user’s position. In indoor positioning systems, the most commonly used method for determining the location is to collect samples of the strength of the received signal from each base station that is audible at the client’s position and then pass the signal strength data on to a positioning server that has been previously fed with example signal strength data from a set of reference points where the position is known. From this set of reference points, the positioning server can interpolate the client’s current location by comparing the signal strength data it has collected with the signal strength data associated with every reference point. Our work proposes the use of multiple successive received signal strength samples in order to capture periodic signal strength variations that are the result of effects such as multi-path propagation, reflections and other types of radio interference. We believe that, by capturing these variations, it is possible to more easily identify a particular point; this is due to the fact that the signal strength fluctuations should be rather constant at every position, since they are the result of for example reflections on the fixed surfaces of the building’s interior. For the purpose of investigating our assumptions, we conducted measurements at a site at Växjö university, where we collected signal strength samples at known points. With the data collected, we performed two different experiments: one with a neural network and one where the k-nearest-neighbor method was used for position approximation. For each of the methods, we performed the same set of tests with single signal strength samples and with multiple successive signal strength samples, to evaluate their respective performances. We concluded that the k-nearest-neighbor method does not seem to benefit from multiple successive signal strength samples, at least not in our setup, compared to when using single signal strength samples. However, the neural network performed about 17% better when multiple successive signal strength samples were used.
77

A Hilbert Curve-Based Algorithm for Order-Sensitive Moving KNN Queries

Feng, Fei-Chung 11 July 2012 (has links)
¡@¡@Due to wireless communication technologies, positioning technologies, and mobile computing develop quickly, mobile services are becoming practical and important on big spatiotemporal databases management. Mobile service users move only inside a spatial space, e:g: a country. They often issue the K Nearest Neighbor (kNN) query to obtain data objects reachable through the spatial database. The challenge problem of mobile services is how to efficiently answer the data objects which users interest to the corresponding mobile users. One type of kNN query problems is the order-sensitive moving kNN (order-sensitive MkNN) query problem. In the order-sensitive MkNN query problem, the query point is dynamic and unpredictable, the kNN answers should be responded in real time and sorted by the distance in the ascending order. Therefore, how to respond the kNN answers effectively, incrementally and correctly is an important issue. Nutanong et al: have proposed the V*-kNN algorithm to process the order-sensitive MkNN query. The V*-kNN algorithm uses their the V*-diagram algorithm to generate the safe region. It also uses the Incremental Rank Updates algorithm (IRU) to handle the events while the query point passing the bisectors or the boundary of the safe region. However, the V*-kNN algorithm uses the BF-kNN algorithm to retrieve NNs, which is non-incremental. This makes the search time increase while the density of the object increases. Moreover, they do not consider the situation that there are multiple objects at the same order, and the situation that there are multiple events happen in a single step. These situations may cause that the kNN answers are incorrect. Therefore, in this thesis, we propose the Hilbert curve-based kNN algorithm (HC-kNN) algorithm to process the ordersensitive MkNN query. The HC-kNN algorithm can handle the situation that there are multiple events happen in a single step. We also propose new data structure of the kNN answers. Next, we propose the Intersection of Perpendicular Bisectors algorithm (IPB) in order to handle order update events of the kNN answers. The IPB algorithm handles the situation which there are multiple objects at the same order. Finally, based on the Hilbert curve index, we propose the ONHC-kNN algorithm to get NNs incrementally and to generate the safe region. The safe region will not be affected while the density of the object increases. The safe region of our algorithm is larger than that of the V*-kNN algorithm. From our simulation result, we show that the HC-kNN algorithm provides better performance than the V*-kNN algorithm.
78

Efficient case-based reasoning through feature weighting, and its application in protein crystallography

Gopal, Kreshna 02 June 2009 (has links)
Data preprocessing is critical for machine learning, data mining, and pattern recognition. In particular, selecting relevant and non-redundant features in highdimensional data is important to efficiently construct models that accurately describe the data. In this work, I present SLIDER, an algorithm that weights features to reflect relevance in determining similarity between instances. Accurate weighting of features improves the similarity measure, which is useful in learning algorithms like nearest neighbor and case-based reasoning. SLIDER performs a greedy search for optimum weights in an exponentially large space of weight vectors. Exhaustive search being intractable, the algorithm reduces the search space by focusing on pivotal weights at which representative instances are equidistant to truly similar and different instances in Euclidean space. SLIDER then evaluates those weights heuristically, based on effectiveness in properly ranking pre-determined matches of a set of cases, relative to mismatches. I analytically show that by choosing feature weights that minimize the mean rank of matches relative to mismatches, the separation between the distributions of Euclidean distances for matches and mismatches is increased. This leads to a better distance metric, and consequently increases the probability of retrieving true matches from a database. I also discuss how SLIDER is used to improve the efficiency and effectiveness of case retrieval in a case-based reasoning system that automatically interprets electron density maps to determine the three-dimensional structures of proteins. Electron density patterns for regions in a protein are represented by numerical features, which are used in a distance metric to efficiently retrieve matching patterns by searching a large database. These pre-selected cases are then evaluated by more expensive methods to identify truly good matches – this strategy speeds up the retrieval of matching density regions, thereby enabling fast and accurate protein model-building. This two-phase case retrieval approach is potentially useful in many case-based reasoning systems, especially those with computationally expensive case matching and large case libraries.
79

The Incremental Benefits of the Nearest Neighbor Forecast of U.S. Energy Commodity Prices

Kudoyan, Olga 2010 December 1900 (has links)
This thesis compares the simple Autoregressive (AR) model against the k- Nearest Neighbor (k-NN) model to make a point forecast of five energy commodity prices. Those commodities are natural gas, heating oil, gasoline, ethanol, and crude oil. The data for the commodities are monthly and, for each commodity, two-thirds of the data are used for an in-sample forecast, and the remaining one-third of the data are used to perform an out-of-sample forecast. Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are used to compare the two forecasts. The results showed that one method is superior by one measure but inferior by another. Although the differences of the two models are minimal, it is up to a decision maker as to which model to choose. The Diebold-Mariano (DM) test was performed to test the relative accuracy of the models. For all five commodities, the results failed to reject the null hypothesis indicating that both models are equally accurate.
80

Ammunition Transfer System Optimization Problem

Gunsel, H. Sinem 01 March 2012 (has links) (PDF)
Ammunition Transfer System (ATS) is the electro-mechanical system of the Ammunition Resupply Vehicle (ARV) which will be used to meet T-155 mm Firtina howitzers&rsquo / ammunition demand for tactical requirements of higher firing rate by off-road mobility and survivability. The transfer of ammunitions from ARV to Firtina is to be optimized for an effective improvement of firing rate. In this thesis the transferring order of carried ammunitions is being optimized to minimize the total ammunition transferring time. This transfer problem is modeled as a modification of Travelling Salesman Problem (TSP). The given locations of the ammunitions are treated as cities to be visited and the gripper of ATS is treated as the traveling salesman. By GAMS / the small-size problems are solved optimally but large-size ones get only local optimum. A heuristic algorithm that contains nearest neighbor heuristics as construction method and 2-opt exchange heuristic as improvement method is developed to obtain same or better solutions obtained by GAMS with less computational time.

Page generated in 0.0914 seconds