Global ETD Search

91	The development of an intelligent, cloud-based remote monitoring management system Cheng, Wen-Hao 25 October 2012 (has links) In this thesis, a data collection application based on MapReduce programming is described. This application aims to collect tempera- ture data stream continuously from a specied set of sensors. Instead of collecting the temperature information of all the sensors by one machine, the sensors are divided into several subsets each of which is handled as a Map task. In each Map task, the temperature data stream of the assigned sensors is collected continuously and stored in a predened database. All the Map tasks can run simultaneously on several machines. This method can reduce the delay time and improve the eciency of the data collection service, especially in the case of having a huge number of sensors monitored remotely by a data center through Internet. We can use the value of remote sensors to predict the next value of remote sensors by some methods such as linear regres- sion and K-means. And, we can use it to predict the system alarm. Experimental results show that the proposed method is eective in temperature data collection,and eective in carbon reduction. K-means Hadoop Data collection MapReduce Distributed programming Database Linear regression
92	Coevolution Based Prediction Of Protein-protein Interactions With Reduced Training Data Pamuk, Bahar 01 February 2009 (has links) (PDF) Protein-protein interactions are important for the prediction of protein functions since two interacting proteins usually have similar functions in a cell. Available protein interaction networks are incomplete / but, they can be used to predict new interactions in a supervised learning framework. However, in the case that the known protein network includes large number of protein pairs, the training time of the machine learning algorithm becomes quite long. In this thesis work, our aim is to predict protein-protein interactions with a known portion of the interaction network. We used Support Vector Machines (SVM) as the machine learning algoritm and used the already known protein pairs in the network. We chose to use phylogenetic profiles of proteins to form the feature vectors required for the learner since the similarity of two proteins in evolution gives a reasonable rating about whether the two proteins interact or not. For large data sets, the training time of SVM becomes quite long, therefore we reduced the data size in a sensible way while we keep approximately the same prediction accuracy. We applied a number of clustering techniques to extract the most representative data and features in a two categorical framework. Knowing that the training data set is a two dimensional matrix, we applied data reduction methods in both dimensions, i.e., both in data size and in feature vector size. We observed that the data clustered by the k-means clustering technique gave superior results in prediction accuracies compared to another data clustering algorithm which was also developed for reducing data size for SVM training. Still the true positive and false positive rates (TPR-FPR) of the training data sets constructed by the two clustering methods did not give satisfying results about which method outperforms the other. On the other hand, we applied feature selection methods on the feature vectors of training data by selecting the most representative features in biological and in statistical meaning. We used phylogenetic tree of organisms to identify the organisms which are evolutionarily significant. Additionally we applied Fisher&sbquo / &Auml / &ocirc / s test method to select the features which are most representative statistically. The accuracy and TPR-FPR values obtained by feature selection methods could not provide to make a certain decision on the performance comparisons. However it can be mentioned that phylogenetic tree method resulted in acceptable prediction values when compared to Fisher&sbquo / &Auml / &ocirc / s test. QA Computer Software 76.75-76.765
93	Image Annotation With Semi-supervised Clustering Sayar, Ahmet 01 December 2009 (has links) (PDF) Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words. Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this thesis, we propose a new image annotation technique, which improves the representation and quantization of the visual information by employing the available but unused information, called side information, which is hidden in the system. This side information is used to semi-supervise the clustering process which creates the visterms. The selection of side information depends on the visual image content, the annotation words and the relationship between them. Although there may be many different ways of defining and selecting side information, in this thesis, three types of side information are proposed. The first one is the hidden topic probability information obtained automatically from the text document associated with the image. The second one is the orientation and the third one is the color information around interest points that correspond to critical locations in the image. The side information provides a set of constraints in a semi-supervised K-means region clustering algorithm. Consequently, in generation of the visual terms from the regions, not only low level features are clustered, but also side information is used to complement the visual information, called visterms. This complementary information is expected to close the semantic gap between the low level features extracted from each region and the high level textual information. Therefore, a better match between visual codebook and the annotation words is obtained. Moreover, a speedup is obtained in the modified K-means algorithm because of the constraints brought by the side information. The proposed algorithm is implemented in a high performance parallel computation environment. QA General 15707
94	Optimal Location for a Mobile Base Station in a Complex Network Moazzami, Farzad, Dean, Richard, Astatke, Yacob 10 1900 (has links) ITC/USA 2013 Conference Proceedings / The Forty-Ninth Annual International Telemetering Conference and Technical Exhibition / October 21-24, 2013 / Bally's Hotel & Convention Center, Las Vegas, NV / The focus of this work is the development of a complete network architecture to enhance telemetry performance using a mobile base station (MBS). The present study proposes a means of enabling both the mobile ad-hoc network (MANET) and a cellular network to operate simultaneously within the same spectrum. In this paper the application of a modified k-means clustering to organize several hundred TAs in a complex network environment is presented. A mobile base station is added to the network to locate the congested area and support the network but positioning itself in the mixed network environment. A scenario with two base stations (one mobile and one stationary) is simulated and results are presented. It is observed that use of an additional mobile base station could greatly increase the quality of communication by providing uniform distribution of node traffic and interference across the clusters in a complex telemetry environment with several hundred TAs. Ad-hoc networks K-means clustering Mixed Networks Spectrum Efficiency QoS
95	Telemetry Network Intrusion Detection System Maharjan, Nadim, Moazzemi, Paria 10 1900 (has links) ITC/USA 2012 Conference Proceedings / The Forty-Eighth Annual International Telemetering Conference and Technical Exhibition / October 22-25, 2012 / Town and Country Resort & Convention Center, San Diego, California / Telemetry systems are migrating from links to networks. Security solutions that simply encrypt radio links no longer protect the network of Test Articles or the networks that support them. The use of network telemetry is dramatically expanding and new risks and vulnerabilities are challenging issues for telemetry networks. Most of these vulnerabilities are silent in nature and cannot be detected with simple tools such as traffic monitoring. The Intrusion Detection System (IDS) is a security mechanism suited to telemetry networks that can help detect abnormal behavior in the network. Our previous research in Network Intrusion Detection Systems focused on "Password" attacks and "Syn" attacks. This paper presents a generalized method that can detect both "Password" attack and "Syn" attack. In this paper, a K-means Clustering algorithm is used for vector quantization of network traffic. This reduces the scope of the problem by reducing the entropy of the network data. In addition, a Hidden-Markov Model (HMM) is then employed to help to further characterize and analyze the behavior of the network into states that can be labeled as normal, attack, or anomaly. Our experiments show that IDS can discover and expose telemetry network vulnerabilities using Vector Quantization and the Hidden Markov Model providing a more secure telemetry environment. Our paper shows how these can be generalized into a Network Intrusion system that can be deployed on telemetry networks. Intrusion Detection System Vector Quantization K-means Clustering Hidden Markov Model Security iNET
96	Examination of Initialization Techniques for Nonnegative Matrix Factorization Frederic, John 21 November 2008 (has links) While much research has been done regarding different Nonnegative Matrix Factorization (NMF) algorithms, less time has been spent looking at initialization techniques. In this thesis, four different initializations are considered. After a brief discussion of NMF, the four initializations are described and each one is independently examined, followed by a comparison of the techniques. Next, each initialization's performance is investigated with respect to the changes in the size of the data set. Finally, a method by which smaller data sets may be used to determine how to treat larger data sets is examined. Nonnegative matrix factorization Initialization Spherical K- means Compression ratio Percent error Random Acol Random C Mathematics
97	Stability Selection of the Number of Clusters Reizer, Gabriella v 18 April 2011 (has links) Selecting the number of clusters is one of the greatest challenges in clustering analysis. In this thesis, we propose a variety of stability selection criteria based on cross validation for determining the number of clusters. Clustering stability measures the agreement of clusterings obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. We propose to measure the clustering stability by the correlation between two clustering functions. These criteria are motivated by the concept of clustering instability proposed by Wang (2010), which is based on a form of clustering distance. In addition, the effectiveness and robustness of the proposed methods are numerically demonstrated on a variety of simulated and real world samples. Consistency Cross validation Hierarchical clustering Instability k-means clustering Spectral clustering Stability Mathematics
98	Text Clustering with String Kernels in R Karatzoglou, Alexandros, Feinerer, Ingo January 2006 (has links) (PDF) We present a package which provides a general framework, including tools and algorithms, for text mining in R using the S4 class system. Using this package and the kernlab R package we explore the use of kernel methods for clustering (e.g., kernel k-means and spectral clustering) on a set of text documents, using string kernels. We compare these methods to a more traditional clustering technique like k-means on a bag of word representation of the text and evaluate the viability of kernel-based methods as a text clustering technique. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
99	Towards Efficient Certificate Revocation Status Validation in Vehicular Ad Hoc Networks with Data Mining Zhang, Qingwei 26 November 2012 (has links) Vehicular Ad hoc Networks (VANETs) are emerging as a promising approach to improving traffic safety and providing a wide range of wireless applications for drivers and passengers. To perform reliable and trusted vehicular communications, one prerequisite is to ensure a peer vehicle’s credibility by means of digital certificates validation from messages that are sent out by other vehicles. However, in vehicular communication systems, certificates validation is more time consuming than in traditional networks, due to the fact that each vehicle receives a large number of messages in a short period of time. Another issue that needs to be addressed is the unsuccessful delivery of information between vehicles and other entities on the road as a result of their high mobility rate. For these reasons, we need new solutions to accelerate the process of certificates validation. In this thesis, we propose a certificate revocation status validation scheme using the concept of clustering; based on data mining practices, which can meet the aforementioned requirements. We employ the technique of k -means clustering to boost the efficiency of certificates validation, thereby enhancing the security of a vehicular ad hoc network. Additionally, a comprehensive analysis of the security of the proposed scheme is presented. The analytical results demonstrate that this scheme can effectively improve the validation of certificates and thus secure the vehicular communication in vehicular networks. VANETs digital certificates Certificate Revocation List k-Means authentication certificate validation
100	Color Range Determination and Alpha Matting for Color Images Luo, Zhenyi 02 November 2011 (has links) This thesis proposes a new chroma keying method that can automatically detect background, foreground, and unknown regions. For background color detection, we use K-means clustering in color space to calculate the limited number of clusters of background colors. We use spatial information to clean the background regions and minimize the unknown regions. Our method only needs minimum inputs from user. For unknown regions, we implement the alpha matte based on Wang's robust matting algorithm, which is considered one of the best algorithms in the literature, if not the best. Wang's algorithm is based on modified random walk. We proposed a better color selection method, which improves matting results in the experiments. In the thesis, a detailed implementation of robust matting is provided. The experimental results demonstrate that our proposed method can handle images with one background color, images with gridded background, and images with difficult regions such as complex hair stripes and semi-transparent clothes. Chroma keying Alpha matting Foreground extraction Modified random walk K-means

Search results