Global ETD Search

101	Automatic Detection of Abnormal Behavior in Computing Systems Roberts, James Frank 01 January 2013 (has links) I present RAACD, a software suite that detects misbehaving computers in large computing systems and presents information about those machines to the system administrator. I build this system using preexisting anomaly detection techniques. I evaluate my methods using simple synthesized data, real data containing coerced abnormal behavior, and real data containing naturally occurring abnormal behavior. I find that the system adequately detects abnormal behavior and significantly reduces the amount of uninteresting computer health data presented to a system administrator. Anomaly detection computer system health monitoring Symbolic Aggregate Approximation SAX Digital Communications and Networking OS and Networks Theory and Algorithms
102	Optimized Forecasting of Dominant U.S. Stock Market Equities Using Univariate and Multivariate Time Series Analysis Methods Schwartz, Michael 01 May 2017 (has links) This dissertation documents an investigation into forecasting U.S. stock market equities via two very different time series analysis techniques: 1) autoregressive integrated moving average (ARIMA), and 2) singular spectrum analysis (SSA). Approximately 40% of the S&P 500 stocks are analyzed. Forecasts are generated for one and five days ahead using daily closing prices. Univariate and multivariate structures are applied and results are compared. One objective is to explore the hypothesis that a multivariate model produces superior performance over a univariate configuration. Another objective is to compare the forecasting performance of ARIMA to SSA, as SSA is a relatively recent development and has shown much potential. Stochastic characteristics of stock market data are analyzed and found to be definitely not Gaussian, but instead better fit to a generalized t-distribution. Probability distribution models are validated with goodness-of-fit tests. For analysis, stock data is segmented into non-overlapping time “windows” to support unconditional statistical evaluation. Univariate and multivariate ARIMA and SSA time series models are evaluated for independence. ARIMA models are found to be independent, but SSA models are not able to reach independence. Statistics for out-of-sample forecasts are computed for every stock in every window, and multivariate-univariate confidence interval shrinkages are examined. Results are compared for univariate, bivariate, and trivariate combinations of highly-correlated stocks. Effects are found to be mixed. Bivariate modeling and forecasting with three different covariates are investigated. Examination of results with covariates of trading volume, principal component analysis (PCA), and volatility reveal that PCA exhibits the best overall forecasting accuracy in the entire field of investigated elements, including univariate models. Bivariate-PCA structures are applied in a back-testing environment to evaluate economic significance and robustness of the methods. Initial results of back-testing yielded similar results to those from earlier independent testing. Inconsistent performance across test intervals inspired the development of a second technique that yields improved results and positive economic significance. Robustness is validated through back-testing across multiple market trends. Forecasting univariate multivariate time series analysis Other Computer Sciences Other Economics Theory and Algorithms
103	Spatial Data Mining Analytical Environment for Large Scale Geospatial Data Yang, Zhao 16 December 2016 (has links) Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users. Databases and Information Systems Systems Architecture Theory and Algorithms
104	A Performance Analysis of Distributed Algorithms in JavaSpaces, CORBA Services and Web Services Sunku, Suresh 01 January 2003 (has links) Implementation of distributed parallel algorithms on networked computers has always been very difficult until the introduction of service-oriented architectures (SOA) like JavaSpaces service, CORBA services and Web Services. Algorithms of the type Master/Worker pattern are implemented with relative ease using the SOAs. This project analyzes the performance of such algorithms on three contemporary SOAs namely JavaSpaces service, CORBA services and Web Services. These architectures make the implementations of distributed algorithms reasonably fault tolerant and highly and dynamically scalable. Also, the systems built on these architectures are generally loosely coupled and operate asynchronously. In this project we measure and analyze the latency, speed-up and efficiency metrics of an insertion sort of 0 (n^2) complexity on all the three SOAs. We then draw conclusions of overall performance and scalability on all the three architectures. UNF University of North Florida Computer Sciences Systems Architecture Theory and Algorithms
105	A Generalized Adaptive Mathematical Morphological Filter for LIDAR Data Cui, Zheng 14 November 2013 (has links) Airborne Light Detection and Ranging (LIDAR) technology has become the primary method to derive high-resolution Digital Terrain Models (DTMs), which are essential for studying Earth’s surface processes, such as flooding and landslides. The critical step in generating a DTM is to separate ground and non-ground measurements in a voluminous point LIDAR dataset, using a filter, because the DTM is created by interpolating ground points. As one of widely used filtering methods, the progressive morphological (PM) filter has the advantages of classifying the LIDAR data at the point level, a linear computational complexity, and preserving the geometric shapes of terrain features. The filter works well in an urban setting with a gentle slope and a mixture of vegetation and buildings. However, the PM filter often removes ground measurements incorrectly at the topographic high area, along with large sizes of non-ground objects, because it uses a constant threshold slope, resulting in “cut-off” errors. A novel cluster analysis method was developed in this study and incorporated into the PM filter to prevent the removal of the ground measurements at topographic highs. Furthermore, to obtain the optimal filtering results for an area with undulating terrain, a trend analysis method was developed to adaptively estimate the slope-related thresholds of the PM filter based on changes of topographic slopes and the characteristics of non-terrain objects. The comparison of the PM and generalized adaptive PM (GAPM) filters for selected study areas indicates that the GAPM filter preserves the most “cut-off” points removed incorrectly by the PM filter. The application of the GAPM filter to seven ISPRS benchmark datasets shows that the GAPM filter reduces the filtering error by 20% on average, compared with the method used by the popular commercial software TerraScan. The combination of the cluster method, adaptive trend analysis, and the PM filter allows users without much experience in processing LIDAR data to effectively and efficiently identify ground measurements for the complex terrains in a large LIDAR data set. The GAPM filter is highly automatic and requires little human input. Therefore, it can significantly reduce the effort of manually processing voluminous LIDAR measurements. LIDAR Digital terrain model (DTM) data filtering geospatial data analytics Geotechnical Engineering Other Computer Sciences Other Earth Sciences Theory and Algorithms
106	The window least mean square error algorithm Degtyarena, Anna Semenovna 01 January 2003 (has links) In order to improve the performance of LMS (least mean square) algorithm by decreasing the amount of calculations this research proposes to make an update on each step only for those elements from the input data set, that fall within a small window W near the separating hyperplane surface. This work aims to describe in detail the results that can be achieved by using the proposed LMS with window learning algorithm in information systems that employ the methodology of neural network for the purposes of classification. Least squares Computer algorithms Expert systems (Computer science) Perceptrons Theory and Algorithms
107	DiSH: Democracy in State Houses Russo, Nicholas A 01 February 2019 (has links) In our current political climate, state level legislators have become increasingly impor- tant. Due to cuts in funding and growing focus at the national level, public oversight for these legislators has drastically decreased. This makes it difficult for citizens and activists to understand the relationships and commonalities between legislators. This thesis provides three contributions to address this issue. First, we created a data set containing over 1200 features focused on a legislator’s activity on bills. Second, we created embeddings that represented a legislator’s level of activity and engagement for a given bill using a custom model called Democracy2Vec. Third, we provided a case study focused on the 2015-2016 California State Legislator and had our results verified by a political expert. Our results show that our embeddings can explain relationships between legislator and how they will likely act during the legislative process. Digital Democracy Machine Learning Embeddings Neural Networks Clusterings American Politics Artificial Intelligence and Robotics Databases and Information Systems Software Engineering Theory and Algorithms
108	RISK Gameplay Analysis Using Stochastic Beam Search Gillenwater, Jacob 01 May 2022 (has links) Hasbro’s RISK, first published in 1959, is a complex multiplayer strategy game that has received little attention from the scientific community. Training artificial intelligence (AI) agents using stochastic beam search gives insight into effective strategy when playing RISK. A comprehensive analysis of the systems of play challenges preconceptions about good strategy in some areas of the game while reinforcing those preconceptions in others. This study applies stochastic beam search to discover optimal strategies in RISK. Results of the search show both support for and challenges to traditionally held positions about RISK gameplay. While stochastic beam search competently investigates gameplay on a turn-by-turn basis, the search cannot create contingencies that allow for effective strategy across multiple turns. Future work would investigate additional algorithms that eliminate this limitation to provide further insights into optimal gameplay strategies. Stochastic Beam Search Genetic Algorithm Artificial Intelligence RISK Strategy Games Artificial Intelligence and Robotics Software Engineering Theory and Algorithms
109	Designing Superior Evolutionary Algorithms via Insights From Black-Box Complexity Theory / Conception de meilleurs algorithmes évolutionnaires grâce à la théorie de la complexité boîte noire Yang, Jing 04 September 2018 (has links) Il a été observé que l'exécution des heuristiques de recherche aléatoire dépend d'un ou de plusieurs paramètres. Un certain nombre de résultats montrent un avantage des paramètres dynamiques, c'est-à-dire que les paramètres de l'algorithme sont modifiés au cours de son exécution. Dans ce travail, nous montrons que la complexité de la boîte noire sans biais de la classe de fonction de référence OneMax est $n ln(n) - cn pm o(n)$ pour une constante $c$ comprise entre $0.2539$ et $0.2665$. L'exécution peut être réalisé avec un algorithme simple de type-(1+1) utilisant une puissance de mutation fitness dépendant. Une fois traduite dans le cas du budget fixe, notre algorithme trouve des solutions plus proches de l'optimum de 13% que celles des meilleurs algorithmes connus.Basé sur la puissance de mutation optimale analysée pour OneMaX, nous montrons qu'un choix auto-ajusté du nombre de bits à retourner atteint le même temps d'exécution (excepté $o(n)$ termes inférieurs) et le même (asymptotique) 13% amélioration de la fitness-distance par rapport au RLS. Le mécanisme d'ajustement doit apprendre de manière adaptative la puissance de mutation actuellement optimale des itérations précédentes. Cela vise à la fois à exploiter le fait que des problèmes généralement différents peuvent nécessiter des puissances de mutation différentes et que, pour un problème fixe, différentes puissances peuvent devenir optimales à différentes étapes du processus d'optimisation.Nous étendons ensuite notre stratégie d'auto-ajustement aux algorithmes évolutifs basés sur la population dans des espaces discrets de recherche. Grosso modo, il consiste à créer la moitié de la descendance avec un taux de mutation qui est deux fois plus élevé que le taux de mutation actuel et l'autre moitié avec la moitié du taux actuel. Le taux de mutation est ensuite mis à jour au taux utilisé dans cette sous-population qui contient la meilleure descendance. Nous analysons comment l'algorithme d'évolution $(1+lambda)$ avec ce taux de mutation auto-ajustable optimise la fonction de test OneMax. Nous montrons que cette version dynamique de $(1+lambda)$~EA trouve l'optimum dans un temps d'optimisation attendu (nombre d'évaluations de la fitness) de $O(nlambda/loglambda+nlog n)$. Le temps est asymptotiquement plus petit que le temps d'optimisation de l'EA classique $(1+lambda)$. Des travaux antérieurs montrent que cette performance est la meilleure possible parmi tous les algorithmes de boîtes noires sans biais unaire basés sur des mutations $lambda$-parallèles.Nous proposons et analysons également une version auto-réglage de l'algorithme évolutionnaire $(1,lambda)$ dans lequel le taux de mutation actuel fait partie de l'individu et donc également sujet à mutation. Une analyse d'exécution rigoureuse sur la fonction de référence OneMax révèle qu'un simple schéma de mutation pour le taux conduit à un temps d'optimisation attendu du meilleur $O(nlambda/loglambda+nlog n)$. Notre résultat montre que l'auto-réglage dans le calcul évolutif peut trouver automatiquement des paramètres optimaux complexes. En même temps, cela prouve qu'un schéma d'auto-ajustement relativement compliqué pour le taux de mutation peut être remplacé par notre schéma endogène simple. / It has been observed that the runtime of randomized search heuristics depend on one or more parameters. A number of results show an advantage of dynamic parameter settings, that is, the parameters of the algorithm are changed during its execution. In this work, we prove that the unary unbiased black-box complexity of the OneMax benchmark function class is $n ln(n) - cn pm o(n)$ for a constant $c$ which is between $0.2539$ and $0.2665$. This runtime can be achieved with a simple (1+1)-type algorithm using a fitness-dependent mutation strength. When translated into the fixed-budget perspective, our algorithm finds solutions which are roughly 13% closer to the optimum than those of the best previously known algorithms.Based on the analyzed optimal mutation strength for OneMax, we show that a self-adjusting choice of the number of bits to be flipped attains the same runtime (apart from $o(n)$ lower-order terms) and the same (asymptotic) 13% fitness-distance improvement over RLS. The adjusting mechanism is to adaptively learn the currently optimal mutation strength from previous iterations. This aims both at exploiting that generally different problems may need different mutation strengths and that for a fixed problem different strengths may become optimal in different stages of the optimization process.We then extend our self-adjusting strategy to population-based evolutionary algorithms in discrete search spaces. Roughly speaking, it consists of creating half the offspring with a mutation rate that is twice the current mutation rate and the other half with half the current rate. The mutation rate is then updated to the rate used in that subpopulation which contains the best offspring. We analyze how the $(1+lambda)$ evolutionary algorithm with this self-adjusting mutation rate optimizes the OneMax test function. We prove that this dynamic version of the $(1+lambda)$~EA finds the optimum in an expected optimization time (number of fitness evaluations) of $O(nlambda/loglambda+nlog n)$. This time is asymptotically smaller than the optimization time of the classic $(1+lambda)$ EA. Previous work shows that this performance is best-possible among all $lambda$-parallel mutation-based unbiased black-box algorithms.We also propose and analyze a self-adaptive version of the $(1,lambda)$ evolutionary algorithm in which the current mutation rate is part of the individual and thus also subject to mutation. A rigorous runtime analysis on the OneMax benchmark function reveals that a simple local mutation scheme for the rate leads to an expected optimization time of the best possible $O(nlambda/loglambda+nlog n)$. Our result shows that self-adaptation in evolutionary computation can find complex optimal parameter settings on the fly. At the same time, it proves that a relatively complicated self-adjusting scheme for the mutation rate can be replaced by our simple endogenous scheme. Théorie des algorithmes Algorithmes évolutionnaires Théorie de la complexité Complexité de la requête Theory of algorithms Evolutionary algorithms Complexity theory Query complexity 006.382 3
110	Advances in Piecewise Smooth Image Reconstruction Juengling, Ralf 17 March 2014 (has links) Advances and new insights into algorithms for piecewise smooth image reconstruction are presented. Such algorithms fit a piecewise smooth function to image data without prior knowledge of the number of regions or the location of region boundaries in the best fitting function. This is a difficult model selection problem since the number of parameters of possible solutions varies widely. The approach followed in this work was proposed by Yvan Leclerc. It uses the Minimum Description Length principle to make the reconstruction problem well-posed: the best fitting function yields the shortest encoding of the image data. In order to derive a code length formula, the class of functions is restricted to piecewise polynomial. The resulting optimization problem may have many local minima, and a good initial approximation is required in order to find acceptable solutions. Good initial approximations may be generated at the cost of solving a sequence of related optimization problems, as prescribed by a continuation method. Several problems with this approach are identified and addressed. First, success or failure of the continuation method is found to be sensitive to the choice of objective function parameters. Second, the optimization method used in prior work may fail to converge, and, third, it converges too slowly to be useful in many vision applications. I address the first problem in three different ways. First, a revised continuation method is less sensitive to parameter choice. Second, I show how to move control over success or failure from the objective function parameters to the continuation method. Third, a new objective function is derived which includes one parameter instead of the two parameters used in prior work. Experimental results show that all measures improve robustness with respect to parameter choice. In order to address the optimization-related problems I use a quasi-Newton line-search method. This method is guaranteed to converge and may converge at a faster rate than the relaxation method used in prior work. To realize a faster convergence rate, I introduce a new parameter whose role is to improve variable scaling and problem conditioning. Further runtime improvements result from using extrapolation in the continuation method. Experimental results show overall runtime improvements of an order of magnitude and more. My reconstruction algorithm performs superior to the well-known Canny edge detector on the Berkeley boundary detection task. This is a novel result that demonstrates the merits of image reconstruction as a means for extracting information from an image. Image processing -- Digital techniques Computer Sciences Graphics and Human Computer Interfaces Theory and Algorithms

Search results