Global ETD Search

1	Using random projections for dimensionality reduction in identifying rogue applications Atkison, Travis Levestis 08 August 2009 (has links) In general, the consumer must depend on others to provide their software solutions. However, this outsourcing of software development has caused it to become more and more abstract as to where the software is actually being developed and by whom, and it poses a potentially large security problem for the consumer as it opens up the possibility for rogue functionality to be injected into an application without the consumer’s knowledge or consent. This begs the question of ‘How do we know that the software we use can be trusted?’ or ‘How can we have assurance that the software we use is doing only the tasks that we ask it to do?’ Traditional methods for thwarting such activities, such as virus detection engines, are far too antiquated for today’s adversary. More sophisticated research needs to be conducted in this area to combat these more technically advanced enemies. To combat the ever increasing problem of rogue applications, this dissertation has successfully applied and extended the information retrieval techniques of n-gram analysis and document similarity and the data mining techniques of dimensionality reduction and attribute extraction. This combination of techniques has generated a more effective Trojan horse, rogue application detection capability tool suite that can detect not only standalone rogue applications but also those that are embedded within other applications. This research provides several major contributions to the field including a unique combination of techniques that have provided a new tool for the administrator’s multi-pronged defense to combat the infestation of rogue applications. Another contribution involves a unique method of slicing the potential rogue applications that has proven to provide a more robust rogue application classifier. Through experimental research this effort has shown that a viable and worthy rogue application detection tool suite can be developed. Experimental results have shown that in some cases as much as a 28% increase in overall accuracy can be achieved when comparing the accepted feature selection practice of mutual information with the feature extraction method presented in this effort called randomized projection. rogue application detection information retrieval data mining n-gram analysis clustering dimensionality reduction randomized projection
2	EXTRACTION AND PREDICTION OF SYSTEM PROPERTIES USING VARIABLE-N-GRAM MODELING AND COMPRESSIVE HASHING Muthukumarasamy, Muthulakshmi 01 January 2010 (has links) In modern computer systems, memory accesses and power management are the two major performance limiting factors. Accesses to main memory are very slow when compared to operations within a processor chip. Hardware write buffers, caches, out-of-order execution, and prefetch logic, are commonly used to reduce the time spent waiting for main memory accesses. Compiler loop interchange and data layout transformations also can help. Unfortunately, large data structures often have access patterns for which none of the standard approaches are useful. Using smaller data structures can significantly improve performance by allowing the data to reside in higher levels of the memory hierarchy. This dissertation proposes using lossy data compression technology called ’Compressive Hashing’ to create “surrogates”, that can augment original large data structures to yield faster typical data access. One way to optimize system performance for power consumption is to provide a predictive control of system-level energy use. This dissertation creates a novel instruction-level cost model called the variable-n-gram model, which is closely related to N-Gram analysis commonly used in computational linguistics. This model does not require direct knowledge of complex architectural details, and is capable of determining performance relationships between instructions from an execution trace. Experimental measurements are used to derive a context-sensitive model for performance of each type of instruction in the context of an N-instruction sequence. Dynamic runtime power prediction mechanisms often suffer from high overhead costs. To reduce the overhead, this dissertation encodes the static instruction-level predictions into a data structure and uses compressive hashing to provide on-demand runtime access to those predictions. Genetic programming is used to evolve compressive hash functions and performance analysis of applications shows that, runtime access overhead can be reduced by a factor of ~3x-9x. Electrical and Computer Engineering Engineering
3	N-gramy v mluveném projevu českých a rodilých mluvčích angličtiny / N-grams in the speech of Czech and native speakers of English Zvěřinová, Simona January 2016 (has links) The diploma thesis is concerned with the analysis of recurrent word-combinations in the speech of advanced Czech speakers of English and native speakers of English. The data used for the analysis is extracted from two corpora, learner corpus LINDSEI and native speaker corpus LOCNEC. The aim of the thesis is to compare the two groups of speakers, determine differences in their use of recurrent word-combinations and compare the findings to previous studies involving speakers of different languages. The quantitative analysis is performed on a sample of 50 speakers from each corpus and the frequency data is used to compare the two groups as to the number of types of word-combinations they use and how frequently they do so. The qualitative analysis is performed on a sample of 15 speakers from each corpus to determine functional differences. Four categories of word-combinations are determined in the analysis. In the conclusion, the quantitative and qualitative findings are compared to previous research involving speakers of different languages. Keywords: spoken language, learner language, n-grams, n-gram analysis, recurrent word- combinations, lexical bundles, learner corpus

1

Page generated in 0.0422 seconds