1 |
Information Visualization and Machine Learning Applied on Static Code AnalysisKacan, Denis, Sidlauskas, Darius January 2008 (has links)
Software engineers will possibly never see the perfect source code in their lifetime, but they are seeing much better analysis tools for finding defects in software. The approaches used in static code analysis emerged from simple code crawling to usage of statistical and probabilistic frameworks. This work presents a new technique that incorporates machine learning and information visualization into static code analysis. The technique learns patterns in a program’s source code using a normalized compression distance and applies them to classify code fragments into faulty or correct. Since the classification frequently is not perfect, the training process plays an essential role. A visualization element is used in the hope that it lets the user better understand the inner state of the classifier making the learning process transparent. An experimental evaluation is carried out in order to prove the efficacy of an implementation of the technique, the Code Distance Visualizer. The outcome of the evaluation indicates that the proposed technique is reasonably effective in learning to differentiate between faulty and correct code fragments, and the visualization element enables the user to discern when the tool is correct in its output and when it is not, and to take corrective action (further training or retraining) interactively, until the desired level of performance is reached.
|
2 |
Evaluation of Idempotency & Block Size of Data on the Performance of Normalized Compression Distance AlgorithmMandhapati, Venkata Srikanth, Bajwa, Kamran Ali January 2012 (has links)
Normalized compression distance (NCD) is a similarity distance metric algorithm which is used for the purpose of analyzing the type of file fragments. The performance of NCD depends upon underlying compression algorithm to be used. We have studied three compressors bzip2, gzip and ppmd, the compression ratio of ppmd is better than bzip2 and the compression ratio of bzip2 is better than gzip, but which one out of these three is better than one another in the viewpoint of idempotency is evaluated by us. Then we have applied NCD along with k nearest neighbour as a classification algorithm to a randomly selected public corpus data with different block sizes (512 byte, 1024 bytes, 1536 bytes, 2048 bytes). The performance of two compressors bzip2 and gzip is also compared for the NCD algorithm in the perspective of idempotency. Objectives: In this study we have investigated the In this study we have investigated the combine effect of both of the parameters namely compression ratio versus idempotency and varying block size of data on the performance of NCD. The objective is to figure out that in order to have a better performance of NCD either a compressor for NCD should be selected on the basis of better compression ratio of compressors or better idempotency of compressors. The whole purpose of using different block sizes was to evaluate either the performance of NCD will improve or not by varying the block size of data to be used for making the datasets. Methods: Experiments are performed to test the hypotheses and evaluate the effect of compression ratio versus idempotency and block size of data on the performance of NCD. Results: The results obtained after the analysis of null hypotheses of main experiment are retained, which showed that there is no statistically significant difference on the performance of NCD when varying block size of data is used and also there is no statistically significant difference on the NCD’s performance when a compressor is selected for NCD on the basis of better compression ratio or better idempotency. Conclusions: As the results obtained from the experiments are unable to reject the null hypotheses of main experiment so no conclusion could be drawn of the effect of the independent variables on the dependent variable i.e. there is no statistically significant effect of compression ratio versus idempotency and varying block size of data on performance of the NCD.
|
3 |
Fast Low Memory T-Transform: string complexity in linear time and space with applications to Android app store security.Rebenich, Niko 27 April 2012 (has links)
This thesis presents flott, the Fast Low Memory T-Transform, the currently fastest and most memory efficient linear time and space algorithm available to compute the string complexity measure T-complexity. The flott algorithm uses 64.3% less memory and in our experiments runs asymptotically 20% faster than its predecessor. A full C-implementation is provided and published under the Apache Licence 2.0. From the flott algorithm two deterministic information measures are derived and applied to Android app store security. The derived measures are the normalized T-complexity distance and the instantaneous T-complexity rate which are used to detect, locate, and visualize unusual information changes in Android applications. The information measures introduced present a novel, scalable approach to assist with the detection of malware in app stores. / Graduate
|
Page generated in 0.1268 seconds