• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6112
  • 683
  • 654
  • 654
  • 654
  • 654
  • 654
  • 654
  • 184
  • 62
  • 16
  • 7
  • 2
  • 2
  • 2
  • Tagged with
  • 10272
  • 10272
  • 6037
  • 1943
  • 826
  • 796
  • 524
  • 524
  • 510
  • 494
  • 454
  • 442
  • 442
  • 431
  • 401
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
251

Assessing malware detection using hardware performance counters

Gupta, Anmol Brijesh 02 November 2017 (has links)
Despite the use of modern anti-virus (AV) software, malware is a prevailing threat to today's computing systems. AV software cannot cope with the increasing number of evasive malware, calling for more robust malware detection techniques. Out of the many proposed methods for malware detection, researchers have suggested microarchitecture-based mechanisms for detection of malicious software in a system. For example, Intel embeds a shadow stack in their modern architectures that maintains the integrity between function calls and their returns by tracking the function's return address. Any malicious program that exploits an application to overflow the return addresses can be restrained using the shadow stack. Researchers also propose the use of Hardware Performance Counters (HPCs). HPCs are counters embedded in modern computing architectures that count the occurrence of architectural events, such as cache hits, clock cycles, and integer instructions. Malware detectors that leverage HPCs create a profile of an application by reading the counter values periodically. Subsequently, researchers use supervised machine learning-based (ML) classification techniques to differentiate malicious profiles amongst benign ones. It is important to note that HPCs count the occurrence of microarchitectural events during execution of the program. However, whether a program is malicious or benign is the high-level behavior of a program. Since HPCs do not surveil the high-level behavior of an application, we hypothesize that the counters may fail to capture the difference in the behavioral semantics of a malicious and benign software. To investigate whether HPCs capture the behavioral semantics of the program, we recreate the experimental setup from the previously proposed systems. To this end, we leverage HPCs to profile applications such as MS-Office and Chrome as benign applications and known malware binaries as malicious applications. Standard ML classifiers demand a normally distributed dataset, where the variance is independent of the mean of the data points. To transform the profile into more normal-like distribution and to avoid over-fitting the machine learning models, we employ power transform on the profiles of the applications. Moreover, HPCs can monitor a broad range of hardware-based events. We use Principal Component Analysis (PCA) for selecting the top performance events that show maximum variation in the least number of features amongst all the applications profiled. Finally, we train twelve supervised machine learning classifiers such as Support Vector Machine (SVM) and MultiLayer Perceptron (MLPs) on the profiles from the applications. We model each classifier as a binary classifier, where the two classes are 'Benignware' and 'Malware.' Our results show that for the 'Malware' class, the average recall and F2-score across the twelve classifiers is 0.22 and 0.70 respectively. The low recall score shows that the ML classifiers tag malware as benignware. Even though we exercise a statistical approach for selecting our features, the classifiers are not able to distinguish between malware and benignware based on the hardware-based events monitored by the HPCs. The incapability of the profiles from HPCs in capturing the behavioral characteristic of an application force us to question the use of HPCs as malware detectors.
252

A Unique Method of Using Information Entropy to Evaluate the Reliability of Deep Neural Network Predictions on Intracranial Electroencephalogram

Dharmaraj Gireesh, Elakkat 15 August 2023 (has links) (PDF)
Deep Neural networks (DNN) are fundamentally information processing machines, which synthesize the complex patterns in input to arrive at solutions, with applications in various fields. One major question when working with the DNN is, which features in the input lead to a specific decision by DNN. One of the common methods of addressing this question involve generation of heatmaps. Another pertinent question is how effectively DNN has captured the entire information presented in the input, which can potentially be addressed with complexity measures of the inputs. In the case of patients with intractable epilepsy, appropriate clinical decision making depends on the interpretation of the brain signals, as recorded in the form of Electroencephalogram (EEG), which in most of the cases will be recorded through intracranial monitoring (iEEG)). In current clinical settings, the iEEG is visually inspected by the clinicians to arrive at decisions regarding the location of the epileptogenic zones which is used in the determination of surgical planning. Visual inspection and decision making is a very tedious and potentially error prone approach, given the massive amount of data that need to be evaluated in a limited amount of time. We developed a DNN model to evaluate iEEG to classify signals arising from epileptic and non-epileptic zones. One of the challenges of incorporating the deep neural network tools in the medical decision making is the black box nature of these tools. To further analyze the underlying reasons for DNN's decision regarding iEEG, we used heatmapping and signal processing tools to better understand the decision-making process of DNN. We were able to demonstrate that the energy rich regions, as captured by analytical signals, is identified by DNN as potentially epileptogenic, when arriving at decisions. We explored the DNN's ability to capture the details of the signal with information theoretical approaches. We introduced a measure of confidence of DNN predictions, named certainty index, which is calculated based on the overall outputs in the penultimate layer of the network. We employed the method of Sample Entropy (SampEn) and were able to demonstrate that the DNN's prediction certainty is related to how effectively the heatmap is correlated to the SampEn of the entire signal. We explored the parameter space of the SampEn calculation and demonstrate that the relationship between SampEn and certainty of DNN predictions hold even on changing the estimation parameters. Further we were able to demonstrate that the rate of change of relationship between the DNN output and activation map, as a function of the sequential DNN layers, is related to the SampEn of the signal. This observation suggests that the speed at which DNN captures the results is directly proportional to the information content in the signal.
253

On designing hardware accelerator-based systems: interfaces, taxes and benefits

Azad, Zahra 30 August 2023 (has links)
Complementary Metal Oxide Semiconductor (CMOS) Technology scaling has slowed down. One promising approach to sustain the historic performance improvement of computing systems is to utilize hardware accelerators. Today, many commercial computing systems integrate one or more accelerators, with each accelerator optimized to efficiently execute specific tasks. Over the years, there has been a substantial amount of research on designing hardware accelerators for machine learning (ML) training and inference tasks. Hardware accelerators are also widely employed to accelerate data privacy and security algorithms. In particular, there is currently a growing interest in the use of hardware accelerators for accelerating homomorphic encryption (HE) based privacy-preserving computing. While the use of hardware accelerators is promising, a realistic end-to-end evaluation of an accelerator when integrated into the full system often reveals that the benefits of an accelerator are not always as expected. Simply assessing the performance of the accelerated portion of an application, such as the inference kernel in ML applications, during performance analysis can be misleading. When designing an accelerator-based system, it is critical to evaluate the system as a whole and account for all the accelerator taxes. In the first part of our research, we highlight the need for a holistic, end-to-end analysis of workloads using ML and HE applications. Our evaluation of an ML application for a database management system (DBMS) shows that the benefits of offloading ML inference to accelerators depend on several factors, including backend hardware, model complexity, data size, and the level of integration between the ML inference pipeline and the DBMS. We also found that the end-to-end performance improvement is bottlenecked by data retrieval and pre-processing, as well as inference. Additionally, our evaluation of an HE video encryption application shows that while HE client-side operations, i.e., message-to- ciphertext and ciphertext-to-message conversion operations, are bottlenecked by number theoretic transform (NTT) operations, accelerating NTT in hardware alone is not sufficient to get enough application throughput (frame rate per second) improvement. We need to address all bottlenecks such as error sampling, encryption, and decryption in message-to-ciphertext and ciphertext-to-message conversion pipeline. In the second part of our research, we address the lack of a scalable evaluation infrastructure for building and evaluating accelerator-based systems. To solve this problem, we propose a robust and scalable software-hardware framework for accelerator evaluation, which uses an open-source RISC-V based System-on-Chip (SoC) design called BlackParrot. This framework can be utilized by accelerator designers and system architects to perform an end-to-end performance analysis of coherent and non-coherent accelerators while carefully accounting for the interaction between the accelerator and the rest of the system. In the third part of our research, we present RISE, which is a full RISC-V SoC designed to efficiently perform message-to-ciphertext and ciphertext-to-message conversion operations. RISE comprises of a BlackParrot core and an efficient custom-designed accelerator tailored to accelerate end-to-end message-to-ciphertext and ciphertext-to-message conversion operations. Our RTL-based evaluation demonstrates that RISE improves the throughput of the video encryption application by 10x-27x for different frame resolutions.
254

ACiS: smart switches with application-level acceleration

Haghi, Pouya 30 August 2023 (has links)
Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches. In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities. In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems. In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs). To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator. In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method. In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes. We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities.
255

Performance-aware site-wide data center power management

Wilson, Daniel C. 30 August 2023 (has links)
Top high performance computing (HPC) data centers recently entered the era of exascale computing, requiring up to tens of megawatts for a computing facility to meet its users’ computing needs. The massive capacity for power at a single site comes with challenges in power management. Poorly managed power may result in unnecessarily high demand on costly energy or may cause the system to under-perform. An HPC data center may have many types of users and workloads with non-trivial power requirements, making it difficult to select a one-size-fits-all policy. But the high power capacity also offers opportunities for data centers to be key players enabling greater adoption of renewable energy across a power grid. Data centers can adjust their demand through software power management policies to help smart grids balance against nature’s time-varying green energy supply. This thesis claims that multi-tiered power management methods are essential for data centers to implement site-wide power management policies that accurately respond to changing power constraints at a higher cluster-level tier while reacting to application-specific performance impacts at a lower job-level tier. Through investigations over site, cluster, job, and server characteristics, we demonstrate that a feedback-driven multi-tiered power management approach meets power management objectives more effectively than siloed solutions. We design a cluster power management policy that distributes power across jobs using knowledge about job power-performance properties, demonstrating up to 7% reduction in system time dedicated to jobs and up to 11% savings in energy, compared to a policy without job awareness. We provide a power management framework that enables accurate, dynamic cluster power control while reacting to incomplete or inaccurate prior knowledge about job power and performance properties. We add a site-wide power model to a cluster power management policy that offers regulation services in a smart grid, showing 1.3x cost savings compared to a policy that is unaware of site-wide power consumption. We introduce a job power management policy that integrates job performance awareness with knowledge of hardware power-performance trade-offs, demonstrating up to 40% energy reduction and 17% execution time reduction in an imbalanced, compute-bound benchmark compared to a policy without frequency throttling. / 2024-02-29T00:00:00Z
256

A Scalable and Efficient Outlier Detection Strategy for Categorical Data

Ortiz, Enrique 01 January 2007 (has links)
Outlier detection has received significant attention in many applications, such as credit card fraud detection and network intrusion detection. Most of the existing research efforts focus on numerical datasets and cannot be directly applied to categorical sets where there is little sense in ordering the data and calculating distances among data points. Furthermore, a number of the current outlier detection methods require quadratic time with respect to the dataset size and usually need multiple scans of the data; these features are undesirable when the datasets are large and scattered over multiple geographically distributed sites. In this paper, we focus and evaluate, experimentally, a few representative current outlier detection approaches ( one based on entropy and two based on frequent itemsets) that are geared towards categorical sets. In addition, we introduce a simple, scalable and efficient outlier detection algorithm that has the advantage of discovering outliers in categorical datasets by performing a single scan of the dataset. This newly introduced outlier detection algorithm is compared with the existing, and aforementioned outlier detection strategies. The conclusion from this comparison is that the simple outlier detection algorithm that we introduce is more efficient (faster) than the existing strategies, and as effective (accurate) in discovering outliers.
257

Computing with Memory for Energy-Efficient Robust Systems

Paul, Somnath 12 July 2011 (has links)
No description available.
258

Fine-Grained Width-Aware Dynamic Supply Gating for Active Power Reduction

Wang, Lei 27 August 2012 (has links)
No description available.
259

Coordinating Data Center Network and Servers for Power Savings

Zheng, Kuangyu 24 March 2014 (has links)
No description available.
260

BYZANTINE FAULT TOLERANCE FOR DISTRIBUTED SYSTEMS

Zhang, Honglei 11 June 2014 (has links)
No description available.

Page generated in 0.1805 seconds