Spelling suggestions: "subject:"electrical & computer engineering"" "subject:"electrical & computer ingineering""
1 |
Physical Modeling of Graphene Nanoribbon Field Effect Transistor Using Non-Equilibrium Green Function Approach for Integrated Circuit DesignMohammadi Banadaki, Yaser 21 April 2016 (has links)
The driving engine for the exponential growth of digital information processing systems is scaling down the transistor dimensions. For decades, this has enhanced the device performance and density. However, the International Technology Roadmap for Semiconductors (ITRS) states the end of Moores law in the next decade due to the scaling challenges of silicon-based CMOS electronics, e.g. extremely high power density. The forward-looking solutions are the utilization of emerging materials and devices for integrated circuits. The Ph.D. dissertation focuses on graphene, one atomic layer of carbon sheet, experimentally discovered in 2004. Since fabrication technology of emerging materials is still in early stages, transistor modeling has been playing an important role for evaluating futuristic graphene-based devices and circuits.
The GNR FET has been simulated by solving a numerical quantum transport model based on self-consistent solution of the 3D Poisson equation and 1D Schrödinger equations within the non-equilibrium Greens function (NEGF) formalism. The quantum transport model fully treats short channel-length electrostatic effects and the quantum tunneling effects, leading to the technology exploration of graphene nanoribbon field effect transistors (GNRFETs) for the future. A comprehensive study of static metrics and switching attributes of GNRFET has been presented including the performance dependence of device characteristics to the GNR width and the scaling of its channel length down to 2.5 nanometer.
It has been found that increasing the GNR width deteriorate the off-state performance of the GNRFET, such that, narrower armchair GNRs improved the device robustness to short channel effects, leading to better off-state performance considering smaller off-current, larger ION/IOFF ratio, smaller subthreshold swing and smaller drain-induced barrier-lowering. The wider armchair GNRs allow the scaling of channel length and supply voltage resulting in better on-state performance such as higher drive current, smaller intrinsic gate-delay time and smaller power-delay product. In addition, the width-dependent characteristics of GNR FETs is investigated for two GNR semiconducting families (3p,0) and (3p+1,0). It has been found that the GNRs(3p+1,0) demonstrates superior off-state performance, while, on the other hand, GNRs(3p,0) shows superior on-state performance. Thus, GNRs(3p+1,0) are promising for low-power design, while GNRs(3p,0) indicate a more preferable attribute for high frequency applications.
|
2 |
Increasing Off-Chip Bandwidth and Mitigating Dark Silicon via Switchable PinsChen, Shaoming 02 August 2016 (has links)
Off-chip memory bandwidth has been considered as one of the major limiting factors to processor performance, especially for multi-cores and many-cores. Conventional processor design allocates a large portion of off-chip pins to deliver power, leaving a small number of pins for processor signal communication. We observed that the processor requires much less power than that can be supplied during memory intensive stages in some cases. In this work, we propose a dynamic pin switch technique to alleviate the bandwidth limitation issue. The technique is introduced to dynamically exploit the surplus pins for power delivery in the memory intensive phases and uses them to provide extra bandwidth for the program executions, thus significantly boosting the performance. We also explore its performance benefit in the era of Phase-change memory (PCM) and prove that the technique can be applied beyond DRAM-based memory systems.
On the other hand, the end of Dennard Scaling has led to a large amount of inactive or significantly under-clocked transistors on modern chip multi-processors in order to comply with the power budget and prevent the processors from overheating. This so-called dark silicon is one of the most critical constraints that will hinder the scaling with Moores Law in the future. While advanced cooling techniques, such as liquid cooling, can effectively decrease the chip temperature and alleviate the power constraints; the peak performance, determined by the maximum number of transistors which are allowed to switch simultaneously, is still confined by the amount of power pins on the chip package. In this paper, we propose a novel mechanism to power up the dark silicon by dynamically switching a portion of I/O pins to power pins when off-chip communications are less frequent. By enabling extra cores or increasing processor frequency, the proposed strategy can significantly boost performance compared with traditional designs.
Using the switchable pins can increase inter-socket bandwidth as one of performance bottlenecks. Multi-socket computer systems are popular in workstations and servers. However, they suffer from the relatively low bandwidth of inter-socket communication especially for massive parallel workloads that generates many inter-socket requests for synchronizations and remote memory accesses. The inter-socket traffic poses a huge pressure on the underlying networks fully connecting all processors with the limited bandwidth that is confined by pin resources. Given the constraint, we propose to dynamically increase the inter-socket band-width, trading off with lower off-chip memory bandwidth when the systems have heavy inter-socket communication but few off-chip memory accesses. The design increases the physical bandwidth of inter-socket communication via switching the function of pins from off-chip memory accesses to inter-socket communication.
|
3 |
A Study of FPGA Resource Utilization for Pipelined Windowed Image ComputationsVijaya Varma, Aswin 02 August 2016 (has links)
In image processing operations, each pixel is often treated independently and operated upon by using values of other pixels in the neighborhood. These operations are often called windowed image computations (or neighborhood operations).
In this thesis, we examine the implementation of a windowed computation pipeline in an FPGA-based environment. Typically, the image is generated outside the FPGA environment (such as through a camera) and the result of the windowed computation is consumed outside the FPGA environment (for example, in a screen for display or an engine for higher level analysis). The image is typically large (over a million pixels 1000×1000 image) and the FPGA input-output (I/O) infrastructure is quite modest in comparison (typically a few hundred pins). Consequently, the image is brought into the chip a small piece (tile) at a time.
We define a handshaking scheme that allows us to construct an FPGA architecture without making large assumptions about component speeds and synchronization. We define a pipeline architecture for windowed computations, including details of a stage to accommodate FPGA pin-limitation and bounded storage. We implement a design to better suit FPGAs where it ensures a smoother (stall-resistant) flow of the computation in the pipeline. Based on the architecture proposed, we have analytically predicted resource usage in the FPGA. In particular, we have shown that for an N×N image processed as n×n tiles on a z-stage windowed computation with parameter w; θ(n^2+logN+logz ) pins are used and θ(n^2 z) memory is used. We ran simulations that validated these predictions on two FPGAs (Artix-7 and Kintex-7) with different resources. As we had predicted, the pins and distributed memory turned out to be the most used resources. Our simulations have also shown that the operating clock speed of the design is relatively independent of the number of stages in the pipeline; this is in line with what was expected with the handshaking scheme that isolates the timing of communicating modules.
Our work, although aimed at FPGAs, could also be applied to any I/O pin-limited devices and memory limited environments.
|
4 |
Relay Selection Strategies for Multi-hop Cooperative NetworksSun, Hui 09 June 2016 (has links)
In this dissertation we consider several relay selection strategies for multi-hop cooperative networks.
The relay selection strategies we propose do not require a central controller (CC).
Instead, the relay selection is on a hop-by-hop basis.
As such, these strategies can be implemented in a distributed manner.
Therefore, increasing the number of hops in the network would not increase the complexity or time consumed for the relay selection procedure of each hop.
We first investigate the performance of a hop-by-hop relay selection strategy for multi-hop decode-and-forward (DF) cooperative networks.
In each relay cluster, relays that successfully receive and decode the message from the previous hop form a decoding set for relaying,
and the relay which has the highest signal-to-noise ratio (SNR) link to the next hop is then selected for retransmission.
We analyze the performance of this method in terms of end-to-end outage probability,
and we derive approximations for the ergodic capacity and the effective ergodic capacity of this strategy.
Next we propose a novel hop-by-hop relay selection strategy where
the relay in the decoding set with the largest number of ``good'' channels to the next stage is selected for retransmission.
We analyze the performance of this method in terms of end-to-end outage probability in the case of perfect and imperfect channel state information (CSI).
We also investigate relay selection strategies in underlay spectrum sharing cognitive relay networks.
We consider a two-hop DF cognitive relay network with a constraint on the
interference to the primary user.
The outage probability of the secondary user
and the interference probability at the primary user are analyzed
under imperfect CSI scenario.
Finally we introduce a hop-by-hop relay selection strategy for underlay spectrum sharing multi-hop relay networks.
Relay selection in each stage is only based on the CSI in that hop.
It is shown that in terms of outage probability,
the performance of this method is nearly optimal.
|
5 |
Nano Cost Nano Patterned Template for Surface Enhanced Raman Scattering (SERS) for IN-VITRO and IN-VIVO ApplicationsHou, Hsuan-Chao 30 May 2016 (has links)
Raman scattering is a well-known technique for detecting and identifying complex molecular level samples. The weak Raman signals are enormously enhanced in the presence of a nano-patterned metallic surface next to the specimens. This dissertation describes a technique to fabricate a novel, low cost, high sensitive, disposable, and reproducible metallic nanostructure on a transparent substrate for Surface Enhanced Raman Scattering (SERS). Raman signals can be obtained from the specimen surface of opaque specimens. Most importantly, the metallic nanostructure can be bonded on the end of a probe / a needle, and the other end is coupled to a distant spectrometer. This opens up the Raman spectroscopy for a use in a clinical environment with the patient simply sitting or lying near a spectrometer.
This SERS system, one of molecular level early diagnosis technologies, can be divided into four parts: SERS nanostructure substrates, reflection Raman signal (in vitro), transmission (in vivo) Raman signal, and a probe / a needle with a gradient-index (GRIN) lens in an articulated arm system. In this work, the aluminum metal was employed as not only a base substrate for a sputtered Au nanostructure (conventional view) but also a sacrificial layer for the Au nanostructure on a transparent substrate (transmission view). The enhanced Raman Signal from reflection and transparent SERS substrates depended on aluminum etching methods, Au deposition angles, and Au deposition thicknesses. Rhodamine 6G solutions on both sides of the SERS substrates were used to analyze and characterize. Moreover, preliminary Raman Spectra from R6G and chicken specimen were obtained through a remote SERS probe head and an articulated arm system. The diameter of the invasive probe head was shrunk to 0.5 mm. The implication is that this system can be applied in medical applications.
|
6 |
Effective 3D Geometric Matching for Data Restoration and Its Forensic ApplicationZhang, Kang 31 May 2016 (has links)
3D geometric matching is the technique to detect the similar patterns among multiple objects. It is an important and fundamental problem and can facilitate many tasks in computer graphics and vision, including shape comparison and retrieval, data fusion, scene understanding and object recognition, and data restoration. For example, 3D scans of an object from different angles are matched and stitched together to form the complete geometry. In medical image analysis, the motion of deforming organs is modeled and predicted by matching a series of CT images. This problem is challenging and remains unsolved, especially when the similar patterns are 1) small and lack geometric saliency; 2) incomplete due to the occlusion of the scanning and damage of the data. We study the reliable matching algorithm that can tackle the above difficulties and its application in data restoration.
Data restoration is the problem to restore the fragmented or damaged model to its original complete state. It is a new area and has direct applications in many scientific fields such as Forensics and Archeology. In this dissertation, we study novel effective geometric matching algorithms, including curve matching, surface matching, pairwise matching, multi-piece matching and template matching. We demonstrate its applications in an integrated digital pipeline of skull reassembly, skull completion, and facial reconstruction, which is developed to facilitate the state-of-the-art forensic skull/facial reconstruction processing pipeline in law enforcement.
|
7 |
An Architecture for Configuring an Efficient Scan Path for a Subset of ElementsAshrafi, Arash 05 May 2016 (has links)
<html>
<head>
<title>LaTeX4Web 1.4 OUTPUT</title>
<style type="text/css">
<!--
body {color: black; background:"#FFCC99"; }
div.p { margin-top: 7pt;}
td div.comp { margin-top: -0.6ex; margin-bottom: -1ex;}
td div.comb { margin-top: -0.6ex; margin-bottom: -.6ex;}
td div.norm {line-height:normal;}
td div.hrcomp { line-height: 0.9; margin-top: -0.8ex; margin-bottom: -1ex;}
td.sqrt {border-top:2 solid black;
border-left:2 solid black;
border-bottom:none;
border-right:none;}
table.sqrt {border-top:2 solid black;
border-left:2 solid black;
border-bottom:none;
border-right:none;}
-->
</style>
</head>
<body>
Field Programmable Gate Arrays (FPGAs) have many modern applications. A feature of FPGAs is that they can be reconfigured to suit the computation. One such form of reconfiguration, called partial reconfiguration (PR), allows part of the chip to be altered. The smallest part that can be reconfigured is called a frame. To reconfigure a frame, a fixed number of configuration bits are input (typically from outside) to the frame.
Thus PR involves (a) selecting a subset C <font face=symbol>Í</font> S of k out of n frames to configure and (b) inputting the configuration bits for these k frames. The, recently proposed, MU-Decoder has made it possible to select the subset C quickly. This thesis involves mechanisms to input the configuration bits to the selected frames.
Specifically, we propose a class of architectures that, for any subset C <font face=symbol>Í</font> S (set of frames), constructs a path connecting only the k frames of C through which the configuration bits can be scanned in. We introduce a Basic Network that runs in <font face=symbol>Q</font> (k log n) time, where k is the number of frames selected out of the total number n of available frames; we assume the number of configuration bits per frame is constant. The Basic Network does not exploit any locality or other structure in the subset of frames selected. We show that for certain structures (such as frames that are relatively close to each other) the speed of reconfiguration can be improved. We introduce an addition to the Basic Network that suggests the fastest clock speed that can be employed for a given set of frames. This enhancement decreases configuration time to O(k log k) for certain cases. We then introduce a second enhancement, called shortcuts, that for certain cases reduces the time to an optimal O(k). All the proposed architectures require an optimal <font face=symbol>Q</font>(n) number of gates.
We implement our networks on the CAD tools and show that the theoretical predictions are a good reflection of the network<font face=symbol>¢</font>s performance.
Our work, although directed to FPGAs, may also apply to other applications; for example hardware testing and novel memory accesses.</body>
</html>
|
8 |
Powers and Compensation in Three-Phase Systems with Nonsinusoidal and Asymmetrical Voltages and CurrentsBhattarai, Prashanna Dev 22 April 2016 (has links)
A contribution to power theory development of three-phase three-wire systems with asymmetrical and nonsinusoidal supply voltages is presented in this dissertation.
It includes:
contribution to explanation of power related phenomena
contribution to methods of compensation
The power equation of unbalanced Linear Time Invariant (LTI) loads at sinusoidal but asymmetrical voltage is first presented. The different current components of such a load and the phenomenon associated with these current components are described. The load current decomposition is used for the design of reactive balancing compensators for power factor improvement. Next, the current of LTI loads operating at nonsinusoidal asymmetrical voltage is decomposed, and the power equation of such a load is developed. Methods of the design of reactive compensators for the complete compensation of the reactive and unbalanced current components, as well as the design of optimized compensator for minimization of these currents are also presented.
Next, the power equation of Harmonics Generating Loads (HGLs) connected to nonsinusoidal asymmetrical voltage is developed. The voltage and current harmonics are divided into two subsets, namely the subset of the harmonic orders originating in the supply, and the subset of the harmonic orders originating in the load. The load current is decomposed based on the Currents Physical Components (CPC) power theory, and the theory is also used for reference signal generation for the control of Switching Compensators used for power factor improvement. Results of simulation in MATLAB Simulink are presented as well.
|
9 |
Spectrum Allocation in Networks with Finite Sources and Data-Driven Characterization of Users' Stochastic DynamicsAli, Ahsan-Abbas 25 May 2015 (has links)
During emergency situations, the public safety communication systems (PSCSs) get overloaded with high traffic loads. Note that these PSCSs are finite source networks. The goal of our study is to propose techniques for an efficient allocation of spectrum in finite source networks that can help alleviate the overloading of PSCSs. In a PSCS, there are two system segments, one for the system-access control and the other for communications, each having dedicated frequency channels. The first part of our research, consisting of three projects, is based on modeling and analysis of finite source systems for optimal spectrum allocation, for both access-control and communications. In the first project, Chapter 2, we study the allocation of spectrum based on the concept of cognitive radio systems. In the second project, Chapter 3, we study the optimal communication channel allocation by call admission and preemption control. In the third project, Chapter 4, we study the optimal joint allocation of frequency channels for access-control and communications. Note that the aforementioned spectrum allocation techniques require the knowledge of the call traffic parameters and the priority levels of the users in the system. For practical systems, these required pieces of information are extracted from the call records meta-data. A key fact that should be considered while analyzing the call records is that the call arrival traffic and the users priority levels change with a change in events on the ground. This is so because a change in events on the ground affects the communication behavior of the users in the system, which affects the call arrival traffic and the priority levels of the users. Thus, the first and the foremost step in analyzing the call records data for a given user, for extracting the call traffic information, is to segment the data into time intervals of homogeneous or stationary communication behavior of the user. Note that such a segmentation of the data of a practical PSCS is the goal of our fourth project, Chapter 5, which constitutes the second part of our study.
|
10 |
SCALABLE TECHNIQUES FOR FAILURE RECOVERY AND LOCALIZATIONCho, Sangman January 2011 (has links)
Failure localization and recovery is one of the most important issues in network management to provide continuous connectivity to users. In this dissertation, we develop several algorithms for network failure localization and recovery. First, to achieve resilient multipath routing we introduce the concept of Independent Directed Acyclic Graphs (IDAGs). Link-independent (Node-independent) DAGs satisfy the property that any path from a source to the root on one DAG is link-disjoint (node- disjoint) with any path from the source to the root on the other DAG. Given a network, we develop polynomial time algorithms to compute link-independent and node-independent DAGs. The algorithm developed in this dissertation: (1) provides multipath routing; (2) utilizes all possible edges; (3) guarantees recovery from single link failure; and (4) achieves all these with at most one bit per packet as overhead when routing is based on destination address and incoming edge. We show the effectiveness of the proposed IDAGs approach by comparing key performance indices to that of the independent trees and multiple pairs of independent trees techniques through extensive simulations. Secondly, we introduce the concept of monitoring tours (m-tours) to uniquely localize all possible failures up to k links in arbitrary all-optical networks. We establish paths and cycles that can traverse the same link at most twice (backward and forward) and call them m-tours. An m-tour is different from other existing schemes such as m-cycle and m-trail which traverse a link at most once. Closed (open) m-tours start and terminate at the same (distinct) monitor location(s). Each tour is constructed such that any shared risk linked group (SRLG) failure results in the failure of a unique combination of closed and open m-tours. We prove k-edge connectivity is a sufficient condition to localize all SRLG failures with up to k-link failures when only one monitor station is employed. We introduce an integer linear program (ILP) and a greedy scheme to find the placement of monitoring locations to uniquely localize any SRLG failures with up to k links. We provide a heuristic scheme to compute m-tours for a given network. We demonstrate the validity of the proposed monitoring method through simulations. We show that our approach using m-tours significantly reduces the number of required monitoring locations and contributes to reducing monitoring cost and network management complexity through these simulation results. Finally, this dissertation studies the problem of uniquely localizing single network element failures involving a link/node using monitoring cycles, paths, and tours. A monitoring cycle starts and ends at the same monitoring node. A monitoring path starts and ends at distinct monitoring nodes. A monitoring tour starts and ends at a monitoring station, however may traverse a link twice, once in each direction. The failure of any link/node results in the failure of a unique combination of cycles/paths/tours. We develop the necessary theories for monitoring single element (link/node) failures using only one monitoring station and cycles/tours respectively. We show that the scheme employing monitoring tours can decrease the number of monitors required compared to the scheme employing monitoring cycles and paths. With the efficient monitoring approach that uses monitoring tours, the problem of localizing up to k element (link/node) failures using only single monitor is also considered. Through the simulations, we verify the effectiveness of our monitoring algorithms.
|
Page generated in 0.1093 seconds