• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 44
  • 10
  • 6
  • 4
  • 4
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 84
  • 29
  • 21
  • 21
  • 16
  • 13
  • 11
  • 10
  • 8
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Belief Propagation and Algorithms for Mean-Field Combinatorial Optimisations

Khandwawala, Mustafa January 2014 (has links) (PDF)
We study combinatorial optimization problems on graphs in the mean-field model, which assigns independent and identically distributed random weights to the edges of the graph. Specifically, we focus on two generalizations of minimum weight matching on graphs. The first problem of minimum cost edge cover finds application in a computational linguistics problem of semantic projection. The second problem of minimum cost many-to-one matching appears as an intermediate optimization step in the restriction scaffold problem applied to shotgun sequencing of DNA. For the minimum cost edge cover on a complete graph on n vertices, where the edge weights are independent exponentially distributed random variables, we show that the expectation of the minimum cost converges to a constant as n →∞ For the minimum cost many-to-one matching on an n x m complete bipartite graph, scaling m as [ n/α ] for some fixed α > 1, we find the limit of the expected minimum cost as a function of α. For both problems, we show that a belief propagation algorithm converges asymptotically to the optimal solution. The belief propagation algorithm yields a near optimal solution with lesser complexity than the known best algorithms designed for optimality in worst-case settings. Our proofs use the machinery of the objective method and local weak convergence, which are ideas developed by Aldous for proving the ζ(2) limit for the minimum cost bipartite matching. We use belief propagation as a constructive proof technique to supplement the objective method. Recursive distributional equations(RDEs) arise naturally in the objective method approach. In a class of RDEs that arise as extensions of the minimum weight matching and travelling salesman problems, we prove existence and uniqueness of a fixed point distribution, and characterize its domain of attraction.
72

Designing Energy-Aware Optimization Techniques through Program Behaviour Analysis

Kommaraju, Ananda Varadhan January 2014 (has links) (PDF)
Green computing techniques aim to reduce the power foot print of modern embedded devices with particular emphasis on processors, the power hot-spots of these devices. In this thesis we propose compiler-driven and profile-driven optimizations that reduce power consumption in a modern embedded processor. We show that these optimizations reduce power consumption in functional units and memory subsystems with very low performance loss. We present three new techniques to reduce power consumption in processors, namely, transition aware scheduling, leakage reduction in data caches using criticality analysis, and dynamic power reduction in data caches using locality analysis of data regions. A novel instruction scheduling technique to address leakage power consumption in functional units is proposed. This scheduling technique, transition aware scheduling, is motivated by idle periods that arise in the utilization of functional units during program execution. A continuously large idle period in a functional unit can be exploited to place the unit in low power state. This novel scheduling algorithm increases the duration of idle periods without hampering performance and drives power gating in these periods. A power model defined with idle cycles as a parameter shows that this technique saves up to 25% of leakage power with very low performance impact. In modern embedded programs, data regions can be classified as critical and non-critical. Critical data regions significantly impact the performance. A new technique to identify such data regions through profiling is proposed. This technique along with a new criticality based cache policy is used to control the power state of the data cache. This scheme allocates non-critical data regions to low-power cache regions, thereby reducing leakage power consumption by up to 40% without compromising on the performance. This profiling technique is extended to identify data regions that have low locality. Some data regions have high data reuse. A locality based cache policy based on cache parameters like size and associativity is proposed. This scheme reduces dynamic as well as static power consumption in the cache subsystem. This optimization reduces 25% of the total power consumption in the data caches without hampering the execution time. In this thesis, the problem of power consumption of a program is decoupled from the number of processor cores. The underlying architecture model is simplified to abstract away a variety of processor scenarios. This simplified model can be scaled up to be implemented in various multi-core architecture models like Chip Multi-Processors, Simultaneous Multi-Threaded Processors, Chip Multi-Threaded Processors, to name a few. The three techniques proposed in this thesis leverage underlying hardware features like low power functional units, drowsy caches and split data caches. These techniques reduce power consumption of a wide range of benchmarks with low performance loss.
73

A Coarse Grained Reconfigurable Architecture Framework Supporting Macro-Dataflow Execution

Varadarajan, Keshavan 12 1900 (has links) (PDF)
A Coarse-Grained Reconfigurable Architecture (CGRA) is a processing platform which constitutes an interconnection of coarse-grained computation units (viz. Function Units (FUs), Arithmetic Logic Units (ALUs)). These units communicate directly, viz. send-receive like primitives, as opposed to the shared memory based communication used in multi-core processors. CGRAs are a well-researched topic and the design space of a CGRA is quite large. The design space can be represented as a 7-tuple (C, N, T, P, O, M, H) where each of the terms have the following meaning: C -choice of computation unit, N -choice of interconnection network, T -Choice of number of context frame (single or multiple), P -presence of partial reconfiguration, O choice of orchestration mechanism, M -design of memory hierarchy and H host-CGRA coupling. In this thesis, we develop an architectural framework for a Macro-Dataflow based CGRA where we make the following choice for each of these parameters: C -ALU, N -Network-on-Chip (NoC), T -Multiple contexts, P -support for partial reconfiguration, O -Macro Dataflow based orchestration, M -data memory banks placed at the periphery of the reconfigurable fabric (reconfigurable fabric is the name given to the interconnection of computation units), H -loose coupling between host processor and CGRA, enabling our CGRA to execute an application independent of the host-processor’s intervention. The motivations for developing such a CGRA are: To execute applications efficiently through reduction in reconfiguration time (i.e. the time needed to transfer instructions and data to the reconfigurable fabric) and reduction in execution time through better exploitation of all forms of parallelism: Instruction Level Parallelism (ILP), Data Level Parallelism (DLP) and Thread/Task Level Parallelism (TLP). We choose a macro-dataflow based orchestration framework in combination with partial reconfiguration so as to ease exploitation of TLP and DLP. Macro-dataflow serves as a light weight synchronization mechanism. We experiment with two variants of the macro-dataflow orchestration units, namely: hardware controlled orchestration unit and the compiler controlled orchestration unit. We employ a NoC as it helps reduce the reconfiguration overhead. To permit customization of the CGRA for a particular domain through the use of domain-specific custom-Intellectual Property (IP) blocks. This aids in improving both application performance and makes it energy efficient. To develop a CGRA which is completely programmable and accepts any program written using the C89 standard. The compiler and the architecture were co-developed to ensure that every feature of the architecture could be automatically programmed through an application by a compiler. In this CGRA framework, the orchestration mechanism (O) and the host-CGRA coupling (H) are kept fixed and we permit design space exploration of the other terms in the 7-tuple design space. The mode of compilation and execution remains invariant of these changes, hence referred to as a framework. We now elucidate the compilation and execution flow for this CGRA framework. An application written in C language is compiled and is transformed into a set of temporal partitions, referred to as HyperOps in this thesis. The macro-dataflow orchestration unit selects a HyperOp for execution when all its inputs are available. The instructions and operands for a ready HyperOp are transferred to the reconfigurable fabric for execution. Each ALU (in the computation unit) is capable of waiting for the availability of the input data, prior to issuing instructions. We permit the launch and execution of a temporal partition to progress in parallel, which reduces the reconfiguration overhead. We further cut launch delays by keeping loops persistent on fabric and thus eliminating the need to launch the instructions. The CGRA framework has been implemented using Bluespec System Verilog. We evaluate the performance of two of these CGRA instances: one for cryptographic applications and another instance for linear algebra kernels. We also run other general purpose integer and floating point applications to demonstrate the generic nature of these optimizations. We explore various microarchitectural optimizations viz. pipeline optimizations (i.e. changing value of T ), different forms of macro dataflow orchestration such as hardware controlled orchestration unit and compiler-controlled orchestration unit, different execution modes including resident loops, pipeline parallelism, changes to the router etc. As a result of these optimizations we observe 2.5x improvement in performance as compared to the base version. The reconfiguration overhead was hidden through overlapping launching of instructions with execution making. The perceived reconfiguration overhead is reduced drastically to about 9-11 cycles for each HyperOp, invariant of the size of the HyperOp. This can be mainly attributed to the data dependent instruction execution and use of the NoC. The overhead of the macro-dataflow execution unit was reduced to a minimum with the compiler controlled orchestration unit. To benchmark the performance of these CGRA instances, we compare the performance of these with an Intel Core 2 Quad running at 2.66GHz. On the cryptographic CGRA instance, running at 700MHz, we observe one to two orders of improvement in performance for cryptographic applications and up to one order of magnitude performance degradation for linear algebra CGRA instance. This relatively poor performance of linear algebra kernels can be attributed to the inability in exploiting ILP across computation units interconnected by the NoC, long latency in accessing data memory placed at the periphery of the reconfigurable fabric and unavailability of pipelined floating point units (which is critical to the performance of linear algebra kernels). The superior performance of the cryptographic kernels can be attributed to higher computation to load instruction ratio, careful choice of custom IP block, ability to construct large HyperOps which allows greater portion of the communication to be performed directly (as against communication through a register file in a general purpose processor) and the use of resident loops execution mode. The power consumption of a computation unit employed on the cryptography CGRA instance, along with its router is about 76mW, as estimated by Synopsys Design Vision using the Faraday 90nm technology library for an activity factor of 0.5. The power of other instances would be dependent on specific instantiation of the domain specific units. This implies that for a reconfigurable fabric of size 5 x 6 the total power consumption is about 2.3W. The area and power ( 84mW) dissipated by the macro dataflow orchestration unit, which is common to both instances, is comparable to a single computation unit, making it an effective and low overhead technique to exploit TLP.
74

Návrh sanace stokové sítě vybrané části urbanizovaného celku / Design of sewer network rehabilitation of choice parts of urbanized areas

Horák, Ondřej January 2013 (has links)
The thesis approaches the problem of design of sewer network rehabilitation of urbanized area named Kamenná čtvrť. The thesis is divided into several parts. In the first part of the thesis there is the accompaying report, which describs characteristics of the area as a whole. The second part of the thesis is about describing each part of the network, clasification of defects according to the ČSN EN 13508 -2 and fothcoming TNV 75 6905 and photographs from network camera survey. In the third part of this thesis there is a chart with all defects located on the network. Also there is an evaluation of each part of network according to prepared TNV 75 6905. The fourth part of the thesis talks about evaluation of each part of the network as a whole. In the fifth part of the thesis we design possible choises of reahabilitation on the network. The sixth part of the thesis contains the economic aspects of possible alternatives and their comparsion. Last part of the thesis contains a hydrotechnical calculation, which means calculation of the current situation and the possible alternatives. There is an individual evaluation of the calculation and if its necessary optimizations of the network.
75

Efektivn­ metoda Äten­ adresovch poloek v souborov©m syst©mu Ext4 / An Efficient Way to Allocate and Read Directory Entries in the Ext4 File System

Pazdera, Radek January 2013 (has links)
C­lem t©to prce je zvit vkon sekvenÄn­ho prochzen­ adres v souborov©m syst©mu ext4. Datov struktura HTree, jen je v souÄasn© dobÄ pouita k implementaci adresu v ext4 zvld velmi dobe nhodn© p­stupy do adrese, avak nen­ optimalizovna pro sekvenÄn­ prochzen­. Tato prce pin­ analzu tohoto probl©mu. Nejprve studuje implementaci souborov©ho syst©mu ext4 a dal­ch subsyst©mu Linuxov©ho jdra, kter© s n­m souvis­. Pro vyhodnocen­ vkonu souÄasn© implementace adresov©ho indexu byla vytvoena sada test. Na zkladÄ vsledk tÄchto test bylo navreno een­, kter© bylo nslednÄ implementovno do Linuxov©ho jdra. V zvÄru t©to prce naleznete vyhodnocen­ p­nosu a porovnn­ vkonu nov© implementace s dal­mi souborovmi syst©my v Linuxu.
76

Analyses and Scalable Algorithms for Byzantine-Resilient Distributed Optimization

Kananart Kuwaranancharoen (16480956) 03 July 2023 (has links)
<p>The advent of advanced communication technologies has given rise to large-scale networks comprised of numerous interconnected agents, which need to cooperate to accomplish various tasks, such as distributed message routing, formation control, robust statistical inference, and spectrum access coordination. These tasks can be formulated as distributed optimization problems, which require agents to agree on a parameter minimizing the average of their local cost functions by communicating only with their neighbors. However, distributed optimization algorithms are typically susceptible to malicious (or "Byzantine") agents that do not follow the algorithm. This thesis offers analysis and algorithms for such scenarios. As the malicious agent's function can be modeled as an unknown function with some fundamental properties, we begin in the first two parts by analyzing the region containing the potential minimizers of a sum of functions. Specifically, we explicitly characterize the boundary of this region for the sum of two unknown functions with certain properties. In the third part, we develop resilient algorithms that allow correctly functioning agents to converge to a region containing the true minimizer under the assumption of convex functions of each regular agent. Finally, we present a general algorithmic framework that includes most state-of-the-art resilient algorithms. Under the strongly convex assumption, we derive a geometric rate of convergence of all regular agents to a ball around the optimal solution (whose size we characterize) for some algorithms within the framework.</p>
77

Locality Optimizations for Regular and Irregular Applications

Rajbhandari, Samyam 28 December 2016 (has links)
No description available.
78

Adapting the polytope model for dynamic and speculative parallelization

Jimborean, Alexandra 14 September 2012 (has links) (PDF)
In this thesis, we present a Thread-Level Speculation (TLS) framework whose main feature is to speculatively parallelize a sequential loop nest in various ways, to maximize performance. We perform code transformations by applying the polyhedral model that we adapted for speculative and runtime code parallelization. For this purpose, we designed a parallel code pattern which is patched by our runtime system according to the profiling information collected on some execution samples. We show on several benchmarks that our framework yields good performance on codes which could not be handled efficiently by previously proposed TLS systems.
79

The Organic Permeable Base Transistor:

Kaschura, Felix 23 October 2017 (has links) (PDF)
Organic transistors are a core component for basically all relevant types of fully organic circuits and consumer electronics. The Organic Permeable Base Transistor (OPBT) is a transistor with a sandwich geometry like in Organic Light Emitting Diodes (OLEDs) and has a vertical current transport. Therefore, it combines simple fabrication with high performance due its short transit paths and has a fairly good chance of being used in new organic electronics applications that have to fall back to silicon transistors up to now. A detailed understanding of the operation mechanism that allows a targeted engineering without trial-and-error is required and there is a need for universal optimization techniques which require as little effort as possible. Several mechanisms that explain certain aspects of the operation are proposed in literature, but a comprehensive study that covers all transistor regimes in detail is not found. High performances have been reported for organic transistors which are, however, usually limited to certain materials. E. g., n-type C60 OPBTs are presented with excellent performance, but an adequate p-type OPBT is missing. In this thesis, the OPBT is investigated under two aspects: Firstly, drift-diffusion simulations of the OPBT are evaluated. By comparing the results from different geometry parameters, conclusions about the detailed operation mechanism can be drawn. It is discussed where charge carriers flow in the device and which parameters affect the performance. In particular, the charge carrier transmission through the permeable base layer relies on small openings. Contrary to an intuitive view, however, the size of these openings does not limit the device performance. Secondly, p-type OPBTs using pentacene as the organic semiconductor are fabricated and characterized with the aim to catch up with the performance of the n-type OPBTs. It is shown how an additional seed-layer can improve the performance by changing the morphology, how leakage currents can be defeated, and how parameters like the layer thickness should be chosen. With the combination of all presented optimization strategies, pentacene OPBTs are built that show a current density above 1000 mA/cm^2 and a current gain of 100. This makes the OPBT useful for a variety of applications, and also complementary logic circuits are possible now. The discussed optimization strategies can be extended and used as a starting point for further enhancements. Together with the deep understanding obtained from the simulations, purposeful modifications can be studied that have a great potential. / Organische Transistoren stellen eine Kernkomponente für praktisch jede Art von organischen Schaltungen und Elektronikgeräten dar. Der “Organic Permeable Base Transistor” (OPBT, dt.: Organischer Transistor mit durchlässiger Basis) ist ein Transistor mit einem Schichtaufbau wie in organischen Leuchtdioden (OLEDs) und weist einen vertikalen Stromfluss auf. Somit wird eine einfache Herstellung mit gutem Verhalten und Leistungsfähigkeit kombiniert, welche aus den kurzen Weglängen der Ladungsträger resultiert. Damit ist der OPBT bestens für neuartige organische Elektronik geeignet, wofür andernfalls auf Siliziumtransistoren zurückgegriffen werden müsste. Notwendig sind ein tiefgehendes Verständnis der Funktionsweise, welches ein zielgerichtetes Entwickeln der Technologie ohne zahlreiche Fehlversuche ermöglicht, sowie universell einsetzbare und leicht anwendbare Optimierungsstrategien. In der Literatur werden einige Mechanismen vorgeschlagen, die Teile der Funktionsweise betrachten, aber eine umfassende Untersuchung, die alle Arbeitsbereiche des Transistors abdeckt, findet sich derzeit noch nicht. Ebenso gibt es einige Veröffentlichungen, die Transistoren mit hervorragender Leistungsfähigkeit zeigen, aber meist nur mit Materialien für einen Ladungsträgertyp erzielt werden. So gibt es z.B. n-typ OPBTs auf Basis von C60, für die bisher vergleichbare p-typ OPBTs fehlen. In dieser Arbeit werden daher die folgenden beiden Aspekte des OPBT untersucht: Einerseits werden Drift-Diffusions-Simulationen von OPBTs untersucht und ausgewertet. Kennlinien und Ergebnisse von Transistoren aus verschiedenen Parametervariationen können verglichen werden und erlauben damit Rückschlüsse auf verschiedenste Aspekte der Funktionsweise. Der Fluss der Ladungsträger sowie für die Leistungsfähigkeit wichtige Parameter werden besprochen. Insbesondere sind für die Transmission von Ladungsträgern durch die Basisschicht kleine Öffnungen in dieser nötig. Die Größe dieser Öffnungen stellt jedoch entgegen einer intuitiven Vorstellung keine Begrenzung für die erreichbaren Ströme dar. Andererseits werden p-typ OPBTs auf Basis des organischen Halbleiters Pentacen hergestellt und charakterisiert. Das Ziel ist hierbei die Leistungsfähigkeit an die n-typ OPBTs anzugleichen. In dieser Arbeit wird gezeigt, wie durch eine zusätzliche Schicht die Morphologie und die Transmission verbessert werden kann, wie Leckströme reduziert werden können und welche Parameter bei der Optimierung besondere Beachtung finden sollten. Mit all den Optimierungen zusammen können Pentacen OPBTs hergestellt werden, die Stromdichten über 1000 mA/cm^2 und eine Stromverstärkung über 100 aufweisen. Damit kann der OPBT für eine Vielzahl von Anwendungen eingesetzt werden, unter anderem auch in Logik-Schaltungen zusammen mit n-typ OPBTs. Die besprochenen Optimierungen können weiterentwickelt werden und somit als Startpunkt für anschließende Verbesserungen dienen. In Verbindung mit erlangten Verständnis aus den Simulationsergebnissen können somit aussichtsreiche Veränderungen an der Struktur des OPBTs zielgerichtet eingeführt werden.
80

Rekonstrukce blízkého pole antén / Reconstruction of the Antenna Near-Field

Puskely, Jan January 2011 (has links)
Cílem disertační práce je navrhnout efektivně pracující algoritmus, který na základě bezfázového měření v blízkém poli antény bude schopen zrekonstruovat komplexní blízké pole antény resp. vyzařovací diagram antény ve vzdáleném poli. Na základě těchto úvah byly zkoumány vlastnosti minimalizačního algoritmu. Zejména byl analyzován a vhodně zvolen minimalizační přistup, optimalizační metoda a v neposlední řadě i optimalizační funkce tzv. funkcionál. Dále pro urychlení celého minimalizačního procesu byly uvažovány prvotní odhady. A na závěr byla do minimalizačního algoritmu zahrnuta myšlenka nahrazující hledané elektrické pole několika koeficienty. Na základě předchozích analýz byla navržená bezfázová metoda pro charakterizaci vyzařovacích vlastností antén. Tato metoda kombinuje globální optimalizaci s obrazovou kompresní metodou a s lokální metodou ve spojení s konvečním amplitudovým měřením na dvou površích. V našem případě je globální optimalizace použita k nalezení globálního minima minimalizovaného funkcionálu, kompresní metoda k redukci neznámých proměnných na apertuře antény a lokální metoda zajišťuje přesnější nalezení minima. Navržená metoda je velmi robustní a mnohem rychlejší než jiné dostupné minimalizační algoritmy. Další výzkum byl zaměřen na možnosti využití měřených amplitud pouze z jednoho měřícího povrchu pro rekonstrukci vyzařovacích charakteristik antén a využití nového algoritmu pro rekonstrukci fáze na válcové geometrii.

Page generated in 0.4909 seconds