Global ETD Search

471	Binary Redundancy Elimination Fernández Gómez, Manuel 13 April 2005 (has links) Dos de las limitaciones de rendimiento más importantes en los procesadores de hoy en día provienen de las operaciones de memoria y de las dependencias de control. Para resolver estos problemas, las memorias cache y los predictores de salto son dos alternativas hardware bien conocidas que explotan, entre otros factores, el reuso temporal de memoria y la correlación de saltos. En otras palabras, estas estructuras tratan de explotar la redundancia dinámica existente en los programas. Esta redundancia proviene parcialmente de la forma en que los programadores escriben código, pero también de limitaciones existentes en el modelo de compilación tradicional, lo cual introduce instrucciones de memoria y de salto innecesarias. Pensamos que los compiladores deberían ser muy agresivos optimizando programas, y por tanto ser capaces de eliminar una parte importante de esta redundancia. Por otro lado, las optimizaciones aplicadas en tiempo de enlace o directamente al programa ejecutable final han recibido una atención creciente en los últimos años, debido a limitaciones existentes en el modelo de compilación tradicional. Incluso aplicando sofisticados análisis y transformaciones interprocedurales, un compilador tradicional no es capaz de optimizar un programa como una entidad completa. Un problema similar aparece aplicando técnicas de compilación dirigidas por profiling: grandes proyectos se ven forzados a recompilar todos y cada uno de sus módulos para aprovechar dicha información. Por el contrario, seria más conveniente construir la aplicación completa, instrumentarla para obtener información de profiling y optimizar entonces el binario final sin recompilar ni un solo fichero fuente.En esta tesis presentamos nuevas técnicas de compilación dirigidas por profiling para eliminar la redundancia encontrada en programas ejecutables a nivel binario (esto es, redundancia binaria), incluso aunque estos programas hayan sido compilados agresivamente con un novísimo compilador comercial. Nuestras técnicas de eliminación de redundancia están diseñadas para eliminar operaciones de memoria y de salto redundantes, que son las más importantes para mitigar los problemas de rendimiento que hemos mencionado. Estas propuestas están basadas en técnicas de eliminación de redundancia parcial sensibles al camino de ejecución. Los resultados muestran que aplicando nuestras optimizaciones, somos capaces de alcanzar una reducción del 14% en el tiempo de ejecución de nuestro conjunto de programas.En este trabajo también revisamos el problemas del análisis de alias en programas ejecutables, identificando el por qué la desambiguación de memoria es uno de los puntos débiles en la modificación de código objeto. Proponemos varios análisis para ser aplicados en el contexto de optimizadores binarios. Primero un análisis de alias estricto para descubrir dependencias de memoria sensibles al camino de ejecución, el cual es usado en nuestras optimizaciones para la eliminación de redundancias de memoria. Seguidamente, dos análisis especulativos de posibles alias para detección de independencias de memoria. Estos análisis están basados en introducir información especulativa en tiempo de análisis, lo que incrementa la precisión en partes importantes de código manteniendo el análisis eficiente. Los resultados muestran que nuestras propuestas son altamente útiles para incrementar la desambiguación de memoria de código binario, lo que se traduce en oportunidades para aplicar optimizaciones. Todos nuestros algoritmos, tanto de análisis como de optimización, han sido implementados en un optimizador binario, enfatizando los problemas más relevantes en la aplicaciones de nuestros algoritmos en código ejecutable, sin la ayuda de gran parte de la información de alto nivel presente en compiladores tradicionales. / Two of the most important performance limiters in today's processor families comes from solving the memory wall and handling control dependencies. In order to address these issues, cache memories and branch predictors are well-known hardware proposals that take advantage of, among other things, exploiting both temporal memory reuse and branch correlation. In other words, they try to exploit the dynamic redundancy existing in programs. This redundancy comes partly from the way that programmers write source code, but also from limitations in the compilation model of traditional compilers, which introduces unnecessary memory and conditional branch instructions. We believe that today's optimizing compilers should be very aggressive in optimizing programs, and then they should be expected to optimize a significant part of this redundancy away.On the other hand, optimizations performed at link-time or directly applied to final program executables have received increased attention in recent years, due to limitations in the traditional compilation model. First, even though performing sophisticated interprocedural analyses and transformations, traditional compilers do not have the opportunity to optimize the program as a whole. A similar problem arises when applying profile-directe compilation techniques: large projects will be forced to re-build every source file to take advantage of profile information. By contrast, it would be more convenient to build the full application, instrument it to obtain profile data and then re-optimize the final binary without recompiling a single source file.In this thesis we present new profile-guided compiler optimizations for eliminating the redundancy encountered on executable programs at binary level (i.e.: binary redundancy), even though these programs have been compiled with full optimizations using a state-ofthe- art commercial compiler. In particular, our Binary Redundancy Elimination (BRE) techniques are targeted at eliminating both redundant memory operations and redundant conditional branches, which are the most important ones for addressing the performance issues that we mentioned above in today's microprocessors. These new proposals are mainly based on Partial Redundancy Elimination (PRE) techniques for eliminating partial redundancies in a path-sensitive fashion. Our results show that, by applying our optimizations, we are able to achieve a 14% execution time reduction in our benchmark suite.In this work we also review the problem of alias analysis at the executable program level, identifying why memory disambiguation is one of the weak points of object code modification. We then propose several alias analyses to be applied in the context of linktime or executable code optimizers. First, we present a must-alias analysis to recognize memory dependencies in a path- sensitive fashion, which is used in our optimization for eliminating redundant memory operations. Next, we propose two speculative may-alias data-flow algorithms to recognize memory independencies. These may-alias analyses are based on introducing unsafe speculation at analysis time, which increases alias precision on important portions of code while keeping the analysis reasonably cost-efficient. Our results show that our analyses prove to be very useful for increasing memory disambiguation accuracy of binary code, which turns out into opportunities for applying optimizations.All our algorithms, both for the analyses and the optimizations, have been implemented within a binary optimizer, which overcomes most of the existing limitations of traditional source-code compilers. Therefore, our work also points out the most relevant issues of applying our algorithms at the executable code level, since most of the high-level information available in traditional compilers is lost. binary optimizers profile-guided optimizations redundancy elimination alias analysis 3304. Tecnologia dels ordinadors 004 519.1 62
472	HW/SW mechanisms for instruction fusion, issue and commit in modern u-processors Deb, Abhishek 03 May 2012 (has links) In this thesis we have explored the co-designed paradigm to show alternative processor design points. Specifically, we have provided HW/SW mechanisms for instruction fusion, issue and commit for modern processors. We have implemented a co-designed virtual machine monitor that binary translates x86 instructions into RISC like micro-ops. Moreover, the translations are stored as superblocks, which are a trace of basic blocks. These superblocks are further optimized using speculative and non-speculative optimizations. Hardware mechanisms exists in-order to take corrective action in case of misspeculations. During the course of this PhD we have made following contributions. Firstly, we have provided a novel Programmable Functional unit, in-order to speed up general-purpose applications. The PFU consists of a grid of functional units, similar to CCA, and a distributed internal register file. The inputs of the macro-op are brought from the Physical Register File to the internal register file using a set of moves and a set of loads. A macro-op fusion algorithm fuses micro-ops at runtime. The fusion algorithm is based on a scheduling step that indicates whether the current fused instruction is beneficial or not. The micro-ops corresponding to the macro-ops are stored as control signals in a configuration. The macro-op consists of a configuration ID which helps in locating the configurations. A small configuration cache is present inside the Programmable Functional unit, that holds these configurations. In case of a miss in the configuration cache configurations are loaded from I-Cache. Moreover, in-order to support bulk commit of atomic superblocks that are larger than the ROB we have proposed a speculative commit mechanism. For this we have proposed a Speculative commit register map table that holds the mappings of the speculatively committed instructions. When all the instructions of the superblock have committed the speculative state is copied to Backend Register Rename Table. Secondly, we proposed a co-designed in-order processor with with two kinds of accelerators. These FU based accelerators run a pair of fused instructions. We have considered two kinds of instruction fusion. First, we fused a pair of independent loads together into vector loads and execute them on vector load units. For the second kind of instruction fusion we have fused a pair of dependent simple ALU instructions and execute them in Interlock Collapsing ALUs (ICALU). Moreover, we have evaluated performance of various code optimizations such as list-scheduling, load-store telescoping and load hoisting among others. We have compared our co-designed processor with small instruction window out-of-order processors. Thirdly, we have proposed a co-designed out-of-order processor. Specifically we have reduced complexity in two areas. First of all, we have co-designed the commit mechanism, that enable bulk commit of atomic superblocks. In this solution we got rid of the conventional ROB, instead we introduce the Superblock Ordering Buffer (SOB). SOB ensures program order is maintained at the granularity of the superblock, by bulk committing the program state. The program state consists of the register state and the memory state. The register state is held in a per superblock register map table, whereas the memory state is held in gated store buffer and updated in bulk. Furthermore, we have tackled the complexity of Out-of-Order issue logic by using FIFOs. We have proposed an enhanced steering heuristic that fixes the inefficiencies of the existing dependence-based heuristic. Moreover, a mechanism to release the FIFO entries earlier is also proposed that further improves the performance of the steering heuristic. / En aquesta tesis hem explorat el paradigma de les màquines issue i commit per processadors actuals. Hem implementat una màquina virtual que tradueix binaris x86 a micro-ops de tipus RISC. Aquestes traduccions es guarden com a superblocks, que en realitat no és més que una traça de virtuals co-dissenyades. En particular, hem proposat mecanismes hw/sw per a la fusió d’instruccions, blocs bàsics. Aquests superblocks s’optimitzen utilitzant optimizacions especualtives i d’altres no speculatives. En cas de les optimizations especulatives es consideren mecanismes per a la gestió de errades en l’especulació. Al llarg d’aquesta tesis s’han fet les següents contribucions: Primer, hem proposat una nova unitat functional programmable (PFU) per tal de millorar l’execució d’aplicacions de proposit general. La PFU està formada per un conjunt d’unitats funcionals, similar al CCA, amb un banc de registres intern a la PFU distribuït a les unitats funcionals que la composen. Les entrades de la macro-operació que s’executa en la PFU es mouen del banc de registres físic convencional al intern fent servir un conjunt de moves i loads. Un algorisme de fusió combina més micro-operacions en temps d’execució. Aquest algorisme es basa en un pas de planificació que mesura el benefici de les decisions de fusió. Les micro operacions corresponents a la macro operació s’emmagatzemen com a senyals de control en una configuració. Les macro-operacions tenen associat un identificador de configuració que ajuda a localitzar d’aquestes. Una petita cache de configuracions està present dintre de la PFU per tal de guardar-les. En cas de que la configuració no estigui a la cache, les configuracions es carreguen de la cache d’instruccions. Per altre banda, per tal de donar support al commit atòmic dels superblocks que sobrepassen el tamany del ROB s’ha proposat un mecanisme de commit especulatiu. Per aquest mecanisme hem proposat una taula de mapeig especulativa dels registres, que es copia a la taula no especulativa quan totes les instruccions del superblock han comitejat. Segon, hem proposat un processador en order co-dissenyat que combina dos tipus d’acceleradors. Aquests acceleradors executen un parell d’instruccions fusionades. S’han considerat dos tipus de fusió d’instructions. Primer, combinem un parell de loads independents formant loads vectorials i els executem en una unitat vectorial. Segon, fusionem parells d’instruccions simples d’alu que són dependents i que s’executaran en una Interlock Collapsing ALU (ICALU). Per altra aquestes tecniques les hem evaluat conjuntament amb diverses optimizacions com list scheduling, load-store telescoping i hoisting de loads, entre d’altres. Aquesta proposta ha estat comparada amb un processador fora d’ordre. Tercer, hem proposat un processador fora d’ordre co-dissenyat efficient reduint-ne la complexitat en dos areas principals. En primer lloc, hem co-disenyat el mecanisme de commit per tal de permetre un eficient commit atòmic del superblocks. En aquesta solució hem substituït el ROB convencional, i en lloc hem introduït el Superblock Ordering Buffer (SOB). El SOB manté l’odre de programa a granularitat de superblock. L’estat del programa consisteix en registres i memòria. L’estat dels registres es manté en una taula per superblock, mentre que l’estat de memòria es guarda en un buffer i s’actulitza atòmicament. La segona gran area de reducció de complexitat considerarada és l’ús de FIFOs a la lògica d’issue. En aquest últim àmbit hem proposat una heurística de distribució que solventa les ineficiències de l’heurística basada en dependències anteriorment proposada. Finalment, i junt amb les FIFOs, s’ha proposat un mecanisme per alliberar les entrades de la FIFO anticipadament. co-designed virtual machines complexity - effective processors power - efficient processors microarchitecture dynamic binary optimization 004
473	Counting Bases Webb, Kerri January 2004 (has links) A theorem of Edmonds characterizes when a pair of matroids has a common basis. Enumerating the common bases of a pair of matroid is a much harder problem, and includes the #P-complete problem of counting the number of perfect matchings in a bipartite graph. We focus on the problem of counting the common bases in pairs of regular matroids, and describe a class called <i>Pfaffian matroid pairs</i> for which this enumeration problem can be solved. We prove that when a pair of regular matroids is non-Pfaffian, there is a set of common bases which certifies this, and that the number of bases in the certificate is linear in the size of the ground set of the matroids. When both matroids in a pair are series-parallel, we prove that determining if the pair is Pfaffian is equivalent to finding an edge signing in an associated graph, and in the case that the pair is non-Pfaffian, we obtain a characterization of this associated graph. Pfaffian bipartite graphs are a class of graphs for which the number of perfect matchings can be determined; we show that the class of series-parallel Pfaffian matroid pairs is an extension of the class of Pfaffian bipartite graphs. Edmonds proved that the polytope generated by the common bases of a pair of matroids is equal to the intersection of the polytopes generated by the bases for each matroid in the pair. We consider when a similar property holds for the binary space, and give an excluded minor characterization of when the binary space generated by the common bases of two matroids can not be determined from the binary spaces for the individual matroids. As a result towards a description of the lattice of common bases for a pair of matroids, we show that the lattices for the individual matroids determine when all common bases of a pair of matroids intersect a subset of the ground set with fixed cardinality. Mathematics matroid Pfaffian lattice binary space series-parallel bipartite graph matroid polytope
474	Key Randomization Countermeasures to Power Analysis Attacks on Elliptic Curve Cryptosystems Ebeid, Nevine Maurice 04 1900 (has links) It is essential to secure the implementation of cryptosystems in embedded devices agains side-channel attacks. Namely, in order to resist differential (DPA) attacks, randomization techniques should be employed to decorrelate the data processed by the device from secret key parts resulting in the value of this data. Among the countermeasures that appeared in the literature were those that resulted in a random representation of the key known as the binary signed digit representation (BSD). We have discovered some interesting properties related to the number of possible BSD representations for an integer and we have proposed a different randomization algorithm. We have also carried our study to the $\tau$-adic representation of integers which is employed in elliptic curve cryptosystems (ECCs) using Koblitz curves. We have then dealt with another randomization countermeasure which is based on randomly splitting the key. We have investigated the secure employment of this countermeasure in the context of ECCs. Binary signed-digit representations DPA countermeasures tau-adic representation Electrical and Computer Engineering
475	Efficient Reasoning Techniques for Large Scale Feature Models Mendonca, Marcilio January 2009 (has links) In Software Product Lines (SPLs), a feature model can be used to represent the similarities and differences within a family of software systems. This allows describing the systems derived from the product line as a unique combination of the features in the model. What makes feature models particularly appealing is the fact that the constraints in the model prevent incompatible features from being part of the same product. Despite the benefits of feature models, constructing and maintaining these models can be a laborious task especially in product lines with a large number of features and constraints. As a result, the study of automated techniques to reason on feature models has become an important research topic in the SPL community in recent years. Two techniques, in particular, have significant appeal for researchers: SAT solvers and Binary Decision Diagrams (BDDs). Each technique has been applied successfully for over four decades now to tackle many practical combinatorial problems in various domains. Currently, several approaches have proposed the compilation of feature models to specific logic representations to enable the use of SAT solvers and BDDs. In this thesis, we argue that several critical issues related to the use of SAT solvers and BDDs have been consistently neglected. For instance, satisfiability is a well-known NP-complete problem which means that, in theory, a SAT solver might be unable to check the satisfiability of a feature model in a feasible amount of time. Similarly, it is widely known that the size of BDDs can become intractable for large models. At the same time, we currently do not know precisely whether these are real issues when feature models, especially large ones, are compiled to SAT and BDD representations. Therefore, in our research we provide a significant step forward in the state-of-the-art by examining deeply many relevant properties of the feature modeling domain and the mechanics of SAT solvers and BDDs and the sensitive issues related to these techniques when applied in that domain. Specifically, we provide more accurate explanations for the space and/or time (in)tractability of these techniques in the feature modeling domain, and enhance the algorithmic performance of these techniques for reasoning on feature models. The contributions of our work include the proposal of novel heuristics to reduce the size of BDDs compiled from feature models, several insights on the construction of efficient domain-specific reasoning algorithms for feature models, and empirical studies to evaluate the efficiency of SAT solvers in handling very large feature models. Software Product Lines Feature Models Binary Decision Diagrams SAT Solvers Feature Model Reasoning Computer Science
476	A Tale of Two Telescopes: Taking a Closer Look at the Multiplicity Properties of Massive Stars in Cygnus Caballero, Saida M 13 August 2012 (has links) Massive stars profoundly influence the evolution of the Universe, from Galactic dynamics and structure to star formation. They are often found with bound companions. However, our knowledge of O-type multiple systems with periods in the range from years to thousands of years is incomplete due their great distances. We present results from a high angular resolution survey to find angularly resolved companions using the Fine Guidance Sensor (FGS) on the Hubble Space Telescope and using ground-based adaptive optics at Gemini North. We observed 75 O- and early B-type stars in Cyg OB2 and determined that 42% of the sample have at least one companion that meets a statistical criterion for gravitationally bound status. As a case study, we present an examination of high resolution, ultraviolet spectroscopy from Hubble Space Telescope of the photospheric spectrum of the O-supergiant in the massive X-ray binary HDE 226868 = Cyg X-1. We analyzed the ultraviolet and ground-based optical spectra to determine the effective temperature and gravity of the O9.7 Iab supergiant. Using non-LTE, line blanketed, plane parallel models from the TLUSTY grid, we obtain Teff = 28.0 +/- 2.5 kK and log g > 3.00 +/- 0.25, both lower than found in previous studies. The optical spectrum is best fit with models that have enriched He and N abundances. We fit the model spectral energy distribution for this temperature and gravity to the UV, optical, and IR fluxes to determine the angular size of and extinction towards the binary. By assuming that the supergiant rotates synchronously with the orbit, we can use the radius - distance relation to find mass estimates for both components as a function of the distance and the ratio of stellar to Roche radius. Our results indicate masses of 23+8-6 solarmasses for the supergiant and 11+5-3 solarmasses for the black hole. These results agree with subsequent mass estimates Orosz et al. (2011) based on the trigonometric parallax distance measurements of Reid et al. (2011). The results of this survey provide fundamental information on the impact of environment on massive binaries and also the role multiplicity has on massive star formation and evolution. Massive stars Binary stars Adaptive Optics Fine Guidance Sensors Cygnus OB2 Cygnus X-1 HDE226868
477	En studie i röj Lindgren, Petter January 2000 (has links) This paper presents a study of the computer game ”Minesweeper”. The aim of the game is to search through a rectangular area of mined squares without hitting any mines. By using a strategy based on making every operation as safe as possible, series of the game have been simulated. The size of the playground is four times four squares. The si- mulations indicate how often the game will succeed and which square is the best one to start at. The strategy demands advanced mathematical calculations. The account of these is the ma- jor part of my work. My investigation shows that if there are three hidden mines the game will succeed about two times out of three. It also shows that the best startingpoint is a corner. Binary variables Gauss Jordan elimination independent island neighbour proba- bility simulation system of equations.
478	Counting Bases Webb, Kerri January 2004 (has links) A theorem of Edmonds characterizes when a pair of matroids has a common basis. Enumerating the common bases of a pair of matroid is a much harder problem, and includes the #P-complete problem of counting the number of perfect matchings in a bipartite graph. We focus on the problem of counting the common bases in pairs of regular matroids, and describe a class called <i>Pfaffian matroid pairs</i> for which this enumeration problem can be solved. We prove that when a pair of regular matroids is non-Pfaffian, there is a set of common bases which certifies this, and that the number of bases in the certificate is linear in the size of the ground set of the matroids. When both matroids in a pair are series-parallel, we prove that determining if the pair is Pfaffian is equivalent to finding an edge signing in an associated graph, and in the case that the pair is non-Pfaffian, we obtain a characterization of this associated graph. Pfaffian bipartite graphs are a class of graphs for which the number of perfect matchings can be determined; we show that the class of series-parallel Pfaffian matroid pairs is an extension of the class of Pfaffian bipartite graphs. Edmonds proved that the polytope generated by the common bases of a pair of matroids is equal to the intersection of the polytopes generated by the bases for each matroid in the pair. We consider when a similar property holds for the binary space, and give an excluded minor characterization of when the binary space generated by the common bases of two matroids can not be determined from the binary spaces for the individual matroids. As a result towards a description of the lattice of common bases for a pair of matroids, we show that the lattices for the individual matroids determine when all common bases of a pair of matroids intersect a subset of the ground set with fixed cardinality. Mathematics matroid Pfaffian lattice binary space series-parallel bipartite graph matroid polytope
479	Key Randomization Countermeasures to Power Analysis Attacks on Elliptic Curve Cryptosystems Ebeid, Nevine Maurice 04 1900 (has links) It is essential to secure the implementation of cryptosystems in embedded devices agains side-channel attacks. Namely, in order to resist differential (DPA) attacks, randomization techniques should be employed to decorrelate the data processed by the device from secret key parts resulting in the value of this data. Among the countermeasures that appeared in the literature were those that resulted in a random representation of the key known as the binary signed digit representation (BSD). We have discovered some interesting properties related to the number of possible BSD representations for an integer and we have proposed a different randomization algorithm. We have also carried our study to the $\tau$-adic representation of integers which is employed in elliptic curve cryptosystems (ECCs) using Koblitz curves. We have then dealt with another randomization countermeasure which is based on randomly splitting the key. We have investigated the secure employment of this countermeasure in the context of ECCs. Binary signed-digit representations DPA countermeasures tau-adic representation Electrical and Computer Engineering
480	Efficient Reasoning Techniques for Large Scale Feature Models Mendonca, Marcilio January 2009 (has links) In Software Product Lines (SPLs), a feature model can be used to represent the similarities and differences within a family of software systems. This allows describing the systems derived from the product line as a unique combination of the features in the model. What makes feature models particularly appealing is the fact that the constraints in the model prevent incompatible features from being part of the same product. Despite the benefits of feature models, constructing and maintaining these models can be a laborious task especially in product lines with a large number of features and constraints. As a result, the study of automated techniques to reason on feature models has become an important research topic in the SPL community in recent years. Two techniques, in particular, have significant appeal for researchers: SAT solvers and Binary Decision Diagrams (BDDs). Each technique has been applied successfully for over four decades now to tackle many practical combinatorial problems in various domains. Currently, several approaches have proposed the compilation of feature models to specific logic representations to enable the use of SAT solvers and BDDs. In this thesis, we argue that several critical issues related to the use of SAT solvers and BDDs have been consistently neglected. For instance, satisfiability is a well-known NP-complete problem which means that, in theory, a SAT solver might be unable to check the satisfiability of a feature model in a feasible amount of time. Similarly, it is widely known that the size of BDDs can become intractable for large models. At the same time, we currently do not know precisely whether these are real issues when feature models, especially large ones, are compiled to SAT and BDD representations. Therefore, in our research we provide a significant step forward in the state-of-the-art by examining deeply many relevant properties of the feature modeling domain and the mechanics of SAT solvers and BDDs and the sensitive issues related to these techniques when applied in that domain. Specifically, we provide more accurate explanations for the space and/or time (in)tractability of these techniques in the feature modeling domain, and enhance the algorithmic performance of these techniques for reasoning on feature models. The contributions of our work include the proposal of novel heuristics to reduce the size of BDDs compiled from feature models, several insights on the construction of efficient domain-specific reasoning algorithms for feature models, and empirical studies to evaluate the efficiency of SAT solvers in handling very large feature models. Software Product Lines Feature Models Binary Decision Diagrams SAT Solvers Feature Model Reasoning Computer Science

Search results