Global ETD Search

151	Roteamento global de circuitos VLSI / Global routing for VLSI circuits Reimann, Tiago Jose January 2013 (has links) Este trabalho apresenta a implementação de um roteador global de circuitos integrados capaz de tratar os problemas de roteamento atuais, utilizando como referência para avaliação os circuitos de benchmark publicados durante as competições de roteamento global realizadas no ACM International Symposium on Physical Design 2007 e 2008. O roteador global desenvolvido utiliza como ferramenta principal a técnica de ripup and reroute associada às técnicas de roteamento monotônico e maze routing, ambas com grande histórico de uso nas ferramentas acadêmicas descritas também neste trabalho. O desenvolvimento da ferramenta também possui características diferenciadas e únicas, com um novo método de ordenamento das redes durante a fase de rip-up and reroute. Para a geração dos resultados foram definidas duas versões diferentes da ferramenta, sendo estas duas versões analisadas com duas diferentes técnicas de construção das árvores de roteamento, gerando no total quatro configurações da ferramenta. Como decisão de projeto, a versão principal utilizada no desenvolvimento e discussão dos resultados é a versão que prioriza a qualidade do roteamento, utilizando MSTs para construção das árvores de roteamento. Os resultados mostram que o roteador global desenvolvido é capaz de gerar resultados com boa qualidade mesmo sem fazer uso de técnicas de identificação de áreas de congestionamento, sem otimizações pós-roteamento e sem nenhuma forma de ajuste (tuning) para os diferentes circuitos de benchmark, apesar de ainda ter tempo de execução acima dos apresentados por outras ferramentas acadêmicas. O foco durante o processo de desenvolvimento e implementação da ferramenta foram os circuitos mais recentes, entretanto a ferramenta obteve ótimos resultados também para os circuitos publicados no ISPD 1998, gerando soluções com qualidade similar ou melhor que as reportadas na literatura. A diferença dos resultados deste trabalho em relação aos melhores resultados dos roteadores globais com código disponível, para circuitos 3D lançados no ISPD 2008 é de, em média, 1,78%1 na métrica de comprimento de fio sem considerar o custo das vias e de 15,56% considerando o custo da via como uma unidade de comprimento de fio (ISPD 2008), para a versão voltada a qualidade de roteamento. Já para a versão da ferramenta que busca a convergência o mais rápido possível a diferença foi de 3,39% e 16,32%, respectivamente. As maiores diferenças são encontradas nos circuitos mais difíceis de gerar uma solução sem violações. Isso mostra como as técnicas de identificação de região podem contribuir tanto para uma convergência mais rápida quanto para evitar que fios passem por rotas desnecessárias durante a fase de negociação. Na métrica que avalia as vias como custo de uma unidade de comprimento, os resultados obtidos apresentam em média 18,67% maior comprimento de fio que os melhores resultados da literatura, sendo que dois circuitos com solução sem violações2 apresentam resultado com violações utilizando a ferramenta desenvolvida neste trabalho. / This work describes the implementation of an integrated circuit global router capable of handling the current routing problems, using as a reference the evaluation of benchmark circuits from the two global routing contests held in ISPD 2007 and 2008. The developed global router uses rip-up and reroute as the main technique associated with monotonic and maze routing techniques, both with large history of use in academic tools, also described in this work. The tool also has distinctive and unique characteristics, with a new method of net ordering during the rip-up and reroute stage. In order to generate the results were defined two different versions of the tool analyzed with two different techniques of routing tree construction, generating a total of four configurations. As a design decision, the major version used in the development and discussion of results is the version that prioritizes the routing quality, using MSTs for tree construction. The results show that the global router developed is able to generate good results even without making use of techniques to identify congestion areas, without post-routing optimizations and without any form of tuning for the different benchmark circuits, despite having run time above other academic tools. The focus during the development and implementation of the tool were the newer circuits, however the tool also obtained excellent results for the circuits released in ISPD 1998, generating solutions with similar quality or better than those reported in the literature. The difference in the results of this work over the best results generated with the available code global routers for 3D circuits released in ISPD 2008 is, on average, 2.53% in wirelength metric without considering the cost of vias and 18.34% considering the cost of the vias as one wirelength unit (ISPD 2008), for the best routing quality version. As for the version of the tool that seeks convergence as soon as possible the difference was 3.82% and 17.03%, respectively. The largest differences were found in the most difficult circuits to generate a solution without violations. This shows how the techniques of congested region identification can contribute to both a faster convergence and to avoid unnecessary wire detours during the negotiation phase. In the metric that evaluates the cost of vias as one wirelength unit, the results show an average of 22.5% greater wirelength than the best results found in literature. Also, the developed global router was unable to find a violation free solution for two circuits that are known to have a violation free solution3. Microeletrônica Vlsi Roteamento : Circuitos integrados Global routing Physical synthesis CAD VLSI
152	Efficient VLSI Implementation of Arithmetic Units and Logic Circuits Katreepalli, Raghava 01 December 2017 (has links) Arithmetic units and logic circuits are critical components of any VLSI system. Thus realizing efficient arithmetic units and logic circuits is required for better performance of a data path unit and therefore microprocessor or digital signal processor (DSP). Adders are basic building blocks of any processor or data path application. For the design of high performance processing units, high-speed adders with low power consumption is a requirement. Carry Select Adder (CSA) is known to be one of the fastest adders used in many data processing applications. This first contribution of the dissertation is the design of a new CSA architecture using Manchester carry chain (MCC) in multioutput domino CMOS logic. It employs a novel MCC blocks in a hierarchical approach in the design of the CSA. The proposed MCC block is also extended in designing a power-delay and area efficient Vedic multiplier based on "Urdhva-Tiryakbhyam”. The simulation results shows that the proposed architecture achieves two fold advantages in terms of power-delay product (PDP) and hardware overhead. Apart from adders and multipliers, counters also play a major role in a data path unit. Counters are basic building blocks in many VLSI applications such as timers, memories, ADCs/DACs, frequency dividers etc. It is observed that design of counters has power overhead because of requirement of high power consumption for the clock signal distribution and undesired activity of flip-flops due to presence of clocks. The second contribution of the dissertation is the power efficient design of synchronous counters that reduces the power consumption due to clock distribution for different flip-flops and offers high reliability. The simulation results shows that the proposed counter design has lower power requirement and power-area product than existing counter architectures. Pipelines can be used for achieving high circuit operating speeds. However, as the operating frequency increases, the number of pipeline stages also increase linearly and so the memory elements. The third contribution of the dissertation is the dynamic memory-less pipeline design based on sinusoidal three-phase clocking scheme that reduces the power required by the clock and offers high circuit operating frequencies. Finally, the dissertation presents a novel tool for Boolean-function realization with minimum number of transistor in series. This tool is based on applying a new functional decomposition algorithms to decompose the initial Boolean-function into a network of smaller sub-functions and subsequently generating the final circuit. The effectiveness of proposed technique is estimated using circuit level simulations as well as using automated tool. The number of levels required using proposed technique is reduced by an average of 70% compared to existing techniques. Circuits Energy Efficient Circuits High Speed Low Power VLSI Reliability VLSI
153	Lógica quaternária de alto desempenho e baixo consumo para circuitos VLSI / Low-power high-performance quaternary for VLSI circuits Silva, Ricardo Cunha Gonçalves da January 2007 (has links) Desde a década de 60, o aprimoramento das técnicas de fabricação de circuitos integrados que usam lógica binária tem levado ao aumento exponencial na densidade de dispositivos, melhoria do desempenho, redução da energia consumida e redução dos custos de fabricação dos circuitos integrados no estado da arte. Esse avanço tem sido alcançado historicamente pela miniaturização dos dispositivos que, já em escala nanométrica, começam a encontrar limites físicos para a sua redução. Com o intuito de dar continuidade ao avanço tecnológico, muitos trabalhos têm proposto a compactação da informação através do uso de lógica não binária como solução alternativa para a melhoria de desempenho de circuitos no estado da arte. Nesse sentido, diversos trabalhos foram desenvolvidos em diferentes tecnologias que vão de circuitos bipolares a dispositivos quânticos, entretanto, até o presente momento, nenhuma tecnologia demonstrou ao mesmo tempo os requisitos de desempenho, consumo, área e confiabilidade, necessários à aplicação em circuitos de alta escala de integração. Este trabalho apresenta uma nova família de circuitos de lógica quaternária com alto desempenho, baixos consumo e área e que usa tecnologia CMOS. Os circuitos desenvolvidos neste trabalho fazem uso de três fontes de alimentação e até oito diferentes transistores com diferentes tensões de limiar para realizar a lógica quaternária. São apresentados circuitos elementares como inversores e circuitos literais e com eles construídos circuitos aritméticos e multiplexadores. Os circuitos são simulados com a ferramenta SPICE usando a tecnologia TSMC 0,18 μm e os resultados são comparados com circuitos equivalentes em lógica binária. Na comparação de um somador completo quaternário de quatro bits, por exemplo, com o circuito equivalente em lógica binária, a implementação quaternária apresenta melhoria 55% na velocidade, 63% no consumo de potência e utiliza pouco mais de duas vezes o número de transistores. Este trabalho também propõe o uso de lógica quaternária em FPGA e são desenvolvidos blocos lógicos programáveis quaternários. Resultados de mapeamento lógico de circuitos aritméticos em blocos lógicos programáveis apresentam grande redução em área e consumo de potência na implementação quaternária quando comparado aos equivalentes binários. Em alguns circuitos quaternários, o consumo de potência e o número de transistores usados são reduzidos a 3% do consumo e do número de transistores usados nos circuitos equivalentes binários, enquanto o atraso crítico é duas vezes maior do que o atraso crítico binário. / Since the decade of 60, the improvement of techniques for manufacturing integrated circuits that use binary logic has led to the exponential increase in the density of devices, improving performance, reducing energy consumption and reducing costs of manufacture of integrated circuits in the state of the art. This breakthrough has been achieved historically by the miniaturization of devices, already in nano, starting to reach physical limits to their reduction. In order to give continuity to technological advancement, many studies have proposed the compaction of information through the use of non-binary logic as an alternative for the performance improvement of the state of the art circuits. Accordingly, several studies have been developed in different technologies ranging from bipolar circuits to quantum devices, however, at the moment, no technology demonstrated at the same time the performance requirements, consumption, area and reliability necessary for the application in very large scale of integration. This paper presents a new family of quaternary logic circuits with high performance, low consumption and area, which uses CMOS technology. The circuits developed in this work make use of three power supplies and up to eight different transistors with different threshold voltages, to perform the quaternary logic. Elementary circuits such as inverters and literal circuits are presented and used to implement multiplexers and arithmetic circuits. The circuits are simulated with the SPICE tool using TSMC 0.18 μm technology and the results are compared with equivalent circuits in binary logic. Comparison of a quaternary full adder of four bits, for example, with the equivalent circuit in binary logic shows 55% improvement in speed and 63% in the power consumption for the quaternary implementation and it uses little more than twice the number of transistors. This paper also proposes the use of quaternary logic in FPGA and quaternary configurable logic blocks are developed. Logical mapping results of arithmetic circuits in configurable logic blocks show great reduction in area and power consumption of the quaternary implementation compared to the equivalent binary. In some quaternary circuits, the consumption of power and the number of transistors used are reduced to 3% of consumption and the number of transistors used in the binary equivalent circuits, while the critical delay is two times higher than the binary critical delay. Microeletrônica Vlsi Desempenho : Circuitos integrados Multiple valued logic Quaternary logic VLSI circuits FPGA
154	Modelagem de hardware para codificação de vídeo e arquitetura de compensação de movimento segundo o padrão H.264/AVC / Hardware modeling for video coding and motion compensation architecture for the H.264/AVC standard Zatt, Bruno January 2008 (has links) Esta dissertação é composta de duas partes principais em que apresenta, em sua primeira parte, o desenvolvimento de uma arquitetura de hardware para compensação de movimento para decodificadores de vídeo segundo o padrão H.264/AVC. A segunda parte apresenta a modelagem de uma arquitetura de hardware para codificação de vídeo segundo o mesmo padrão. Também são apresentados os conceitos básicos da codificação e decodificação de vídeo digital segundo o padrão H.264/AVC. A arquitetura desenvolvida para compensação de movimento, denominada HP422- MoCHA (High Profile 4:2:2 Motion Compensation Hardware Architecture) (ZATT, 2008), baseada na arquitetura MoCHA (Motion Compensator Hardware Architecture) (AZEVEDO, 2007), suporta o conjunto de ferramentas da compensação de movimento para o perfil High 4:2:2 do H.264/AVC. Esta arquitetura está particionada em três blocos principais: Preditor de Vetores de Movimento, Acesso à Memória e Processador de Amostras. Esses blocos funcionam na forma de um pipeline, existindo buffers entre os mesmos para armazenar os resultados intermediários. A descrição foi desenvolvida com a linguagem VHDL e alcança desempenho para decodificar, em tempo real, vídeos HDTV 1920x1080 a 30 quadros por segundo. Na literatura atual não foi encontrada nenhuma solução detalhada para a compensação de movimento no perfil High 4:2:2 do padrão H.264/AVC. Uma nova estrutura para interpolação de amostra na compensação de movimento foi proposta, sendo que sua versão para o Perfil Main se mostra 17% mais compacta, em termos de gates, que a solução mais compacta encontrada na literatura, sem degradação de performance. A segunda parte do texto detalha a modelagem de uma arquitetura de codificação de vídeo segundo o H.264/AVC. A descrição utiliza a linguagem SystemC e consumiu aproximadamente 15.000 linhas de código. Seu projeto foi desenvolvido com o objetivo de codificar vídeo H.264/AVC segundo o perfil Main do padrão com desempenho para codificar vídeos 1920x1080 em tempo real, a 30 quadros por segundo. A modelagem alcançou o objetivo principal de chegar a uma implementação funcional de um codificador, embora assumindo diversas restrições de codificação, permitindo a caracterização temporal e de comunicação do codificador. Dessa forma, o modelo se mostra uma poderosa ferramenta para o desenvolvimento do sistema de codificação em HW, desde a etapa de projeto até a verificação final. Não foi encontrado na literatura, até o presente momento, nenhum trabalho que descreva uma modelagem em alto nível de um hardware para o codificador, ou mesmo para o decodificador, de vídeo H.264/AVC. / This thesis is comprised by two main parts that present, in the first part, the development of a motion compensation hardware architecture for video decoders in compliance with the H.264/AVC standard. The second part presents a hardware architecture modeling for a video encoder compliant to the same video standard. The digital video coding basics in the H.264/AVC standard are also reviewed. The developed motion compensation hardware architecture, named HP422-MoCHA (High Profile 4:2:2 Motion Compensation Hardware Architecture) (ZATT, 2008), is based on the MoCHA (Motion Compensator Hardware Architecture) (AZEVEDO, 2007) architecture. It supports the motion compensation toolset for the H.264/AVC High 4:2:2 profile. This architecture is divided in three main modules: Motion Vector Predictor, Memory Access and Sample Processor. These modules work in a pipeline and are interfaced by buffers to store the intermediate data. The architecture was described in the VHDL language and reaches the required throughput for real time decoding of HDTV 1920x1080 video sequences at 30 frames per second. In the current literature another detailed motion compensation solution for the H.264/AVC High 4:2:2 could not be found. A new filtering organization for the motion compensation sample interpolator was proposed and its Main profile version reduces 17% the gate count in comparison to the smallest solution found in the literature, without any performance degradation. The second part of the thesis details the modeling of a hardware architecture for a video encoder for the H.264/AVC standard. The model was described in SystemC language and used 15,000 source code lines. The project was designed for real time encoding of Main profile H.264/AVC for 1920x1080 video sequences at 30 frames per second. The model supported the main objective which was to obtain a functional encoder implementation, despite of the several encoding restrictions, permitting the temporal and communications characterization of the encoder. The model is presented as a powerful tool for the hardware video encoder development, as it is useful from the initial design to the final verification. No other hardware encoder or decoder modeling description was found in the current literature for the H.264/AVC video coding standard. Microeletrônica Vlsi Codificacao : Video digital Sistemas digitais H.264/AVC Video codin VLSI architectures Modeling in systemC
155	Roteamento global de circuitos VLSI / Global routing for VLSI circuits Reimann, Tiago Jose January 2013 (has links) Este trabalho apresenta a implementação de um roteador global de circuitos integrados capaz de tratar os problemas de roteamento atuais, utilizando como referência para avaliação os circuitos de benchmark publicados durante as competições de roteamento global realizadas no ACM International Symposium on Physical Design 2007 e 2008. O roteador global desenvolvido utiliza como ferramenta principal a técnica de ripup and reroute associada às técnicas de roteamento monotônico e maze routing, ambas com grande histórico de uso nas ferramentas acadêmicas descritas também neste trabalho. O desenvolvimento da ferramenta também possui características diferenciadas e únicas, com um novo método de ordenamento das redes durante a fase de rip-up and reroute. Para a geração dos resultados foram definidas duas versões diferentes da ferramenta, sendo estas duas versões analisadas com duas diferentes técnicas de construção das árvores de roteamento, gerando no total quatro configurações da ferramenta. Como decisão de projeto, a versão principal utilizada no desenvolvimento e discussão dos resultados é a versão que prioriza a qualidade do roteamento, utilizando MSTs para construção das árvores de roteamento. Os resultados mostram que o roteador global desenvolvido é capaz de gerar resultados com boa qualidade mesmo sem fazer uso de técnicas de identificação de áreas de congestionamento, sem otimizações pós-roteamento e sem nenhuma forma de ajuste (tuning) para os diferentes circuitos de benchmark, apesar de ainda ter tempo de execução acima dos apresentados por outras ferramentas acadêmicas. O foco durante o processo de desenvolvimento e implementação da ferramenta foram os circuitos mais recentes, entretanto a ferramenta obteve ótimos resultados também para os circuitos publicados no ISPD 1998, gerando soluções com qualidade similar ou melhor que as reportadas na literatura. A diferença dos resultados deste trabalho em relação aos melhores resultados dos roteadores globais com código disponível, para circuitos 3D lançados no ISPD 2008 é de, em média, 1,78%1 na métrica de comprimento de fio sem considerar o custo das vias e de 15,56% considerando o custo da via como uma unidade de comprimento de fio (ISPD 2008), para a versão voltada a qualidade de roteamento. Já para a versão da ferramenta que busca a convergência o mais rápido possível a diferença foi de 3,39% e 16,32%, respectivamente. As maiores diferenças são encontradas nos circuitos mais difíceis de gerar uma solução sem violações. Isso mostra como as técnicas de identificação de região podem contribuir tanto para uma convergência mais rápida quanto para evitar que fios passem por rotas desnecessárias durante a fase de negociação. Na métrica que avalia as vias como custo de uma unidade de comprimento, os resultados obtidos apresentam em média 18,67% maior comprimento de fio que os melhores resultados da literatura, sendo que dois circuitos com solução sem violações2 apresentam resultado com violações utilizando a ferramenta desenvolvida neste trabalho. / This work describes the implementation of an integrated circuit global router capable of handling the current routing problems, using as a reference the evaluation of benchmark circuits from the two global routing contests held in ISPD 2007 and 2008. The developed global router uses rip-up and reroute as the main technique associated with monotonic and maze routing techniques, both with large history of use in academic tools, also described in this work. The tool also has distinctive and unique characteristics, with a new method of net ordering during the rip-up and reroute stage. In order to generate the results were defined two different versions of the tool analyzed with two different techniques of routing tree construction, generating a total of four configurations. As a design decision, the major version used in the development and discussion of results is the version that prioritizes the routing quality, using MSTs for tree construction. The results show that the global router developed is able to generate good results even without making use of techniques to identify congestion areas, without post-routing optimizations and without any form of tuning for the different benchmark circuits, despite having run time above other academic tools. The focus during the development and implementation of the tool were the newer circuits, however the tool also obtained excellent results for the circuits released in ISPD 1998, generating solutions with similar quality or better than those reported in the literature. The difference in the results of this work over the best results generated with the available code global routers for 3D circuits released in ISPD 2008 is, on average, 2.53% in wirelength metric without considering the cost of vias and 18.34% considering the cost of the vias as one wirelength unit (ISPD 2008), for the best routing quality version. As for the version of the tool that seeks convergence as soon as possible the difference was 3.82% and 17.03%, respectively. The largest differences were found in the most difficult circuits to generate a solution without violations. This shows how the techniques of congested region identification can contribute to both a faster convergence and to avoid unnecessary wire detours during the negotiation phase. In the metric that evaluates the cost of vias as one wirelength unit, the results show an average of 22.5% greater wirelength than the best results found in literature. Also, the developed global router was unable to find a violation free solution for two circuits that are known to have a violation free solution3. Microeletrônica Vlsi Roteamento : Circuitos integrados Global routing Physical synthesis CAD VLSI
156	Lógica quaternária de alto desempenho e baixo consumo para circuitos VLSI / Low-power high-performance quaternary for VLSI circuits Silva, Ricardo Cunha Gonçalves da January 2007 (has links) Desde a década de 60, o aprimoramento das técnicas de fabricação de circuitos integrados que usam lógica binária tem levado ao aumento exponencial na densidade de dispositivos, melhoria do desempenho, redução da energia consumida e redução dos custos de fabricação dos circuitos integrados no estado da arte. Esse avanço tem sido alcançado historicamente pela miniaturização dos dispositivos que, já em escala nanométrica, começam a encontrar limites físicos para a sua redução. Com o intuito de dar continuidade ao avanço tecnológico, muitos trabalhos têm proposto a compactação da informação através do uso de lógica não binária como solução alternativa para a melhoria de desempenho de circuitos no estado da arte. Nesse sentido, diversos trabalhos foram desenvolvidos em diferentes tecnologias que vão de circuitos bipolares a dispositivos quânticos, entretanto, até o presente momento, nenhuma tecnologia demonstrou ao mesmo tempo os requisitos de desempenho, consumo, área e confiabilidade, necessários à aplicação em circuitos de alta escala de integração. Este trabalho apresenta uma nova família de circuitos de lógica quaternária com alto desempenho, baixos consumo e área e que usa tecnologia CMOS. Os circuitos desenvolvidos neste trabalho fazem uso de três fontes de alimentação e até oito diferentes transistores com diferentes tensões de limiar para realizar a lógica quaternária. São apresentados circuitos elementares como inversores e circuitos literais e com eles construídos circuitos aritméticos e multiplexadores. Os circuitos são simulados com a ferramenta SPICE usando a tecnologia TSMC 0,18 μm e os resultados são comparados com circuitos equivalentes em lógica binária. Na comparação de um somador completo quaternário de quatro bits, por exemplo, com o circuito equivalente em lógica binária, a implementação quaternária apresenta melhoria 55% na velocidade, 63% no consumo de potência e utiliza pouco mais de duas vezes o número de transistores. Este trabalho também propõe o uso de lógica quaternária em FPGA e são desenvolvidos blocos lógicos programáveis quaternários. Resultados de mapeamento lógico de circuitos aritméticos em blocos lógicos programáveis apresentam grande redução em área e consumo de potência na implementação quaternária quando comparado aos equivalentes binários. Em alguns circuitos quaternários, o consumo de potência e o número de transistores usados são reduzidos a 3% do consumo e do número de transistores usados nos circuitos equivalentes binários, enquanto o atraso crítico é duas vezes maior do que o atraso crítico binário. / Since the decade of 60, the improvement of techniques for manufacturing integrated circuits that use binary logic has led to the exponential increase in the density of devices, improving performance, reducing energy consumption and reducing costs of manufacture of integrated circuits in the state of the art. This breakthrough has been achieved historically by the miniaturization of devices, already in nano, starting to reach physical limits to their reduction. In order to give continuity to technological advancement, many studies have proposed the compaction of information through the use of non-binary logic as an alternative for the performance improvement of the state of the art circuits. Accordingly, several studies have been developed in different technologies ranging from bipolar circuits to quantum devices, however, at the moment, no technology demonstrated at the same time the performance requirements, consumption, area and reliability necessary for the application in very large scale of integration. This paper presents a new family of quaternary logic circuits with high performance, low consumption and area, which uses CMOS technology. The circuits developed in this work make use of three power supplies and up to eight different transistors with different threshold voltages, to perform the quaternary logic. Elementary circuits such as inverters and literal circuits are presented and used to implement multiplexers and arithmetic circuits. The circuits are simulated with the SPICE tool using TSMC 0.18 μm technology and the results are compared with equivalent circuits in binary logic. Comparison of a quaternary full adder of four bits, for example, with the equivalent circuit in binary logic shows 55% improvement in speed and 63% in the power consumption for the quaternary implementation and it uses little more than twice the number of transistors. This paper also proposes the use of quaternary logic in FPGA and quaternary configurable logic blocks are developed. Logical mapping results of arithmetic circuits in configurable logic blocks show great reduction in area and power consumption of the quaternary implementation compared to the equivalent binary. In some quaternary circuits, the consumption of power and the number of transistors used are reduced to 3% of consumption and the number of transistors used in the binary equivalent circuits, while the critical delay is two times higher than the binary critical delay. Microeletrônica Vlsi Desempenho : Circuitos integrados Multiple valued logic Quaternary logic VLSI circuits FPGA
157	Modelagem de hardware para codificação de vídeo e arquitetura de compensação de movimento segundo o padrão H.264/AVC / Hardware modeling for video coding and motion compensation architecture for the H.264/AVC standard Zatt, Bruno January 2008 (has links) Esta dissertação é composta de duas partes principais em que apresenta, em sua primeira parte, o desenvolvimento de uma arquitetura de hardware para compensação de movimento para decodificadores de vídeo segundo o padrão H.264/AVC. A segunda parte apresenta a modelagem de uma arquitetura de hardware para codificação de vídeo segundo o mesmo padrão. Também são apresentados os conceitos básicos da codificação e decodificação de vídeo digital segundo o padrão H.264/AVC. A arquitetura desenvolvida para compensação de movimento, denominada HP422- MoCHA (High Profile 4:2:2 Motion Compensation Hardware Architecture) (ZATT, 2008), baseada na arquitetura MoCHA (Motion Compensator Hardware Architecture) (AZEVEDO, 2007), suporta o conjunto de ferramentas da compensação de movimento para o perfil High 4:2:2 do H.264/AVC. Esta arquitetura está particionada em três blocos principais: Preditor de Vetores de Movimento, Acesso à Memória e Processador de Amostras. Esses blocos funcionam na forma de um pipeline, existindo buffers entre os mesmos para armazenar os resultados intermediários. A descrição foi desenvolvida com a linguagem VHDL e alcança desempenho para decodificar, em tempo real, vídeos HDTV 1920x1080 a 30 quadros por segundo. Na literatura atual não foi encontrada nenhuma solução detalhada para a compensação de movimento no perfil High 4:2:2 do padrão H.264/AVC. Uma nova estrutura para interpolação de amostra na compensação de movimento foi proposta, sendo que sua versão para o Perfil Main se mostra 17% mais compacta, em termos de gates, que a solução mais compacta encontrada na literatura, sem degradação de performance. A segunda parte do texto detalha a modelagem de uma arquitetura de codificação de vídeo segundo o H.264/AVC. A descrição utiliza a linguagem SystemC e consumiu aproximadamente 15.000 linhas de código. Seu projeto foi desenvolvido com o objetivo de codificar vídeo H.264/AVC segundo o perfil Main do padrão com desempenho para codificar vídeos 1920x1080 em tempo real, a 30 quadros por segundo. A modelagem alcançou o objetivo principal de chegar a uma implementação funcional de um codificador, embora assumindo diversas restrições de codificação, permitindo a caracterização temporal e de comunicação do codificador. Dessa forma, o modelo se mostra uma poderosa ferramenta para o desenvolvimento do sistema de codificação em HW, desde a etapa de projeto até a verificação final. Não foi encontrado na literatura, até o presente momento, nenhum trabalho que descreva uma modelagem em alto nível de um hardware para o codificador, ou mesmo para o decodificador, de vídeo H.264/AVC. / This thesis is comprised by two main parts that present, in the first part, the development of a motion compensation hardware architecture for video decoders in compliance with the H.264/AVC standard. The second part presents a hardware architecture modeling for a video encoder compliant to the same video standard. The digital video coding basics in the H.264/AVC standard are also reviewed. The developed motion compensation hardware architecture, named HP422-MoCHA (High Profile 4:2:2 Motion Compensation Hardware Architecture) (ZATT, 2008), is based on the MoCHA (Motion Compensator Hardware Architecture) (AZEVEDO, 2007) architecture. It supports the motion compensation toolset for the H.264/AVC High 4:2:2 profile. This architecture is divided in three main modules: Motion Vector Predictor, Memory Access and Sample Processor. These modules work in a pipeline and are interfaced by buffers to store the intermediate data. The architecture was described in the VHDL language and reaches the required throughput for real time decoding of HDTV 1920x1080 video sequences at 30 frames per second. In the current literature another detailed motion compensation solution for the H.264/AVC High 4:2:2 could not be found. A new filtering organization for the motion compensation sample interpolator was proposed and its Main profile version reduces 17% the gate count in comparison to the smallest solution found in the literature, without any performance degradation. The second part of the thesis details the modeling of a hardware architecture for a video encoder for the H.264/AVC standard. The model was described in SystemC language and used 15,000 source code lines. The project was designed for real time encoding of Main profile H.264/AVC for 1920x1080 video sequences at 30 frames per second. The model supported the main objective which was to obtain a functional encoder implementation, despite of the several encoding restrictions, permitting the temporal and communications characterization of the encoder. The model is presented as a powerful tool for the hardware video encoder development, as it is useful from the initial design to the final verification. No other hardware encoder or decoder modeling description was found in the current literature for the H.264/AVC video coding standard. Microeletrônica Vlsi Codificacao : Video digital Sistemas digitais H.264/AVC Video codin VLSI architectures Modeling in systemC
158	Robust Signaling Techniques for Through Silicon Via Bundles Chillara, Krishna Chaitanya 01 January 2011 (has links) (PDF) 3D circuit integration is becoming increasingly important as one of the remaining techniques for staying on Moore’s law trajectory. 3D Integrated Circuits (ICs) can be realized using the Through Silicon Via (TSV) approach. In order to extract the full benefits of 3D and for better yield, it has been suggested that the TSVs should be arranged as bundles rather than parallel TSVs. TSVs are required to route the signals through different dies in a multi-tier 3D IC. TSVs are excellent but scarce electrical conductors. Hence, it is important to utilize these resources very efficiently. In high performance 3D ICs, signaling techniques play a crucial role in determining the overall performance of the system. In this work, 3x3 and 4x4 TSV bundles are considered. Electrical parasitics of TSV bundles are extracted using Ansoft Q3D Extractor. Various techniques for signaling over TSV bundles are analyzed in this work. Performance, energy and robustness are the crucial aspects to be considered for analyzing a signaling technique. For performance analysis, maximum data rate for each of the signaling techniques is obtained and the dominant factors that determine these values are identified. 3D integration is fairly a new field and does not have common standards. Different research groups (both academic and industry) across the globe have different manufacturing technologies to suit their needs. In this work, we obtain the electrical parasitics of TSV bundles for different TSV radii ranging from 1mm to 15mm. The TSV radius for most of the 3D integration technologies falls within this range. Maximum data rates are determined for different TSV radii ranging from 1mm to 15mm. This study across different TSV radii helps in choosing a better signaling technique for a particular TSV radius depending on the design goals. Energy/bit for each of the signaling techniques is obtained for a common data rate of 10Gbps Pseudo Random Bit Sequence (PRBS) input. For robustness analysis, the impact of process, voltage and temperature variations between driver and receiver circuits is analyzed. Ansoft Q3D extractor, NCSU 45nm PDK and HSPICE simulation tool are used. From the simulation results, it is observed that a differential technique is beneficial for smaller radii in terms of maximum data rate that can be obtained. For a radius above 7mm, single ended current mode signaling gives a better data rate. Low swing single ended signaling techniques consume less energy but suffer slightly more due to process variations compared to full swing voltage mode signaling. In terms of robustness to supply noise, differential signaling is more robust compared to single ended techniques. An increase in the temperature reduces the data rates of both single ended and differential signaling techniques. Hence, depending on the TSV radius of target technology and process and environment variations, an optimum signaling technique can be chosen. Through Silicon Via 3D Integration VLSI Signaling techniques Robustness
159	Design, Analysis, and Simulation of a Jitter Reduction Circuit (JRC) System at 1GHz Yu, Run Bin 01 December 2016 (has links) (PDF) The clock signal is considered as the “heartbeat” of a digital system yet jitter which is a variation on the arrival time of the clock edge, could undermine the overall performance or even cause failures on the system. Deterministic jitter could be reduced during the designing process however random jitter during operation is somehow less-controllable and unavoidable. Being able to remove jitter on the clock would therefore play a vital role in system performance improvement. This thesis implements a 1GHz fully feedforward jitter reduction circuit (JRC) which can be used as an on-chip IP core at clock tree terminals to provide a low jitter clock signal to a local clock network or be used at the clock insertion point to reduce jitter from an off chip signal. It can also be stand-alone and used on PCB designs to reduce jitter on the high-frequency clock signal used on the board. This jitter attenuation circuit is implemented using IBM CMHV7SF 180nm MOSFET process, demonstrates a jitter reduction of at least 8dB at 1GHz with 33ps rms Gaussian random jitter (for a 200ps peak-to-peak randomly changing rising edge input signal). Jitter reduction VLSI Clock Electrical and Electronics
160	Energy Efficient Computing Using Scalable General Purpose Analog Processors De Guzman, Ethan Paul Palisoc 01 June 2021 (has links) (PDF) Due to fundamental physical limitations, conventional digital circuits have not been able to scale at the pace expected from Moore’s law. In addition, computationally intensive applications such as neural networks and computer vision demand large amounts of energy from digital circuits. As a result, energy efficient alternatives are needed in order to provide continued performance scaling. Analog circuits have many well known benefits: the ability to store more information onto a single wire and efficiently perform mathematical operations such as addition, subtraction, and differential equation solving. However, analog computing also comes with drawbacks such as its sensitivity to process variation and noise, limited scalability, programming difficulty, and poor compatibility with digital circuits and design tools. We propose to leverage the strengths of analog circuits and avoid its weaknesses by using digital circuits and time-encoded computation. Time-encoded circuits also operate on continuous data but are implemented using digital circuits. We propose a novel scalable general purpose analog processor using time-encoded circuits that is well suited for emerging applications that require high numeric precision. The processor’s datapath, including time-domain register file and function units are described. We evaluate our proposed approach using an implementation that is simulated with a 0.18µm TSMC process and demonstrate that this approach improves the performance of a scientific benchmark by 4x compared against conventional analog implementations and improves energy consumption by 146x compared against digital implementations. Energy Efficient Computing VLSI Computer Architecture Computer and Systems Architecture

Search results