Global ETD Search

Return to search

Text Preprocessing in Programmable Logic

There is a tremendous amount of information being generated and stored every year, and its growth rate is exponential. From 2008 to 2009, the growth rate was estimated to be 62%. In 2010, the amount of generated information is expected to grow by 50% to 1.2 Zettabytes, and by 2020 this rate is expected to grow to 35 Zettabytes. By preprocessing text in programmable logic, high data processing rates could be achieved
with greater power efficiency than with an equivalent software solution, leading to a smaller carbon footprint.

This thesis presents an overview of the fields of Information Retrieval and Natural Language Processing, and the design and implementation of four text preprocessing modules in programmable logic: UTF–8 decoding, stop–word filtering, and stemming with both Lovins’ and Porter’s techniques. These extensively pipelined circuits were implemented in a high performance FPGA and found to sustain maximum operational frequencies of 704 MHz, data throughputs in excess of 5 Gbps and efficiencies in the range of 4.332 – 6.765 mW/Gbps and 34.66 – 108.2 uW/MHz. These circuits can be incorporated into larger systems, such as document classifiers and information extraction engines.

http://hdl.handle.net/10012/5366

Programmable Logic

Text Processing

Electrical and Computer Engineering

Identifer	oai:union.ndltd.org:WATERLOO/oai:uwspace.uwaterloo.ca:10012/5366
Date	03 August 2010
Creators	Skiba, Michal
Source Sets	University of Waterloo Electronic Theses Repository
Language	English
Detected Language	English
Type	Thesis or Dissertation

Page generated in 0.002 seconds

Text Preprocessing in Programmable Logic

Description

Links & Downloads

Tags

Additional Fields