Global ETD Search

11	Experiments in Compressing Wikipedia Wotschka, Marco January 2013 (has links) No description available. Computer Science BWT Burrows-Wheeler PPM compression Wikipedia
12	Lossless and nearly-lossless image compression based on combinatorial transforms / Compression d'images sans perte ou quasi sans perte basée sur des transformées combinatoires Syahrul, Elfitrin 29 June 2011 (has links) Les méthodes classiques de compression d’image sont communément basées sur des transformées fréquentielles telles que la transformée en Cosinus Discret (DCT) ou encore la transformée discrète en ondelettes. Nous présentons dans ce document une méthode originale basée sur une transformée combinatoire celle de Burrows-Wheeler(BWT). Cette transformée est à la base d’un réagencement des données du fichier servant d’entrée au codeur à proprement parler. Ainsi après utilisation de cette méthode sur l’image originale, les probabilités pour que des caractères identiques initialement éloignés les uns des autres se retrouvent côte à côte sont alors augmentées. Cette technique est utilisée pour la compression de texte, comme le format BZIP2 qui est actuellement l’un des formats offrant un des meilleurs taux de compression. La chaîne originale de compression basée sur la transformée de Burrows-Wheeler est composée de 3 étapes. La première étape est la transformée de Burrows-Wheeler elle même qui réorganise les données de façon à regrouper certains échantillons de valeurs identiques. Burrows et Wheeler conseillent d’utiliser un codage Move-To-Front (MTF) qui va maximiser le nombre de caractères identiques et donc permettre un codage entropique (EC) (principalement Huffman ou un codeur arithmétique). Ces deux codages représentent les deux dernières étapes de la chaîne de compression. Nous avons étudié l’état de l’art et fait des études empiriques de chaînes de compression basées sur la transformée BWT pour la compression d’images sans perte. Les données empiriques et les analyses approfondies se rapportant aux plusieurs variantes de MTF et EC. En plus, contrairement à son utilisation pour la compression de texte,et en raison de la nature 2D de l’image, la lecture des données apparaît importante. Ainsi un prétraitement est utilisé lors de la lecture des données et améliore le taux de compression. Nous avons comparé nos résultats avec les méthodes de compression standards et en particulier JPEG 2000 et JPEG-LS. En moyenne le taux de com-pression obtenu avec la méthode proposée est supérieur à celui obtenu avec la norme JPEG 2000 ou JPEG-LS / Common image compression standards are usually based on frequency transform such as Discrete Cosine Transform or Wavelets. We present a different approach for loss-less image compression, it is based on combinatorial transform. The main transform is Burrows Wheeler Transform (BWT) which tends to reorder symbols according to their following context. It becomes a promising compression approach based on contextmodelling. BWT was initially applied for text compression software such as BZIP2 ; nevertheless it has been recently applied to the image compression field. Compression scheme based on Burrows Wheeler Transform is usually lossless ; therefore we imple-ment this algorithm in medical imaging in order to reconstruct every bit. Many vari-ants of the three stages which form the original BWT-based compression scheme can be found in the literature. We propose an analysis of the more recent methods and the impact of their association. Then, we present several compression schemes based on this transform which significantly improve the current standards such as JPEG2000and JPEG-LS. In the final part, we present some open problems which are also further research directions Transformé de Burrows-Wheeler Compression sans perte et quasi sans Burrows-Wheeler Transform (BWT) Lossless (nearly lossless) image 006.6 005.7
13	Akcelerace Burrows-Wheelerovy transformace s využitím GPU / Acceleration of Burrows-Wheeler Transform Using GPU Zahradníček, Tomáš January 2019 (has links) This thesis deals with Burrows-Wheeler transform (BWT) and possibilities of acceleration of this transform on graphics processing unit (GPU). Methods of compression based on BWT are introduced, as well as software libraries CUDA and OpenCL for writing programs for GPU. Parallel variants of BWT are implemented, as well as following steps necessary for compression, using CUDA library. Amount of compression of used approaches are tested and parallel versions are compared to their sequential counterparts.
14	[en] THE BURROWS-WHEELER TRANSFORM AND ITS APPLICATIONS TO COMPRESSION / [pt] A TRANSFORMADA DE BURROWS-WHEELER E SUA APLICAÇÃO À COMPRESSÃO JULIO CESAR DUARTE 23 July 2003 (has links) [pt] A transformada de Burrows-Wheeler, baseada na ordenação de contextos, transforma uma seqüência de caracteres em uma nova seqüência mais facilmente comprimida por um algoritmo que explore grandes seqüências de repetições de caracteres. Aliado a recodificação do MoverParaFrente e seguida de uma codificação para os inteiros gerados, eles formam uma nova família de compressores, que possuem excelentes taxas de compressão, com boas performances nos tempos de compressão e descompressão. Este trabalho examina detalhadamente essa transformada, suas variações e algumas alternativas para os algoritmos utilizados em conjunto com ela. Como resultado final, apresentamos uma combinação de estratégias que produz taxas de compressão para texto melhores do que as oferecidas pelas implementações até aqui disponíveis. / [en] The Burrows-Wheeler Transform, based on sorting of contexts, transforms a sequence of characters into a new sequence easier to compress by an algorithm that exploits long sequences of repeted characters. Combined with the coding provided by the MoveToFront Algorithm and followed by a codification for the generated integers, they propose a new family of compressors, that achieve excellent compression rates with good time performances in compression and decompression. This work examines detaildedly this transform, its variations and some alternatives for the algorithms used together with it. As a final result, we present a combination of strategies that producescompression rates for text data that are better than those offered by implementations available nowadays. [pt] TEORIA DA INFORMACAO [en] INFORMATION THEORY [pt] COMPRESSAO DE DADOS [en] DATA COMPRESSION [pt] TRANSFORMADA DE BURROWS-WHEELER [en] BURROWS-WHEELER TRANSFORM [pt] MOVERPARAFRENTE [en] MOVETOFRONT [pt] CODIFICACAO DE INTEIROS [en] INTEGER CODING
15	Coastal habitat mapping and monitoring utilising remote sensing Jones, Gwawr Angharad January 2017 (has links) Coastal habitats are highly sensitive to change and highly diverse. Degrading environmental conditions have led to a global decline in biodiversity through loss, modification and fragmentation of habitats, triggering an increased effort to conserve these ecosystems. Remote sensing is important tool for filling in critical information gaps for monitoring habitats, yet significant barriers exist for operational use within the ecological and conservation communities. Reporting on both extent and condition of habitats are critical to fulfil policy requirements, specifically the ECs Habitat’s Directive. This study focuses on the use of Very High Resolution (VHR) optical imagery for retrieving parameters to identifyassociations that can separate habitat boundaries for extent mapping down to species level for indicators of condition, with a focus on operational use. The Earth Observation Data for Habitat Monitoring (EODHaM) system was implemented using Worldview-2 data from two periods (July and September), in situ data and local ecological knowledge for two sites in Wales, Kenfig Burrows SAC and Castlemartin SSSI. The system utilises the Food and Agricultural Organisation’s (FAO) Land Cover Classification System (LCCS) but translations between land cover and habitat schemes are not straight forward and need special consideration that are likely to be site specific. Limitations within therule-based method of the EODHaM system were identified and therefore augmented with machine learning based classification algorithms creating a hybrid method of classification generating accurate (>80% overall accuracy) baseline maps with a more automated and repeatable method. Quantitative methods of validation traditionally used within the remote sensing community do not consider spatial aspects of maps. Therefore, qualitative assessments carried out in the field were used in addition to error matrices, overall accuracy and the kappa coefficient. This required input from ecologists and site specialists, enhancing communication and understanding between the different communities. Generating baseline maps required significant amount of training data and updating baselines through change detection methods is recommended for monitoring. An automated, novel map-to-image change detection was therefore implemented. Natural and anthropogenic changes were successfully detected from Worldview-2 and Sentinel-2 data at Kenfig Burrows. An innovative component of this research was the development of methods, which were demonstrated to be transferable between both sites and increased understanding between remote sensing scientist and ecologist. Through this approach, a more operational method for monitoring site specific habitats through satellite data is proposed, with direct benefits for conservation, environment and policy.
16	Lossless and nearly-lossless image compression based on combinatorial transforms Syahrul, Elfitrin 29 June 2011 (has links) (PDF) Common image compression standards are usually based on frequency transform such as Discrete Cosine Transform or Wavelets. We present a different approach for loss-less image compression, it is based on combinatorial transform. The main transform is Burrows Wheeler Transform (BWT) which tends to reorder symbols according to their following context. It becomes a promising compression approach based on contextmodelling. BWT was initially applied for text compression software such as BZIP2 ; nevertheless it has been recently applied to the image compression field. Compression scheme based on Burrows Wheeler Transform is usually lossless ; therefore we imple-ment this algorithm in medical imaging in order to reconstruct every bit. Many vari-ants of the three stages which form the original BWT-based compression scheme can be found in the literature. We propose an analysis of the more recent methods and the impact of their association. Then, we present several compression schemes based on this transform which significantly improve the current standards such as JPEG2000and JPEG-LS. In the final part, we present some open problems which are also further research directions [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Burrows-Wheeler Transform (BWT) Lossless (nearly lossless) image
17	The analysis of enumerative source codes and their use in Burrows‑Wheeler compression algorithms McDonald, Andre Martin 10 September 2010 (has links) In the late 20th century the reliable and efficient transmission, reception and storage of information proved to be central to the most successful economies all over the world. The Internet, once a classified project accessible to a selected few, is now part of the everyday lives of a large part of the human population, and as such the efficient storage of information is an important part of the information economy. The improvement of the information storage density of optical and electronic media has been remarkable, but the elimination of redundancy in stored data and the reliable reconstruction of the original data is still a desired goal. The field of source coding is concerned with the compression of redundant data and its reliable decompression. The arithmetic source code, which was independently proposed by J. J. Rissanen and R. Pasco in 1976, revolutionized the field of source coding. Compression algorithms that use an arithmetic code to encode redundant data are typically more effective and computationally more efficient than compression algorithms that use earlier source codes such as extended Huffman codes. The arithmetic source code is also more flexible than earlier source codes, and is frequently used in adaptive compression algorithms. The arithmetic code remains the source code of choice, despite having been introduced more than 30 years ago. The problem of effectively encoding data from sources with known statistics (i.e. where the probability distribution of the source data is known) was solved with the introduction of the arithmetic code. The probability distribution of practical data is seldomly available to the source encoder, however. The source coding of data from sources with unknown statistics is a more challenging problem, and remains an active research topic. Enumerative source codes were introduced by T. J. Lynch and L. D. Davisson in the 1960s. These lossless source codes have the remarkable property that they may be used to effectively encode source sequences from certain sources without requiring any prior knowledge of the source statistics. One drawback of these source codes is the computationally complex nature of their implementations. Several years after the introduction of enumerative source codes, J. G. Cleary and I. H. Witten proved that approximate enumerative source codes may be realized by using an arithmetic code. Approximate enumerative source codes are significantly less complex than the original enumerative source codes, but are less effective than the original codes. Researchers have become more interested in arithmetic source codes than enumerative source codes since the publication of the work by Cleary and Witten. This thesis concerns the original enumerative source codes and their use in Burrows–Wheeler compression algorithms. A novel implementation of the original enumerative source code is proposed. This implementation has a significantly lower computational complexity than the direct implementation of the original enumerative source code. Several novel enumerative source codes are introduced in this thesis. These codes include optimal fixed–to–fixed length source codes with manageable computational complexity. A generalization of the original enumerative source code, which includes more complex data sources, is proposed in this thesis. The generalized source code uses the Burrows–Wheeler transform, which is a low–complexity algorithm for converting the redundancy of sequences from complex data sources to a more accessible form. The generalized source code effectively encodes the transformed sequences using the original enumerative source code. It is demonstrated and proved mathematically that this source code is universal (i.e. the code has an asymptotic normalized average redundancy of zero bits). AFRIKAANS : Die betroubare en doeltreffende versending, ontvangs en berging van inligting vorm teen die einde van die twintigste eeu die kern van die mees suksesvolle ekonomie¨e in die wˆereld. Die Internet, eens op ’n tyd ’n geheime projek en toeganklik vir slegs ’n klein groep verbruikers, is vandag deel van die alledaagse lewe van ’n groot persentasie van die mensdom, en derhalwe is die doeltreffende berging van inligting ’n belangrike deel van die inligtingsekonomie. Die verbetering van die bergingsdigteid van optiese en elektroniese media is merkwaardig, maar die uitwissing van oortolligheid in gebergde data, asook die betroubare herwinning van oorspronklike data, bly ’n doel om na te streef. Bronkodering is gemoeid met die kompressie van oortollige data, asook die betroubare dekompressie van die data. Die rekenkundige bronkode, wat onafhanklik voorgestel is deur J. J. Rissanen en R. Pasco in 1976, het ’n revolusie veroorsaak in die bronkoderingsveld. Kompressiealgoritmes wat rekenkundige bronkodes gebruik vir die kodering van oortollige data is tipies meer doeltreffend en rekenkundig meer effektief as kompressiealgoritmes wat vroe¨ere bronkodes, soos verlengde Huffman kodes, gebruik. Rekenkundige bronkodes, wat gereeld in aanpasbare kompressiealgoritmes gebruik word, is ook meer buigbaar as vroe¨ere bronkodes. Die rekenkundige bronkode bly na 30 jaar steeds die bronkode van eerste keuse. Die probleem om data wat afkomstig is van bronne met bekende statistieke (d.w.s. waar die waarskynlikheidsverspreiding van die brondata bekend is) doeltreffend te enkodeer is opgelos deur die instelling van rekenkundige bronkodes. Die bronenkodeerder het egter selde toegang tot die waarskynlikheidsverspreiding van praktiese data. Die bronkodering van data wat afkomstig is van bronne met onbekende statistieke is ’n groter uitdaging, en bly steeds ’n aktiewe navorsingsveld. T. J. Lynch and L. D. Davisson het tel–bronkodes in die 1960s voorgestel. Tel– bronkodes het die merkwaardige eienskap dat bronsekwensies van sekere bronne effektief met hierdie foutlose kodes ge¨enkodeer kan word, sonder dat die bronenkodeerder enige vooraf kennis omtrent die statistieke van die bron hoef te besit. Een nadeel van tel–bronkodes is die ho¨e rekenkompleksiteit van hul implementasies. J. G. Cleary en I. H. Witten het verskeie jare na die instelling van tel–bronkodes bewys dat benaderde tel–bronkodes gerealiseer kan word deur die gebruik van rekenkundige bronkodes. Benaderde tel–bronkodes het ’n laer rekenkompleksiteit as tel–bronkodes, maar benaderde tel–bronkodes is minder doeltreffend as die oorspronklike tel–bronkodes. Navorsers het sedert die werk van Cleary en Witten meer belangstelling getoon in rekenkundige bronkodes as tel–bronkodes. Hierdie tesis is gemoeid met die oorspronklike tel–bronkodes en die gebruik daarvan in Burrows–Wheeler kompressiealgoritmes. ’n Nuwe implementasie van die oorspronklike tel–bronkode word voorgestel. Die voorgestelde implementasie het ’n beduidende laer rekenkompleksiteit as die direkte implementasie van die oorspronklike tel–bronkode. Verskeie nuwe tel–bronkodes, insluitende optimale vaste–tot–vaste lengte tel–bronkodes met beheerbare rekenkompleksiteit, word voorgestel. ’n Veralgemening van die oorspronklike tel–bronkode, wat meer komplekse databronne insluit as die oorspronklike tel–bronkode, word voorgestel in hierdie tesis. The veralgemeende tel–bronkode maak gebruik van die Burrows–Wheeler omskakeling. Die Burrows–Wheeler omskakeling is ’n lae–kompleksiteit algoritme wat die oortolligheid van bronsekwensies wat afkomstig is van komplekse databronne omskakel na ’n meer toeganklike vorm. Die veralgemeende bronkode enkodeer die omgeskakelde sekwensies effektief deur die oorspronklike tel–bronkode te gebruik. Die universele aard van hierdie bronkode word gedemonstreer en wiskundig bewys (d.w.s. dit word bewys dat die kode ’n asimptotiese genormaliseerde gemiddelde oortolligheid van nul bisse het). Copyright / Dissertation (MEng)--University of Pretoria, 2010. / Electrical, Electronic and Computer Engineering / unrestricted Source coding Burrows wheeler Enumerative Source code Compression Fixed length code Universal code UCTD
18	Comparisons of Estimators of Small Proportion under Group Testing Wei, Xing 02 July 2015 (has links) Binomial group testing has been long recognized as an efficient method of estimating proportion of subjects with a specific characteristic. The method is superior to the classic maximum likelihood estimator (MLE), particularly when the proportion is small. Under the group testing model, we assume the testing is conducted without error. In the present research, a new Bayes estimator will be proposed that utilizes an additional piece of information, the proportion to be estimated is small and within a given range. It is observed that with the appropriate choice of the hyper-parameter our new Bayes estimator has smaller mean squared error (MSE) than the classic MLE, Burrows estimator, and the existing Bayes estimator. Furthermore, on the basis of heavy Monte Carlo simulation we have determined the best hyper-parameters in the sense that the corresponding new Bayes estimator has the smallest MSE. A table of these best hyper-parameters is made for proportions within the considered range. Group Testing MLE Burrows' Estimator Bayes Estimator Relative Bias Relative Efficiency Statistical Models
19	Habitat Heterogeneity Affects the Thermal Ecology of the Federally Endangered Blunt-Nosed Leopard Lizard Gaudenti, Nicole 01 June 2021 (has links) (PDF) Global climate change is already contributing to the extirpation of numerous species worldwide, and sensitive species will continue to face challenges associated with rising temperatures throughout this century and beyond. It is especially important to evaluate the thermal ecology of endangered ectotherm species now so that mitigation measures can be taken as early as possible. A recent study of the thermal ecology of the federally endangered Blunt-Nosed Leopard Lizard (Gambelia sila) suggested that they face major activity restrictions due to thermal constraints in their desert habitat, but that large shade-providing shrubs act as thermal buffers to allow them to maintain surface activity without overheating. We replicated this study but added a population of G. sila with no access to large shrubs to facilitate comparison of the thermal ecology of G. sila in shrubless and shrubbed populations. We found that G. sila without access to shrubs spent more time sheltering inside rodent burrows than lizards with access to shrubs, especially during the hot summer months. Lizards from a shrubbed population had higher midday body temperatures and therefore poorer thermoregulatory accuracy than G. sila from a shrubless population, suggesting that greater surface activity may represent a thermoregulatory tradeoff for G. sila. Lizards at both sites are currently constrained from using open, sunny microhabitats for much of the day during their short active seasons, and our projections suggest that climate change will exacerbate these restrictions and force G. sila to use rodent burrows for shelter even more than they do now, especially at sites without access to shrubs. The continued management of shrubs and of burrowing rodents at G. sila sites is therefore essential to the survival of this endangered species. Thermoregulation Shrubs Shade Burrows Lizard Activity Restriction Behavior and Ethology Comparative and Evolutionary Physiology Desert Ecology Population Biology
20	Inexact Mapping of Short Biological Sequences in High Performance Computational Environments Salavert Torres, José 30 October 2014 (has links) La bioinformática es la aplicación de las ciencias computacionales a la gestión y análisis de datos biológicos. A partir de 2005, con la aparición de los secuenciadores de ADN de nueva generación surge lo que se conoce como Next Generation Sequencing o NGS. Un único experimento biológico puesto en marcha en una máquina de secuenciación NGS puede producir fácilmente cientos de gigabytes o incluso terabytes de datos. Dependiendo de la técnica elegida este proceso puede realizarse en unas pocas horas o días. La disponibilidad de recursos locales asequibles, tales como los procesadores multinúcleo o las nuevas tarjetas gráfi cas preparadas para el cálculo de propósito general GPGPU (General Purpose Graphic Processing Unit ), constituye una gran oportunidad para hacer frente a estos problemas. En la actualidad, un tema abordado con frecuencia es el alineamiento de secuencias de ADN. En bioinformática, el alineamiento permite comparar dos o más secuencias de ADN, ARN, o estructuras primarias proteicas, resaltando sus zonas de similitud. Dichas similitudes podrían indicar relaciones funcionales o evolutivas entre los genes o proteínas consultados. Además, la existencia de similitudes entre las secuencias de un individuo paciente y de otro individuo con una enfermedad genética detectada podría utilizarse de manera efectiva en el campo de la medicina diagnóstica. El problema en torno al que gira el desarrollo de la tesis doctoral consiste en la localización de fragmentos de secuencia cortos dentro del ADN. Esto se conoce bajo el sobrenombre de mapeo de secuencia o sequence mapping. Dicho mapeo debe permitir errores, pudiendo mapear secuencias incluso existiendo variabilidad genética o errores de lectura en el mapeo. Existen diversas técnicas para abordar el mapeo, pero desde la aparición de la NGS destaca la búsqueda por pre jos indexados y agrupados mediante la transformada de Burrows-Wheeler [28] (o BWT en lo sucesivo). Dicha transformada se empleó originalmente en técnicas de compresión de datos, como es el caso del algoritmo bzip2. Su utilización como herramienta para la indización y búsqueda posterior de información es más reciente [22]. La ventaja es que su complejidad computacional depende únicamente de la longitud de la secuencia a mapear. Por otra parte, una gran cantidad de técnicas de alineamiento se basan en algoritmos de programación dinámica, ya sea Smith-Watterman o modelos ocultos de Markov. Estos proporcionan mayor sensibilidad, permitiendo mayor cantidad de errores, pero su coste computacional es mayor y depende del tamaño de la secuencia multiplicado por el de la cadena de referencia. Muchas herramientas combinan una primera fase de búsqueda con la BWT de regiones candidatas al alineamiento y una segunda fase de alineamiento local en la que se mapean cadenas con Smith-Watterman o HMM. Cuando estamos mapeando permitiendo pocos errores, una segunda fase con un algoritmo de programación dinámica resulta demasiado costosa, por lo que una búsqueda inexacta basada en BWT puede resultar más e ficiente. La principal motivación de la tesis doctoral es la implementación de un algoritmo de búsqueda inexacta basado únicamente en la BWT, adaptándolo a las arquitecturas paralelas modernas, tanto en CPU como en GPGPU. El algoritmo constituirá un método nuevo de rami cación y poda adaptado a la información genómica. Durante el periodo de estancia se estudiarán los Modelos ocultos de Markov y se realizará una implementación sobre modelos de computación funcional GTA (Aggregate o Test o Generate), así como la paralelización en memoria compartida y distribuida de dicha plataforma de programación funcional. / Salavert Torres, J. (2014). Inexact Mapping of Short Biological Sequences in High Performance Computational Environments [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/43721 Inexact mapping Backward search BWT Burrows-Wheeler Transform Suffix Array GPGPU GPU

Search results