1 |
Matching in MySQL : A comparison between REGEXP and LIKECarlsson, Emil January 2012 (has links)
When needing to search for data in multiple datasets there is a risk that not all da-tasets are of the same type. Some might be in XML-format; others might use a re-lational database. This could frighten developers from using two separate datasets to search for the data in, because of the fact that crafting different search methods for different datasets can be time consuming. One option that is greatly overlooked is the usage of regular expressions. If a search expression is created it can be used in a majority of database engines as a “WHERE” statement and also in other form of data sources such as XML. This option is however, at best, poorly documented and few tests have been made in how it performs against traditional search methods in databases such as “LIKE”. Multiple experiments comparing “LIKE” and “REGEXP” in MySQL have been performed for this paper. The results of these experiments show that the possible overhead by using regular expressions can be motivated when considering the gain of only using one search phrase over several data sources. / När behovet att söka over flertalet typer av datakällor finns det alltid en risk att inte alla datakällor är av samma typ. Några kan vara i XML-format; andra kan vara i form av en relationsdatabas. Detta kan avskräcka utvecklare ifrån att använda två oberoende datakällor för att söka efter data, detta för att det kan vara väldigt tidskrävande att utveckla två olika vis att skapa sökmetoderna. Ett alternativ som ofta är förbisett är att använda sig av reguljära uttryck. Om ett sökuttryck är skapat i reguljära uttryck så kan det användas i en majoritet av data-basmotorerna på marknaden som ett ”WHERE” påstående, men det kan även an-vändas i andra typer av datakällor så som XML. Detta alternativ är allt som ofta dåligt dokumenterat och väldigt få tester har ut-förts på prestandan i jämförelse med ”LIKE”. Som grund för denna uppsats har flertalet experiment utförs där ”LIKE” och ”REGEXP” jämförs i en MySQL databas. Försöken visar på att den eventuella försämringen i prestanda kan betala sig vid användande av multipla datatyper.
|
2 |
Probabilistic Models for Collecting, Analyzing, and Modeling Expression DataLe, Hai-Son Phuoc 01 May 2013 (has links)
Advances in genomics allow researchers to measure the complete set of transcripts in cells. These transcripts include messenger RNAs (which encode for proteins) and microRNAs, short RNAs that play an important regulatory role in cellular networks. While this data is a great resource for reconstructing the activity of networks in cells, it also presents several computational challenges. These challenges include the data collection stage which often results in incomplete and noisy measurement, developing methods to integrate several experiments within and across species, and designing methods that can use this data to map the interactions and networks that are activated in specific conditions. Novel and efficient algorithms are required to successfully address these challenges.
In this thesis, we present probabilistic models to address the set of challenges associated with expression data. First, we present a novel probabilistic error correction method for RNA-Seq reads. RNA-Seq generates large and comprehensive datasets that have revolutionized our ability to accurately recover the set of transcripts in cells. However, sequencing reads inevitably contain errors, which affect all downstream analyses. To address these problems, we develop an efficient hidden Markov modelbased error correction method for RNA-Seq data . Second, for the analysis of expression data across species, we develop clustering and distance function learning methods for querying large expression databases. The methods use a Dirichlet Process Mixture Model with latent matchings and infer soft assignments between genes in two species to allow comparison and clustering across species. Third, we introduce new probabilistic models to integrate expression and interaction data in order to predict targets and networks regulated by microRNAs.
Combined, the methods developed in this thesis provide a solution to the pipeline of expression analysis used by experimentalists when performing expression experiments.
|
Page generated in 0.1076 seconds