1 |
MCAT: Motif Combining and Association ToolYang, Yanshen 02 July 2018 (has links)
De novo motif discovery in biological sequences is an important and computationally challenging problem. A myriad of algorithms have been developed to solve this problem with varying success, but it can be difficult for even a small number of these tools to reach a consensus. Because individual tools can be better suited for specific scenarios, an ensemble tool that combines the results of many algorithms can yield a more confident and complete result. We present a novel and fast tool MCAT (Motif Combining and Association Tool) for de novo motif discovery by combining six state-of-the-art motif discovery tools (MEME, BioProspector, DECOD, XXmotif, Weeder, and CMF). We apply MCAT to data sets with DNA sequences that come from various species and compare our results with two well-established ensemble motif finding tools, EMD and DynaMIT. The experimental results show that MCAT is able to identify exact match motifs in DNA sequences efficiently, and it has a better performance in practice. / Master of Science / Finding hidden motifs in DNA or protein sequences is an important and computationally challenging problem. A motif is a short patterned DNA/protein sequence that has biological functions. Motifs regulate the process of gene expression, which is the fundamental biological process in which DNA is transcribed into RNA which is then translated to protein. In the past 20 years, a myriad of algorithms have been developed to solve the motif finding problem with varying success, but it can be difficult for even a small number of these tools to reach a consensus. Because individual tools can be better suited for specific scenarios, an ensemble tool that combines the results of many algorithms can yield a more confident and complete result. I present a novel and fast tool MCAT (Motif Combining and Association Tool) for motif discovery by combining six state-of-the-art motif discovery tools (MEME, BioProspector, DECOD, XXmotif, Weeder, and CMF). I apply MCAT to data sets with DNA sequences that come from various species and compare our results with two well-established ensemble motif finding tools, EMD and DynaMIT. The experimental results show that MCAT is able to identify exact match motifs in DNA sequences efficiently, and it has an improved performance in practice.
|
2 |
MobilesWhitworth, Clifford K. (Clifford Kirk) 08 1900 (has links)
Mobiles is a composition for an ensemble consisting of 12 instruments. The piece, in one movement, incorporates intuition, chance, and twelve tone techniques and reflects the relationship between motion and rest or tension and release. The structure is modeled according to principles of growth and decay, starting off slowly, building, and then dying away. Much of the material is inspired by mental images invoked from modern theories concerning chaos. Mobiles' character stems from the principal use of two motives, the chaos motif and the echo motif. Primarily, the chaos motif is representative of a state of motion while the echo motif represents a state of rest. Mobile architecture is usually characteristic of symmetry, balance, and proportion, but because of uncertainty in a natural environment, this proportion often falls short of a perfect symmetrical balance as in the case of a crystal or a fractal design. It is this kind of architecture that Mobiles portrays in its form and developmental process.
|
3 |
The Q motif is involved in DNA binding that affects ATP hydrolysis and unwinding in ChlR1 helicase2016 February 1900 (has links)
Helicases are molecular motors that couple the energy of nucleoside triphosphate (NTP) hydrolysis to the unwinding and remodeling of structured DNA or RNA. The conversion of energy derived from NTP hydrolysis into unwinding of double-stranded nucleic acids is coordinated by seven sequence motifs (I, Ia, II, III, IV, V, and VI). The Q motif, consisting of an invariant glutamine (Q) residue, has been identified in some, but not all helicases. Compared with the seven well-recognized conserved helicase motifs, the role of the Q motif is not well known. Mutations in the human ChlR1 (DDX11) gene are associated with Warsaw Breakage Syndrome characterized by cellular defects in genome maintenance. ChlR1 is known to play essential roles to preserve genomic stability, particularly in sister chromatid cohesion. To examine the roles of the Q motif in the ChlR1 helicase, we performed site directed mutagenesis of glutamine to alanine at residue 23 in the Q motif of ChlR1. ChlR1 recombinant wild type (WT) and mutant (Q23A) proteins were overexpressed and purified from HEK293T cells. The ChlR1-Q23A mutant abolished the helicase activity of ChlR1, and displayed reduced DNA binding ability. The mutant showed impaired ATPase activity but displayed normal ATP binding. The Q motif in FANCJ helicase, a ChlR1 homolog, regulates FANCJ’s dimerization, while our size exclusion chromatography (SEC) indicated that the ChlR1 protein functions as a monomer. A thermal shift assay revealed that ChlR1-Q23A has a similar melting point as ChlR1-WT. Partial proteolysis mapping demonstrated that ChlR1-WT and Q23A have similar globular structures, although there are some subtle conformational differences between these two proteins. Taken together, our results suggest that the Q motif in ChlR1 helicase is involved in DNA binding but not in ATP binding.
|
4 |
Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifsChilds, Kevin 17 February 2005 (has links)
There are three main categories of algorithms for identifying small transcription regulatory sequences in the promoters of genes, phylogenetic comparison, expectation maximization and combinatorial. For convenience, the combinatorial methods typically define motifs in terms of a canonical sequence and a set of sequences that have a small number of differences compared to the canonical sequence. Such motifs are referred to as (l, d)-motifs where l is the length of the motif and d indicates how many mismatches are allowed between an instance of the motif and the canonical motif sequence. There are limits to the complexity of the patterns of motifs that can be found by combinatorial methods. For some values of l and d, there will exist many sets of random words in a cluster of gene promoters that appear to form an (l, d)-motif. For these motifs, it will be impossible to distinguish biological motifs from randomly generated motifs. A better formalization of motifs is the (l, f, d)-motif that is derived from a biological consideration of motifs. The motivation for (l, f, d)-motifs comes from an examination of known transcription factor binding sites where typically a few positions in the motif are invariant. It is shown that there exist (l, f, d)-motifs that can be found in the promoters of gene clusters that would not be recognizable from random sequences if they were described as (l, d)-motifs. The inclusion of the f-value in the definition of motifs suggests that the sequence space that is occupied by a motif will consist of a several clusters of closely related sequences. An algorithm, CM, has been developed that identifies small sets of overabundant sequences in the promoters from a cluster of genes and then combines these simple sets of sequences to form complex (l, f, d)-motif models. A dataset from a yeast gene expression experiment is analyzed with CM. Known biological motifs and novel motifs are identified by CM. The performance of CM is compared to that of a popular expectation maximization algorithm, AlginACE, and to that from a simple combinatorial motif finding program.
|
5 |
Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifsChilds, Kevin 17 February 2005 (has links)
There are three main categories of algorithms for identifying small transcription regulatory sequences in the promoters of genes, phylogenetic comparison, expectation maximization and combinatorial. For convenience, the combinatorial methods typically define motifs in terms of a canonical sequence and a set of sequences that have a small number of differences compared to the canonical sequence. Such motifs are referred to as (l, d)-motifs where l is the length of the motif and d indicates how many mismatches are allowed between an instance of the motif and the canonical motif sequence. There are limits to the complexity of the patterns of motifs that can be found by combinatorial methods. For some values of l and d, there will exist many sets of random words in a cluster of gene promoters that appear to form an (l, d)-motif. For these motifs, it will be impossible to distinguish biological motifs from randomly generated motifs. A better formalization of motifs is the (l, f, d)-motif that is derived from a biological consideration of motifs. The motivation for (l, f, d)-motifs comes from an examination of known transcription factor binding sites where typically a few positions in the motif are invariant. It is shown that there exist (l, f, d)-motifs that can be found in the promoters of gene clusters that would not be recognizable from random sequences if they were described as (l, d)-motifs. The inclusion of the f-value in the definition of motifs suggests that the sequence space that is occupied by a motif will consist of a several clusters of closely related sequences. An algorithm, CM, has been developed that identifies small sets of overabundant sequences in the promoters from a cluster of genes and then combines these simple sets of sequences to form complex (l, f, d)-motif models. A dataset from a yeast gene expression experiment is analyzed with CM. Known biological motifs and novel motifs are identified by CM. The performance of CM is compared to that of a popular expectation maximization algorithm, AlginACE, and to that from a simple combinatorial motif finding program.
|
6 |
Problematic shores : The literature of islandsLoxley, D. January 1986 (has links)
No description available.
|
7 |
Motif extraction from complex data : case of protein classification / Extraction de motifs des données complexes : cas de la classification des protéinesSaidi, Rabie 03 October 2012 (has links)
La classification est l’un des défis important en bioinformatique, aussi bien pour les données protéiques que nucléiques. La présence de ces données en grandes masses, leur ambiguïté et en particulier les coûts élevés de l’analyse in vitro en termes de temps et d’argent, rend l’utilisation de la fouille de données plutôt une nécessité qu’un choix rationnel. Cependant, les techniques fouille de données, qui traitent souvent des données sous le format relationnel, sont confrontés avec le format inapproprié des données biologiques. Par conséquent, une étape inévitable de prétraitement doit être établie. Cette thèse traite du prétraitement de données protéiques comme une étape de préparation avant leur classification. Nous présentons l’extraction de motifs comme un moyen fiable pour répondre à cette tâche. Les motifs extraits sont utilisés comme descripteurs, en vue de coder les protéines en vecteurs d’attributs. Cela permet l’utilisation des classifieurs connus. Cependant, la conception d’un espace appropié d’attributs, n’est pas une tâche triviale. Nous traitons deux types de données protéiques à savoir les séquences et les structures 3D. Dans le premier axe, i:e:; celui des séquences, nous proposons un nouveau procédé de codage qui utilise les matrices de substitution d’acides aminés pour définir la similarité entre les motifs lors de l’étape d’extraction. En utilisant certains classifieurs, nous montrons l’efficacité de notre approche en la comparant avec plusieurs autres méthodes de codage. Nous proposons également de nouvelles métriques pour étudier la robustesse de certaines de ces méthodes lors de la perturbation des données d’entrée. Ces métriques permettent de mesurer la capacité d’une méthode de révéler tout changement survenant dans les données d’entrée et également sa capacité à cibler les motifs intéressants. Le second axe est consacré aux structures protéiques 3D, qui ont été récemment considérées comme graphes d’acides aminés selon différentes représentations. Nous faisons un bref survol sur les représentations les plus utilisées et nous proposons une méthode naïve pour aider à la construction de graphes d’acides aminés. Nous montrons que certaines méthodes répandues présentent des faiblesses remarquables et ne reflètent pas vraiment la conformation réelle des protéines. Par ailleurs, nous nous intéressons à la découverte, des sous-structures récurrentes qui pourraient donner des indications fonctionnelles et structurelles. Nous proposons un nouvel algorithme pour trouver des motifs spatiaux dans les protéines. Ces motifs obéissent à un format défini sur la base d’une argumentation biologique. Nous comparons avec des motifs séquentiels et spatiaux de certains travaux reliés. Pour toutes nos contributions, les résultats expérimentaux confirment l’efficacité de nos méthodes pour représenter les séquences et les structures protéiques, dans des tâches de classification. Les programmes développés sont disponibles sur ma page web http://fc.isima.fr/~saidi. / The classification of biological data is one of the significant challenges inbioinformatics, as well for protein as for nucleic data. The presence of these data in hugemasses, their ambiguity and especially the high costs of the in vitro analysis in terms oftime and resources, make the use of data mining rather a necessity than a rational choice.However, the data mining techniques, which often process data under the relational format,are confronted with the inappropriate format of the biological data. Hence, an inevitablestep of pre-processing must be established.This thesis deals with the protein data preprocessing as a preparation step before theirclassification. We present motif extraction as a reliable way to address that task. The extractedmotifs are used as descriptors to encode proteins into feature vectors. This enablesthe use of known data mining classifiers which require this format. However, designing asuitable feature space, for a set of proteins, is not a trivial task.We deal with two kinds of protein data i:e:, sequences and tri-dimensional structures. In thefirst axis i:e:, protein sequences, we propose a novel encoding method that uses amino-acidsubstitution matrices to define similarity between motifs during the extraction step. Wedemonstrate the efficiency of such approach by comparing it with several encoding methods,using some classifiers. We also propose new metrics to study the robustness of some ofthese methods when perturbing the input data. These metrics allow to measure the abilityof the method to reveal any change occurring in the input data and also its ability to targetthe interesting motifs. The second axis is dedicated to 3D protein structures which are recentlyseen as graphs of amino acids. We make a brief survey on the most used graph-basedrepresentations and we propose a naïve method to help with the protein graph making. Weshow that some existing and widespread methods present remarkable weaknesses and do notreally reflect the real protein conformation. Besides, we are interested in discovering recurrentsub-structures in proteins which can give important functional and structural insights.We propose a novel algorithm to find spatial motifs from proteins. The extracted motifsmatch a well-defined shape which is proposed based on a biological basis. We compare withsequential motifs and spatial motifs of recent related works. For all our contributions, theoutcomes of the experiments confirm the efficiency of our proposed methods to representboth protein sequences and protein 3D structures in classification tasks.Software programs developed during this research work are available on my home page http://fc.isima.fr/~saidi.
|
8 |
Approche plurielle à l'étude de la structure tertiaire de l'ARN chez les virusPermal, Emmanuelle January 2007 (has links)
Thèse numérisée par la Direction des bibliothèques de l'Université de Montréal.
|
9 |
An analysis of emotion-exchange motifs in multiplex networks during emergency eventsKusen, Ema, Strembeck, Mark January 2019 (has links) (PDF)
In this paper, we present an analysis of the emotion-exchange patterns that arise from
Twitter messages sent during emergency events. To this end, we performed a
systematic structural analysis of the multiplex communication network that we derived
from a data-set including more than 1.9 million tweets that have been sent during five
recent shootings and terror events. In order to study the local communication
structures that emerge as Twitter users directly exchange emotional messages, we
propose the concept of emotion-exchangemotifs. Our findings suggest that
emotion-exchange motifs which contain reciprocal edges (indicating online
conversations) only emerge when users exchange messages that convey anger or fear,
either in isolation or in any combination with another emotion. In contrast, the
expression of sadness, disgust, surprise, as well as any positive emotion are rather
characteristic for emotion-exchange motifs representing one-way communication
patterns (instead of online conversations). Among other things, we also found that a
higher structural similarity exists between pairs of network layers consisting of one
high-arousal emotion and one low-arousal emotion, rather than pairs of network layers
belonging to the same arousal dimension.
|
10 |
Motif Selection: Identification of Gene Regulatory Elements using Sequence CoverageBased Models and Evolutionary AlgorithmsAl-Ouran, Rami January 2015 (has links)
No description available.
|
Page generated in 0.045 seconds