1 |
Aspects of scientific methodology with special reference to evolutionary biologyAnderson, Michael Laurence 16 September 2014 (has links)
A critical examination of Popper’s falsificationism as a methodological
criterion of demarcation led to the development o f a supplementary means of
distinguishing science from pseudo- science The discipline is made the unit of
appraisal and its pattern o f historical development b used as the indicator of
demarcation. Results of a test of this indicator against astrology and physical
optics accord with our basic judgm ents of these disciplines. The indicator
effectively reveals that scientific creationism is pseudo-science, and that
evolutionary biology is genuine science.
Three fundam ental approaches to scientific investigation, viz.
v erificationism , falsificationism and m ulti-cornered testing (M CT) are
contrasted. MCT is distinguished by competition between hypotheses, which
makes it more informative than at least the naive versions of the other two
approaches. While competition does not produce immediate victors, it does make
demands on theories, which can be augmented by prescribing a series of
independent tests. The comparative method implies the existence of two types of
evidence. Common evidence is that which io predicted or explained by two or
more rival hypotheses. Discriminatory evidence favours one rival over the
others.
It is argued that in both the fields of species biology and speciation there
have been instances o f over-relying on common evidence, o f indistinctly
defining alternative hypotheses, of ro t following their logical consequences and
of not using exisiing discriminatory evidence to adjudicate between these
hypotheses. Species concepts and definitions of modes o f speciation are
evaluated. Normative principles are suggested for defining species and other
important terms in evolutionary biology, and for testing species concepts and
modes of speciation. The advantages and limitations o f a historical indicator of
demarcation and the merits and principles of the comparative approach to
method are discussed and illustrated using the analoev of a mathematical game.
Scientific crcanomsni is shown to have a coating of scientific method,
but to have systematically violated fundamental methodological principles.
D arn in ’* method in contrast, had a comparative structure, and distinguished
between common *nd discriminatory evidence. While there are methodological
problems sn evolutionary biology, these are shown to be minor in comparison to
that four*! in to c n o fk ciratxxiiun.
|
2 |
A computational framework for protein-DNA binding discovery.January 2010 (has links)
Wong, Ka Chun. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 109-121). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgements --- p.iv / List of Figures --- p.ix / List of Tables --- p.xi / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Objective --- p.2 / Chapter 1.3 --- Methodology --- p.2 / Chapter 1.4 --- Bioinforrnatics --- p.2 / Chapter 1.5 --- Computational Methods --- p.3 / Chapter 1.5.1 --- Evolutionary Algorithms --- p.3 / Chapter 1.5.2 --- Data Mining for TF-TFBS bindings --- p.4 / Chapter 2 --- Background --- p.5 / Chapter 2.1 --- Gene Transcription --- p.5 / Chapter 2.1.1 --- Protein-DNA Binding --- p.6 / Chapter 2.1.2 --- Existing Methods --- p.6 / Chapter 2.1.3 --- Related Databases --- p.8 / Chapter 2.1.3.1 --- TRANSFAC - Experimentally Determined Database --- p.8 / Chapter 2.1.3.2 --- cisRED - Computational Determined Database --- p.9 / Chapter 2.1.3.3 --- ORegAnno - Community Driven Database --- p.10 / Chapter 2.2 --- Evolutionary Algorithms --- p.13 / Chapter 2.2.1 --- Representation --- p.15 / Chapter 2.2.2 --- Parent Selection --- p.16 / Chapter 2.2.3 --- Crossover Operators --- p.17 / Chapter 2.2.4 --- Mutation Operators --- p.18 / Chapter 2.2.5 --- Survival Selection --- p.19 / Chapter 2.2.6 --- Termination Condition --- p.19 / Chapter 2.2.7 --- Discussion --- p.19 / Chapter 2.2.8 --- Examples --- p.19 / Chapter 2.2.8.1 --- Genetic Algorithm --- p.20 / Chapter 2.2.8.2 --- Genetic Programming --- p.21 / Chapter 2.2.8.3 --- Differential Evolution --- p.21 / Chapter 2.2.8.4 --- Evolution Strategy --- p.22 / Chapter 2.2.8.5 --- Swarm Intelligence --- p.23 / Chapter 2.3 --- Association Rule Mining --- p.24 / Chapter 2.3.1 --- Objective --- p.24 / Chapter 2.3.2 --- Apriori Algorithm --- p.24 / Chapter 2.3.3 --- Partition Algorithm --- p.25 / Chapter 2.3.4 --- DHP --- p.25 / Chapter 2.3.5 --- Sampling --- p.25 / Chapter 2.3.6 --- Frequent Pattern Tree --- p.26 / Chapter 3 --- Discovering Protein-DNA Binding Sequence Patterns Using Associa- tion Rule Mining --- p.27 / Chapter 3.1 --- Materials and Methods --- p.28 / Chapter 3.1.1 --- Association Rule Mining and Apriori Algorithm --- p.29 / Chapter 3.1.2 --- Discovering associated TF-TFBS sequence patterns --- p.29 / Chapter 3.1.3 --- "Data, Preparation" --- p.31 / Chapter 3.2 --- Results and Analysis --- p.34 / Chapter 3.2.1 --- Rules Discovered --- p.34 / Chapter 3.2.2 --- Quantitative Analysis --- p.36 / Chapter 3.2.3 --- Annotation Analysis --- p.37 / Chapter 3.2.4 --- Empirical Analysis --- p.37 / Chapter 3.2.5 --- Experimental Analysis --- p.38 / Chapter 3.3 --- Verifications --- p.41 / Chapter 3.3.1 --- Verification by PDB --- p.41 / Chapter 3.3.2 --- Verification by Homology Modeling --- p.45 / Chapter 3.3.3 --- Verification by Random Analysis --- p.45 / Chapter 3.4 --- Discussion --- p.49 / Chapter 4 --- Designing Evolutionary Algorithms for Multimodal Optimization --- p.50 / Chapter 4.1 --- Introduction --- p.50 / Chapter 4.2 --- Problem Definition --- p.51 / Chapter 4.2.1 --- Minimization --- p.51 / Chapter 4.2.2 --- Maximization --- p.51 / Chapter 4.3 --- An Evolutionary Algorithm with Species-specific Explosion for Multi- modal Optimization --- p.52 / Chapter 4.3.1 --- Background --- p.52 / Chapter 4.3.1.1 --- Species Conserving Genetic Algorithm --- p.52 / Chapter 4.3.2 --- Evolutionary Algorithm with Species-specific Explosion --- p.53 / Chapter 4.3.2.1 --- Species Identification --- p.53 / Chapter 4.3.2.2 --- Species Seed Delta Evaluation --- p.55 / Chapter 4.3.2.3 --- Stage Switching Condition --- p.56 / Chapter 4.3.2.4 --- Species-specific Explosion --- p.57 / Chapter 4.3.2.5 --- Calculate Explosion Weights --- p.59 / Chapter 4.3.3 --- Experiments --- p.59 / Chapter 4.3.3.1 --- Performance measurement --- p.60 / Chapter 4.3.3.2 --- Parameter settings --- p.61 / Chapter 4.3.3.3 --- Results --- p.61 / Chapter 4.3.4 --- Conclusion --- p.62 / Chapter 4.4 --- A. Crowding Genetic. Algorithm with Spatial Locality for Multimodal Op- timization --- p.64 / Chapter 4.4.1 --- Background --- p.64 / Chapter 4.4.1.1 --- Crowding Genetic Algorithm --- p.64 / Chapter 4.4.1.2 --- Locality of Reference --- p.64 / Chapter 4.4.2 --- Crowding Genetic Algorithm with Spatial Locality --- p.65 / Chapter 4.4.2.1 --- Motivation --- p.65 / Chapter 4.4.2.2 --- Offspring generation with spatial locality --- p.65 / Chapter 4.4.3 --- Experiments --- p.67 / Chapter 4.4.3.1 --- Performance measurements --- p.67 / Chapter 4.4.3.2 --- Parameter setting --- p.68 / Chapter 4.4.3.3 --- Results --- p.68 / Chapter 4.4.4 --- Conclusion --- p.68 / Chapter 5 --- Generalizing Protein-DNA Binding Sequence Representations and Learn- ing using an Evolutionary Algorithm for Multimodal Optimization --- p.70 / Chapter 5.1 --- Introduction and Background --- p.70 / Chapter 5.2 --- Problem Definition --- p.72 / Chapter 5.3 --- Crowding Genetic Algorithm with Spatial Locality --- p.72 / Chapter 5.3.1 --- Representation --- p.72 / Chapter 5.3.2 --- Crossover Operators --- p.73 / Chapter 5.3.3 --- Mutation Operators --- p.73 / Chapter 5.3.4 --- Fitness Function --- p.74 / Chapter 5.3.5 --- Distance Metric --- p.76 / Chapter 5.4 --- Experiments --- p.77 / Chapter 5.4.1 --- Parameter Setting --- p.77 / Chapter 5.4.2 --- Search Space Estimation --- p.78 / Chapter 5.4.3 --- Experimental Procedure --- p.78 / Chapter 5.4.4 --- Results and Analysis --- p.79 / Chapter 5.4.4.1 --- Generalization Analysis --- p.79 / Chapter 5.4.4.2 --- Verification By PDB --- p.86 / Chapter 5.5 --- Conclusion --- p.87 / Chapter 6 --- Predicting Protein Structures on a Lattice Model using an Evolution- ary Algorithm for Multimodal Optimization --- p.88 / Chapter 6.1 --- Introduction --- p.88 / Chapter 6.2 --- Problem Definition --- p.89 / Chapter 6.3 --- Representation --- p.90 / Chapter 6.4 --- Related Works --- p.91 / Chapter 6.5 --- Crowding Genetic Algorithm with Spatial Locality --- p.92 / Chapter 6.5.1 --- Motivation --- p.92 / Chapter 6.5.2 --- Customization --- p.92 / Chapter 6.5.2.1 --- Distance metrics --- p.92 / Chapter 6.5.2.2 --- Handling infeasible conformations --- p.93 / Chapter 6.6 --- Experiments --- p.94 / Chapter 6.6.1 --- Performance Metrics --- p.94 / Chapter 6.6.2 --- Parameter Settings --- p.94 / Chapter 6.6.3 --- Results --- p.94 / Chapter 6.7 --- Conclusion --- p.95 / Chapter 7 --- Conclusion and Future Work --- p.97 / Chapter 7.1 --- Thesis Contribution --- p.97 / Chapter 7.2 --- Fixture Work --- p.98 / Chapter A --- Appendix --- p.99 / Chapter A.1 --- Problem Definition in Chapter 3 --- p.107 / Bibliography --- p.109 / Author's Publications --- p.122
|
3 |
Computational development of regulatory gene set networks for systems biology applicationsSuphavilai, Chayaporn January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / In systems biology study, biological networks were used to gain insights into biological systems. While the traditional approach to studying biological networks is based on the identification of interactions among genes or the identification of a gene set ranking according to differentially expressed gene lists, little is known about interactions between higher order biological systems, a network of gene sets. Several types of gene set network have been proposed including co-membership, linkage, and co-enrichment human gene set networks. However, to our knowledge, none of them contains directionality information. Therefore, in this study we proposed a method to construct a regulatory gene set network, a directed network, which reveals novel relationships among gene sets. A regulatory gene set network was constructed by using publicly available gene regulation data. A directed edge in regulatory gene set networks represents a regulatory relationship from one gene set to the other gene set. A regulatory gene set network was compared with another type of gene set network to show that the regulatory network provides additional information. In order to show that a regulatory gene set network is useful for understand the underlying mechanism of a disease, an Alzheimer's disease (AD) regulatory gene set network was constructed.
In addition, we developed Pathway and Annotated Gene-set Electronic Repository (PAGER), an online systems biology tool for constructing and visualizing gene and gene set networks from multiple gene set collections. PAGER is available at http://discern.uits.iu.edu:8340/PAGER/. Global regulatory and global co-membership gene set networks were pre-computed. PAGER contains 166,489 gene sets, 92,108,741 co-membership edges, 697,221,810 regulatory edges, 44,188 genes, 651,586 unique gene regulations, and 650,160 unique gene interactions. PAGER provided several unique features including constructing regulatory gene set networks, generating expanded gene set networks, and constructing gene networks within a gene set.
However, tissue specific or disease specific information was not considered in the disease specific network constructing process, so it might not have high accuracy of presenting the high level relationship among gene sets in the disease context. Therefore, our framework can be improved by collecting higher resolution data, such as tissue specific and disease specific gene regulations and gene sets. In addition, experimental gene expression data can be applied to add more information to the gene set network. For the current version of PAGER, the size of gene and gene set networks are limited to 100 nodes due to browser memory constraint. Our future plans is integrating internal gene or proteins interactions inside pathways in order to support future systems biology study.
|
4 |
Développement de potentiels statistiques pour l'étude in silico de protéines et analyse de structurations alternatives / Development of statistical potentials for the [study] in silico study of proteins and analysis of alternative structuring.Dehouck, Yves 20 May 2005 (has links)
Cette thèse se place dans le cadre de l'étude in silico, c'est-à-dire assistée par ordinateur, des liens qui unissent la séquence d'une protéine à la (ou aux) structure(s) tri-dimensionnelle(s) qu'elle adopte. Le décryptage de ces liens présente de nombreuses applications dans divers domaines et constitue sans doute l'une des problématiques les plus fascinantes de la recherche en biologie moléculaire.<p><p>Le premier aspect de notre travail concerne le développement de potentiels statistiques dérivés de bases de données de protéines dont les structures sont connues. Ces potentiels présentent plusieurs avantages: ils peuvent être aisément adaptés à des représentations structurales simplifiées, et permettent de définir un nombre limité de fonctions énergétiques qui incarnent l'ensemble complexe d'interactions gouvernant la structure et la stabilité des protéines, et qui incluent également certaines contributions entropiques. Cependant, leur signification physique reste assez nébuleuse, car l'impact des diverses hypothèses nécessaires à leur dérivation est loin d'être clairement établi. Nous nous sommes attachés à l'étude de certaines limitations des ces potentiels: leur dépendance en la taille des protéines incluses dans la base de données, la non-additivité des termes de potentiels, et l'importance souvent négligée de l'environnement protéique spécifique ressenti par chaque résidu. Nous avons ainsi mis en évidence que l'influence de la taille des protéines de la base de données sur les potentiels de distance entre résidus est spécifique à chaque paire d'acides aminés, peut être relativement importante, et résulte essentiellement de la répartition inhomogène des résidus hydrophobes et hydrophiles entre le coeur et la surface des protéines. Ces résultats ont guidé la mise au point de fonctions correctives qui permettent de tenir compte de cette influence lors de la dérivation des potentiels. Par ailleurs, la définition d'une procédure générale de dérivation de potentiels et de termes de couplage a rendu possible la création d'une fonction énergétique qui tient compte simultanément de plusieurs descripteurs de séquence et de structure (la nature des résidus, leurs conformations, leurs accessibilités au solvant, ainsi que les distances qui les séparent dans l'espace et le long de la séquence). Cette fonction énergétique présente des performances nettement améliorées par rapport aux potentiels originaux, et par rapport à d'autres potentiels décrits dans la littérature.<p><p>Le deuxième aspect de notre travail concerne l'application de programmes basés sur des potentiels statistiques à l'étude de protéines qui adoptent des structures alternatives. La permutation de domaines est un phénomène qui affecte diverses protéines et qui implique la génération d'un oligomère suite à l'échange de fragments structuraux entre monomères identiques. Nos résultats suggèrent que la présence de "faiblesses structurales", c'est-à-dire de régions qui ne sont pas optimales vis-à-vis de la stabilité de la structure native ou qui présentent une préférence marquée pour une conformation non-native en absence d'interactions tertiaires, est intimement liée aux mécanismes de permutation. Nous avons également mis en évidence l'importance des interactions de type cation-{pi}, qui sont fréquemment observées dans certaines zones clés de la permutation. Finalement, nous avons sélectionné un ensemble de mutations susceptibles de modifier sensiblement la propension de diverses protéines à permuter. L'étude expérimentale de ces mutations devrait permettre de valider, ou de raffiner, les hypothèses que nous avons proposées quant au rôle joué par les faiblesses structurales et les interactions de type cation-{pi}. Nous avons également analysé une autre protéine soumise à d'importants réarrangements conformationnels: l'{alpha}1-antitrypsine. Dans le cas de cette protéine, les modifications structurales sont indispensables à l'exécution de l'activité biologique normale, mais peuvent sous certaines conditions mener à la formation de polymères insolubles et au développement de maladies. Afin de contribuer à une meilleure compréhension des mécanismes responsables de la polymérisation, nous avons cherché à concevoir rationnellement des protéines mutantes qui présentent une propension à polymériser contrôlée. Des tests expérimentaux ont été réalisés par le groupe australien du Professeur S.P. Bottomley, et ont permis de valider nos prédictions de manière assez remarquable.<p><p><p><p>The work presented in this thesis concerns the computational study of the relationships between the sequence of a protein and its three-dimensional structure(s). The unravelling of these relationships has many applications in different domains and is probably one of the most fascinating issues in molecular biology.<p><p>The first part of our work is devoted to the development of statistical potentials derived from databases of known protein structures. These potentials allow to define a limited number of energetic functions embodying the complex ensemble of interactions that rule protein folding and stability (including some entropic contributions), and can be easily adapted to simplified representations of protein structures. However, their physical meaning remains unclear since several hypotheses and approximations are necessary, whose impact is far from clearly understood. We studied some of the limitations of these potentials: their dependence on the size of the proteins included in the database, the non-additivity of the different potential terms, and the importance of the specific environment of each residue. Our results show that residue-based distance potentials are affected by the size of the database proteins, and that this effect can be quite strong, is residue-specific, and seems to result mostly from the inhomogeneous partition of hydrophobic and hydrophilic residues between the surface and the core of proteins. On the basis of these observations, we defined a set of corrective functions in order to take protein size into account while deriving the potentials. On the other hand, we developed a general procedure of derivation of potentials and coupling terms and consequently created an energetic function describing the correlations between several sequence and structure descriptors (the nature of each residue, the conformation of its main chain, its solvent accessibility, and the distances that separate it from other residues, in space and along the sequence). This energetic function presents a strongly improved predictive power, in comparison with the original potentials and with other potentials described in the literature.<p><p>The second part describes the application of different programs, based on statistical potentials, to the study of proteins that adopt alternative structures. Domain swapping involves the exchange of a structural element between identical proteins, and leads to the generation of an oligomeric unit. We showed that the presence of “structural weaknesses”, regions that are not optimal with respect to the folding mechanisms or to the stability of the native structure, seems to be intimately linked with the swapping mechanisms. In addition, cation-{pi} interactions were frequently detected in some key locations and might also play an important role. Finally, we designed a set of mutations that are likely to affect the swapping propensities of different proteins. The experimental study of these mutations should allow to validate, or refine, our hypotheses concerning the importance of structural weaknesses and cation-{pi} interactions. We also analysed another protein that undergoes large conformational changes: {alpha}1-antitrypsin. In this case, the structural modifications are necessary to the proper execution of the biological activity. However, under certain circumstances, they lead to the formation of insoluble polymers and the development of diseases. With the aim of reaching a better understanding of the mechanisms that are responsible for this polymerisation, we tried to design mutant proteins that display a controlled polymerisation propensity. An experimental study of these mutants was conducted by the group of Prof. S.P. Bottomley, and remarkably confirmed our predictions.<p> / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished
|
5 |
Context specific text mining for annotating protein interactions with experimental evidencePandit, Yogesh 03 January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Proteins are the building blocks in a biological system. They interact with other proteins to make unique biological phenomenon. Protein-protein interactions play a valuable role in understanding the molecular mechanisms occurring in any biological system. Protein interaction databases are a rich source on protein interaction related information. They gather large amounts of information from published literature to enrich their data. Expert curators put in most of these efforts manually. The amount of accessible and publicly available literature is growing very rapidly. Manual annotation is a time consuming process. And with the rate at which available information is growing, it cannot be dealt with only manual curation. There need to be tools to process this huge amounts of data to bring out valuable gist than can help curators proceed faster. In case of extracting protein-protein interaction evidences from literature, just a mere mention of a certain protein by look-up approaches cannot help validate the interaction. Supporting protein interaction information with experimental evidence can help this cause. In this study, we are applying machine learning based classification techniques to classify and given protein interaction related document into an interaction detection method. We use biological attributes and experimental factors, different combination of which define any particular interaction detection method. Then using predicted detection methods, proteins identified using named entity recognition techniques and decomposing the parts-of-speech composition we search for sentences with experimental evidence for a protein-protein interaction. We report an accuracy of 75.1% with a F-score of 47.6% on a dataset containing 2035 training documents and 300 test documents.
|
Page generated in 0.0689 seconds