Spelling suggestions: "subject:"geoinformatics"" "subject:"bioinformatics""
41 |
Planejamento de inibidores das enzimas gliceraldeído-3-fosfato desidrogenase e diidroorotato desidrogenase de Trypanosoma cruzi / Design of inhibitors of the enzymes glyceraldehyde-3-phosphate dehydrogenase and dihydroorotate dehydrogenase from Trypanosoma cruziRocha, Josmar Rodrigues da 15 March 2010 (has links)
A Doença de Chagas, causada pelo parasito tripanossomatídeo Trypanosoma cruzi, é endêmica e se distribuí por toda América Latina. É uma das parasitoses mais negligenciadas pela indústria farmacêutica e os únicos fármacos disponíveis para seu tratamento foram introduzidos há décadas. Infelizmente, eles são ineficientes e apresentam sérios efeitos colaterais. Esse panorama mostra a necessidade do desenvolvimento de novos fármacos para a quimioterapia contra a doença de Chagas. As enzimas pertencentes a vias metabólicas essenciais para a sobrevivência do parasito tais como a via glicolítica e a de síntese de novo de nucleotídeos de pirimidinas, têm sido propostas como alvos interessantes no planejamento novos fármacos para o tratamento da doença de Chagas. Neste trabalho, as enzimas Gliceraldeído 3-fosfato desidrogenase (TcGAPDH) e a Diidroorotato desidrogenase (TcDHODH) de Trypanosoma cruzi foram estudadas como alvos para o planejamento de inibidores enzimáticos com propriedades físico-químicas e características estruturais similares à de compostos-líderes. Para isso, foram utilizados métodos e ferramentas de Quiminformática tanto baseadas nas estruturas dos ligantes (LBVS) quanto dos receptores (SBVS). Para a identificação e seleção de potenciais inibidores da enzima GAPDH, uma coleção virtual obtida do banco de dados ZINC, contendo aproximadamente 2,5 milhões de compostos, foi avaliada através de vários filtros de seleção com o objetivo de priorizar aqueles compostos mais interessantes do ponto de vista estrutural, de propriedades físico-químicas e farmacocinéticas. A aplicação desses filtros originou uma subcoleção de aproximadamente 450 mil estruturas que foram avaliadas segundo a complementaridade de interações com a estrutura da enzima através de métodos de docagem molecular. Com base nestes resultados, doze compostos que se mostraram promissores foram selecionados e adquiridos comercialmente para serem testadas in vitro contra a enzima TcGAPDH. Dos doze compostos testados, três exibiram afinidade (Ki) pela enzima em concentrações inferiores a 80 μM, Além disso, esses compostos também são caracterizados pelo baixo peso molecular (274 a 330 g mol-1) e no máximo 24 átomos diferentes do hidrogênio e, como consequência, apresentam eficiências do ligante entre 0,24 e 0,34 Kcal mol-1 átomo-1, o que os tornam ótimos candidatos à otimização molecular visando aumento da afinidade pelo alvo. Para a busca de inibidores da enzima TcDHODH, primeiramente foi realizada uma busca por cavidades na estrutura 3D do alvo para a identificação de regiões distintas do sítio catalítico e passíveis de serem exploradas no planejamento de ligantes. Através desta análise foi possível o estabelecimento de quatro novas regiões com forma, volume e localizações adequadas para acomodar pequenas moléculas capazes de modular a atividade da TcDHODH. Uma destas regiões, chamada S2, localizada sob a alça β4-αA e no canal de acesso dos substratos ao sítio ativo, foi escolhida para o planejamento baseado na estrutura do alvo. As estruturas de aproximadamente cem compostos derivados de pirimidinas, substituídos em posições estrategicamente definidas e selecionados através de buscas por subestruturas, foram docadas no sítio de interesse e nove compostos adquiridos e testados in vitro contra a enzima com o objetivo de validar as hipóteses estabelecidas inicialmente. Destes, cinco compostos mostraram potências (IC50) superiores à do produto de reação (inferior a 150 μM), Os resultados encontrados validaram as hipóteses geradas na primeira etapa e foram usados para direcionar a seleção de outras quinze novas moléculas através de um protocolo de docagem molecular com ajuste induzido. A avaliação in vitro desses compostos contra a enzima TcDHODH resultou na identificação de outros 11 compostos ativos, dos quais o mais potente exibiu afinidade pela enzima em concentração igual a 124 nM. Este composto possui eficiência do ligante igual a 0,56 Kcal mol-1 átomo-1 e pode ser considerado um fragmento molecular com excelentes características do ponto de vista do potencial para futuro desenvolvimento como agente terapêutico. Os resultados encontrados também evidenciaram a importância de determinadas características estruturais impressas nos inibidores da TcDHODH para a complementaridade com o novo sítio de interação identificado. Assim, novos compostos foram propostos para otimização molecular com o objetivo de melhorar afinidade e aumentar a diversidade de classes e, deste modo, ampliar o espectro de perfis farmacocinéticos para posteriores ensaios celulares e in vivo, Através da realização deste trabalho foi possível validar as estratégias adotadas na utilização dos métodos computacionais e também as hipóteses construídas a partir da aplicação dos mesmos. A taxa de acerto (TA) alcançada foi superior a 30% no planejamento de inibidores para ambos os alvos, ou seja, muito superiores às encontradas em experimentos de ensaio em massa. Deste modo, este estudo contribuiu com a proposição de novos esqueletos moleculares que podem ser usados como compostos-líderes no desenvolvimento de novos agentes tripanocidas focando nas enzimas TcGAPDH e TcDHODH como alvos. / Chagas\' disease, an endemic illness widely distributed throughout Latin America, is caused by the protozoa parasite Trypanosoma cruzi. It is one of the tropical diseases that are among the most neglected by the pharmaceutical industry, for which available treatments were launched more than 30 years ago. In addition, these drugs are ineffective and cause severe side effects to patients. This panorama shows the need for the development of new and more effective chemotherapeutic agents for the treatment of the disease. Enzymes belonging to metabolic pathways that are essential for the parasite survival such as the glycolysis and pyrimidine nucleotide biosynthesis have been proposed as attractive targets for the design of new drugs for the treatment of Chagas disease. In this work, the enzymes Gyceraldehyde-3-phosphate dehydrogenase (TcGAPDH) and the Dihydroorotate dehydrogenase (TcDHODH) from Trypanosoma cruzi were studied as targets for the design of inhibitors with physicochemical properties and structural characteristics similar to lead-compounds. Methods in Cheminformatics within the Ligand- and Structure-based Virtual Screening (LBVS and SBVS, respectively) approaches were thoroughly employed as tools to identify new hits. For the selection and identification of potential inhibitors of the GAPDH enzyme, a compound database containing nearly 2.5 million of small molecules retrievable from ZINC was evaluated through several molecular filters aiming at prioritizing those compounds more interesting from the point of view of their structures, physicochemical and predicted ADME/Tox properties. The application of Filter originated a subcollection of approximately 450 thousand structures that were then scored according to their complementary interactions with the 3D structure of the enzyme through molecular docking. Based on docking results, twelve compounds that showed to be promising ligands were selected and commercially acquired for in vitro assays against the TcGAPDH. Of the twelve compounds evaluated in vitro, three exhibited affinity constants (Ki) at concentrations lower than 80 μM. Furthermore, the selected compounds are also characterized by the low molecular weight (274 to 330 g mol-1) and a maximum of non-hydrogen atom count of 24, as a result, they have Ligand Efficiencies between 0,24 and 0,34 Kcal mol-1 átomo-1, what grant them great potential as candidates for molecular optimization and potency improvement. For the search of TcDHODH inhibitors, cavities in the 3D structure of the target for the identification of areas apart from catalytic site but likely to be explored in the design of ligands were selected a priori. This resulted in four new regions with appropriate shape, volume and locations to accommodate small molecules capable of modulating the activity of TcDHODH. One of the areas, called S2 site, is located under the α4 - βA loop and in the access channel of the substrate to the active site and was chosen to be explored in the SBDD studies. Approximately one hundred of pyrimidine derivatives containing strategically defined posítions for molecular substitution were retrieved from commercially available compounds database through substructure searching and docked into the previously defined site. Based on the docking results nine compounds were selected, purchased and assayed in vitro against the enzyme with the objective of validating the hypothesis. Of these, five compounds showed potencies (IC50) better than that exhibited by the product of the reaction (values lower than 150 μM). Thus, the results found validated the hypotheses generated in the first stage of the designing and they were used to drive the selection of other fifteen new molecules through induced fit molecular docking protocol. The in vitro evaluation of those compounds against the TcDHODH enzyme resulted in the identification of other eleven ligands, of which the most potent exhibited affinity for enzyme at the concentration of 124 nM. This molecule has a Ligand Efficiency of 0.56 Kcal mol-1 atom-1 and can be considered a fragment-like compound with excellent characteristics from the point of view of its potential for future development as therapeutic agent. The results found also evidenced the importance of certain structural characteristics in the inhibitor of TcDHODH for the complementarily with the new identified site of interaction. Thus, new compounds were proposed for potency and class diversity improvement. By doing so we hope to enlarge ADME profile spectrum for further cellular and in vivo assays. Through the success of this work, it was possible to validate the strategies adopted in the use of computational methods and also the hypotheses built from the application of that. The success rate (TA) obtained was higher than 30% in the design of ligands for both studied targets, which is much better than that usually found along High Throughput Screening assays. Thus, this study contributed with the proposítion of new molecular scaffolds that can be used as lead compounds in the development of new tripanocidal agents having as targets the enzymes TcGAPDH or TcDHODH.
|
42 |
Planejamento de inibidores das enzimas gliceraldeído-3-fosfato desidrogenase e diidroorotato desidrogenase de Trypanosoma cruzi / Design of inhibitors of the enzymes glyceraldehyde-3-phosphate dehydrogenase and dihydroorotate dehydrogenase from Trypanosoma cruziJosmar Rodrigues da Rocha 15 March 2010 (has links)
A Doença de Chagas, causada pelo parasito tripanossomatídeo Trypanosoma cruzi, é endêmica e se distribuí por toda América Latina. É uma das parasitoses mais negligenciadas pela indústria farmacêutica e os únicos fármacos disponíveis para seu tratamento foram introduzidos há décadas. Infelizmente, eles são ineficientes e apresentam sérios efeitos colaterais. Esse panorama mostra a necessidade do desenvolvimento de novos fármacos para a quimioterapia contra a doença de Chagas. As enzimas pertencentes a vias metabólicas essenciais para a sobrevivência do parasito tais como a via glicolítica e a de síntese de novo de nucleotídeos de pirimidinas, têm sido propostas como alvos interessantes no planejamento novos fármacos para o tratamento da doença de Chagas. Neste trabalho, as enzimas Gliceraldeído 3-fosfato desidrogenase (TcGAPDH) e a Diidroorotato desidrogenase (TcDHODH) de Trypanosoma cruzi foram estudadas como alvos para o planejamento de inibidores enzimáticos com propriedades físico-químicas e características estruturais similares à de compostos-líderes. Para isso, foram utilizados métodos e ferramentas de Quiminformática tanto baseadas nas estruturas dos ligantes (LBVS) quanto dos receptores (SBVS). Para a identificação e seleção de potenciais inibidores da enzima GAPDH, uma coleção virtual obtida do banco de dados ZINC, contendo aproximadamente 2,5 milhões de compostos, foi avaliada através de vários filtros de seleção com o objetivo de priorizar aqueles compostos mais interessantes do ponto de vista estrutural, de propriedades físico-químicas e farmacocinéticas. A aplicação desses filtros originou uma subcoleção de aproximadamente 450 mil estruturas que foram avaliadas segundo a complementaridade de interações com a estrutura da enzima através de métodos de docagem molecular. Com base nestes resultados, doze compostos que se mostraram promissores foram selecionados e adquiridos comercialmente para serem testadas in vitro contra a enzima TcGAPDH. Dos doze compostos testados, três exibiram afinidade (Ki) pela enzima em concentrações inferiores a 80 μM, Além disso, esses compostos também são caracterizados pelo baixo peso molecular (274 a 330 g mol-1) e no máximo 24 átomos diferentes do hidrogênio e, como consequência, apresentam eficiências do ligante entre 0,24 e 0,34 Kcal mol-1 átomo-1, o que os tornam ótimos candidatos à otimização molecular visando aumento da afinidade pelo alvo. Para a busca de inibidores da enzima TcDHODH, primeiramente foi realizada uma busca por cavidades na estrutura 3D do alvo para a identificação de regiões distintas do sítio catalítico e passíveis de serem exploradas no planejamento de ligantes. Através desta análise foi possível o estabelecimento de quatro novas regiões com forma, volume e localizações adequadas para acomodar pequenas moléculas capazes de modular a atividade da TcDHODH. Uma destas regiões, chamada S2, localizada sob a alça β4-αA e no canal de acesso dos substratos ao sítio ativo, foi escolhida para o planejamento baseado na estrutura do alvo. As estruturas de aproximadamente cem compostos derivados de pirimidinas, substituídos em posições estrategicamente definidas e selecionados através de buscas por subestruturas, foram docadas no sítio de interesse e nove compostos adquiridos e testados in vitro contra a enzima com o objetivo de validar as hipóteses estabelecidas inicialmente. Destes, cinco compostos mostraram potências (IC50) superiores à do produto de reação (inferior a 150 μM), Os resultados encontrados validaram as hipóteses geradas na primeira etapa e foram usados para direcionar a seleção de outras quinze novas moléculas através de um protocolo de docagem molecular com ajuste induzido. A avaliação in vitro desses compostos contra a enzima TcDHODH resultou na identificação de outros 11 compostos ativos, dos quais o mais potente exibiu afinidade pela enzima em concentração igual a 124 nM. Este composto possui eficiência do ligante igual a 0,56 Kcal mol-1 átomo-1 e pode ser considerado um fragmento molecular com excelentes características do ponto de vista do potencial para futuro desenvolvimento como agente terapêutico. Os resultados encontrados também evidenciaram a importância de determinadas características estruturais impressas nos inibidores da TcDHODH para a complementaridade com o novo sítio de interação identificado. Assim, novos compostos foram propostos para otimização molecular com o objetivo de melhorar afinidade e aumentar a diversidade de classes e, deste modo, ampliar o espectro de perfis farmacocinéticos para posteriores ensaios celulares e in vivo, Através da realização deste trabalho foi possível validar as estratégias adotadas na utilização dos métodos computacionais e também as hipóteses construídas a partir da aplicação dos mesmos. A taxa de acerto (TA) alcançada foi superior a 30% no planejamento de inibidores para ambos os alvos, ou seja, muito superiores às encontradas em experimentos de ensaio em massa. Deste modo, este estudo contribuiu com a proposição de novos esqueletos moleculares que podem ser usados como compostos-líderes no desenvolvimento de novos agentes tripanocidas focando nas enzimas TcGAPDH e TcDHODH como alvos. / Chagas\' disease, an endemic illness widely distributed throughout Latin America, is caused by the protozoa parasite Trypanosoma cruzi. It is one of the tropical diseases that are among the most neglected by the pharmaceutical industry, for which available treatments were launched more than 30 years ago. In addition, these drugs are ineffective and cause severe side effects to patients. This panorama shows the need for the development of new and more effective chemotherapeutic agents for the treatment of the disease. Enzymes belonging to metabolic pathways that are essential for the parasite survival such as the glycolysis and pyrimidine nucleotide biosynthesis have been proposed as attractive targets for the design of new drugs for the treatment of Chagas disease. In this work, the enzymes Gyceraldehyde-3-phosphate dehydrogenase (TcGAPDH) and the Dihydroorotate dehydrogenase (TcDHODH) from Trypanosoma cruzi were studied as targets for the design of inhibitors with physicochemical properties and structural characteristics similar to lead-compounds. Methods in Cheminformatics within the Ligand- and Structure-based Virtual Screening (LBVS and SBVS, respectively) approaches were thoroughly employed as tools to identify new hits. For the selection and identification of potential inhibitors of the GAPDH enzyme, a compound database containing nearly 2.5 million of small molecules retrievable from ZINC was evaluated through several molecular filters aiming at prioritizing those compounds more interesting from the point of view of their structures, physicochemical and predicted ADME/Tox properties. The application of Filter originated a subcollection of approximately 450 thousand structures that were then scored according to their complementary interactions with the 3D structure of the enzyme through molecular docking. Based on docking results, twelve compounds that showed to be promising ligands were selected and commercially acquired for in vitro assays against the TcGAPDH. Of the twelve compounds evaluated in vitro, three exhibited affinity constants (Ki) at concentrations lower than 80 μM. Furthermore, the selected compounds are also characterized by the low molecular weight (274 to 330 g mol-1) and a maximum of non-hydrogen atom count of 24, as a result, they have Ligand Efficiencies between 0,24 and 0,34 Kcal mol-1 átomo-1, what grant them great potential as candidates for molecular optimization and potency improvement. For the search of TcDHODH inhibitors, cavities in the 3D structure of the target for the identification of areas apart from catalytic site but likely to be explored in the design of ligands were selected a priori. This resulted in four new regions with appropriate shape, volume and locations to accommodate small molecules capable of modulating the activity of TcDHODH. One of the areas, called S2 site, is located under the α4 - βA loop and in the access channel of the substrate to the active site and was chosen to be explored in the SBDD studies. Approximately one hundred of pyrimidine derivatives containing strategically defined posítions for molecular substitution were retrieved from commercially available compounds database through substructure searching and docked into the previously defined site. Based on the docking results nine compounds were selected, purchased and assayed in vitro against the enzyme with the objective of validating the hypothesis. Of these, five compounds showed potencies (IC50) better than that exhibited by the product of the reaction (values lower than 150 μM). Thus, the results found validated the hypotheses generated in the first stage of the designing and they were used to drive the selection of other fifteen new molecules through induced fit molecular docking protocol. The in vitro evaluation of those compounds against the TcDHODH enzyme resulted in the identification of other eleven ligands, of which the most potent exhibited affinity for enzyme at the concentration of 124 nM. This molecule has a Ligand Efficiency of 0.56 Kcal mol-1 atom-1 and can be considered a fragment-like compound with excellent characteristics from the point of view of its potential for future development as therapeutic agent. The results found also evidenced the importance of certain structural characteristics in the inhibitor of TcDHODH for the complementarily with the new identified site of interaction. Thus, new compounds were proposed for potency and class diversity improvement. By doing so we hope to enlarge ADME profile spectrum for further cellular and in vivo assays. Through the success of this work, it was possible to validate the strategies adopted in the use of computational methods and also the hypotheses built from the application of that. The success rate (TA) obtained was higher than 30% in the design of ligands for both studied targets, which is much better than that usually found along High Throughput Screening assays. Thus, this study contributed with the proposítion of new molecular scaffolds that can be used as lead compounds in the development of new tripanocidal agents having as targets the enzymes TcGAPDH or TcDHODH.
|
43 |
Modèles bio-informatiques pour les peptides non-ribosomiques et leurs synthétasesPupin, Maude 03 December 2013 (has links) (PDF)
Je présente dans ce mémoire de HDR le travail pionnier de la bio-informatique pour les peptides non-ribosomiques (PNR). Ces recherches ont été initiées sur Lille en 2006 et ont abouti à l'unique plate-forme d'analyse bio-informatique des PNR appelée Norine, dont je suis un des membres fondateurs. Les peptides non-ribosomiques font partie des petites molécules produites par les micro-organismes, bactéries et fungi, pour coloniser leur milieu. Ces peptides particuliers ont l'avantage d'avoir une grande variété de structures. En effet, ils peuvent être linéaires, mais aussi contenir des cycles et/ou des branchements et sont composés de plus de 500 briques de base différentes. Cette variété provient de leur synthèse réalisée par de gros complexes enzymatiques, les synthétases peptidiques non-ribosomiques (PNRS). Ceux-ci sélectionnent les acides aminés et d'autres composés, appelés monomères, puis les assemblent en formant des liaisons peptidiques et d'autres liaisons. Ainsi, les peptides non-ribosomiques présentent une grande diversité d'activités telles que antibiotique, anti-cancéreux ou immuno-suppresseur. Certains, comme la pénicilline, sont des médicaments employés fréquemment. Dans une première partie, je propose un regard différent sur les synthétases en associant les particularités des peptides aux fonctions enzymatiques nécessaires à les réaliser. Puis, je décris les principales étapes nécessaires à la conception d'un outil d'analyse des séquences protéiques de PNRS en précisant les particularités des outils existants. Ensuite, je présente ma contribution à l'exploration du potentiel de synthèse de PNR à partir de séquences génomiques ou protéiques à travers ma participation à la mise au point d'un protocole d'analyses bio-informatiques et à l'annotation de plusieurs génomes. Dans une seconde partie, je commence par préciser les apports de la plate-forme Norine sur la compréhension de la diversité des peptides non-ribosomiques, complétés par une étude de la chimie de ces molécules. Ensuite, je présente les quelques bases de données et outils en relation avec ces peptides, qui sont développés par ailleurs. Puis, je présente la plate-forme Norine en exposant mes contributions et en proposant la modernisation du processus de collecte des données et l'évolution des fonctionnalités d'interrogation via les structures peptidiques. Je termine par la présentation d'une nouvelle perspective : la chémo-informatique dédiée aux peptides non-ribosomiques avec pour objectif la prédiction d'une ou plusieurs synthétases capables de produire un peptide ayant une activité cible.
|
44 |
Importance des données inactives dans les modèles : application aux méthodes de criblage virtuel en santé humaine et environnementale / Importance of inactive data in models : application to virtual screening in human and environmental healthRéau, Manon 29 October 2019 (has links)
Le criblage virtuel est utilisé dans la recherche de médicaments et la construction de modèle de prédiction de toxicité. L’application d’un protocole de criblage est précédée par une étape d’évaluation sur une banque de données de référence. La composition des banques d’évaluation est un point critique ; celles-ci opposent généralement des molécules actives à des molécules supposées inactives, faute de publication des données d’inactivité. Les molécules inactives sont néanmoins porteuses d’information. Nous avons donc créé la banque NR-DBIND composée uniquement de molécules actives et inactives expérimentalement validées et dédiées aux récepteurs nucléaires. L’exploitation de la NR-DBIND nous a permis d’étudier l’importance des molécules inactives dans l’évaluation de modèles de docking et dans la construction de modèles de pharmacophores. L’application de protocoles de criblage a permis d’élucider des modes de liaison potentiels de petites molécules sur FXR, NRP-1 et TNF⍺. / Virtual screening is widely used in early stages of drug discovery and to build toxicity prediction models. Commonly used protocols include an evaluation of the performances of different tools on benchmarking databases before applying them for prospective studies. The content of benchmarking tools is a critical point; most benchmarking databases oppose active data to putative inactive due to the scarcity of published inactive data in the literature. Nonetheless, experimentally validated inactive data also bring information. Therefore, we constructed the NR-DBIND, a database dedicated to nuclear receptors that contains solely experimentally validated active and inactive data. The importance of the integration of inactive data in docking and pharmacophore models construction was evaluated using the NR-DBIND data. Virtual screening protocols were used to resolve the potential binding mode of small molecules on FXR, NRP-1 et TNF⍺.
|
45 |
Cellular and Computational Evaluation of the Structural Pharmacology of Delta Opioid ReceptorsYazan J Meqbil (14210360) 05 December 2022 (has links)
<p>G-protein coupled receptors (GPCRs) are membrane proteins that constitute ~30% of the FDA-approved drug targets. Opioid receptors are a subtype of GPCRs with four different receptor types: delta, kappa, mu, and nociception opioid receptors. Opioids such as morphine have been used for thousands of years and are deemed the most effective method for treating pain. However, opioids can have detrimental effects if used illicitly or over an extended period of time. Intriguingly, most of the clinically used opioids act on the mu opioid receptor (µOR). Hence, efforts in recent decades have focused on other opioid receptors to treat pain and other disorders. The delta opioid receptor (δOR) is one of four opioid receptors expressed in the central and peripheral nervous system. The δOR has attracted much attention as a potential target for a multitude of diseases and disorders including substance and alcohol use disorders, ischemia, migraine, and neurodegenerative diseases. However, to date, no δOR agonists, or drugs that act directly at the δOR, have been successful as clinical candidates. Nonetheless, the therapeutic potential of the δOR necessitates the targeting its pharmacologically. In this dissertation, I highlight peptide-based modulation as well as the identification of novel agonists at the δOR. I report research findings in the context of biased agonism at δOR, which is a hypothesized cellular signaling mechanism with potential therapeutic benefits. The focus on this work is the molecular determinants of biased agonism, which were investigated using a combination of cellular and computational approaches. </p>
|
46 |
Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline DevelopmentChen, Jonathan Jun Feng 23 May 2018 (has links)
No description available.
|
47 |
CHEMICAL SPACE INVADERS: ENHANCING EXPLORATION OF MODULARLY CONSTRUCTED CHEMICAL SPACES USING CONTEXT AWARE AI AGENTSMatthew Muhoberac (19820007) 10 October 2024 (has links)
<p dir="ltr">Chemical science can be imagined as a universe of information in which individual galaxies, solar systems, stars, and planets are compounds, reactions, biomolecules, etc. which need to be discovered, researched, and documented. The problem with this is that the universe of chemical science is potentially vaster than the one in which we live, and we are exploring it in a relatively inefficient manner. There is a scene in one of my favorite television shows, Futurama, which paints a picture of traditional chemical exploration. Taking place in the 30<sup>th</sup> century, the main character Fry loses his robot friend Bender in outer space and resorts to using a giant telescope in the Himalayan mountains to randomly search through points in space to try to find him. After days of searching nonstop, he gives up noting that it is an impossible task because space is so vast in size, and he is searching so inefficiently. While human exploration of chemistry may not be as inefficient, there are a lot of steps which are driven by trial and error and educated guesswork which ultimately introduce major inefficiencies into scientific discovery. While we don’t live in the 30<sup>th</sup> century yet, we do have access to 21<sup>st</sup>century technology which can assist in exploring chemistry in a more directed manner. This mainly involves using machine learning, search algorithms, and generative powered exploratory AI to serve as a force multiplier which can serve to assist human chemists in chemical exploration. To shamelessly compare this with another space-based sci-fi reference, this would be akin to deploying hundreds or thousands of automated space probes to search unexplored planets, akin to how the empire found the rebellion on Hoth in the Empire Strikes Back.</p><p dir="ltr">The journey to integrate AI with chemical exploration starts with the important concept of standardization and how to apply it to chemically relevant data. To easily organize, store, and access relevant aspects of small molecules, macromolecules, chemical reactions, biological assays, etc. it is imperative that data be represented in a standard format which accurately portrays necessary chemical information. This becomes especially relevant as humans aggregate more and more chemical data. In this thesis, we tackle a subset of standardization in Chapter 2 involving benchmarking sets for comparative evaluation of docking software. One major reason why standardization is so important is that standardization promotes ease of access to relevant data, regardless of if this access is attempted by human or computational means. While improving data access for humans is beneficial, computationally it is a game changer when datamining training data for machine learning (ML) applications. Having standardized data readily available for computational access allows for software to rapidly access and preprocess relevant data boosts efficiency in ML model training. In Chapter 4 of this thesis, the central database of the CIPHER close-loop system is standardized and integrated with a REST API, allowing for rapid data acquisition via a structured URL call. Having database standardization and a mechanism for easy data mining makes a database “ML ready” and promotes the database for ML applications.</p><p dir="ltr">Build upon data standardization and training ML models for chemical applications, the next step of this journey revolves around a concept known as a “chemical space” and how chemists can approximate and sample chemical spaces in a directed manner. In the context of this thesis, a chemical space can be visualized in the following manner. Start by envisioning any chemical relationship between some inputs and outputs as an unknown mathematical function. For example, if one is measuring the assay response of a specific drug at a certain concentration, the input would be the concentration, and the output would be the assay response. Then the bounds of this space are set by determining the range of input values and this forms a chemical space which corresponds to the chemical problem. Chemists sample these spaces every day when they go into the lab, run experiments, and analyze their data. While the example described above is relatively simple in scope, even if the relationship is very complex techniques such as ML can be used to approximate the relationship. An example of this approximation is shown in Chapter 3 of this thesis, where normalizing flow architecture is used to bias a vector space representation of molecules with chemical properties, creating a space which correlates compound and property and can be sampled to provided compounds with specific values of trained chemical properties. Training individual models is important, but to truly emulate certain chemical processes multiple models may need to be combined with physical instrumentation to efficiently sample and validate a chemical space. Chapter 4 of this thesis expands upon this concept by integrating a variety of ML modules with high-throughput (HT) bioassay instrumentation to create a “close loop” system designed around discovering, synthesizing, and validating non-addictive analgesics.</p><p dir="ltr">The final step of this journey is to integrate these systems which sample chemical spaces with AI, allowing for automated exploration of these spaces in a directed manner. There are several AI frameworks which can be used separately or combined to accomplish this task, but the framework that is the focus of this thesis is AI agents. AI agents are entities which use some form of AI to serve as a logical processing center which drives their exploration through a problem space. This can be a simple algorithm, some type of heuristic model, or an advance form of generative AI such as an LLM. Additionally, these agents generally have access to certain tools which serve as a medium for interaction with physical or computational environments, such as controlling a robotic arm or searching a database. Finally, these agents generally have a notion of past actions and observations, commonly referred to as memory, which allows agents to recall important information as they explore. Chapter 5 of this thesis details a custom agentic framework which is tailored towards complex scientific applications. This framework builds agents from source documentation around a specific user defined scope, provides them with access to literature and documentation in the form of embeddings, has custom memory for highly targeted retention, and allows form agents to communicate with one another to promote collaborative problem solving. Chapter 6 of this thesis showcase an application of a simpler agentic framework to an automated lipidomic workflow which performs comparative analysis on 5xFAD vs. WT mice brain tissue. The group of AI agents involved in this system generate mass spectrometry worklists, filter data into categories for analysis, perform comparative analysis, and allow for the user to dynamically create plots which can be used to answer specific statistical questions. In addition to performing all these operational and statistical analysis functions, the system includes an agent which uses document embeddings trained on curated technical manuals and protocols to answer user questions via a chatbot style interface. Overall, the system showcases how AI can effectivity be applied to relevant chemical problems to enhance speed, bolster accuracy, and improve usability.</p>
|
48 |
Beyond the paywall / a multi-sited ethnographic examination of the information-related behaviors of six scientistsKrueger, Stephanie 02 September 2016 (has links)
In dieser Dissertation untersuche ich die Forschungswege von sechs Wissenschaftlern, die in verschiedenen Disziplinen und Institutionen in den Vereinigten Staaten und in der Tschechischen Republik arbeiten. Um dies zu tun, verwende ich sogenannte „multi-sited“ ethnographisch-methodische Strategien (d.h. Strategien, die Anthropologen verwenden, um Kulturen an zwei oder mehr geografischen Standorten zu vergleichen), mit dem Ziel, informationsbezogene Verhaltensweisen dieser Wissenschaftler im global vernetzten akademischen Umfeld zu untersuchen, englisch abgekürzt „GNAE“, ein Begriff, der sich speziell auf die komplexe Bricolage von Netzwerkinfrastrukturen, Online-Informationsressourcen und Tools bezieht, die Wissenschaftler heutzutage nutzen, d.h. die weltweite akademische e-IS, oder akademische Infrastruktur (Edwards et al. 2013). Die zentrale Forschungsfrage (RQ1), die in dieser Dissertation beantwortet wird, ist: Gibt es, gemäß der multi-sited ethnographischen Analyse der beteiligten Wissenschaftler in dieser Studie—Personen, die Forschung in verschiedenen Disziplinen und Institutionen sowie an unterschiedlichen Standorten betreiben—Hinweise darauf, dass ein signifikanter Anteil der nicht-institutionellen/informellen informationsbezogenen Forschung über Mechanismen im GNAE, die nicht von Bibliotheken unterstützt werden, betrieben wird, sowie (RQ2): Was für Muster sind vorhanden und wie beziehen sie sich auf informationswissenschaftliche und andere sozialwissenschaftliche Theorien? Und drittens (RQ3): Haben die Resultate praxisnahe Bedeutungen für die Entwicklung von Dienstleistungen in wissenschaftlichen Bibliotheken? Ethnographische Strategien sind bisher noch nicht in der Informationswissenschaft (IS) eingesetzt worden, um Fragen dieser Art zu untersuchen. Die Ergebnisse zeigen, dass eine informelle Informationsexploration nur bei zwei Wissenschaftlern, die mit offenen Daten und Tools einer verteilten Computing-Infrastruktur arbeiten, zu finden ist. / In this dissertation I examine the pathways of information exploration and discovery of six scientists working in different research disciplines affiliated with several academic institutions in the United States and in the Czech Republic. To do so, I utilize multi-sited ethnographic methodological strategies (i.e., strategies developed by anthropologists to compare cultures across two or more geographic locations) to examine the information-related behaviors of these scholars within the global networked academic environment (GNAE), a term which specifically refers to the complex bricolage of network infrastructures, online information resources, and tools scholars use to perform their research today (i.e., the worldwide academic e-IS, or academic infrastructure [Edwards et al. 2013]). The central research question (RQ1) to be answered in this dissertation: According to the multi-sited ethnographic analysis of scientists participating in this study—individuals conducting research in various disciplines at different institutions in several geographical locations—is there evidence indicating a significant allotment of non-institutional/informal information-related exploration and discovery occurring beyond official library-supported mechanisms in the GNAE?, and—part two (RQ2) of the central research question—What (if any) patterns are exhibited and how do these patterns relate to information science (IS) and other social science theories? Both RQ1 and RQ2 are exploratory. I additionally ask (RQ3): What might all this mean in the applied sense? by showing examples of services piloted during the research process in response to my observations in the field. Multi-sited ethnographic strategies have not yet been employed in IS, as of the date of publication of this thesis, to examine such questions. Results indicate informal information exploration occurring only with two scientists who use of open data and tools on a distributed computing infrastructure.
|
49 |
Development of cheminformatics-based methods for computational prediction of off-target activitiesBanerjee, Priyanka 17 May 2017 (has links)
DieMenschheit ist vielfältigen chemischenWirkstoffen ausgesetzt – zum Beispiel durch Kosmetika und Pharmazeutika sowie durch viele andere chemische Quellen. Es wird angenommen, dass diese stetige Exposition mit Chemikalien gesundheitliche Beeinträchtigungen bei Menschen hervorruft. Zudem haben Regulierungsbehörden aus Europa und den USA festgestellt, dass es ein Risiko gibt, welches mit der kombinierten Exposition durch mehrere Chemikalien im Zusammenhang steht. Mögliche Kombinationen von Tausenden Wirkstoffen zu testen, ist sehr zeitaufwendig und nicht praktikabel. Das Hauptanliegen dieser Arbeit ist es, die Probleme von Off-target-Effekten chemischer Strukturen zu benennen – mit den Mitteln der Chemieinformatik, der strukturellen Bioinformatik sowie unter Berücksichtigung von computerbasierten, systembiologischen Ansätzen. Diese Dissertation ist in vier Hauptprojekte eingeteilt. ImProjekt I (Kapitel 3)wurde ein neuartiger Ensemble-Ansatz basierend auf der strukturellen Ähnlichkeit von chemischenWirkstoffen und Bestimmungen von toxischen Fragmenten implementiert,um die orale Toxizität bei Nagetieren vorherzusagen. Im Projekt II (Kapitel 4) wurden – auf der Grundlage von Daten des Tox21 Wettbewerbs – unterschiedliche Machine-Learning Modelle entwickelt und verglichen, um die Komponenten vorherzusagen, die in den toxikologischen Stoffwechselwegen mit Zielmolekülen interagieren von target-spezifischenWirkstoffen vorherzusagen. In Projekt III (Kapitel 5) wird ein neuartiger Ansatz beschrieben, welcher das dreigliedrige Konzept aus computerbasierter Systembiologie, Chemieinformatik und der strukturellen-Bioinformatik nutzt, um Medikamente zu bestimmen, welche das metabolische Syndrom hervorrufen. In Projekt IV (Kapitel 6) wurde in silico ein Screening Protokoll entwickelt, welches die strukturelle Ähnlichkeit, die pharmakophorischen Eigenschaften und die Überprüfung von computerbasierten Docking Studien berücksichtigt. / Exposure to various chemicals agents through cosmetics, medications, preserved food, environments and many other sources have resulted in serious health issues in humans. Additionally, regulatory authorities from Europe and United States of America have recognized the risk associated with combined exposure to multiple chemicals. Testing all possible combinations of these thousands of compounds is impractical and time consuming. The main aim of the thesis is to address the problem of off-targets effects of chemical structures by applying and developing cheminformatics, structural bioinformatics and computational systems biology approaches. This dissertation is divided into four main projects representing four different computational methods to aid different level of toxicological investigations. In project I (chapter 3) a novel ensemble approach based on the structural similarity of the chemical compounds and identifications of toxic fragments was implemented to predict rodent oral toxicity. In project II (chapter 4) different machine learning models were developed and compared using Tox 21 challenge 2014 data, to predict the outcomes of the compounds that have the potential to interact with the targets active in toxicological pathways. In project III (chapter 5) a novel approach integrating the trio concept of ’computational system biology, cheminformatics and structural bioinformatics’ to predict drugs induced metabolic syndrome have been described. In project IV (chapter 6) a in silico screening protocol was established taking into the structurally similarity, pharmacophoric features and validation using computational docking studies. This approach led to the identification of novel binding site for acyclovir in the peptide binding groove of the human leukocyte antigen (HLA) specific allele.
|
50 |
Hydrate crystal structures, radial distribution functions, and computing solubilitySkyner, Rachael Elaine January 2017 (has links)
Solubility prediction usually refers to prediction of the intrinsic aqueous solubility, which is the concentration of an unionised molecule in a saturated aqueous solution at thermodynamic equilibrium at a given temperature. Solubility is determined by structural and energetic components emanating from solid-phase structure and packing interactions, solute–solvent interactions, and structural reorganisation in solution. An overview of the most commonly used methods for solubility prediction is given in Chapter 1. In this thesis, we investigate various approaches to solubility prediction and solvation model development, based on informatics and incorporation of empirical and experimental data. These are of a knowledge-based nature, and specifically incorporate information from the Cambridge Structural Database (CSD). A common problem for solubility prediction is the computational cost associated with accurate models. This issue is usually addressed by use of machine learning and regression models, such as the General Solubility Equation (GSE). These types of models are investigated and discussed in Chapter 3, where we evaluate the reliability of the GSE for a set of structures covering a large area of chemical space. We find that molecular descriptors relating to specific atom or functional group counts in the solute molecule almost always appear in improved regression models. In accordance with the findings of Chapter 3, in Chapter 4 we investigate whether radial distribution functions (RDFs) calculated for atoms (defined according to their immediate chemical environment) with water from organic hydrate crystal structures may give a good indication of interactions applicable to the solution phase, and justify this by comparison of our own RDFs to neutron diffraction data for water and ice. We then apply our RDFs to the theory of the Reference Interaction Site Model (RISM) in Chapter 5, and produce novel models for the calculation of Hydration Free Energies (HFEs).
|
Page generated in 0.0648 seconds