Spelling suggestions: "subject:"alphafold"" "subject:"alphafold2""
1 |
Prediction of multiple conformational states of membrane proteinsThorén, Tobias January 2024 (has links)
Predicting protein structures has long been an area of active research in the field ofbioinformatics. Great strides have recently been made in this area by googles DeepMindteam. They developed an AI called AlphaFold which is able to make the most accuratepredictions of protein structures as of date. With the advent of AlphaFold some considerthe problem solved. There is however an area in protein prediction that has lagged be-hind, that of multi conformational prediction. There are proteins that can take on oneout of several active forms in the body. Making predictions for these are harder than forsingle conformational proteins due to an increase in complexity and a lack of data. Apromising solution to this problem is to introduce noise to the input data AlphaFold usesto create a wider range of predictions. In this thesis multi conformational prediction withdifferent methods to introduce noise is evaluated. Dropout, disclosing templates, untar-geted Multiple sequence alignment(MSA) subsampling and targeted MSA subsamplingwere used. It was concluded introducing noise did indeed improve the prediction of mul-tiple conformations. Among them, MSA subsampling seemed to be the most effective,especially untargeted MSA subsampling. Dropout also seemed to slightly improve theresults while excluding template information did little to nothing. AlphaFold was unableto predict both structures for 6 out of 16 structures, even with introduced noise. No clearreason for why this could be determined, but the leading hypothesis is that AlhpaFoldwas unable to extract sufficient information about both conformations from the MSAdata for these proteins.
|
2 |
Determining Protein Conformational Ensembles by Combining Machine Learning and SAXS / Bestämning av konformationsensembler hos protein genom att kombinera maskininlärning med SAXSEriksson Lidbrink, Samuel January 2023 (has links)
In structural biology, immense effort has been put into discovering functionally relevant atomic resolution protein structures. Still, most experimental, computational and machine learning-based methods alone struggle to capture all the functionally relevant states of many proteins without very involved and system-specific techniques. In this thesis, I propose a new broadly applicable method for determining an ensemble of functionally relevant protein structures. The method consists of (1) generating multiple protein structures from AlphaFold2 by stochastic subsampling of the multiple sequence alignment (MSA) depth, (2) screening these structures using small-angle X-ray scattering (SAXS) data and a structure validation scoring tool, (3) simulating the screened conformers using short molecular dynamics (MD) simulations and (4) refining the ensemble of simulated structures by reweighting it against SAXS data using a bayesian maximum entropy (BME) approach. I apply the method to the T-cell intracellular antigen-1 (TIA-1) protein and find that the generated ensemble is in good agreement with the SAXS data it is fitted to, in contrast to the original set of conformations from AF2. Additionally, the predicted radius of gyration is much more consistent with the experimental value than what is predicted from a 450 ns long MD simulation starting from a single structure. Finally, I cross-validate my findings against small-angle neutron scattering (SANS) data and find that the method-generated ensemble, although not in a perfect way, fits some of the SANS data much better than the ensemble from the long MD simulation. Since the method is fairly automatic, I argue that it could be used by non-experts in MD simulations and also in combination with more advanced methods for more accurate results. I also propose generalisations of the method by tuning it to different biological systems, by using other AI-based methods or a different type of experimental data. / Inom strukturbiologi har ett stort arbete lagts på att bestämma funktionellt relevanta proteinstrukturer på atomnivå. Dock så har de flesta experimentella, simuleringsbaserade och maskinlärningsbaserade metoderna svårigheter med att ensamma bestämma alla funktionellt relevanta strukturer utan väldigt involverade och system-specifika tekniker. I den här masteruppsatsen föreslår jag en ny allmänt applicerbar metod för att bestämma ensembler av funktionellt relevanta proteinstrukturer. Metoden består utav (1) generering av ett flertal proteinkonformationer från AlphaFold2 (AF2) genom att stokastiskt subsampla djupet för multisekvenslinjering, (2) välja ut en delmängd av dessa konformationer med hjälp av small angle X-ray scattering (SAXS) och ett strukturvalideringsverktyg, (3) simulera de utvlada konformationerna med hjälp av korta molekyldynamiksimuleringar (MD-simuleringar) och (4) förfina ensemblen av simulerade konformationer genom att vikta om dem utgående från SAXS-data med en Bayesian Maximum Entropy-metod. Jag applicerar min föreslagna metod på proteinet T-cell intracellular antigen-1 och finner att den genererade ensemblen har en god anpassning till den SAXS-profil den är anpassad till, till skillnad från ensemblen av konformationer direkt genererade av AF2. Dessutom är den förutspådda tröghetsradien mycket mer konsekvent med den experimentellt förutspådda radien än vad som förutspås utifrån en 450 ns lång MD-simulering utgående från en ensam struktur. Slutgiltigen korsvaliderar jag mina upptäckter mot data från small-angle neutron scattering (SANS) och finner att den metod-genererade ensemblen, om än inte på ett perfekt sätt, passar en del av SANS-datan mycket bättre än ensemblen från den långa MD simulationen. Då metoden är ganska automatisk så argumenterar jag för att den med fördel kan användas av icke-experter inom MD simuleringar och dessutom kombineras med mer avancerade metoder för ännu bättre resultat. Jag föreslår också generaliseringar av metoden genom att kunna anpassa den till olika biologiska system, genom att använda andra AI-baserade metoder eller att använda andra typer av experimentell data.
|
Page generated in 0.0293 seconds