11 |
Using regression analyses for the determination of protein structure from FTIR spectraWilcox, Kieaibi January 2014 (has links)
One of the challenges in the structural biological community is processing the wealth of protein data being produced today; therefore, the use of computational tools has been incorporated to speed up and help understand the structures of proteins, hence the functions of proteins. In this thesis, protein structure investigations were made through the use of Multivariate Analysis (MVA), and Fourier Transformed Infrared (FTIR), a form of vibrational spectroscopy. FTIR has been shown to identify the chemical bonds in a protein in solution and it is rapid and easy to use; the spectra produced from FTIR are then analysed qualitatively and quantitatively by using MVA methods, and this produces non-redundant but important information from the FTIR spectra. High resolution techniques such as X-ray crystallography and NMR are not always applicable and Fourier Transform Infrared (FTIR) spectroscopy, a widely applicable analytical technique, has great potential to assist structure analysis for a wide range of proteins. FTIR spectral shape and band positions in the Amide I (which contains the most intense absorption region), Amide II, and Amide III regions, can be analysed computationally, using multivariate regression, to extract structural information. In this thesis Partial least squares (PLS), a form of MVA, was used to correlate a matrix of FTIR spectra and their known secondary structure motifs, in order to determine their structures (in terms of "helix", "sheet", “310-helix”, “turns” and "other" contents) for a selection of 84 non-redundant proteins. Analysis of the spectral wavelength range between 1480 and 1900 cm-1 (Amide I and Amide II regions) results in high accuracies of prediction, as high as R2 = 0.96 for α-helix, 0.95 for β-sheet, 0.92 for 310-helix, 0.94 for turns and 0.90 for other; their Root Mean Square Error for Calibration (RMSEC) values are between 0.01 to 0.05, and their Root Mean Square Error for Prediction (RMSEP) values are between 0.02 to 0.12. The Amide II region also gave results comparable to that of Amide I, especially for predictions of helix content. We also used Principal Component Analysis (PCA) to classify FTIR protein spectra into their natural groupings as proteins of mainly α-helical structure, or protein of mainly β-sheet structure or proteins of some mixed variations of α-helix and β-sheet. We have also been able to differentiate between parallel and anti-parallel β-sheet. The developed methods were applied to characterize the secondary structure conformational changes of an unfolding protein as a function of pH and also to determine the limit of Quantitation (LoQ).Our structural analyses compare highly favourably to those in the literature using machine learning techniques. Our work proves that FTIR spectra in combination with multivariate regression analysis like PCA and PLS, can accurately identify and quantify protein secondary structure. The developed models in this research are especially important in the pharmaceutical industry where the therapeutic effect of drugs strongly depends on the stability of the physical or chemical structure of their proteins targets; therefore, understanding the structure of proteins is very important in the biopharmaceutical world for drugs production and formulation. There is a new class of drugs that are proteins themselves used to treat infectious and autoimmune diseases. The use of spectroscopy and multivariate regression analysis in the medical industry to identify biomarkers in diseases has also brought new challenges to the bioinformatics field. These methods may be applicable in food science and academia in general, for the investigation and elucidation of protein structure.
|
12 |
Protein secondary structure prediction using amino acid regularitiesSenekal, Frederick Petrus 23 January 2009 (has links)
The protein folding problem is examined. Specifically, the problem of predicting protein secondary structure from the amino acid sequence is investigated. A literature study is presented into the protein folding process and the different techniques that currently exist to predict protein secondary structures. These techniques include the use of expert rules, statistics, information theory and various computational intelligence techniques, such as neural networks, nearest neighbour methods, Hidden Markov Models and Support Vector Machines. A pattern recognition technique based on statistical analysis is developed to predict protein secondary structure from the amino acid sequence. The technique can be applied to any problem where an input pattern is associated with an output pattern and each element in both the input and output patterns can take its value from a set with finite cardinality. The technique is applied to discover the role that small sequences of amino acids play in the formation of protein secondary structures. By applying the technique, a performance score of Q8 = 59:2% is achieved, with a corresponding Q3 score of 69.7%. This compares well with state of the art techniques, such as OSS-HMM and PSIPRED, which achieve Q3 scores of 67.9% and 66.8% respectively, when predictions on single sequences are made. / Dissertation (MEng)--University of Pretoria, 2009. / Electrical, Electronic and Computer Engineering / unrestricted
|
13 |
Applications of Deep Neural Networks in Computer-Aided Drug DesignAhmadreza Ghanbarpour Ghouchani (10137641) 01 March 2021 (has links)
<div>Deep neural networks (DNNs) have gained tremendous attention over the recent years due to their outstanding performance in solving many problems in different fields of science and technology. Currently, this field is of interest to many researchers and growing rapidly. The ability of DNNs to learn new concepts with minimal instructions facilitates applying current DNN-based methods to new problems. Here in this dissertation, three methods based on DNNs are discussed, tackling different problems in the field of computer-aided drug design.</div><div><br></div><div>The first method described addresses the problem of prediction of hydration properties from 3D structures of proteins without requiring molecular dynamics simulations. Water plays a major role in protein-ligand interactions and identifying (de)solvation contributions of water molecules can assist drug design. Two different model architectures are presented for the prediction the hydration information of proteins. The performance of the methods are compared with other conventional methods and experimental data. In addition, their applications in ligand optimization and pose prediction is shown.</div><div><br></div><div>The design of de novo molecules has always been of interest in the field of drug discovery. The second method describes a generative model that learns to derive features from protein sequences to design de novo compounds. We show how the model can be used to generate molecules similar to the known for the targets the model have not seen before and compare with benchmark generative models.</div><div><br></div><div>Finally, it is demonstrated how DNNs can learn to predict secondary structure propensity values derived from NMR ensembles. Secondary structure propensities are important in identifying flexible regions in proteins. Protein flexibility has a major role in drug-protein binding, and identifying such regions can assist in development of methods for ligand binding prediction. The prediction performance of the method is shown for several proteins with two or more known secondary structure conformations.</div>
|
14 |
Positive correlation between A3 subunit of glycinin and firmness of tofu made from soybeans grown in three locations over two yearsChen, Ruiqi 10 December 2021 (has links) (PDF)
Producing desirable firmness is important in manufacturing tofu from soybeans. This study’s objective was to explore the environmental impact (location and year) on soybean chemical components and identify the correlations between chemical composition and the firmness of tofu made from soybeans planted in three locations over two years. Seventeen soybean Plant Introductions (PI) from the USDA Soybean Germplasm Collection and eight check varieties were planted in Mississippi, Virginia and Missouri in 2017 and 2018. Protein subunit composition, protein secondary structure, phytic acid content, Ca2+ and Mg2+ content were determined. The result showed that A3 subunit content was strongly correlated with tofu firmness. Environmental factors had a significant influence on some chemical components in soybean seeds as well as tofu texture. The current study confirmed the validity of using A3 peptide as a criterion for estimating tofu firmness in both tofu manufacturing and food-grade soybean trade.
|
15 |
Bio-Nano Interactions : Synthesis, Functionalization and Characterization of Biomaterial InterfacesCai, Yixiao January 2016 (has links)
Current strategies for designing biomaterials involve creating materials and interfaces that interact with biomolecules, cells and tissues. This thesis aims to investigate several bioactive surfaces, such as nanocrystalline diamond (NCD), hydroxyapatite (HA) and single crystalline titanium dioxide, in terms of material synthesis, surface functionalization and characterization. Although cochlear implants (CIs) have been proven to be clinically successful, the efficiency of these implants still needs to be improved. A CI typically only has 12-20 electrodes while the ear has approximately 3400 inner hair cells. A type of micro-textured NCD surface that consists of micrometre-sized nail-head-shaped pillars was fabricated. Auditory neurons showed a strong affinity for the surface of the NCD pillars, and the technique could be used for neural guidance and to increase the number of stimulation points, leading to CIs with improved performance. Typical transparent ceramics are fabricated using pressure-assisted sintering techniques. However, the development of a simple energy-efficient production method remains a challenge. A simple approach to fabricating translucent nano-ceramics was developed by controlling the morphology of the starting ceramic particles. Translucent nano-ceramics, including HA and strontium substituted HA, could be produced via a simple filtration process followed by pressure-less sintering. Furthermore, the application of such materials as a window material was investigated. The results show that MC3T3 cells could be observed through the translucent HA ceramic for up to 7 days. The living fluorescent staining confirmed that the MC3T3 cells were visible throughout the culture period. Single crystalline rutile possesses in vitro bioactivity, and the crystalline direction affects HA formation. The HA growth on (001), (100) and (110) faces was investigated in a simulated body fluid in the presence of fibronectin (FN) via two different processes. The HA layers on each face were analysed using different characterization techniques, revealing that the interfacial energies could be altered by the pre-adsorbed FN, which influenced HA formation. In summary, micro textured NCD, and translucent HA and FN functionalized single crystalline rutile, and their interactions with cells and biomimetic HA were studied. The results showed that controlled surface properties are important for enhancing a material’s biological performance.
|
16 |
Redes neurais residuais profundas e autômatos celulares como modelos para predição que fornecem informação sobre a formação de estruturas secundárias proteicas / Residual neural networks and cellular automata as protein secondary structure prediction models with information about foldingPereira, José Geraldo de Carvalho 15 March 2018 (has links)
O processo de auto-organização da estrutura proteica a partir da cadeia de aminoácidos é conhecido como enovelamento. Apesar de conhecermos a estrutura tridimencional de muitas proteínas, para a maioria delas, não possuímos uma compreensão suficiente para descrever em detalhes como a estrutura se organiza a partir da sequência de aminoácidos. É bem conhecido que a formação de núcleos de estruturas locais, conhecida como estrutura secundária, apresenta papel fundamental no enovelamento final da proteína. Desta forma, o desenvolvimento de métodos que permitam não somente predizer a estrutura secundária adotada por um dado resíduo, mas também, a maneira como esse processo deve ocorrer ao longo do tempo é muito relevante em várias áreas da biologia estrutural. Neste trabalho, desenvolvemos dois métodos de predição de estruturas secundárias utilizando modelos com o potencial de fornecer informações mais detalhadas sobre o processo de predição. Um desses modelos foi construído utilizando autômatos celulares, um tipo de modelo dinâmico onde é possível obtermos informações espaciais e temporais. O outro modelo foi desenvolvido utilizando redes neurais residuais profundas. Com este modelo é possível extrair informações espaciais e probabilísticas de suas múltiplas camadas internas de convolução, o que parece refletir, em algum sentido, os estados de formação da estrutura secundária durante o enovelamento. A acurácia da predição obtida por esse modelo foi de ~78% para os resíduos que apresentaram consenso na estrutura atribuída pelos métodos DSSP, STRIDE, KAKSI e PROSS. Tal acurácia, apesar de inferior à obtida pelo PSIPRED, o qual utiliza matrizes PSSM como entrada, é superior à obtida por outros métodos que realizam a predição de estruturas secundárias diretamente a partir da sequência de aminoácidos. / The process of self-organization of the protein structure is known as folding. Although we know the structure of many proteins, for a majority of them, we do not have enough understanding to describe in details how the structure is organized from its amino acid sequence. In this work, we developed two methods for secondary structure prediction using models that have the potential to provide detailed information about the prediction process. One of these models was constructed using cellular automata, a type of dynamic model where it is possible to obtain spatial and temporal information. The other model was developed using deep residual neural networks. With this model it is possible to extract spatial and probabilistic information from its multiple internal layers of convolution. The accuracy of the prediction obtained by this model was ~ 78% for residues that showed consensus in the structure assigned by the DSSP, STRIDE, KAKSI and PROSS methods. Such value is higher than that obtained by other methods which perform the prediction of secondary structures from the amino acid sequence only.
|
17 |
Redes neurais residuais profundas e autômatos celulares como modelos para predição que fornecem informação sobre a formação de estruturas secundárias proteicas / Residual neural networks and cellular automata as protein secondary structure prediction models with information about foldingJosé Geraldo de Carvalho Pereira 15 March 2018 (has links)
O processo de auto-organização da estrutura proteica a partir da cadeia de aminoácidos é conhecido como enovelamento. Apesar de conhecermos a estrutura tridimencional de muitas proteínas, para a maioria delas, não possuímos uma compreensão suficiente para descrever em detalhes como a estrutura se organiza a partir da sequência de aminoácidos. É bem conhecido que a formação de núcleos de estruturas locais, conhecida como estrutura secundária, apresenta papel fundamental no enovelamento final da proteína. Desta forma, o desenvolvimento de métodos que permitam não somente predizer a estrutura secundária adotada por um dado resíduo, mas também, a maneira como esse processo deve ocorrer ao longo do tempo é muito relevante em várias áreas da biologia estrutural. Neste trabalho, desenvolvemos dois métodos de predição de estruturas secundárias utilizando modelos com o potencial de fornecer informações mais detalhadas sobre o processo de predição. Um desses modelos foi construído utilizando autômatos celulares, um tipo de modelo dinâmico onde é possível obtermos informações espaciais e temporais. O outro modelo foi desenvolvido utilizando redes neurais residuais profundas. Com este modelo é possível extrair informações espaciais e probabilísticas de suas múltiplas camadas internas de convolução, o que parece refletir, em algum sentido, os estados de formação da estrutura secundária durante o enovelamento. A acurácia da predição obtida por esse modelo foi de ~78% para os resíduos que apresentaram consenso na estrutura atribuída pelos métodos DSSP, STRIDE, KAKSI e PROSS. Tal acurácia, apesar de inferior à obtida pelo PSIPRED, o qual utiliza matrizes PSSM como entrada, é superior à obtida por outros métodos que realizam a predição de estruturas secundárias diretamente a partir da sequência de aminoácidos. / The process of self-organization of the protein structure is known as folding. Although we know the structure of many proteins, for a majority of them, we do not have enough understanding to describe in details how the structure is organized from its amino acid sequence. In this work, we developed two methods for secondary structure prediction using models that have the potential to provide detailed information about the prediction process. One of these models was constructed using cellular automata, a type of dynamic model where it is possible to obtain spatial and temporal information. The other model was developed using deep residual neural networks. With this model it is possible to extract spatial and probabilistic information from its multiple internal layers of convolution. The accuracy of the prediction obtained by this model was ~ 78% for residues that showed consensus in the structure assigned by the DSSP, STRIDE, KAKSI and PROSS methods. Such value is higher than that obtained by other methods which perform the prediction of secondary structures from the amino acid sequence only.
|
18 |
A multivariate approach to characterization of drug-like molecules, proteins and the interactions between themLindström, Anton January 2008 (has links)
En sjukdom kan många gånger härledas till en kaskadereaktion mellan proteiner, co-faktorer och substrat. Denna kaskadreaktion blir många gånger målet för att behandla sjukdomen med läkemedel. För att designa nya läkemedelsmoleyler används vanligen datorbaserade verktyg. Denna design av läkemedelsmolekyler drar stor nytta av att målproteinet är känt och då framförallt dess tredimensionella (3D) struktur. Är 3D-strukturen känd kan man utföra så kallad struktur- och datorbaserad molekyldesign, 3D-geometrin (f.f.a. för inbindningsplatsen) blir en vägledning för designen av en ny molekyl. Många faktorer avgör interaktionen mellan en molekyl och bindningsplatsen, till exempel fysikalisk-kemiska egenskaper hos molekylen och bindningsplatsen, flexibiliteten i molekylen och målproteinet, och det omgivande lösningsmedlet. För att strukturbaserad molekyldesign ska fungera väl måste två viktiga steg utföras: i) 3D anpassning av molekyler till bindningsplatsen i ett målprotein (s.k. dockning) och ii) prediktion av molekylers affinitet för bindningsplatsen. Huvudsyftena med arbetet i denna avhandling var som följer: i) skapa modeler för att prediktera affiniteten mellan en molekyl och bindningsplatsen i ett målprotein; ii) förfina molekyl-protein-geometrin som skapas vid 3D-anpassning mellan en molekyl och bindningsplatsen i ett målprotein (s.k. dockning); iii) karaktärisera proteiner och framför allt deras sekundärstruktur; iv) bedöma effekten av olika matematiska beskrivningar av lösningsmedlet för förfining av 3D molekyl-protein-geometrin skapad vid dockning och prediktion av molekylers affinitet för proteiners bindningsfickor. Ett övergripande syfte var att använda kemometriska metoder för modellering och dataanalys på de ovan nämnda punkterna. För att sammanfatta så presenterar denna avhandling metoder och resultat som är användbara för strukturbaserad molekyldesign. De rapporterade resultaten visar att det är möjligt att skapa kemometriska modeler för prediktion av molekylers affinitet för bindningsplatsen i ett protein och att dessa presterade lika bra som andra vanliga metoder. Dessutom kunde kemometriska modeller skapas för att beskriva effekten av hur inställningarna för olika parametrar i dockningsprogram påverkade den 3D molekyl-protein-geometrin som dockingsprogram skapade. Vidare kunde kemometriska modeller andvändas för att öka förståelsen för deskriptorer som beskrev sekundärstrukturen i proteiner. Förfining av molekyl-protein-geometrin skapad genom dockning gav liknande och ickesignifikanta resultat oberoende av vilken matematisk modell för lösningsmedlet som användes, förutom för ett fåtal (sex av 30) fall. Däremot visade det sig att användandet av en förfinad geometri var värdefullt för prediktion av molekylers affinitet för bindningsplatsen i ett protein. Förbättringen av prediktion av affintitet var markant då en Poisson-Boltzmann beskrivning av lösningsmedlet användes; jämfört med prediktionerna gjorda med ett dockningsprogram förbättrades korrelationen mellan beräknad affintiet och uppmätt affinitet med 0,7 (R2). / A disease is often associated with a cascade reaction pathway involving proteins, co-factors and substrates. Hence to treat the disease, elements of this pathway are often targeted using a therapeutic agent, a drug. Designing new drug molecules for use as therapeutic agents involves the application of methods collectively known as computer-aided molecular design, CAMD. When the three dimensional (3D) geometry of a macromolecular target (usually a protein) is known, structure-based CAMD is undertaken and structural information of the target guides the design of new molecules and their interactions with the binding sites in targeted proteins. Many factors influence the interactions between the designed molecules and the binding sites of the target proteins, such as the physico-chemical properties of the molecule and the binding site, the flexibility of the protein and the ligand, and the surrounding solvent. In order for structure-based CAMD to be successful, two important aspects must be considered that take the abovementioned factors into account. These are; i) 3D fitting of molecules to the binding site of the target protein (like fitting pieces of a jigsaw puzzle), and ii) predicting the affinity of molecules to the protein binding site. The main objectives of the work underlying this thesis were: to create models for predicting the affinity between a molecule and a protein binding site; to refine the geometry of the molecule-protein complex derived by or in 3D fitting (also known as docking); to characterize the proteins and their secondary structure; and to evaluate the effects of different generalized-Born (GB) and Poisson-Boltzmann (PB) implicit solvent models on the refinement of the molecule-protein complex geometry created in the docking and the prediction of the molecule-to-protein binding site affinity. A further objective was to apply chemometric methodologies for modeling and data analysis to all of the above. To summarize, this thesis presents methodologies and results applicable to structure-based CAMD. Results show that predictive chemometric models for molecule-to-protein binding site affinity could be created that yield comparable results to similar, commonly used methods. In addition, chemometric models could be created to model the effects of software settings on the molecule-protein complex geometry using software for molecule-to-binding site docking. Furthermore, the use of chemometric models provided a more profound understanding of protein secondary structure descriptors. Refining the geometry of molecule-protein complexes created through molecule-to-binding site docking gave similar results for all investigated implicit solvent models, but the geometry was significantly improved in only a few examined cases (six of 30). However, using the geometry-refined molecule-protein complexes was highly valuable for the prediction of molecule-to-binding site affinity. Indeed, using the PB solvent model it yielded improvements of 0.7 in correlation coefficients (R2) for binding affinity parameters of a set of Factor Xa protein drug molecules, relative to those obtained using the fitting software.
|
Page generated in 0.1119 seconds