DTC/Wellcome Trust Postgraduate Course 2007 Dr Phillip Stansfeld SBCB, Biochemis

DTC/Wellcome Trust Postgraduate Course 2007 Dr Phillip Stansfeld SBCB, Biochemis www.phwiki.com

DTC/Wellcome Trust Postgraduate Course 2007 Dr Phillip Stansfeld SBCB, Biochemis

Hendrickson, Raquel, Managing Editor has reference to this Academic Journal, PHwiki organized this Journal DTC/Wellcome Trust Postgraduate Course 2007 Dr Phillip Stansfeld SBCB, Biochemistry phillip.stansfeld@bioch.ox.ac.uk http://sbcb.bioch.ox.ac.uk/stansfeld.php Homology Modelling Contents Introduce the process of homology modelling. Summarise the methods as long as predicting the structure from sequence. Describe the individual steps involved in creating in addition to optimising a protein homology model. Outline the methods available to evaluate the quality of homology models. Case Study – Modelling the Drug binding site of hERG. Why Homology Model Solving protein structures is not trivial. There are currently ~1.8 million known protein coding sequences. But only ~44,000 protein structures in the PDB. Even so, many of these structures are duplicates. For Membrane Proteins structural data is even more sparse: There are currently 304 membrane protein structures, of which only 142 are unique. www.rscb.org

College of DuPage US www.phwiki.com

This Particular University is Related to this Particular Journal

Amino Acid Residues Proteins are made up of amino acids, which are interconnected by peptide bonds. There are 20 naturally occurring amino acids. Amino acids may be subdivided by their individual properties. From Sequence to Structure DSSRRQYQEKYKQVEQYMSFHKLPADFRQKIHDYYEHRYQGKMFDEDSILGELNGPLREEIVNFNCR KLVASMPLFANADPNFVTAMLTKLKFEVFQPGDYIIREGTIGKKMYFIQHGVVSVLTGNKEMKLSDG SYFGEICLLTRGRRTASVRADTYCRLYSLSVDNFNEVLEEYPMMRRAFETYVAIDRLDRIGKKNSIL Secondary Structure Tertiary Structure Quaternary Structure Primary Structure – Amino Acid Sequence What in as long as mation can we get from a Sequence of amino acids Secondary Structure Prediction The Secondary Structure of Proteins is Defined by the DSSP algorythm. Amino acids classified as either -helix (H), -str in addition to (S) or loop (C). It is possible to extract structural in as long as mation from amino acid sequence. These prediction methods were initially proposed by Chou & Fasman in 1978. They used a statistical method based on 15 known crystal structures. Recent developments in addition to an increase in structural in as long as mation has improved these methods in addition to they are currently ~80% accurate. PSI-Pred: http://bioinf.cs.ucl.ac.uk/psipred/psi as long as m.html JPred: http://www.compbio.dundee.ac.uk/~www-jpred/

Transmembrane Helix Prediction The amino acids at the centre of transmembrane helices are generally hydrophobic in nature. Analysis of Hydropathicity can be used to predict the number of membrane spanning helices. The analysis as long as the G-protein coupled receptor to the right suggests it has 7 TM helices. The example used the Kyte & Doolittle scale. Hydropathy Plot http://expasy.org/tools/protscale.html BLAST How to find an appropriate template Structure as long as homology modelling Basic Local Alignment Search Tool Used to search protein databases: e.g. Non-redundant (nr) & SwissProt to find similar sequences. Protein Data Bank (PDB) to find structures with similar sequences. PSI- & PHI-blast are more advanced Blast methods. http://www.ncbi.nlm.nih.gov/blast/Blast.cgi The Importance of Resolution In X-ray crystallography it is not always possible to flawlessly resolve the crystal density of the protein of interest. This results in a lower resolution structure. The lower the resolution the more likely the structure is wrong. The resolution of the template structure also reflects in the quality of the homology model. 4 Å 2 Å 3 Å 1 Å

Sequence Alignment Aligns the sequence(s) of interest to that of the template structure(s). Emboss may be used as long as two sequence, to generate a pairwise alignment & a percentage identity – ideally an identity of >50%: http://www.ebi.ac.uk/emboss/align/ T-Coffee, Clustal & MUSCLE are popular methods as long as multiple sequence alignment. All may be found at : http://www.ebi.ac.uk/ ESPRIPT is useful as long as as long as matting to creating black & white figures: http://espript.ibcp.fr/ Automated Homology Modelling If you are lazy there are servers that do the modelling as long as you! Swiss Model http://swissmodel.expasy.org//SWISS-MODEL.html Robetta http://robetta.bakerlab.org/ 3D Jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/ Phyre http://www.sbg.bio.ic.ac.uk/phyre/ EsyPred3D http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/ CPHmodels http://www.cbs.dtu.dk/services/CPHmodels/ Modeller Well regarded program as long as Homology/Comparative Modelling. Current Version 9v2. http://www.salilab.org/modeller/ Requires an Input file, Sequence alignment & Template structure. from modeller import from modeller.automodel import log.verbose() env = environ() env.io.atom-files-directory = ‘./’ a = automodel( env, alnfile = ‘herg.ali’, knowns = ‘1q5o’, sequence = ‘herg’ ) a.starting-model= 1 a.ending-model = 1 a.make() >P1;1q5o structureX: 1q5o : 443 : A : 644 : A :::: DSSRRQYQEKYKQVEQYMSFHKLPADFRQKIHDYYEHRYQ-GKMFDEDSILGELNGPLRE EIVNFNCRKLVASMPLFANADPNFVTAMLTKLKFEVFQPGDYIIREGTIGKKMYFIQHGV VSVLTKGNKEMKLSDGSYFGEICLL-TRGRRTASVRADTYCRLYSLSVDNFNEVLEEYP MMRRAFETVAIDRLDRIGKKNSIL. >P1;herg sequence: herg : 1 ::::::: YSGTARYHTQMLRVREFIRFHQIPNPLRQRLEEYFQHAWSYTNGIDMNAVLKGFPECLQA DICLHLNRSLLQHCKPFRGATKGCLRALAMKFKTTHAPPGDTLVHAGDLLTALYFISRGS IEILRGDVVVAILGKNDIFGEPLNLYARPGKSNGDVRALTYCDLHKIHRDDLLEVLDMYP EFSDHFWSSLEITFNLRDTN-MIP. ATOM 1 N ASP A 443 -15.943 41.425 44.702 1.00 44.68 ATOM 2 CA ASP A 443 -15.424 42.618 45.447 1.00 43.15 ATOM 3 C ASP A 443 -14.310 43.306 44.686 1.00 41.81 ATOM 4 O ASP A 443 -14.298 44.528 44.539 1.00 42.61 etc Input File (.py) Template Structure (.pdb) Sequence Alignment (.ali)

How Does it Work Energy Minimisation Amino acid Substitution Template Structure Initial Model (.ini) Output Model(s) (.B999) Valine Glutamine Change in Rotamer Modeller : Output .log : log output from the run.B : model generated in the PDB as long as mat.D : progress of optimisation.V : violation profile.ini : initial model that is generated.rsr : restraints in user as long as mat.sch : schedule file as long as the optimisation process. An Iterative Process

Modeller Features & Restraints Secondary Structure. Regions of the protein may be as long as ced to be -helical or -str in addition to . Distance restraints. The distance between atoms may be restrained. Symmetry. Protein multimers can be restrained so that all monomers are identical. Disulphide Bridges. Two cysteine residues in the model can be as long as ced to make a cystine bond. Lig in addition to s. Ions, waters in addition to small molecules may be included from the template. Loop Refinement. Regions without secondary structure often require further refinement. Structural Convergence The catalytic triad of Serine, Aspartate in addition to Histidine is found in certain protease enzymes. (a) Subtilisin (b) Chymotrypsin. However, the overall structure of the enzyme is often different. This is also important when considering lig in addition to binding sites. Modelling Lig in addition to Interactions Small molecules, waters in addition to ions can be retained from the template structure. It is possible to search as long as homologues based on the lig in addition to s they bind. Experimental data, especially mutagenesis is very useful when modelling lig in addition to binding sites. Although the key residues may often remain, the overall structure of the protein may vary radically. The presence of the lig in addition to is also likely to alter the con as long as mation of the protein. 1ATN 1E4G ATP Binding Site

Con as long as mational States The backbone structure of the model will be almost identical to that of the template. There as long as e the con as long as mational state of the template will be retained in the resultant homology model. This is important when considering the open or closed con as long as mation of a channel or the Apo versus bound state of a lig in addition to binding site. Closed Open Loop Modelling Issues with Loop Modelling As loops are less restrained by hydrogen bonding networks they often have increased flexibility in addition to there as long as e are less well defined. In addition the increased mobility make looped regions more difficult to structurally resolve. Proteins are often poorly conserved in loop regions. There are usually residue insertions or deletions within loops. Proline in addition to Glycine resides are often found in loops – we’ll come back to this when discussing Model evaluation protocols. Loop Modelling There are two main methods as long as modelling loops: Knowledge based: A PDB search as long as fragments that match the sequence to be modelled. Ab initio: A first principles approach to predict the fold of the loop, followed by minimisation steps. Many of the newer loop prediction methods use a combination of the two methods. These approaches are being developed into methods as long as computationally predicting the tertiary structure of proteins. eg Rosetta. But this is computationally expensive. Modeller creates an energy function to evaluate the loop’s quality. The function is then minimised by Monte Carlo (sampling), Conjugate Gradients (CG) or molecular dynamics (MD) techniques.

Predicting Sidechain Con as long as mations Networks of side chain contacts are important as long as retaining protein structure. Sidechains may adopt a variety of different con as long as mations, but this is dependent on the residue type. For example a threonine generally adopts 3 con as long as mations, whilst a lysine may adopt up to 81. This is dependent backbone con as long as mation of the residue. The different residue con as long as mations are known as rotamers. Where a residue is conserved it is best to keep the side chain rotamer from the template than predict a new one. Rotamer prediction accuracy is high as long as buried residues, but much lower as long as surface residues: Side chains at the surface are more flexible. Hydrophobic packing in the core is easier to h in addition to le than the electrostatic interactions with water molecules. (cytoplasmic proteins) Most successful method is SCWRL by Dunbrack et al.: http://dunbrack.fccc.edu/SCWRL3.php Model Evaluation Initial Options For every model, Modeller creates an objective function energy term, which is reported in the second line of the model PDB file (.B). This is not an absolute measure but can be used to rank models calculated from the same alignment. The lower the value the better. A C-RMSD (Root Mean St in addition to ard Deviation) between the template structure in addition to models can also be used to compare the final model to its template. A good C-RMSD will be less than 2Å. Model Evaluation More Advanced Options Procheck, PROVE, WhatIf: Stereochemical checks on bond lengths, angles in addition to atomic contacts. Ramach in addition to ran Plot is a major component of the evaluation. Ensures that the backbone con as long as mation of the model is normal. Modeller is good on the whole, but sometimes struggles with residues found in loops. RAMPAGE: -helix -str in addition to Psi Dihedral Angle Phi Dihedral Angle left-h in addition to ed helix http://mordred.bioc.cam.ac.uk/~rapper/rampage.php

Hendrickson, Raquel Camp Verde Bugle, The Managing Editor www.phwiki.com

Ramach in addition to ran Plot The results of the ramach in addition to ran plot will be very similar to that of the template. A Good template is there as long as e key! Most residues are mainly found on the left-h in addition to side of the plot. Glycine is found more r in addition to omly within plot (orange), due to its small sidechain (H) preventing clashes with its backbone. Proline can only adopt a Phi angle of ~-60° (green) due to its sidechain. This also restricts the con as long as mational space of the pre-proline residue. N Peptide dihedral angles PROCHECK +– < P R O C H E C K S U M M A R Y >>>–+ mgirk .pdb 2.5 104 residues Ramach in addition to ran plot: 91.7% core 7.6% allow 0.3% gener 0.4% disall All Ramach in addition to rans: 15 labelled residues Backbone Chi1-chi2 plots: 6 labelled residues Sidechain Main-chain params: 6 better 0 inside 0 worse Side-chain params: 5 better 0 inside 0 worse Residue properties: Max.deviation: 16.1 Bad contacts: 10 Bond len/angle: 8.0 Morris et al class: 1 1 3 G-factors Dihedrals: 0.10 Covalent: 0.29 Overall: 0.16 M/c bond lengths: 99.1% within limits 0.9% highlighted M/c bond angles: 98.1% within limits 1.9% highlighted Planar groups: 100.0% within limits 0.0% highlighted +———-+ + May be worth investigating further. Worth investigating further. Biotech Validation Suite: http://biotech.embl-ebi.ac.uk:8400/ Procheck: www.biochem.ucl.ac.uk/~roman/procheck/procheck.html CASP Critical Assessment of Structure Prediction. A Biennial competition that has run since 1994. The next competition will be in 2008 (CASP8) http://predictioncenter.org/ Its goal is to advance the methods as long as predicting protein structure from sequence. Protein structures yet to be published are used as blind targets as long as the prediction methods, with only sequence in as long as mation released. Competitors may use Homology Modelling, Fold recognition or Ab Initio structural prediction methods to propose the structure of the protein.

Pymol A powerful visualisation in addition to picture generation tool as long as protein in addition to DNA. Two windows Graphical User Interface (GUI) Pymol Viewer Both Text in addition to Mouse driven. Website: http://pymol.source as long as ge.net/ More Info & Tutorials: http://www.pymolwiki.org/ A-Action S-Show H-Hide L-Label C-Colour Sequence Viewer Pymol Primary Uses Visualisation of Macromolecular Structures. High quality image generation capabilities (~1/4 of published images). Structural alignment of two structures in three dimensional space. Single amino acid mutagenesis. Investigating Protein-Lig in addition to interactions. Assessing multiple-frame simulation data – not as robust as VMD. Homology Modelling Case Study: Drug Binding Site of the hERG Potassium Channel

Practical Session Details of the Three Exercises: (a) Online Sequence Alignment Generation. (b) Homology Modelling a Monomer. (c) Evaluation & Visualisation. (d) Refinement. (a) Retrieve the Sequence of interest. (b) Find a Suitable Template. (c) Modeller Sequence Alignment Generation. (d) Homology Modelling a Dimer. (a) Homology Modelling a Tetramer with Lig in addition to s. (b) Structural Alignment of Template to Model. (c) Visualising Lig in addition to Binding Sites. (d) Computational Mutagenesis.

Hendrickson, Raquel Managing Editor

Hendrickson, Raquel is from United States and they belong to Camp Verde Bugle, The and they are from  Cottonwood, United States got related to this Particular Journal. and Hendrickson, Raquel deal with the subjects like Local News

Journal Ratings by College of DuPage

This Particular Journal got reviewed and rated by College of DuPage and short form of this particular Institution is US and gave this Journal an Excellent Rating.