Coursera BioinfoMethods-II Lab03

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Bioinformatic Methods II Lab 3 Structural Bioinformatics [Software needed: web access and PyMOL see where to get it at the

he end of the lab]

Lab 3

In this lab, we will visit the online protein structure repository, the Protein Data Bank (PDB), and will obtain models for the tertiary structure of several proteins. Using the PDB visualization program PyMOL, we will examine 3D representations of these proteins and will look in detail at both a protein-protein interaction and a protein-DNA interaction. X-ray crystallography and NMR technologies have enabled the elucidation of thousands of protein tertiary structures. Despite the challenges of protein crystallization and structure elucidation, the number of solved protein structures continues to grow at an almost exponential rate. The Protein Data Bank presently holds over 97,000 protein structures and new structures are added regularly from researchers around the world under the wwPDB agreement.

Box 1. Determination of Protein Structure Two main methods of determining protein structure exist, X-ray crystallography and protein NMR: both require highly purified preparations of the protein whose structure is to be determined. A high level of purity may be achieved by chromatographic methods to isolate the protein of interest from the tissue in which it is expressed. Alternatively and more typically, the protein is over-expressed in a heterologous system such as E. coli or Pichia pastoris. Having a pure protein does not guarantee that it will form crystals often clones for different regions of the protein are generated, and these in turn are tested for their ability to form crystals. Protein NMR (nuclear magnetic resonance) spectroscopy can be used to determine a proteins structure. Protein NMR involves 4 main steps: sample preparation, assignment of resonanances, generation of restraints, and structure calculation & validation. The isotopes of carbon and oxygen that occur naturally do not possess a net nuclear spin, the atomic attribute that NMR uses. Nitrogen14 does possess a net spin of 1, but has a large quadrupole moment, which limits its use for NMR. Thus unlabelled protein samples are limited to analysis by proton NMR. C13 and N15 isotopes have a nuclear spin of , which can be exploited during NMR. Proteins that are labelled with C13 and/or N15 can be fairly easily (if expensively) generated by growing the bacteria or yeast in which theyre being expressed on carbon or nitrogen compounds containing these isotopes. The subsequent steps to generate a structure involve placing the labelled or unlabelled protein in solution in an very strong magnetic field, such as a 21 tesla field (for comparison, the earths magnetic field in Toronto is about 5 10-5 T, and the strongest continuous magnetic field produced on earth is only 45 T). Protons and nuclei resonate at a characteristic frequency in such a field, in the case of protons its 900 MHz in a 21 T field. However, depending on their chemical environment, their resonance minutely deviates from this (in parts-per-million, or p.p.m.), and it is this shift, the chemical shift, that can be used to determine what sort of atom is nearby, based on known shifts. The explanation of
1 Copyright 2014 by D.S. Guttman and N.J. Provart

Bioinformatic Methods II

Lab 3

how these signals are deconvoluted and converted into atomic coordinates is far beyond the scope of this box suffice it to say, lots of computing is involved! You can imagine that the larger the protein is the more overlapping signals could be produced, thus protein NMR is typically limited to resolving the structure of proteins less than 35 kDa in size. Protein NMR is invaluable, however, when no crystal for a protein can be generated. Protein structures determined by NMR comprise about 13.5% of the entries in the PDB. X-ray crystallography is by far the most common method by which protein structures are determined roughly 86,000 of the 97,000 protein structures determined to date have been generated using this method. Max Perutz and Sir John Cowdery Kendrew were awarded the Nobel Prize in Chemistry in 1962 for the first protein structure to be determined by X-ray crystallography, sperm whale myoglobin. The first hard X-ray laser was recently powered up (McNeil, Nature Photonics 2009), which will permit structure determination without the need to amplify the diffraction pattern using protein crystals. Rapid advances in our understanding of how protein structure changes over the course of a chemical reaction are likely to result. The figure at the right shows the main steps in generating a protein structure by X-ray crystallography: crystal growth, X-ray diffraction pattern collection, electron density map generation, and finally atomic modelling. Arguably the most difficult step in X-ray crystallography is the production of the protein crystal itself. Typically this involves the testing of many different solutions to find a few in which the protein crystallizes best, or at all. The second step involves collecting many different diffraction patterns by placing the crystal in a powerful X-ray beam, generated by a synchrotron such as the 2.9 GeV Canadian Light Source in Saskatchewan. The device that rotates the crystal for different snap-shots is called a goniometer. The third step involves applying Fourier transforms to the diffraction data to create electron density maps. Finally, the atomic models are fitted to the electron density maps and refinements are made to generate the best model fit. The model coordinates are then deposited into one of the member databases of the World-Wide Protein Data Bank at http://www.wwpdb.org/.

Image courtesy of Thomas Splettstoesser: Creative Commons Attribution ShareAlike 2.5 licence.

1. The Protein Data Bank (PDB) Connect to PDB at : http://www.rcsb.org/pdb/home/home.do and type BRCA2 in the search field at the top of the page and click on the magnifying glass icon. Aside: Recall from Lab 1, BRCA2 is the Breast Cancer Type 2 susceptibility protein. BRCA2 interacts with RAD51 (among others) in the DNA damage and repair response pathway where both proteins are critical to its proper function. Mutations in several of the DNA damage response pathway proteins, including BRCA1, BRCA2 and RAD51 have been linked to multiple forms of cancer including breast cancer.
2 Copyright 2014 by D.S. Guttman and N.J. Provart

Bioinformatic Methods II

Lab 3

Figure 1. Structure summary of 1N0W from the PDB (note that in certain browsers and at certain zoom levels the structure image shown on the right disappears off the right side of the screen). 1) Scroll down your results page and click on the 1NOW link to peruse the 1N0W structure summary page. Use the tabs along the top of the record to answer the following: a. What experimental method was used to elucidate this structure? b. What is the function of Polymer 3? 2) Click on the Sequence tab at the top of the record. c. What was the length of the peptide linker in the expression construct? d. How much of the linker came out in the crystallized structure? Hint: where does the PDB numbering start and end, as depicted by the bar under the sequence. 3) Go back to the BRCA2 search result page in your browser. Click on the View in Jmol icon below the RAD51 structure name (1PZN). Move the structure around with your mouse.

3 Copyright 2014 by D.S. Guttman and N.J. Provart

Bioinformatic Methods II

Lab 3

Figure 2. Structure hit report for 1PZN. Click on the View wit Jmol icon to view the structure with Jmol. Note that you will need Java installed on your computer (it should be installed from last lab) and to permit/allow the Jmol application to run in your browser. Lab Quiz Question 1

e. Give a brief description of the RAD51 structure.

4) Backup to the BRCA2 search result page. Download the following PDB entries by clicking on the first icon (the one with the downward arrow, ) beside the PDB ID for each structure:

1N0W: Crystal structure of the RAD51-BRCA2 BRC repeat complex 1PZN: Rad51 (RecA) 1MJE: Structure of a BRCA2-DSS1-ssDNA complex 1MIU: Structure of a BRCA2-DSS1 complex (note: there are 2 PDB entries with the above description, be sure to get 1MIU and not 1IYJ.) f. What is the benefit of having these files? Hint: look at the chains info at around line 420 in the case of the 1N0W file, using Crimson or a similar text editor that shows line numbers to open the file.

2. NCBI VAST Before we go on to look at our structures, lets briefly look at a simple application of protein structural homology comparison using the NCBI VAST tool. With the extensive number of solved structures in PDB, this relatively new tool allows us to compare similarities between protein tertiary structure, as opposed to (or independent of) just its primary amino acid sequence (as in a BLAST analysis, for instance). Surprisingly high levels of structural similarity can be found between proteins of disparate primary sequence. Go to: http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html and upload the 1N0W PDB file for the RAD51-BRCA2 BRC repeat complex. Click submit to see the structure breakdown (you can use the medium-redundancy set for the search). a. Which chain do you think is the BRC repeat? (Hint: look at the PDB file for 1N0W you just downloaded, specifically starting at Line 420!)
4 Copyright 2014 by D.S. Guttman and N.J. Provart

Bioinformatic Methods II

Lab 3

Click on Start the VAST calculation at the bottom of the page. Continue on with PyMOL below and refer back to the search from time to time to check if it has completed. Peruse the output when your job is ready.

3. PyMOL Start the PyMOL application (see where to get it at the end of the lab). PyMOL is a sophisticated program for the visualization of protein models. Although it may look intimidating, a few minutes fiddling with it and an appropriate explanation of the basic features will reveal a straightforward and intuitive interface. Additionally, PyMOL incorporates a scripting interface that allows for sophisticated manipulation of structures. We will only make brief use of the command line portion of PyMOL in this exercise and restrict ourselves to mostly the visual aspects and GUI controlled features of PyMOL. IMJE BRCA2-DSS1-ssDNA 1) File > Open 1MJE.pdb and move the structure around a bit with your mouse. Note that a right click and dragging up or down will zoom the image in and out, and a middle click will allow you to reposition the protein on the screen. If you lose track of the protein, you can right click anywhere in the visualization field and select Reset.

Figure 3. PyMOL GUI and Viewer. Use the File command to open a PDB file. The DNA bound to BRCA2 is clearly visible in the cartoon view as an orange line with the bases coloured blue.
5 Copyright 2014 by D.S. Guttman and N.J. Provart

Bioinformatic Methods II

Lab 3

Also note the structure menu that appears in the right pane beside the visualization window. The letters of (A,S,H,L,C) stand for Actions, Show, Hide, Label, and Colour, respectively . Clicking on each letter produces a menu of options for the manipulation of the respective structure. Clicking on the structure name beside the menu will cause it to be hidden or displayed in the visualization window. Check out the menus and hide and display your protein before continuing. 2) Display the protein in the form of a cartoon with IMJE: S > As > Cartoon. Note the presence of the single-stranded DNA. Move the protein around (hold left click and move mouse) and zoom in (right click) so you can see clearly how the ssDNA slides through a groove in the surface of BRCA2. 3) Lets look at this interaction from a more blobby biological perspective. If you need to do a right-click in the visualization field and Reset to get the protein back to default position. Then do, IMJE: S > As > Surface Of course now we cant discern the DNA strand, so click in the PyMOL Menu: Display > Sequence. Then scroll in the sequence window to the /C/ chain (at the very end) and click to highlight the 6 DNA residues, here 6 oligo dTs. (ie. DT DT , etc ). Then, in the structure menus, color the DNA strand red with, (sele): C > reds > red. Here (sele) stands for selection, as in the residues youve selected. Again, manouevre the structure to observe the groove and ssDNA occupying it. 4) Now lets see the interaction with the DSS1 protein in this same structure. While still in the surface view. Click IMJE: C > by chain > by chain. Notice how the DSS1 protein is deeply embedded in the BRCA2 structure. With the DNA strand still highlighted do (sele): S > As > sticks. Zoom in on the interaction groove. 5) Return to the cartoon view with, IMJE: S > As > Cartoon. Scroll to the DSS1 chain in the sequence window (Chain /B/). Click on the first residue Pro at 7 and drag along the length of the chain. Notice how the selection lights up in the visualization window. a. What portion of the DSS1 chain is missing from the structure? Hint: where does the amino acid number start and/or end and/or skip an amino acid or three? b. What portions of the BRCA2 protein are missing? Hint: Again, just check the sequence numbers (assuming youve click the Display > Sequence On option), dont highlight.

1MIU - BRCA2-DSS1 1) Before we look at an alternate BRCA2 structure, lets remove the 1MJE structure with 1MJE: A > Delete Object. Then open the new structure with File > Open 1MIU.pdb. Also, lets make things pretty first with 1MIU:S > As > Cartoon and 1MIU:C > by chain > by chain. Note the replication of the DSS1 interaction with BRCA2 in this co-crystallization experiment.

6 Copyright 2014 by D.S. Guttman and N.J. Provart

Bioinformatic Methods II

Lab 3

c. Were they able to crystallize the N-terminus of BRCA2 in this structure? Try to propose why or why not. 2) Recall from Lab 1, the protein domains of BRCA2.
Domain BRCA2 repeat BRCA2 repeat BRCA2 repeat BRCA2 repeat BRCA2 repeat BRCA2 repeat BRCA2 repeat BRCA2 repeat BRCA2, helical BRCA2, oligo-binding, domain 1 Tower BRCA2, oligo-binding, domain 3 Start 1002 1212 1421 1517 1664 1837 1971 2051 2479 2670 2831 3052 End 1036 1246 1455 1551 1698 1871 2005 2085 2667 2800 2872 3190

Without looking at the sequence info, where do you think the tower is?(Hint: right click and reset the view then rotate the model 90 degrees counter-clockwise). Colour all the domains in the above table whenever possible using different colours for each domain. Do this by selecting the sequence range in the sequence window and choosing (sele): C > [some colour] . Be careful NOT to select 1MIU:C > <some colour> as theres no undo button! d. Considering the annotated domains and DNA binding results in the last structure, comment on the domain structure of BRCA2. What do you think the function of the Tower could be?
Box 2. Models for the Function of the RAD51-BRCA2 Complex Both the original authors of the 1N0W structure (Yang et al. 2002) and subsequent authors (e.g. Shin et al., 2003 figure shown) have proposed models of the function of BRCA2 in homologous recombinational repair based on its structure. (1) BRCA2 binds to RAD51 subunits within the ring via BRC repeat mimicry of the RAD51 polymerization motif (0 mimic, blue arrow). (2) BRC repeats disassemble the ring. (3) The RAD51:BRCA2 complex is recruited to a DNA double strand break. (4) BRCA2 helps displace a protective protein, Replication Protein A RPA, and binds the primary singlestranded DNA substrate by its OB folds (5), and loads RAD51 onto DNA. The handoff reactions might be facilitated by attraction of DNA by the positively charged BRC repeat helical arches. The BRCA2 Helix-Turn-Helix domain (red) at the end of the Tower may bind double-stranded DNA in cis at the ssDNA/dsDNA intra-DNA junction (3) or in trans to the dsDNA that later serves as the homologous DNA template (6) and the positively charged arch may also help to attract the dsDNA template.

Image from Shin et al., EMBO Journal (2003) DOI: 10.1093/emboj/cdg429

7 Copyright 2014 by D.S. Guttman and N.J. Provart

Bioinformatic Methods II 1N0W 1PZN Rad51 and BRCA2-Rad51 interaction complex Remove the 1MIU structure with 1MIU:A > delete object and File > Open 1N0W.pdb.

Lab 3

1) Perform the usual 1N0W:S > As > Cartoon and 1NOW: C > by chain > by chain. The longest chain is the interaction region of a RAD51 subunit and the shorter, a BRC repeat domain. Move around and zoom in on the BRC repeat and RAD51 interaction region. 2) Select 1N0W:S > As > sticks. Highlight the BRC repeat chain in the sequence window. Reexamine the interaction region. 3) Select 1N0W:C > by element > CHNOS (use the second one with the green carbons). In the BRC repeat chain (chain /B/) click on the Glutamate (E) at 1548. Note the 2 red oxygens in the carboxyl group of the glutamate. Zoom in by right clicking and moving the cursor. Do an XY translation by holding the Alt key (Windows) or the Option key (Mac) and moving the mouse with the left mouse key held (note: with some trackpads you dont need to hold the left key). 4) In the navigation bar at the top, select Wizard > Measurement. Click on one of the oxygen atoms in Glu 1548 and then click on a nearby atom. Note the distance in Angstroms between the two atoms. Keep clicking on pairs of atoms until your confident you have the most probable hydrogen bonds for these atoms. Note that hydrogens are not shown in this model; the typical hydrogen bond distance is 2.2 to 2.5 Angstoms, and in the case of the actual donor-acceptor atoms, e.g. from O (red atoms) to N (blue atoms), the spacing would be around 2.5 to 3.5 Angstroms. You can right click on the atom where youve measured a distance to find out what chain and residue it belongs to. The A chain is RAD51. e. What atoms is Glu 1548 hydrogen bonding with and what are the distances between these atoms? Lab Quiz Question 2

In the Measurement pane to the right of the visualization field, select Delete All Measurements and then close the measurement wizard by clicking Done. Right-click on some empty space in the measurement field and click reset to return the protein to the middle of the visualization screen. 5) Select 1N0W:S > As > cartoon and 1N0W:C > by chain > by chain. Then click 1N0W:A > rename object. When the renaming box pops up, rename the object BRC (this may not work in the PyMOL educational version). 6) Keeping the BRC interaction object open. Do File > Open 1PZN.pdb, then do 1PZN:S > As > cartoon and 1PZN:C > by chain > by chain. Finally, rename the object to RAD51 with 1PZN: A > rename object (this may not work in the PyMOL educational version) f. How many RAD51 subunits make up full RAD51 protein? Hint: how many colours do you see? 7) Right-click in some empty space in the visualization field and choose Zoom (vis) Select RAD51 (or 1PZN if the rename didnt work):C > yellows > yellow or some other color that isnt the same as either of the BRC (1N0W) object chains.
8 Copyright 2014 by D.S. Guttman and N.J. Provart

Bioinformatic Methods II

Lab 3

8) Now finally, lets align the BRC interaction model to the RAD51 model. Click in the PyMOL> command box above the visualization field. Type: align rad51, brc (or align 1N0W, 1PZN if the renaming didnt work). Hit enter. Now, right-click in the visualization field and select Orient (vis). Zoom in and examine the interaction region. g. Where does the BRC repeat interact on the model? Is this the only potential interaction region? What do you think the lid-like structure on the one subunit in RAD51 does? Lab Quiz Question 3

h. Propose a model for the interaction of BRCA2 with RAD51, keeping in mind, the presence of multiple BRC repeats in the N-terminal region of BRCA2 (see Box 2). End of lab! Where to get it: Download an executable copy of PyMOL for educational use for Windows, Mac, or Linux from http://pymol.org/edu (youll need to register on this site for free; use Coursera BioinfoMethods II as the course number an email with download details will be sent to you. Check your spam folder if you dont receive it within a few minutes). After youve downloaded the installer file, click on it to run the installation program. You should be able to find the Pymol executable in your Programs folder if youve installed it correctly. Use the one named Pymol without any other additional words in the name. You can also download the PyMOL source code at http://sourceforge.net/projects/pymol/, with older legacy executable versions available at http://sourceforge.net/projects/pymol/files/Legacy/. A special thanks to Jason Vetrees of Schrdinger Software for permission to allow the use of the educational version of PyMOL in this lab. Lab 3 Objectives By the end of Lab 3 (comprising the labs including their boxes, and the lectures), you should: know the main methods for determining protein structure; be familiar with Protein Database records and how to determine which method was used to ascertain a given proteins structure; be able to view the proteins structure both with the Jmol applet and PyMOL; be able to use PyMOL to view the structure model in different representations (ribbon, stick etc.) and colour different parts of the molecule at will; be able to align two structures using the command line interface of PyMOL; be able to highlight certain residues in the Sequence Viewer part of PyMOL and have these displayed in the Structure Viewer window be able to measure inter-atom distances using the Measure tool of PyMOL.

Do not hestitate to check or post to the Coursera forums if you do not understand any of the above after reading the relevant material.
9 Copyright 2014 by D.S. Guttman and N.J. Provart

Bioinformatic Methods II

Lab 3

Further Reading Chapter 14 Analyzing Structure-Function Relationships in Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum, Garland Science, 2008. pp. 567-595. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000). The Protein Data Bank. Nucleic Acids Res. 28(1):235-42.

10 Copyright 2014 by D.S. Guttman and N.J. Provart

You might also like