Experiment 1 & 2 - Bioinformatics Lab

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

DEPARTMENT OF BIOTECHNOLOGY ENGINEERING

KULLIYYAH OF ENGINEERING

BTEN 2182: BIOTECHNOLOGY ENGINEERING LAB II


SEMESTER 1, SESSION 2022/2023

EXPERIMENT 1 & 2: BIOINFORMATICS LAB

Severe Acute Respiratory Coronavirus 2 (SARS-CoV 2) &


Working with Protein Information for SARS-CoV 2

MUHAMMAD SYAHMI ADHWA’ BIN SHAHRUM 2015181


MOHAMED IRFAN BIN MOHD ISA 2016029

INSTRUCTOR:

DR. MOHD FIRDAUS ABD WAHAB


DR. FAZIA ADYANI AHMAD FUAD

31 OCTOBER 2023
ABSTRACT

This study delves into web-based bioinformatics exercises, focusing on the Covid-19
pandemic and the SARS-CoV-2 genome. Utilizing tools such as Gene, GenBank, RefSeq, and
PubMed from the NCBI website, along with Clustal Omega for multiple sequence alignment, the
tutorial aims to enhance understanding. The investigation includes exploring the virus's
components, comparing sequences, and identifying mutations. DNA-related information is
integrated throughout, providing a comprehensive overview of bioinformatics methodologies.

INTRODUCTION

This study serves as a gateway to the multifaceted exploration of DNA, genes, and
proteins in the context of the Covid-19 pandemic. At the forefront of this investigation is the
pivotal role of genes, and the intricate web of scientific research woven globally to address the
profound challenges posed by the SARS-CoV-2 virus. In the relentless pursuit of understanding
and combating this global health crisis, scientists worldwide have intensified their efforts,
propelling bioinformatics to the forefront of research methodologies.

Against this backdrop, the study delves into the intricacies of genes, placing particular
emphasis on the utilization of bioinformatics tools from the National Center for Biotechnology
Information (NCBI) website. Genes, the fundamental units of heredity, are not only the carriers
of genetic information but also serve as key players in unraveling the mysteries of Covid-19. As
the scientific community grapples with the complexities of the virus, potential therapeutic
candidates emerge as a beacon of hope, and genes become the focal point of exploration for their
potential roles in developing effective treatments.

The multifaceted nature of this study extends beyond the theoretical realm into practical
applications. In the face of a global pandemic, the imperative to comprehend the virus at a
genetic level has led to the deployment of cutting-edge bioinformatics tools. Clustal-Omega and
BLAST searches, integral to this study, represent the forefront of technology in the realm of
sequence analysis. These tools enable a meticulous comparison between original DNA and
mutations, facilitating the identification of variations that may hold the key to understanding the
virus's behavior and devising targeted therapeutic interventions.

Moreover, the exploration extends to the practical application of bioinformatics


methodologies, encompassing the identification of proteins of interest through gene names. The
convergence of gene-centric knowledge with protein structures is crucial, as it forms the bridge
between genetic information and the actual molecular components influencing the detection and
severity of Covid-19.
As the scientific community grapples with the urgent need for insights into the virus's
genetic makeup, this study positions itself at the intersection of theory and application. By
unraveling the complexities of DNA, genes, and proteins, it strives to contribute to the
ever-expanding body of knowledge crucial for navigating the intricate landscape of the Covid-19
pandemic.

OBJECTIVE

The objectives of this study encompass a comprehensive exploration of DNA in the


context of the Covid-19 pandemic. Specifically, the goals include studying DNA to underscore
its crucial role in understanding the virus. Additionally, the study aims to facilitate a meticulous
comparison between the original DNA and mutation instances, discerning variations that may
have implications for the virus's behavior. Furthermore, the objective involves the identification
of specific mutation points within the DNA structure, contributing to a nuanced understanding of
the virus's genetic makeup.

Moreover, a key focus is directed towards developing proficiency in utilizing


bioinformatics tools available on the NCBI website. This involves navigating the Gene database,
conducting searches, and employing tools such as Clustal-Omega and BLAST for sequence
analysis. Additionally, the study seeks to foster the ability to identify proteins of interest by
utilizing gene names, linking genetic information to the corresponding proteins involved in the
Covid-19 context.

Furthermore, a critical objective involves the practical application of knowledge gained


by creating multiple sequence alignments for proteins using EMBOSS Matcher. This process
serves the dual purpose of identifying protein mutations and mapping secondary structure
predictions. Ultimately, the overarching goal is to establish a comprehensive understanding of
the intricate link between proteins and their pivotal roles in the detection and severity risk
associated with Covid-19.

METHODOLOGY

PART A: Background on Coronavirus (SARS-CoV2)

To visit the NCBI homepage, click to https://www.ncbi.nlm.nih.gov/sars-cov-2/ . (On the


NCBI, this was the SARS-CoV2 page. Used that generic link and find the "Gene" link in the
Search by using the dropdown list. After going to the "Covid-19 clinical resources" section, I
scrolled down and selected "View in Gene." I clicked "Search." After typing "Surface
Glycoprotein" into the search bar, I hit the "Go" button. I used the "results by taxonomy" column
to find entries associated with "Severe Acute Respiratory Syndrome-Related Coronavirus." I
clicked on each GENE ID to read the description, which was "[Severe Acute Respiratory
Syndrome Coronavirus 2]". Then, for each item, read the paragraphs under the "Summary"
section and respond to the following questions. I did some web investigation to get a more
complete description.

PART B: Getting Sequence Information and Viewing Database Entries NCBI – Gene

Return to the Surface Glycoprotein [Severe Acute Respiratory Syndrome Coronavirus 2]


"Gene" entry. To access download options, click the "Download Datasets" (blue button). Choose
the "Gene Sequences and Protein Sequences" download option. Open the gene.faa file after that.
Change the file extension to.txt. Copy the sequence, making sure to include the title line chosen
by the“>” symbol and copy and paste it into a word processing document. Under the EDIT
menu, choose the "Replace" tool. Enter "\p" in the "find" box to locate all paragraph punctuation.
Enter nothing at all in the "replace" field. Next, select "Replace All." follow the same steps, but
enter "^w" to stand for white spaces in the "find" box. Now, insert a new paragraph mark (hit
RETURN) before the sequence begins and after the title line (which begins with ">"). Put the file
in your desktop folder (make one) and save it as SpikeCoV2rna.docx. To save the sequence in
FASTA format as a Word document named SpikeCoV2prot.docx on your desktop, open the
protein.faa file from the same downloaded folder and follow the above instructions.

DISCUSSION AND RESULT

PART A: Background on Coronavirus (SARS-CoV2)

The enveloped, positive-sense, single-stranded RNA virus that causes coronavirus


disease 2019 (COVID-19) is called Severe Acute Respiratory Syndrome Coronavirus 2
(SARS-CoV-2).

The structural proteins and genetic material in RNA that viruses need to infiltrate host
cells are included in their particles.The most prevalent structural protein in the viral particle is
the membrane protein, which is a transmembrane glycoprotein.

One of the structural proteins of SARS-CoV-2 is surface glycoprotein (S) or spike


glycoprotein.The attachment of the viral particle and its entry into the host cell are mediated by
this glycoprotein.
The envelope protein (E), also known as the spike or surface glycoprotein (S), is
responsible for facilitating the assembly and release of the virus.

The most prevalent structural protein in the viral particle is the membrane protein (M), a
transmembrane glycoprotein.

A structural protein called nucleocapsid phosphoprotein (N) attaches to the viral RNA
genome to protect it and helps package the RNA into virus particles. The spike glycoprotein,
which gives coronavirus viruses their crown-like appearance, is present on the outside of the
virus particle.

The S protein is a key target for antibody therapies, vaccine development, and
antigen-based diagnostic testing.E protein has ion channel activity necessary for pathogenesis
and aids in the assembly and release of the virus.Targets for antiviral medications have been
proposed, including M and N proteins.

The spike, or surface glycoprotein (S), is the key innovation in the fight against
Covid-19.

PART B: Getting Sequence Information and Viewing Database Entries NCBI – Gene

The gene's other name is spike glycoprotein with Gene ID number of 43740568. The
spike glycoprotein is found on the outside of the virus particle and gives coronavirus viruses
their crown-like appearance. The NCBI Reference Sequence (RefSeq) accession number for
the genome where this gene is contained is … between what range (base/nucleotide position).

Additionally, a specific focus is placed on missense mutations within the E-protein,


constituting less than 0.1% of SARS-CoV-2 genomes. This information, detailed in the Appendix
Experiment 2 Alignment Comparison Variant.docx, adds a layer of complexity to the genetic
variations within the virus. The discussion then seamlessly integrates these practical steps into
the broader context of understanding the genetic makeup of SARS-CoV-2, contributing valuable
insights into the virus's behavior and potential implications for therapeutic interventions.

CONCLUSION

In conclusion, these experiments in bioinformatics have provided a comprehensive


understanding of the genetic aspects of the SARS-CoV-2 virus, contributing to the ongoing
efforts to combat the Covid-19 pandemic. The exploration commenced with a tutorial on
web-based bioinformatics exercises, leveraging tools from the NCBI website such as Gene,
GenBank, RefSeq, and PubMed, along with Clustal Omega for multiple sequence alignment. The
focus on the Covid-19 pandemic allowed for a detailed investigation into the SARS-CoV-2
genome, emphasizing the importance of understanding the virus for effective response strategies.

The objectives outlined in the study were met through a systematic approach. The
exploration of DNA, comparisons between original DNA and mutations, and the identification of
mutation points have enriched our comprehension of the virus at the genetic level. Moreover, the
study facilitated proficiency in utilizing bioinformatics tools from the NCBI website, showcasing
the practical application of these tools for gene and protein analysis.

The creation of multiple sequence alignments for proteins using EMBOSS Matcher has
further deepened our insights. This process not only identified protein mutations but also mapped
secondary structure predictions, shedding light on the intricacies of protein behavior in the
context of Covid-19 detection and severity risk.

The practical application, exemplified in the detailed methodology of obtaining sequence


information and viewing database entries, seamlessly integrated theoretical knowledge with
hands-on experience. By exploring the gene responsible for the spike glycoprotein, with Gene ID
43740568, and delving into missense mutations within the E-protein, the study provided nuanced
insights into the genetic variations present in SARS-CoV-2.

Overall, these experiments underscore the importance of bioinformatics in unraveling the


complexities of viral genomes. The knowledge gained contributes not only to our understanding
of SARS-CoV-2 but also holds potential implications for therapeutic interventions and the
ongoing global efforts to mitigate the impact of the Covid-19 pandemic. The seamless
integration of theoretical concepts, practical applications, and detailed analysis has enhanced our
grasp of the genetic intricacies underlying this unprecedented global health crisis.

REFERENCES

1. BTEN 3183 Lab Manual- SEM 1, 22/23.


https://liveiiumedu.sharepoint.com/:b:/r/sites/BTEN3183Sem120232024/Class%20Mater
ials/BTEN%203183%20Lab%20Manual-%20SEM%201,%202023-2024.pdf?csf=1&we
b=1&e=Vyjksu
2. NCBI SARS-CoV-2 Resources. (n.d.). Www.ncbi.nlm.nih.gov.
https://www.ncbi.nlm.nih.gov/sars-cov-2/

3. The Sequence Manipulation Suite. (n.d.). Bioinformatics.org. Retrieved November 13,


2023, from http://bioinformatics.org/sms/

4. EMBL-EBI. (2019). Clustal Omega < Multiple Sequence Alignment < EMBL-EBI.
Ebi.ac.uk. https://www.ebi.ac.uk/Tools/msa/clustalo/

5. Uniprot. (2023). UniProt. Uniprot.org. https://www.uniprot.org/

6. PSIPRED Workbench. (n.d.). Bioinf.cs.ucl.ac.uk. http://bioinf.cs.ucl.ac.uk/psipred/

7. EMBOSS Matcher < Pairwise Sequence Alignment < EMBL-EBI. (n.d.).


Www.ebi.ac.uk. https://www.ebi.ac.uk/Tools/psa/emboss_matcher/

8. Hussain, A., Hasan, A., Nejadi Babadaei, M. M., Bloukh, S. H., Chowdhury, M. E. H.,
Sharifi, M., Haghighat, S., & Falahati, M. (2020). Targeting SARS-CoV2 Spike Protein
Receptor Binding Domain by Therapeutic Antibodies. Biomedicine & Pharmacotherapy
= Biomedecine & Pharmacotherapie, 130, 110559.
https://doi.org/10.1016/j.biopha.2020.110559

APPENDIX EXPERIMENT 1

Part A

SpikeCoV2rna.docx
>NC_045512.2:21563-25384 S [organism=Severe acute respiratory syndrome
coronavirus 2] [GeneID=43740568] [chromosome=]
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAAT
TACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCAGT
TTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTC
TCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTT
CCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCCCT
ACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCATTT
TTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAGAGTTTATTCTAGTGCGA
ATAATTGCACTTTTGAATATGTCTCTCAGCCTTTTCTTATGGACCTTGAAGGAAAACAGGGTAATTTCAA
AAATCTTAGGGAATTTGTGTTTAAGAATATTGATGGTTATTTTAAAATATATTCTAAGCACACGCCTATT
AATTTAGTGCGTGATCTCCCTCAGGGTTTTTCGGCTTTAGAACCATTGGTAGATTTGCCAATAGGTATTA
ACATCACTAGGTTTCAAACTTTACTTGCTTTACATAGAAGTTATTTGACTCCTGGTGATTCTTCTTCAGG
TTGGACAGCTGGTGCTGCAGCTTATTATGTGGGTTATCTTCAACCTAGGACTTTTCTATTAAAATATAAT
GAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGACCCTCTCTCAGAAACAAAGTGTACGTTGA
AATCCTTCACTGTAGAAAAAGGAATCTATCAAACTTCTAACTTTAGAGTCCAACCAACAGAATCTATTGT
TAGATTTCCTAATATTACAAACTTGTGCCCTTTTGGTGAAGTTTTTAACGCCACCAGATTTGCATCTGTT
TATGCTTGGAACAGGAAGAGAATCAGCAACTGTGTTGCTGATTATTCTGTCCTATATAATTCCGCATCAT
TTTCCACTTTTAAGTGTTATGGAGTGTCTCCTACTAAATTAAATGATCTCTGCTTTACTAATGTCTATGC
AGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATTGCTGAT
TATAATTATAAATTACCAGATGATTTTACAGGCTGCGTTATAGCTTGGAATTCTAACAATCTTGATTCTA
AGGTTGGTGGTAATTATAATTACCTGTATAGATTGTTTAGGAAGTCTAATCTCAAACCTTTTGAGAGAGA
TATTTCAACTGAAATCTATCAGGCCGGTAGCACACCTTGTAATGGTGTTGAAGGTTTTAATTGTTACTTT
CCTTTACAATCATATGGTTTCCAACCCACTAATGGTGTTGGTTACCAACCATACAGAGTAGTAGTACTTT
CTTTTGAACTTCTACATGCACCAGCAACTGTTTGTGGACCTAAAAAGTCTACTAATTTGGTTAAAAACAA
ATGTGTCAATTTCAACTTCAATGGTTTAACAGGCACAGGTGTTCTTACTGAGTCTAACAAAAAGTTTCTG
CCTTTCCAACAATTTGGCAGAGACATTGCTGACACTACTGATGCTGTCCGTGATCCACAGACACTTGAGA
TTCTTGACATTACACCATGTTCTTTTGGTGGTGTCAGTGTTATAACACCAGGAACAAATACTTCTAACCA
GGTTGCTGTTCTTTATCAGGATGTTAACTGCACAGAAGTCCCTGTTGCTATTCATGCAGATCAACTTACT
CCTACTTGGCGTGTTTATTCTACAGGTTCTAATGTTTTTCAAACACGTGCAGGCTGTTTAATAGGGGCTG
AACATGTCAACAACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCA
GACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGT
GCAGAAAATTCAGTTGCTTACTCTAATAACTCTATTGCCATACCCACAAATTTTACTATTAGTGTTACCA
CAGAAATTCTACCAGTGTCTATGACCAAGACATCAGTAGATTGTACAATGTACATTTGTGGTGATTCAAC
TGAATGCAGCAATCTTTTGTTGCAATATGGCAGTTTTTGTACACAATTAAACCGTGCTTTAACTGGAATA
GCTGTTGAACAAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCACCAA
TTAAAGATTTTGGTGGTTTTAATTTTTCACAAATATTACCAGATCCATCAAAACCAAGCAAGAGGTCATT
TATTGAAGATCTACTTTTCAACAAAGTGACACTTGCAGATGCTGGCTTCATCAAACAATATGGTGATTGC
CTTGGTGATATTGCTGCTAGAGACCTCATTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACCTT
TGCTCACAGATGAAATGATTGCTCAATACACTTCTGCACTGTTAGCGGGTACAATCACTTCTGGTTGGAC
CTTTGGTGCAGGTGCTGCATTACAAATACCATTTGCTATGCAAATGGCTTATAGGTTTAATGGTATTGGA
GTTACACAGAATGTTCTCTATGAGAACCAAAAATTGATTGCCAACCAATTTAATAGTGCTATTGGCAAAA
TTCAAGACTCACTTTCTTCCACAGCAAGTGCACTTGGAAAACTTCAAGATGTGGTCAACCAAAATGCACA
AGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTCAAGTGTTTTAAATGATATC
CTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCACAGGCAGACTTCAAAGTT
TGCAGACATATGTGACTCAACAATTAATTAGAGCTGCAGAAATCAGAGCTTCTGCTAATCTTGCTGCTAC
TAAAATGTCAGAGTGTGTACTTGGACAATCAAAAAGAGTTGATTTTTGTGGAAAGGGCTATCATCTTATG
TCCTTCCCTCAGTCAGCACCTCATGGTGTAGTCTTCTTGCATGTGACTTATGTCCCTGCACAAGAAAAGA
ACTTCACAACTGCTCCTGCCATTTGTCATGATGGAAAAGCACACTTTCCTCGTGAAGGTGTCTTTGTTTC
AAATGGCACACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACAGACAACACA
TTTGTGTCTGGTAACTGTGATGTTGTAATAGGAATTGTCAACAACACAGTTTATGATCCTTTGCAACCTG
AATTAGACTCATTCAAGGAGGAGTTAGATAAATATTTTAAGAATCATACATCACCAGATGTTGATTTAGG
TGACATCTCTGGCATTAATGCTTCAGTTGTAAACATTCAAAAAGAAATTGACCGCCTCAATGAGGTTGCC
AAGAATTTAAATGAATCTCTCATCGATCTCCAAGAACTTGGAAAGTATGAGCAGTATATAAAATGGCCAT
GGTACATTTGGCTAGGTTTTATAGCTGGCTTGATTGCCATAGTAATGGTGACAATTATGCTTTGCTGTAT
GACCAGTTGCTGTAGTTGTCTCAAGGGCTGTTGTTCTTGTGGATCCTGCTGCAAATTTGATGAAGACGAC
TCTGAGCCAGTGCTCAAAGGAGTCAAATTACATTACACATAA
SpikeCoV2.docx
>YP_009724390.1 S [organism=Severe acute respiratory syndrome coronavirus 2]
[GeneID=43740568]
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHV
SGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPF
LGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPI
NLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYN
ENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASV
YAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIAD
YNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYF
PLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL
PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLT
PTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLG
AENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGI
AVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDC
LGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIG
VTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDI
LSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLM
SFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT
FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVA
KNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDD
SEPVLKGVKLHYT

SpikeCoV2prot.docx

>Results for 3822 residue sequence starting "ATGTTTGTTT

MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFS
NVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIV
NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE
GKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT
LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETK
CTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISN
CVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIAD
YNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC
NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVN
FNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITP
GTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY
ECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI
SVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQE
VFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDC
LGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM
QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN
TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPA
ICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDP
LQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDL
QELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDD
SEPVLKGVKLHYT*

Part B

ClustalOmega_Homosapien.docx
CLUSTAL O(1.2.4) multiple sequence alignment

YP_009724390.1 MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFS 60
Results MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFS 60
************************************************************

YP_009724390.1 NVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIV 120


Results NVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIV 120
************************************************************

YP_009724390.1 NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE 180


Results NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE 180
************************************************************

YP_009724390.1 GKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT 240


Results GKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT 240
************************************************************

YP_009724390.1 LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETK 300


Results LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETK 300
************************************************************

YP_009724390.1 CTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISN 360


Results CTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISN 360
************************************************************

YP_009724390.1 CVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIAD 420


Results CVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIAD 420
************************************************************

YP_009724390.1 YNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC 480


Results YNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC 480
************************************************************

YP_009724390.1 NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVN 540


Results NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVN 540
************************************************************

YP_009724390.1 FNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITP 600


Results FNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITP 600
************************************************************

YP_009724390.1 GTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY 660


Results GTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY 660
************************************************************

YP_009724390.1 ECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI 720


Results ECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI 720
************************************************************

YP_009724390.1 SVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQE 780


Results SVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQE 780
************************************************************

YP_009724390.1 VFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDC 840


Results VFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDC 840
************************************************************

YP_009724390.1 LGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM 900


Results LGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM 900
************************************************************

YP_009724390.1 QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN 960


Results QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN 960
************************************************************
YP_009724390.1 TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA 1020
Results TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA 1020
************************************************************

YP_009724390.1 SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPA 1080


Results SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPA 1080
************************************************************

YP_009724390.1 ICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDP 1140


Results ICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDP 1140
************************************************************

APPENDIX EXPERIMENT 2

SARS-CoV2 variant.txt

>S [organism=Severe acute respiratory syndrome coronavirus 2]


[GeneID=43740568_variant1]

TAATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCC
TGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCAGTTTTACATTCAACTCAG
GACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAGGT
TTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAGGCTGGAT
TTTTGGTACTACTTTAGATTCGAAGACCCAGTCCCTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGT
GAATTTCAATTTTGTAATGATCCATTTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCA
GAGTTTATTCTAGTGCGAATAATTGCACTTTTGAATATGTCTCTCAGCCTTTTCTTATGGACCTTGAAGGAAAACAGGG
TAATTTCAAAAATCTTAGGGAATTTGTGTTTAAGAATATTGATGGTTATTTTAAAATATATTCTAAGCACACGCCTATT
AATTTAGTGCGTGATCTCCCTCAGGGTTTTTCGGCTTTAGAACCATTGGTAGATTTGCCAATAGGTATTAACATCACTA
GGTTTCAAACTTTACTTGCTTTACATAGAAGTTATTTGACTCCTGGTGATTCTTCTTCAGGTTGGACAGCTGGTGCTGC
AGCTTATTATGTGGGTTATCTTCAACCTAGGACTTTTCTATTAAAATATAATGAAAATGGAACCATTACAGATGCTGTA
GACTGTGCACTTGACCCTCTCTCAGAAACAAAGTGTACGTTGAAATCCTTCACTGTAGAAAAAGGAATCTATCAAACTT
CTAACTTTAGAGTCCAACCAACAGAATCTATTGTTAGATTTCCTAATATTACAAACTTGTGCCCTTTTGGTGAAGTTTT
TAACGCCACCAGATTTGCATCTGTTTATGCTTGGAACAGGAAGAGAATCAGCAACTGTGTTGCTGATTATTCTGTCCTA
TATAATTCCGCATCATTTTCCACTTTTAAGTGTTATGGAGTGTCTCCTACTAAATTAAATGATCTCTGCTTTACTAATG
TCTATGCAGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATTGCTGATTA
TAATTATAAATTACCAGATGATTTTACAGGCTGCGTTATAGCTTGGAATTCTAACAATCTTGATTCTAAGGTTGGTGGT
AATTATAATTACCTGTATAGATTGTTTAGGAAGTCTAATCTCAAACCTTTTGAGAGAGATATTTCAACTGAAATCTATC
AGGCCGGTAGCACACCTTGTAATGGTGTTGAAGGTTTTAATTGTTACTTTCCTTTACAATCATATGGTTTCCAACCCAC
TAATGGTGTTGGTTACCAACCATACAGAGTAGTAGTACTTTCTTTTGAACTTCTACATGCACCAGCAACTGTTTGTGGA
CCTAAAAAGTCTACTAATTTGGTTAAAAACAAATGTGTCAATTTCAACTTCAATGGTTTAACAGGCACAGGTGTTCTTA
CTGAGTCTAACAAAAAGTTTCTGCCTTTCCAACAATTTGGCAGAGACATTGCTGACACTACTGATGCTGTCCGTGATCC
ACAGACACTTGAGATTCTTGACATTACACCATGTTCTTTTGGTGGTGTCAGTGTTATAACACCAGGAACAAATACTTCT
AACCAGGTTGCTGTTCTTTATCAGGGTGTTAACTGCACAGAAGTCCCTGTTGCTATTCATGCAGATCAACTTACTCCTA
CTTGGCGTGTTTATTCTACAGGTTCTAATGTTTTTCAAACACGTGCAGGCTGTTTAATAGGGGCTGAACATGTCAACAA
CTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGGCA
CGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAATTCAGTTGCTTACTCTAATAACT
CTATTGCCATACCCACAAATTTTACTATTAGTGTTACCACAGAAATTCTACCAGTGTCTATGACCAAGACATCAGTAGA
TTGTACAATGTACATTTGTGGTGATTCAACTGAATGCAGCAATCTTTTGTTGCAATATGGCAGTTTTTGTACACAATTA
AACCGTGCTTTAACTGGAATAGCTGTTGAACAAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACA
AAACACCACCAATTAAAGATTTTGGTGGTTTTAATTTTTCACAAATATTACCAGATCCATCAAAACCAAGCAAGAGGTC
ATTTATTGAAGATCTACTTTTCAACAAAGTGACACTTGCAGATGCTGGCTTCATCAAACAATATGGTGATTGCCTTGGT
GATATTGCTGCTAGAGACCTCATTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACCTTTGCTCACAGATGAAA
TGATTGCTCAATACACTTCTGCACTGTTAGCGGGTACAATCACTTCTGGTTGGACCTTTGGTGCAGGTGCTGCATTACA
AATACCATTTGCTATGCAAATGGCTTATAGGTTTAATGGTATTGGAGTTACACAGAATGTTCTCTATGAGAACCAAAAA
TTGATTGCCAACCAATTTAATAGTGCTATTGGCAAAATTCAAGACTCACTTTCTTCCACAGCAAGTGCACTTGGAAAAC
TTCAAGATGTGGTCAACCAAAATGCACAAGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTC
AAGTGTTTTAAATGATATCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCACAGGCAGA
CTTCAAAGTTTGCAGACATATGTGACTCAACAATTAATTAGAGCTGCAGAAATCAGAGCTTCTGCTAATCTTGCTGCTA
CTAAAATGTCAGAGTGTGTACTTGGACAATCAAAAAGAGTTGATTTTTGTGGAAAGGGCTATCATCTTATGTCCTTCCC
TCAGTCAGCACCTCATGGTGTAGTCTTCTTGCATGTGACTTATGTCCCTGCACAAGAAAAGAACTTCACAACTGCTCCT
GCCATTTGTCATGATGGAAAAGCACACTTTCCTCGTGAAGGTGTCTTTGTTTCAAATGGCACACACTGGTTTGTAACAC
AAAGGAATTTTTATGAACCACAAATCATTACTACAGACAACACATTTGTGTCTGGTAACTGTGATGTTGTAATAGGAAT
TGTCAACAACACAGTTTATGATCCTTTGCAACCTGAATTAGACTCATTCAAGGAGGAGTTAGATAAATATTTTAAGAAT
CATACATCACCAGATGTTGATTTAGGTGACATCTCTGGCATTAATGCTTCAGTTGTAAACATTCAAAAAGAAATTGACC
GCCTCAATGAGGTTGCCAAGAATTTAAATGAATCTCTCATCGATCTCCAAGAACTTGGAAAGTATGAGCAGTATATAAA
ATGGCCATGGTACATTTGGCTAGGTTTTATAGCTGGCTTGATTGCCATAGTAATGGTGACAATTATGCTTTGCTGTATG
ACCAGTTGCTGTAGTTGTCTCAAGGGCTGTTGTTCTTGTGGATCCTGCTGCAAATTTGATGAAGACGACTCTGAGCCAG
TGCTCAAAGGAGTCAAATTACATTACACATAA

SARS-CoV2 variantTrans.docx
Results for 3824 residue sequence starting "TAATGTTTGT".

MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRF
DNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFR
VYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITR
FQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTS
NFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNV
YADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ
AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLT
ESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQGVNCTEVPVAIHADQLTPT
WRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNS
IAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYK
TPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEM
IAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKL
QDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAAT
KMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQ
RNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDR
LNEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPV
LKGVKLHYT*
Alignment Comparison Variant.docx

You might also like