Immunoinformatics Notes2019

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

IMMUNOINFORMATICS

Every day we are alive, humans encounter potentially harmful disease causing organisms, or
“pathogens”, like bacteria or viruses. Yet most of us are still able to function properly and live life
without constantly being sick. That’s because the human body requires a multilayered immune
system to keep it running smoothly. The two main classes of the immune system are the innate
immune system and the adaptive immune system, or “acquired immunity”.

Self vs. Non-self: How does the body know?

In order to be effective, the immune system needs to be able to identify which particles are foreign,
and which are a part of your body. Let’s define some terms before we jump in to how this works:
• Self refers to particles, such as proteins and other molecules, that are a part of, or made by, your
body. They can be found circulating in your blood or attached to different tissues. Something that
is self should not be targeted and destroyed by the immune system. The non-reactivity of the
immune system to self particles is called tolerance.
• Non-self refers to particles that are not made by your body, and are recognized as potentially
harmful. These are sometimes called foreign bodies. Non-self particles or bodies can be bacteria,
viruses, parasites, pollen, dust, and toxic chemicals. The non-self particles and foreign bodies that
are infectious or pathogenic, like bacteria, viruses, and parasites, make proteins called antigens
that allow the human body to know that they intend to cause damage.
• Antigens are anything that causes an immune response. Antigens can be entire pathogens, like
bacteria, viruses, fungi, and parasites, or smaller proteins that pathogens express. Antigens are like
a name tag for each pathogen that announce the pathogens’ presence to your immune system. Some
pathogens are general, whereas others are very specific. A general antigen would announce “I’m
dangerous”, whereas a specific antigen would announce “I’m a bacteria that will cause an infection
in your gastrointestinal tract” or “I’m the influenza virus”.
• Cytokines are molecules that are used for cell signaling, or cell-to-cell communication. Cytokines
are similar to chemokines, wherein they can be used to communicate with neighboring or distant
cells about initiating an immune response. Cytokines are also used to trigger cell trafficking, or
movement, to a specific area of the body.
• Chemokines are a type of cytokines that are released by infected cells. Infected host cells release
chemokines in order to initiate an immune response, and to warn neighboring cells of the threat.

Innate Immune System

The innate immune system is made of defenses against infection that can be activated immediately
once a pathogen attacks. The innate immune system is essentially made up of barriers that aim to
keep viruses, bacteria, parasites, and other foreign particles out of your body or limit their ability
to spread and move throughout the body. The innate immune system includes:
• Physical Barriers
• such as skin, the gastrointestinal tract, the respiratory tract, the nasopharynx, cilia, eyelashes and
other body hair.
• Defense Mechanisms
• such as secretions, mucous, bile, gastric acid, saliva, tears, and sweat.
• General Immune Responses
• such as inflammation, complement, and non-specific cellular responses. The inflammatory
response actively brings immune cells to the site of an infection by increasing blood flow to the
area. Complement is an immune response that marks pathogens for destruction and makes holes
in the cell membrane of the pathogen. The innate immune system is always general, or nonspecific,
meaning anything that is identified as foreign or non-self is a target for the innate immune
response. The innate immune system is activated by the presence of antigens and their chemical
properties.
The innate immune system works to fight off pathogens before they can start an active infection.
For some cases, the innate immune response is not enough, or the pathogen is able to exploit the
innate immune response for a way into the host cells. In such situations, the innate immune system
works with the adaptive immune system to reduce the severity of infection, and to fight off any
additional invaders while the adaptive immune system is busy destroying the initial infection.

Adaptive immunity

The adaptive immune system, also called acquired immunity, uses specific antigens to strategically
mount an immune response. Unlike the innate immune system, which attacks only based on the
identification of general threats, the adaptive immunity is activated by exposure to pathogens, and
uses an immunological memory to learn about the threat and enhance the immune response
accordingly. The adaptive immune response is much slower to respond to threats and infections
than the innate immune response, which is primed and ready to fight at all times.

Cells of the adaptive immune system

Unlike the innate immune system, the adaptive immune system relies on fewer types of cells to
carry out its tasks: B cells and T cells.
Both B cells and T cells are lymphocytes that are derived from specific types of stem cells, called
multipotent hematopoietic stem cells, in the bone marrow. After they are made in the bone marrow,
they need to mature and become activated. Each type of cell follows different paths to their final,
mature forms.

B cells

After formation and maturation in the bone marrow (hence the name “B cell”), the naive B
cells move into the lymphatic system to circulate throughout the body. In the lymphatic system,
naive B cells encounter an antigen, which starts the maturation process for the B cell. B cells each
have one of millions of distinctive surface antigen-specific receptors that are inherent to the
organism’s DNA. For example, naive B cells express antibodies on their cell surface, which can
also be called membrane-bound antibodies.
When a naive B cell encounters an antigen that fits or matches its membrane-bound antibody, it
quickly divides in order to become either a memory B cell or an effector B cell, which is also called
a plasma cell. Antibodies can bind to antigens directly.
The antigen must effectively bind with a naive B cell’s membrane-bound antibody in order to set
off differentiation, or the process of becoming one of the new forms of a B cell.
Memory B cells express the same membrane-bound antibody as the original naive B cell, or the
“parent B cell”. Plasma B cells produce the same antibody as the parent B cell, but they aren’t
membrane bound. Instead, plasma B cells can secrete antibodies. Secreted antibodies work to
identify free pathogens that are circulating throughout the body. When the naive B cell divides and
differentiates, both plasma cells and memory B cells are made.
B cells also express a specialized receptor, called the B cell receptor (BCR). B cell receptors assist
with antigen binding, as well as internalization and processing of the antigen. B cell receptors also
play an important role in signaling pathways. After the antigen is internalized and processed, the
B cell can initiate signaling pathways, such as cytokine release, 7 to communicate with other cells
of the immune system.
Once formed in the bone marrow, T progenitor cells migrate to the thymus (hence the name “T
cell”) to mature and become T cells. While in the thymus, the developing T cells start to express T
cell receptors (TCRs) and other receptors called CD4 and CD8 receptors. All T cells express T cell
receptors, and either CD4 or CD8, not both. So, some T cells will express CD4, and others will
express CD8.
Unlike antibodies, which can bind to antigens directly, T cell receptors can only recognize antigens
that are bound to certain receptor molecules, called Major Histocompatibility Complex class 1
(MHCI) and class 2 (MHCII). These MHC molecules are membrane-bound surface receptors
on antigen-presenting cells, like dendritic cells and macrophages. CD4 and CD8 play a role in T
cell recognition and activation by binding to either MHCI or MHCII.
Mature T cells should recognize only foreign antigens combined with self-MHC molecules in
order to mount an appropriate immune response.
• Helper T cells express CD4, and help with the activation of T cells , B cells, and other immune
cells.
• Cytotoxic T cells express CD8, and are responsible for removing pathogens and infected host cells.
• T regulatory cells express CD4 and another receptor, called CD25. T regulatory cells help
distinguish between self and nonself molecules, and by doing so, reduce the risk of autoimmune
diseases.

• Humoral vs. Cell Mediated Immunity


• Immunity refers to the ability of your immune system to defend against infection and
disease. There are two types of immunity that the adaptive immune system provides, and
they are dependent on the functions of B and T cells, as described above.
• Humoral immunity is immunity from serum antibodies produced by plasma cells. More
specifically, someone who has never been exposed to a specific disease can gain humoral
immunity through administration of antibodies from someone who has been exposed, and
survived the same disease. “Humoral” refers to the bodily fluids where these free-floating
serum antibodies bind to antigens and assist with elimination.
• Cell-mediated immunity can be acquired through T cells from someone who is immune to
the target disease or infection. “Cell-mediated” refers to the fact that the response is carried
out by cytotoxic cells.
• Immunological memory
• Because the adaptive immune system can learn and remember specific pathogens, it can
provide long-lasting defense and protection against recurrent infections. When the adaptive
immune system is exposed to a new threat, the specifics of the antigen are memorized so
we are prevented from getting the disease again. The concept of immune memory is due to
the body’s ability to make antibodies against different pathogens.
• A good example of immunological memory is shown in vaccinations. A vaccination
against a virus can be made using either active, but weakened or attenuated virus, or using
specific parts of the virus that are not active. Both attenuated whole virus and virus particles
cannot actually cause an active infection. Instead, they mimic the presence of an active
virus in order to cause an immune response, even though there are no real threats present.
By getting a vaccination, you are exposing your body to the antigen required to produce
antibodies specific to that virus, and acquire a memory of the virus, without experiencing
illness.
• Some breakdowns in the immunological memory system can lead to autoimmune diseases.
Molecular mimicry of a self‐antigen by an infectious pathogen, such as bacteria and
viruses, may trigger autoimmune disease due to a cross-reactive immune response against
the infection. One example of an organism that uses molecular mimicry to hide from
immunological defenses is Streptococcus infection.
• Innate Immunity vs. Adaptive Immunity: A summary
• The following chart compares and summarizes all of the important parts of each immune
system:

Prediction of B-Cell Epitopes


B-cell epitope prediction aims to facilitate B-cell epitope identification with the practical purpose
of replacing the antigen for antibody production or for carrying structure-function studies. Any
solvent-exposed region in the antigen can be subject of recognition by antibodies. Nonetheless, B-
cell epitopes can be divided in two main groups: linear and conformational . Linear B-cell epitopes
consist of sequential residues, peptides, whereas conformational B-cell epitopes consist of patches
of solvent-exposed atoms from residues that are not necessarily sequential . Therefore, linear and
conformational B-cell epitopes are also known as continuous and discontinuous B-cell epitopes,
respectively. Antibodies recognizing linear B-cell epitopes can recognize denatured antigens,
while denaturing the antigen results in loss of recognition for conformational B-cell epitopes. Most
B-cell epitopes (approximately a 90%) are conformational and, in fact, only a minority of native
antigens contains linear B-cell epitopes. We will review both, prediction of linear and
conformational B-cell epitopes.
1. Prediction of Linear B-Cell Epitopes
Linear B-cell epitopes consist of peptides which can readily be used to replace antigens for
immunizations and antibody production. Therefore, despite being a minority, prediction of linear
B-cell epitopes have received major attention. Linear B-cell epitopes are predicted from the
primary sequence of antigens using sequence-based methods. Early computational methods for the
prediction of B-cell epitopes were based on simple amino acid propensity scales depicting
physicochemical features of B-cellepitopes. For example, Hopp and Wood applied residue
hydrophilicity calculations for B-cell epitope prediction on the assumption that hydrophilic regions
are predominantly located on the protein surface and are potentially antigenic. We know now,
however, that protein surfaces contain roughly the same number of hydrophilic and hydrophobic
residues . Other amino acid propensity scales introduced for B-cell epitope prediction are based
on flexibility ,surface accessibility , and β-turn propensity . Current available bioinformatics tools
to predict linear B-cell epitopes using propensity scales include PREDITOP and PEOPLE.
PREDITOP uses a multiparametric algorithm based on hydrophilicity, accessibility, flexibility,
and secondary structure properties of the amino acids. PEOPLE uses the same parameters and in
addition includes the assessment of β-turns. A related method to predict B-cell epitopes was
introduced by Kolaskar and Tongaonkar , consisting on a simple antigenicity scale derived from
physicochemical properties and frequencies of amino acids in experimentally determined B-cell
epitopes. This index is perhaps the most popular antigenic scale for B-cell epitope prediction, and
it is actually implemented by GCG and EMBOSS packages. Comparative evaluations of
propensity scales carried out in a dataset of 85 linear B-cell epitopes showed that most propensity
scales predicted between 50 and 70% of B-cell epitopes, with the β-turn scale reaching the best
values. It has also been shown that combining the different scales does not appear to improve
predictions . Moreover, Blythe and Flower demonstrated that single-scale amino acid propensity
scales are not reliable to predict epitope location.
The poor performance of amino acid scales for the prediction of linear B-cell epitopes prompted
the introduction of machine learning- (ML-) based methods . These methods are developed by
training ML algorithms to distinguish experimental B-cell epitopes from non-B-cell epitopes. Prior
to training, B-cell epitopes are translated into feature vectors capturing selected properties, such
as those given by different propensity scales. Relevant examples of B-cell epitope prediction
methods based on ML include BepiPred , ABCpred , LBtope , BCPREDS , and SVMtrip . Datasets,
training features, and algorithms used for developing these methods differ. BepiPred is based on
random forests trained on B-cell epitopes obtained from 3D-structures of antigen-antibody
complexes . Both BCPREDS and SVMtrip are based on support vector machines (SVM) but while
BCPREDS was trained using various string kernels that eliminate the need for representing the
sequence into length-fixed feature vectors, SMVtrip was trained on length-fixed tripeptide
composition vectors. ABCpred and LBtope methods consist on artificial neural networks (ANNs)
trained on similar positive data, B-cell epitopes, but differ on negative data, non-B-cell epitopes.
Negative data used for training ABCpred consisted on random peptides while negative data used
for LBtope was based on experimentally validated non-B-cell epitopes form IEDB . In general, B-
cell epitope prediction methods employing ML-algorithm are reported to outperform those based
on amino acid propensity scales. Nevertheless, some authors have reported that ML algorithms
show little improvement over single-scale-based methods .
Antibodies elicited in the course of an immune response are generally of a given isotype that
determines their biological function. A recent advance in B-cell epitope prediction is the
development of a method by Gupta et al. that allows the identification of B-cell epitopes capable
of inducing specific class of antibodies. This method is based on SMVs trained on a dataset that
includes linear B-cell epitopes known to induce IgG, IgE, and IgA antibodies.

3.2. Prediction of Conformational B-Cell Epitopes


Most B-cell epitopes are conformational and yet, prediction of conformational B-cell epitopes has
lagged behind that of linear B-cell epitopes. There are two main practical reasons for that. First of
all, prediction of conformational B-cell epitopes generally requires the knowledge of protein three-
dimensional (3D) structure and this information is only available for a fraction of proteins .
Secondly, isolating conformational B-cell epitopes from their protein context for selective
antibody production is a difficult task that requires suitable scaffolds for epitope grafting. Thereby,
prediction of conformational B-cell prediction is currently of little relevance for epitope vaccine
design and antibody-based technologies. Nonetheless, prediction of conformational B-cell
epitopes is interesting for carrying structure-function studies involving antibody-antigen
interactions.
There are several available methods to predict conformational B-cell epitopes . The first to be
introduced was CEP , which relied almost entirely on predicting patches of solvent-exposed
residues. It was followed by DiscoTope , which, in addition to solvent accessibility, considered
amino acid statistics and spatial information to predict conformational B-cell epitopes. An
independent evaluation of these two methods using a benchmark dataset of 59 conformational
epitopes revealed that they did not exceed a 40% of precision and a 46% of recall . Subsequently,
more methods were developed, like ElliPro that aims to identify protruding regions in antigen
surfaces and PEPITO and SEPPA that combine single physicochemical properties of amino acids
and geometrical structure properties. The reported area under the curve (AUC) of these methods
is around 0.7, which is indicative of a poor discrimination capacity yet better than random. Though,
in an independent evaluation, SEPPA reached an AUC of 0.62 while all the mentioned methods
had an AUC around 0.5
The above methods for conformational B-cell epitope prediction identify generic antigenic regions
regardless of antibodies, which are ignored . However, there are also methods for antibody-specific
epitope prediction. This approach was pioneered by Soga et al. who defined an antibody-specific
epitope propensity (ASEP) index after analyzing the interfaces of antigen-antibody 3D-structures.
Using this index, they developed a novel method for predicting epitope residues in individual
antibodies that worked by narrowing down candidate epitope residues predicted by conventional
methods. More recently, Krawczyk et al. developed EpiPred, a method that uses a docking-like
approach to match up antibody and antigen structures, thus identifying epitope regions on the
antigen.

T-Cell Epitope Prediction


T-cell epitope prediction aims to identify the shortest peptides within an antigen that are able to
stimulate either CD4 or CD8 T-cells . This capacity to stimulate T-cells is called immunogenicity,
and it is confirmed in assays requiring synthetic peptides derived from antigens . There are many
distinct peptides within antigens and T-cell prediction methods aim to identify those that are
immunogenic. T-cell epitope immunogenicity is contingent on three basic steps: (i) antigen
processing, (ii) peptide binding to MHC molecules, and (iii) recognition by a cognate TCR. Of
these three events, MHC-peptide binding is the most selective one at determining T-cell epitopes
. Therefore, prediction of peptide-MHC binding is the main basis to anticipate T-cell epitopes and
we will review it next.

Prediction of Peptide-MHC Binding


MHC I and MHC II molecules have similar 3D-structures with bound peptides sitting in a groove
delineated by two α-helices overlying a floor comprised of eight antiparallel β-strands. However,
there are also key differences between MHC I and II binding grooves that we must highlight for
they condition peptide-binding predictions. The peptide-binding cleft of MHC I molecules is
closed as it is made by a single α chain. As a result, MHC I molecules can only bind short peptides
ranging from 9 to 11 amino acids, whose N- and C-terminal ends remain pinned to conserved
residues of the MHC I molecule through a network of hydrogen bonds . The MHC I peptide-
binding groove also contains deep binding pockets with tight physicochemical preferences that
facilitate binding predictions. There is a complication however. Peptides that have different sizes
and bind to the same MHC I molecule often use alternative binding pockets. Therefore, methods
predicting peptide-MHC I binding require a fixed peptide length. However, since most MHC I
peptide ligands have 9 residues, it is generally preferable to predict peptides with that size. In
contrast, the peptide-binding groove of MHC II molecules is open, allowing the N- and C-terminal
ends of a peptide to extend beyond the binding groove. As a result, MHC II-bound peptides vary
widely in length (9–22 residues), although only a core of nine residues (peptide-binding core) sits
into the MHC II binding groove. Therefore, peptide-MHC II binding prediction methods often
target to identify these peptide-binding cores. MHC II molecule binding pockets are also shallower
and less demanding than those of MHC I molecules. As a consequence, peptide-binding prediction
to MHC II molecules is less accurate than that of MHC I molecules.

Given the relevance of the problem, there are numerous methods to predict peptide-MHC binding.
The most relevant with free online use are collected . They can be divided in two main categories:
data-driven and structure-based methods. Structure-based approaches generally rely on modeling
the peptide-MHC structure followed by evaluation of the interaction through methods such as
molecular dynamic simulations . Structure-based methods have the great advantage of not needing
experimental data. However, they are seldom used as they are computationally intensive and
exhibit lower predictive performance than data-driven methods .
Selected T-cell epitope prediction tools available online for free public use.
Tool URL Method MH
C
class

EpiDOCK http://epidock.ddg-pharmfac.net SB II

MotifScan https://www.hiv.lanl.gov/content/immunology/motif_scan/m SM I and


otif_scan II

Rankpep http://imed.med.ucm.es/Tools/rankpep.html MM I and


II

SYFPEITHI http://www.syfpeithi.de/ MM I and


II

MAPPP http://www.mpiib-berlin.mpg.de/MAPPP/ MM I

PREDIVAC http://predivac.biosci.uq.edu.au/ MM II

PEPVAC http://imed.med.ucm.es/PEPVAC/ MM I

EPISOPT http://bio.med.ucm.es/episopt.html MM I
Tool URL Method MH
C
class

Vaxign http://www.violinet.org/vaxign/ MM I and


II

MHCPred http://www.ddg-pharmfac.net/mhcpred/MHCPred/ QSAR I and


II

EpiTOP http://www.pharmfac.net/EpiTOP QSAR II

BIMAS https://www-bimas.cit.nih.gov/molbio/hla_bind/ QAM I

TEPITOPE http://datamining- QAM II


iip.fudan.edu.cn/service/TEPITOPEpan/TEPITOPEpan.html

Propred http://www.imtech.res.in/raghava/propred/ QAM II

Propred-1 http://www.imtech.res.in/raghava/propred1/ QAM I

EpiJen http://www.ddg-pharmfac.net/epijen/EpiJen/EpiJen.htm QAM I


Tool URL Method MH
C
class

IEDB-MHCI http://tools.immuneepitope.org/mhci/ Combin I


ed

IEDB- http://tools.immuneepitope.org/mhcii/ Combin II


MHCII ed

IL4pred http://webs.iiitd.edu.in/raghava/il4pred/index.php SVM II

MULTIPRE http://cvc.dfci.harvard.edu/multipred2/index.php ANN I and


D2 II

MHC2PRED http://www.imtech.res.in/raghava/mhc2pred/index.html SVM II

NetMHC http://www.cbs.dtu.dk/services/NetMHC/ ANN I

NetMHCII http://www.cbs.dtu.dk/services/NetMHCII/ ANN II

NetMHCpan http://www.cbs.dtu.dk/services/NetMHCpan/ ANN I


Tool URL Method MH
C
class

NetMHCIIpa http://www.cbs.dtu.dk/services/NetMHCIIpan/ ANN II


n

nHLApred http://www.imtech.res.in/raghava/nhlapred/ ANN I

SVMHC http://abi.inf.uni-tuebingen.de/Services/SVMHC/ SVM I and


II

SVRMHC http://us.accurascience.com/SVRMHCdb/ SVM I and


II

NetCTL http://www.cbs.dtu.dk/services/NetCTL/ ANN I

WAPP https://abi.inf.uni-tuebingen.de/Services/WAPP/index_html SVM I

Data-driven methods for peptide-MHC binding prediction are based on peptide sequences that are
known to bind to MHC molecules. These peptide sequences are generally available in specialized
epitope databases such as IEDB , EPIMHC , Antijen . Both MHC I and II binding peptides contain
frequently occurring amino acids at particular peptide positions, known as anchor residues.
Thereby, prediction of peptide-MHC binding was first approached using sequence motif (SM)
reflecting amino acid preferences of MHC molecules at anchor positions . However, it was soon
shown that nonanchor residues also contribute to the capacity of a peptide to bind to a given MHC
molecule . Subsequently, researchers developed motif matrices (MM), which could evaluate the
contribution of each and all peptide positions to the binding with the MHC molecule . The most
sophisticated form of motif matrices consists of profiles that are similar to those used for detecting
sequence homology. We would like to remark that motif matrices are often mistaken with
quantitative affinity matrices (QAMs) since both produce peptide scores. However, MMs are
derived without taking in consideration values of binding affinities and, therefore, resulting peptide
scores are not suited to address binding affinity. In contrast, QAMs are trained on peptides and
corresponding binding affinities, and aim to predict binding affinity. The first method based on
QAMs was developed by Parker et al. Subsequently, various approaches were developed to obtain
QAMs from peptide affinity data and predict peptide binding to MHC I and II molecules.
Researchers have used ML for two distinct problems: the discrimination of MHC binders from
nonbinders and the prediction of binding affinity of peptides to MHC molecules.
For developing discrimination models, ML algorithms are trained on data sets consisting of
peptides that either bind or do not bind to MHC molecules. Relevant examples of ML-based
discrimination models are those based on artificial neural networks (ANNs), support vector
machines (SVMs) , decision trees (DTs), and Hidden Markov models (HMMs), which can also
cope with nonlinear data and have been used to discriminate peptides binding to MHC molecules.
However, unlike other ML algorithms, they have to be trained only on positive data.
With regard to predicting binding affinity, ML algorithms are trained on datasets consisting of
peptides with known affinity to MHC molecules. Both SVMs and ANNs have been used for such
purpose. SVMs were first applied to predict peptide-binding affinity to MHC I molecules and later
to MHC II molecules . Likewise, ANNs were also applied first to the prediction of peptide binding
to MHC I and later to MHC II molecules. Benchmarking of peptide-MHC binding prediction
methods appears to indicate that those based on ANNs are superior to those based on QAMs and
MMs.
A major complication for predicting T-cell epitopes through peptide-MHC binding models is
MHC polymorphism. In humans, MHC molecules are known as human leukocyte antigens
(HLAs), and there are hundreds of allelic variants of class I (HLA I) and class II (HLA II)
molecules. These HLA allelic variants bind distinct sets of peptides and require specific models
for predicting peptide-MHC binding. However, peptide-binding data is only available for a
minority of HLA molecules. To overcome this limitation, some researchers have developed pan-
MHC-specific methods by training ANNs on input data combining MHC residues that contact the
peptide with peptide-binding affinity that are capable of predicting peptide-binding affinities to
uncharacterized HLA alleles.
HLA polymorphism also hampers the development of worldwide covering T-cell epitope-based
vaccines as HLA variants are expressed at vastly variable frequencies in different ethnic groups .
Interestingly, different HLA molecules can also bind similar sets of peptides and researchers have
devised methods to cluster them in groups, known as HLA supertypes, consisting of HLA alleles
with similar peptide-binding specificities
Identification of promiscuous peptide-binding to HLA supertypes enables the development of T-
cell epitope vaccines with high-population coverage using a limited number of peptides. Currently,
several web-based methods allow the prediction of promiscuous peptide-binding to HLA
supertypes for epitope vaccine design including MULTIPRED and PEPVAC
Prediction of Antigen Processing and Integration with Peptide-MHC Binding Prediction
Antigen processing shapes the peptide repertoire available for MHC binding and is a limiting step
determining T-cell epitope immunogenicity. Subsequently, computational modeling of the antigen
processing pathway provides a mean to enhance T-cell epitope predictions. Antigen presentation
by MHC I and II molecules proceed by two different pathways. MHC II molecules present peptide
antigens derived from endocyted antigens that are degraded and loaded onto the MHC II molecule
in endosomal compartments. Class II antigen degradation is poorly understood, and there is lack
of good prediction algorithms yet. In contrast, MHC I molecules present peptides derived mainly
from antigens degraded in the cytosol. The resulting peptide antigens are then transported to the
endoplasmic reticulum by TAP where they are loaded onto nascent MHC I molecules. Prior to
loading, peptides often undergo trimming by ERAAP N-terminal amino peptidases
Proteasomal cleavage and peptide-binding to TAP have been studied in detail and there are
computational methods that predict both processes. Proteasomal cleavage prediction models have
been derived from peptide fragments generated in vitro by human constitutive proteasomes and
from sets of MHC I-restricted ligands mapped onto their source proteins. On the other hand, TAP
binding prediction methods have been developed by training different algorithms on peptides of
known affinity to TAP. Combination of proteasomal cleavage and peptide-binding to TAP with
peptide-MHC binding predictions increases T-cell epitope predictive rate in comparison to just
peptide-binding to MHC I. Subsequently, researchers have developed resources to predict CD8 T-
cell epitopes through multistep approaches integrating proteasomal cleavage, TAP transport, and
peptide-binding to MHC molecules.

Population Coverage Tool


The Population Coverage Tool can be found at
http://tools.immuneepitope.org/main/html/analysis_tools.html.

T cells recognize a complex between a specific MHC type and a particular pathogen-derived
epitope and thus a given epitope will elicit a response only in individuals that express an MHC
molecule capable of binding that particular epitope. MHC molecules are extremely polymorphic
(over a thousand different variants are known in humans). Therefore, selecting multiple peptides
with different MHC binding specificities will afford increased coverage of the patient population
targeted as vaccine recipients. The issue of population coverage in relation to MHC polymorphism
is further complicated by the fact that different MHC types are expressed at dramatically different
frequencies in different ethnicities. Thus, without careful consideration, a vaccine with ethnically
biased population coverage could result. To address this issue, the actual/predicted binding
capacity of potential epitopes to as many different MHC molecules possible (and when available,
also restriction data of T cell responses recognizing the epitope) can be used to project the
population coverage in different ethnicities of different vaccine candidates or epitope sets.
Accordingly, epitope-based vaccines or diagnostics can be designed to maximize population
coverage, while minimizing complexity (that is, the number of different epitopes included in the
diagnostic or vaccine), and also minimizing the variability of coverage obtained or projected in
different ethnic groups.

Epitope Conservancy tool


The Epitope Conservancy Tool can be found at
http://tools.immuneepitope.org/main/html/analysis_tools.html.

In a diagnostic or epitope-based vaccine setting, focusing on conserved epitopes allows for


targeting responses around pathogen variability, whether it exists prior to infection, or develops in
the natural course of disease. The use of conserved epitopes would be expected to focus the
immune response on sequences crucial for retaining biological function of the pathogen proteins,
and thus with intrinsically lower variability, even under immune pressure. The epitope
conservancy analysis tools implemented here aims to address the issue of variability (or
conservation) of epitopes, and to assist in the selection of epitopes with the desired pattern of
conservation. The algorithm has been implemented to calculate the degree of conservancy of an
epitope within a given protein sequence set at different degree of identities. The degree of
conservation is defined as the number of protein sequences that contain the epitope at a given
identity level, divided by the total number of protein sequences found in the dataset analyzed.

VACCINE
The word ‘vaccination’ was used for first time by Edward Jenner in 1796 to describe the injection
of smallpox vaccine . Louis Pasteur developed the concept through his innovative work in
microbiology. Now, vaccination is the administration of antigenic agents applied to stimulate the
immune system of an individual and to develop adaptive immunity to a disease. Vaccines can
ameliorate, or often even prevent, the effects of infection. Vaccination is generally considered to
be the most effective method of preventing infectious diseases , and the efficacy of vaccination
has been extensively studied and verified. The administration of some vaccines is conducted after
the patient has already been infected by the pathogen. Vaccination conducted after exposure to
smallpox, within the first 3 days, is reported to attenuate the disease considerably, and
administration up to a week after exposure is able to provide some protection from disease, or may
ease its severity . Also, a multi-stage tuberculosis vaccine has recently been developed to confer
protection after the exposure to the pathogen. There are numerous vaccine examples, including
experimental ones against AIDS, cancer and Alzheimer's disease. The core mechanism behind all
the vaccinations is the ability of the vaccine to initiate an immune response in a quicker fashion
than the pathogen itself.
The purpose of every vaccination is to present a particular antigen or set of antigens to the immune
system in order to evoke a relevant immune response. The main active component of a vaccine
may be inactive, but still intact (attenuated bacteria or viruses), or purified components of the
pathogen that are known to induce immune reaction.
Types of vaccines
1. Inactivated vaccines
This type of vaccine consists of virus particles grown in cell culture and inactivated by applying
high temperature or chemicals such as formaldehyde. The viral particles are unable to replicate
because they are destroyed, but the capsid proteins of the virus have remained intact enough to be
recognized and used by the immune system in order to induce a response. If properly produced,
the vaccine is not a threat; however, if the inactivation is not performed successfully, active
infectious particles can be administered together with the vaccine. Additional booster shots are
often needed in order to secure the immune response, because the properly produced vaccine
cannot reproduce inside the host.
2. Live attenuated vaccines
The attenuated vaccines contain live virus particles with low levels of virulence. They have
retained their ability to slowly reproduce, and thus they remain a continuous source of antigen for
a certain period after the first vaccination, reducing the need of booster shots to keep the antigen
levels sufficiently high. Such vaccines are produced by passing virus in cell cultures, in animals
or at suboptimal temperatures, allowing selection of less virulent strains or by mutagenesis, or
targeted deletions in genes required for virulence.
3. Subunit vaccines
Subunit vaccines use only the antigenic components that best stimulate the immune system, instead
of dealing with the entire micro-organism. The fact that the subunit vaccine content is mainly
represented by the essential antigens reduces the chances of adverse reactions to the vaccine. A
subunit vaccine introduces an antigen to the immune system without involving any viral particles.
The number of antigens in subunit vaccine can range from 1 to 20 or more. Of course, the
identification of the most promising antigens to stimulate the immune system is often a time-
consuming process, and can be very difficult. Subunit vaccines are often known for causing weaker
antibody responses in comparison with the other vaccine classes. One of the most successful
subunit vaccines is the hepatitis B vaccine containing the surface antigen HbsAg.
4. Virus-like particles
Virus-like particle (VLP) vaccines are comprised only of viral proteins that take part in the
assembly of the virus structure. They have the ability to self-assemble into virus resembling the
particles from which they were derived without the presence of the viral nucleic acid, which makes
them simply non-pathogenic . By contrast with the subunit vaccines, VLPs usually have higher
immunogenicity owing to their multi-valent and highly repetitive structure. VLPs have been
produced from a broad range of viruses that belong to Retroviridae, Flaviviridae and Parvoviridae
families. Vaccines against viruses such as human papillomavirus and hepatitis B are VLP-based
vaccines that are currently in clinical use . Additionally, a pre-clinical vaccine against chikungunya
virus was developed based on the same approach. VLPs are typically produced in a variety of cell
cultures, such as mammalian cell lines, insect cell lines, and plant and yeast cells .
5. Toxoid vaccines
The toxoid vaccines are typical solution for bacteria that secrete harmful metabolites or toxins. It
is common to use them when the main reason for discomfort or sickness is a bacterial toxin. Such
toxoid vaccines are produced by treating the toxins with formalin, thus inactivating them, and still
retaining their structure for further recognition by the immune system. Examples of toxoid
vaccines are the vaccines against diphtheria and tetanus.
6. DNA vaccines
DNA vaccination is a very new approach for induction of humoral and cellular immune responses
to protein antigens by administering genetically engineered DNA. The majority of DNA vaccines
are still in the experimental stage, and have been tested in numerous viral, bacterial and parasitic
models of disease, and also in a few tumour models. DNA vaccines represent an innovative
approach for immunization, bringing a number of advantages over conventional vaccines and
giving the possibility of inducing a broader variety of immune response types . The risks of DNA
vaccines are limited . Several groups demonstrated that cancer vaccines can be effective for the
induction of specific immunity against cancer-associated antigens without negative side effects
like integration of plasmid DNA into the host genomes or induction of pathogenic anti-DNA
antibodies .
7. Peptide vaccines
The improved knowledge of antigen recognition at molecular level has contributed to the
development of rationally designed peptide vaccines. The general idea behind the peptide vaccines
is based on the chemical approach to synthesize the identified B-cell and T-cell epitopes that are
immunodominant and can induce specific immune responses. B-cell epitope of a target molecule
can be conjugated with a T-cell epitope to make it immunogenic. The first epitope-based vaccine
was created in 1985 by Jackob et al. . They introduced recombinant DNA and express epitopes
against cholera in Escherichia coli. Epitope-based vaccines can be constructed for T and B
lymphocytes . The T-cell epitopes are typically peptide fragments, whereas the B-cell epitopes can
be proteins, lipids, nucleic acids or carbohydrates . Peptides have become desirable vaccine
candidates owing to their comparatively easy production and construction, chemical stability, and
absence of infectious potential. The peptide vaccines against various cancers have been developed,
and entered phase I and phase II of clinical trials, with satisfactory clinical outcome. The peptide
vaccination is commonly being studied for application in both ameliorating and prophylactic
immunotherapy . Yet there is more to be improved in order to eliminate obstacles, such as the need
for a better adjuvant and carrier or the low immunogenicity. Nonetheless, current efforts are
showing much promise in defying these limitations and providing improvements for this approach.
Reverse vaccinology
It is an improvement on vaccinology that employs bioinformatics, pioneered by Rino Rappuoli
and first used against Serogroup B meningococcus. Since then, it has been used on several other
bacterial vaccines.
The basic idea behind reverse vaccinology is that an entire pathogenic genome can be screened
using bioinformatics approaches to find genes. Some of the traits that the genes are monitored for
that may indicate antigenecity include genes that code for proteins with extracellular localization,
signal peptides, and B-cell epitopes. Next, those genes are filtered for desirable attributes that
would make good vaccine targets such as outer membrane proteins. Once the candidates are
identified, they are produced synthetically and are screened in animal models of the infection.
After Craig Venter published the genome of the first free-living organism in 1995, the genomes of
other microorganisms became more readily available throughout the end of the twentieth century.
Reverse vaccinology, designing vaccines using the pathogen’s sequenced genome, came from this
new wealth of genomic information, as well as technological advances. Reverse vaccinology is
much more efficient than traditional vaccinology, which requires growing large amounts of
specific microorganisms as well as extensive wet lab tests.
In 2000, Rino Rappuoli and the J. Craig Venter Institute developed the first vaccine using Reverse
Vaccinology against Serogroup B meningococcus. The J. Craig Venter institute and others then
continued work on vaccines for A Streptococcus, B Streptococcus, Staphylococcus auereus, and
Streptococcus pneumoniae.
Reverse Vaccinology with Meningococcus B
Attempts at reverse vaccinology first began with Meningococcus B (MenB). Meningococcus B
caused over 50% of meningococcal meningitis, and scientists had been unable to create a
successful vaccine for the pathogen because of the bacterium’s unique structure. This bacterium’s
polysaccharide shell is identical to that of a human self-antigen, but its surface proteins vary
greatly; and the lack of information about the surface proteins caused developing a vaccine to be
extremely difficult. As a result, Rino Rappuoli and other scientists turned towards bioinformatics
to design a functional vaccine.
Rappuoli and others at the J. Craig Venter Institute first sequenced the MenB genome. Then, they
scanned the sequenced genome for potential antigens. They found over 600 possible antigens,
which were tested by expression in Escherichia coli. The most universally applicable antigens were
used in the prototype vaccines. Several proved to function successfully in mice, however, these
proteins alone did not effectively interact with the human immune system due to not inducing a
good immune response in order for the protection to be achieved. Later, by addition of outer
membrane vesicles that contain lipopolysaccharides from the purification of blebs on gram
negative cultures. The addition of this adjuvant (previously identified by using conventional
vaccinology approaches) enhanced immune response to the level that was required. Later, the
vaccine was proven to be safe and effective in adult humans.
Subsequent Reverse Vaccinology Research
During the development of the MenB vaccine, scientists adopted the same Reverse Vaccinology
methods for other bacterial pathogens. A Streptococcus and B Streptococcus vaccines were two of
the first Reverse Vaccines created. Because those bacterial strains induce antibodies that react with
human antigens, the vaccines for those bacteria needed to not contain homologies to proteins
encoded in the human genome in order to not cause adverse reactions, thus establishing the need
for genome-based Reverse Vaccinology.
Later, Reverse Vaccinology was used to develop vaccines for antibiotic-resistant Staphylococcus
aureus and Streptococcus pneumoniae
Pros and cons
The major advantage for reverse vaccinology is finding vaccine targets quickly and efficiently.
Traditional methods may take decades to unravel pathogens and antigens, diseases and immunity.
However, In silico can be very fast, allowing to identify new vaccines for testing in only a few
years. The downside is that only proteins can be targeted using this process. Whereas, conventional
vaccinology approaches can find other biomolecular targets such as polysaccharides.
Available Software
Though using bioinformatic technology to develop vaccines has become typical in the past ten
years, general laboratories often do not have the advanced software that can do this. However,
there are a growing number of programs making reverse vaccinology information more accessible.
NERVE is one relatively new dataprocessing program. Though it must be downloaded and does
not include all epitope predictions, it does help save some time by combining the computational
steps of reverse vaccinology into one program. Vaxign, an even more comprehensive program,
was created in 2008. Vaxign is web-based and completely public-access.
Though Vaxign has been found to be extremely accurate and efficient, some scientists still utilize
the online software RANKPEP for the peptide bonding predictions. Both Vaxign and RANKPEP
employ PSSMs (Position Specific Scoring Matrices) when analyzing protein sequences or
sequence alignments.
Computer Aided bioinformatics projects are becoming extremely popular, as they help guide the
laboratory experiments.
Other Developments because of Reverse Vaccinology and Bioinformatics
Reverse vaccinology has caused an increased focus on pathogenic biology.
Reverse vaccinology led to the discovery of pili in gram-positive pathogens such as A
streptococcus, B streptococcus, and pneumococcus. Previously, all gram-positive bacteria were
thought to not have any pili.
Reverse vaccinology also led to the discovery of factor G binding protein in meningococcus, which
binds to complement factor H in humans. Binding to the complement factor H allows for
meningococcus to grow in human blood while blocking alternative pathways. This model does not
fit many animal species, which do not have the same complement factor H as humans, indicating
differentiation of meningococcus between differing species.

Traditional
Reverse Vaccinology
Vaccinology

Virtually all antigens


encoded by the genome
are available and could be
sifted using computer
Antigens 10-25 identified by biochemical or genetic tools,
algorithm. However, this
available such as knocking out the important genes.
could result in a data
overload, which makes
choosing the right target
more challenging.

All antigens are available,


even if not highly
The most abundant antigens, the most immunogenic during
Property of
immunogenic during disease, only from cultivable disease. Antigens from
antigens
microorganisms. noncultivable
microorganisms can be
identified.

The most conserved


protective antigens can be
identified. Usually these
Highly immunogenic antigens, often variable in are not the most
Immunology of sequence, because of immune selective pressure. immunogenic during
the antigens Some may contain domains mimicking self-antigens infection. The novel
and may induce autoimmunity. antigens are screened
against the human
genome, and antigens
with homology to self-
antigens are removed
upfront, allowing to
overcome the bottleneck
where stimulating the
immune response could
give the wrong outcome.

Cannot be identified by
reverse vaccinology;
however, operons coding
Polysaccharide for the biosynthesis of
A major target of traditional bacterial vaccines.
antigens polysaccharides can be
identified. This can lead to
discovery of novel
carbohydrate antigens.

Virtually every single T cell


epitope is available.
T cell epitopes Known epitopes limited to the known antigens. Screening of the total T
cell immunity can be done
by overlapping peptides.

You might also like