|
 |
Loop
Modeling
|
| |
|
Comparative
Modeling is one of the powerful methods for predicting protein tertiary
structures. It can provide the structures of most regions of new
sequence from its homologous proteins accurately, especially for
the regular secondary structures and the conserved regions. However,
for the loop, which connects its two ends of regular secondary structures
in protein, often forms the active site of protein and is tightly
related to the protein function in most cases, the Comparative Modeling
method often can not get reasonable results. This is mainly because
loop is very flexible and its conformation is too variable to be
determined. Knowledge-based method and ab initio method had been
developed to deal with this problem. In our works, we systematically
analyzed the conformational distribution of protein loop and provide
a detailed loop cluster library. On the other hand, we also provide
an efficient loop modeling program using Monte Carlo simulation
annealing method combined with soft-sphere potential and constraint
potential.(Seen publications 1
2 5
8 11
12 13
14)
|
|
|
|
Database
of classified loop structures
|
| |
|
|
Loops
of different sequences and structures on similar scaffolds are common
in the Protein Data Bank (PDB). In order to explore both structural
and sequential diversity of them, a database of loops connecting
similar secondary structure fragments is constructed by searching
the database of families of structurally similar proteins (FSSP)
and PDB. A total of 84 loop families having 2 to 13 residues are
found among the well determined structures of resolution better
than 2.5 Å. 8 a-a, 20 a-b, 19 b-a and 37 b-b families are
identified. Every family contains more than 5 loop motifs. In each
family, no loops share same sequence and all the frameworks are
well superimposed. 43 new loop classes are distinguished in the
database. The structural variability of loops in homologous proteins
are examined and shown in 44 families. The conformations of loops
in each family are clustered into subfamilies using Average Linkage
Cluster Analysis method. (Seen the publication
8 and the database
in it)
|
|
|
|
Loop
modeling algorithm
|
| |
|
|
Our
Lab had proposed a Monte Carlo Simulated Annealing (MCSA) method
LPSA (Found
on Ftp Server), which
is based on one simplified force field called soft-sphere potential
and terminal restricted potential, to modeling the conformation
of the single loop accurately (Zhang et al. 1997, Biopolymers, Vol.
41, pp. 61-72). On our pervious study a modified program is presented
with significant improvements (LPCO)(Found
on Ftp Server), which is capable of sampling
the entire conformational space and identifying the low-energy candidates
by cluster analysis method and MCSA simulation. High speed of calculation
is reached with the simplified energy function and revised grid-mapping
method which accelerated the computation one magnitude degree than
before. As a comparison of single simulation, it takes only four
seconds for the simplest four-residue loop and forty-three seconds
for the most complex eight-residue loop on PII-350-Linux platform.
Twenty flexible surface loops connecting different secondary structures
are selected to test the efficiency of this program. The averaged
deviations of backbone heavy atoms for four to eight-residue-loops
are 0.19, 0.27, 0.46, 0.41 and 0.87Å respectively. The program
MLP (Found
on Ftp Server), which can also deal with multiple loops in the
meantime, is also developed based on the same principle. (Seen
publication 5)
|
|
|
|
Comprehensive
analysis on the third loop conformations of short chain snake venom
neurotoxins
|
| |
|
|
The
influence of long-range interactions on local structures is an important
issue in understanding protein folding process and protein structure
stability. Using short-chain snake venom neurotoxin as a model system,
we have studied the conformational properties of eight different
loop III sequences either in the environment of one of the short-chain
neurotoxin?erabutoxin b (PDB ID 1nxb), or in free state by Monte
Carlo simulated annealing method. The surrounding protein structure
was found to be crucial in stabilizing the loop conformation. Although
all the eight peptides prefer type V b turn in solution, three of
them (KPGI, KPGV, KSGI) turn to type II b turn and the other five
(KKGI, KKGV, KNGI, KQGI, and KRGV) are confined to more rigid type
V b turn conformation in the protein structure. Using flexible tetra-glycine-peptide
to screen the backbone conformational space in the protein environment
also validates the results. This study shows that long-range interactions
do contribute to the stability and the types of conformation for
a surface loop in protein, while short-range interactions may only
provide candidate conformations, which then have to be filtered
by the long-range interactions further. (Seen
publications 1
13 14)
|
|
|
|
Study
on interface of protein receptor and its ligand
|
| |
|
|
Comprehensive
study on the protein-protein complex is useful to design new structural
and functional proteins. The binding of protein with its ligand
can be generalized into two major modes: direct binding with surface
loops or binding associated with hinge movement. Two different strategies
were developed by us to study the conformational variation in the
binding process. For directive loop binding, a combinatorial conformational
library for the backbone structure of the loop was built up, which
includes all possible sub-stable conformations of the loop during
the binding procedure and was subject to screening by rigid docking.
Two cases, TNF binding to its receptor and Streptavidin associated
with its ligand were chosen to validated the idea. For hinge binding
movement, step-by-step docking method has been proposed and applied
to HIV-1 protease and inhibitor system. The proposed methods for
studying the binding process of protein and ligand will be helpful
for protein and drug design. (Seen publications
2 11
12)
|
|
|
|
a
|
TNF
binding to its receptors
|
| |
|
|
|
The
tumor necrosis factor (TNF) is selected (PDB ID code is 1TNR) to
validate the quasi-flexible docking method by conformational library
of loop backbone. TNF b binds to its receptor with trimer-state.
Each TNF monomer binds with two receptor monomers, and its d-e loop
( No. 105 to 110 residue ) is tightly related to the associativity.
Firstly, we successfully reproduced the crystal loop structure using
our loop program. The RMSD value between GMEC and crystal structure
was only 0.03nm. Based the result, a flexible polyglycine peptide
was introduced to screen out the canonical structures of loop backbone.
With the cutoff value of 1.35Å, five main canonical conformation
classes of the TNF loop backbone was found with conformational densities
44%, 18%, 16%, 12% and 7% respectively. Then the represent conformations
were docking to its receptor using Vakser's docking program GRAMM
with low resolution of 6.8Å. The results showed that positions
of all five canonical structures approached that in the crystal
structure. At last, the side-chain growth and evaluation of binding
energy were performed to screen out the proper sequences. We mainly
focused on TYR 108, which was closely related to the binding mode,
and Ser 105, which was basically independent of binding. It illustrated
that the third canonical conformation has obviously lowest binding
energy in the five, its conformation was also similar to the crystal
structure. We also found that at the No. 108 position, TYR had the
lower energy than others, PHE which was similar to TYR also had
the lower energy, while at the No. 105 position, there were no special
differences between different residues. The results were consistent
with the protein mutation experiments and validated the rationality
of our idea. (Seen publications 11
12)
|
|
|
|
b
|
Streptavidin
complexes
|
| |
|
|
|
The
Streptavidin complex was another case which was used as an example
to explore the possibility to build the combinatorial conformational
library of loop backbone. Streptavidin protein has an outside and
flexible loop, which can take diversiform conformations when bound
with different ligands. Using the native sequence and unassociated
streptavidin protein, all possible loop conformations were screened
out and then were classified into 25 groups with 2.23Å, which
then were found that really included all the native states of the
loops in different complexes. So did the polyglycine searching results,
which had 24 conformational classes with 2.23Å. The results
demonstrated that the loop conformations were limited because of
the restraints of protein environment and it will be helpful for
protein design. (Seen publication 2)
|
|
|
|
c
|
HIV-1
protease
|
| |
|
|
|
For
hinge binding movement, step-by-step docking method has been proposed
and applied to HIV-1 protease with its inhibitor system. HIV-1 protease
binds with its inhibitor with dimer-state. Through three steps of
docking precess- two HIV-1 protease monomers binding to form dimer,
dimer associated with its inhibitor, and two flaps covering the
active site to form the complete protein complex, the procedure
successfully simulated the open and close conformational state of
HIV-1 protease dimer, which had close energies and were easily to
translate from one to another. It also provided the accurate binding
position and direction of the inhibitor. On the other hand, the
conformation of two flaps were modeled closely to the crystal structure
too. Therefore, the step-by-step docking method can precisely simulate
the binding process of protein receptor with its ligand. (Seen
publication 2)
|
|
 |
Rebuilding protein side chains and protein sequence design
|
| |
|
During
the protein folding process, side-chain packing is crucial to the
stability of protein. It charges the formation of hydrophobic core
and active site, and is directly related to structure stability
and protein function. In recent years, the achievements in de novo
designing new sequences of known protein structure, protein modeling
and successful mutation in protein sequences validate the effects
of side-chain upbuilding method and demands more powerful tools.
The side-chain upbuilding methods can be classified into two categories.
The first belongs to theoretical searching modeling method, which
can get reasonable results while has large computational demanding
and can only be applied in small systems. Since it has been found
that most of the amino acids prefer a few conformations determined
by its side-chain torsion angles, which are correlated with stereochemistry
and energetic constraints by analyzing the characteristics of side-chains
in protein structures, the rotamer library based method was proposed,
which can also give good results while the results depend much on
the accuracy and completeness of the library. (Seen
publication 3 4
6 7
9 10
11 12)
|
|
|
|
Rotamer
library
|
| |
|
|
Rotamer
library has been a powerful tool in managing the conformation of
side-chains in protein structure prediction since Janin et al. and
Bhat et al. made the pioneering work in this area. Continuous works
were developed by Ponder J.W., Richard F.M, Tuffery P., M.D.Maeyer
and H.Kono et al. by exploring more protein crystal structures.
Based on the libraries worked out by the mentioned above groups,
we rebuilt a new one, which mainly consists of 330 rotamers from
M.D.Maeyer's work, and also combines the results from P.Tuffery's
and H.Kono's. The cutoff deviations of side-chain torsion angles
c1, c2, c3, c4 are 10¡ã, 20¡ã, 10¡ã and 10¡ã respectively, the total
deviation of above four torsion angles is 20¡ã. Glycine and Alanine
are treated to have only one conformer, Pro has two conformers which
are up- and down- type proline minimized by amber force field. Thus
we got a 525-rotamer library. Considering side-chains of a few residues,
such as HIS, TRP, ASN and GLN, has asymmetrically planar structure,
the rotamer library was enlarged into 549 rotamers.
(Seen publication 4
and rotamer library
in supplimentary
materials)
|
|
|
|
Disturbing
Genetic Algorithm
|
| |
|
|
In
order to overcome the inaccuracy and incompleteness of rotamer library,
the Disturbing Genetic Algorithm (DGA) which incorporates the Disturbing
Mutation process (Seen Figure
1) into the genetic algorithm (GA) flow has been proposed
to upbuild protein side-chains. The program SURMG
(Found on Ftp Server)also
includes Growing Generation Amount (GGA) method (Seen
Figure 2)
that inherits the characteristics of the natural evolution process.
By repacking side-chains within proteins and at the protein-protein
interfaces using pseudo energy function of Root Mean Standard Deviation
(RMSD), the DGA method was found to combine the advantages from
both library-based GA method and ab initio modeling method and is
superior than general Genetic Algorithm (GA). Real energy function,
which mainly consists of conformational energy (including van der
Waals energy and electrostatic energy), and its parameters have
been developed for side-chain repacking in proteins and protein-protein
interfaces. Reasonable results were obtained when the program was
applied to 31 cases and the averaged veracity of the torsion angles
c1 is 80.37% for the buried residues. (see
publication 4 9
10)
|
|
|
|
Rationality
Analysis of produced new sequences on know folding motifs
|
| |
|
|
We
revised our Disturbing Genetic Algorithm for protein side-chain
upbuilding by incorporating additional energy item to energy function,
which describes the tendencies of each amino acids to all sorts
of special secondary structures. Then each 100 new sequences were
designed for four kinds of folding motifs a, b, a/b and a+b respectively,
which constitutions were same to the represent crystal sequence
while homologies were distributed between 0 and 100%, and their
energies were calculated. The result shows that the energies could
be decreased dramatically through improving the homologies of designed
sequences and the percent of the conservative residues in general.
Contrast to above designed sequences, most of the sequences from
known crystal structures had relatively lower energies. It was found
that the calculated energy with our method was suitable to judge
the rationality of the sequence fitting to special structural folding
motif. Therefore, through retaining the key residues and mutating
rest residues, we may design new stable proteins which have relatively
lower energy than known sequences and are more suitable to special
folding motifs. (Seen publication 3)
|
|
|
|
Redesign
the active sites of hypervariable region of antibody, protein receptor
and protein ligand
|
| |
|
|
The
results on protein loop and interface of protein receptor with its
ligand encourage us to go deep into the protein complex. Based on
the Loop Modeling method of Monte Carlo Simulated Annealing (MCSA)
and Side Chain method of Disturbing Genetic Algorithm (DGA), the
hypervariable region of antibody and the two parts of protein complex
will be studied. We try to redesign some active sites to get new
functional sequences, especially to get new functional enzymes and
protein antagonisms. These works are now in process¡ (Seen
publication 6 7
11 12)
|
|