My research is in broad field related to protein study, including protein-peptide interaction and protein structure prediction. Currently my reserach focuses on protein-peptide research. A self-consistent process was developed to accurately estimate the binding affinity between protein and ligand. Based on the potential, a flexible protein-peptide docking method was designed which not only consider the binding flexibility of ligand but also protein. These methods are helpfull in drug design. Protein modeling is the hotspot of computational science, a related world wide competition- the Critical Assessment of Techniques for Protein Structure Prediction (CASP) has been help for four times. My works mainly focused on protein tertiary structure prediction and functional protein design, which can be classified in detail into two fields: local protein structure prediction- loop modeling and rebuilding protein side chains.
Index
 
   
1
   
2
   
3
   
4
   
a
   
b
   
c
 
   
1
   
2
   
3
   
4
 
 
Loop Modeling
   
Comparative Modeling is one of the powerful methods for predicting protein tertiary structures. It can provide the structures of most regions of new sequence from its homologous proteins accurately, especially for the regular secondary structures and the conserved regions. However, for the loop, which connects its two ends of regular secondary structures in protein, often forms the active site of protein and is tightly related to the protein function in most cases, the Comparative Modeling method often can not get reasonable results. This is mainly because loop is very flexible and its conformation is too variable to be determined. Knowledge-based method and ab initio method had been developed to deal with this problem. In our works, we systematically analyzed the conformational distribution of protein loop and provide a detailed loop cluster library. On the other hand, we also provide an efficient loop modeling program using Monte Carlo simulation annealing method combined with soft-sphere potential and constraint potential.(Seen publications 1 2 5 8 11 12 13 14)
 
Database of classified loop structures
     
Loops of different sequences and structures on similar scaffolds are common in the Protein Data Bank (PDB). In order to explore both structural and sequential diversity of them, a database of loops connecting similar secondary structure fragments is constructed by searching the database of families of structurally similar proteins (FSSP) and PDB. A total of 84 loop families having 2 to 13 residues are found among the well determined structures of resolution better than 2.5 Å. 8 a-a, 20 a-b, 19 b-a and 37 b-b families are identified. Every family contains more than 5 loop motifs. In each family, no loops share same sequence and all the frameworks are well superimposed. 43 new loop classes are distinguished in the database. The structural variability of loops in homologous proteins are examined and shown in 44 families. The conformations of loops in each family are clustered into subfamilies using Average Linkage Cluster Analysis method. (Seen the publication 8 and the database in it)
 
Loop modeling algorithm
     
Our Lab had proposed a Monte Carlo Simulated Annealing (MCSA) method LPSA (Found on Ftp Server), which is based on one simplified force field called soft-sphere potential and terminal restricted potential, to modeling the conformation of the single loop accurately (Zhang et al. 1997, Biopolymers, Vol. 41, pp. 61-72). On our pervious study a modified program is presented with significant improvements (LPCO)(Found on Ftp Server), which is capable of sampling the entire conformational space and identifying the low-energy candidates by cluster analysis method and MCSA simulation. High speed of calculation is reached with the simplified energy function and revised grid-mapping method which accelerated the computation one magnitude degree than before. As a comparison of single simulation, it takes only four seconds for the simplest four-residue loop and forty-three seconds for the most complex eight-residue loop on PII-350-Linux platform. Twenty flexible surface loops connecting different secondary structures are selected to test the efficiency of this program. The averaged deviations of backbone heavy atoms for four to eight-residue-loops are 0.19, 0.27, 0.46, 0.41 and 0.87Å respectively. The program MLP (Found on Ftp Server), which can also deal with multiple loops in the meantime, is also developed based on the same principle. (Seen publication 5)
 
Comprehensive analysis on the third loop conformations of short chain snake venom neurotoxins
     
The influence of long-range interactions on local structures is an important issue in understanding protein folding process and protein structure stability. Using short-chain snake venom neurotoxin as a model system, we have studied the conformational properties of eight different loop III sequences either in the environment of one of the short-chain neurotoxin?erabutoxin b (PDB ID 1nxb), or in free state by Monte Carlo simulated annealing method. The surrounding protein structure was found to be crucial in stabilizing the loop conformation. Although all the eight peptides prefer type V b turn in solution, three of them (KPGI, KPGV, KSGI) turn to type II b turn and the other five (KKGI, KKGV, KNGI, KQGI, and KRGV) are confined to more rigid type V b turn conformation in the protein structure. Using flexible tetra-glycine-peptide to screen the backbone conformational space in the protein environment also validates the results. This study shows that long-range interactions do contribute to the stability and the types of conformation for a surface loop in protein, while short-range interactions may only provide candidate conformations, which then have to be filtered by the long-range interactions further. (Seen publications 1 13 14)
 
Study on interface of protein receptor and its ligand
     
Comprehensive study on the protein-protein complex is useful to design new structural and functional proteins. The binding of protein with its ligand can be generalized into two major modes: direct binding with surface loops or binding associated with hinge movement. Two different strategies were developed by us to study the conformational variation in the binding process. For directive loop binding, a combinatorial conformational library for the backbone structure of the loop was built up, which includes all possible sub-stable conformations of the loop during the binding procedure and was subject to screening by rigid docking. Two cases, TNF binding to its receptor and Streptavidin associated with its ligand were chosen to validated the idea. For hinge binding movement, step-by-step docking method has been proposed and applied to HIV-1 protease and inhibitor system. The proposed methods for studying the binding process of protein and ligand will be helpful for protein and drug design. (Seen publications 2 11 12)
   
a
TNF binding to its receptors
     
The tumor necrosis factor (TNF) is selected (PDB ID code is 1TNR) to validate the quasi-flexible docking method by conformational library of loop backbone. TNF b binds to its receptor with trimer-state. Each TNF monomer binds with two receptor monomers, and its d-e loop ( No. 105 to 110 residue ) is tightly related to the associativity. Firstly, we successfully reproduced the crystal loop structure using our loop program. The RMSD value between GMEC and crystal structure was only 0.03nm. Based the result, a flexible polyglycine peptide was introduced to screen out the canonical structures of loop backbone. With the cutoff value of 1.35Å, five main canonical conformation classes of the TNF loop backbone was found with conformational densities 44%, 18%, 16%, 12% and 7% respectively. Then the represent conformations were docking to its receptor using Vakser's docking program GRAMM with low resolution of 6.8Å. The results showed that positions of all five canonical structures approached that in the crystal structure. At last, the side-chain growth and evaluation of binding energy were performed to screen out the proper sequences. We mainly focused on TYR 108, which was closely related to the binding mode, and Ser 105, which was basically independent of binding. It illustrated that the third canonical conformation has obviously lowest binding energy in the five, its conformation was also similar to the crystal structure. We also found that at the No. 108 position, TYR had the lower energy than others, PHE which was similar to TYR also had the lower energy, while at the No. 105 position, there were no special differences between different residues. The results were consistent with the protein mutation experiments and validated the rationality of our idea. (Seen publications 11 12)
   
b
Streptavidin complexes
     
The Streptavidin complex was another case which was used as an example to explore the possibility to build the combinatorial conformational library of loop backbone. Streptavidin protein has an outside and flexible loop, which can take diversiform conformations when bound with different ligands. Using the native sequence and unassociated streptavidin protein, all possible loop conformations were screened out and then were classified into 25 groups with 2.23Å, which then were found that really included all the native states of the loops in different complexes. So did the polyglycine searching results, which had 24 conformational classes with 2.23Å. The results demonstrated that the loop conformations were limited because of the restraints of protein environment and it will be helpful for protein design. (Seen publication 2)
   
c
HIV-1 protease
       
For hinge binding movement, step-by-step docking method has been proposed and applied to HIV-1 protease with its inhibitor system. HIV-1 protease binds with its inhibitor with dimer-state. Through three steps of docking precess- two HIV-1 protease monomers binding to form dimer, dimer associated with its inhibitor, and two flaps covering the active site to form the complete protein complex, the procedure successfully simulated the open and close conformational state of HIV-1 protease dimer, which had close energies and were easily to translate from one to another. It also provided the accurate binding position and direction of the inhibitor. On the other hand, the conformation of two flaps were modeled closely to the crystal structure too. Therefore, the step-by-step docking method can precisely simulate the binding process of protein receptor with its ligand. (Seen publication 2)
Rebuilding protein side chains and protein sequence design
   
During the protein folding process, side-chain packing is crucial to the stability of protein. It charges the formation of hydrophobic core and active site, and is directly related to structure stability and protein function. In recent years, the achievements in de novo designing new sequences of known protein structure, protein modeling and successful mutation in protein sequences validate the effects of side-chain upbuilding method and demands more powerful tools. The side-chain upbuilding methods can be classified into two categories. The first belongs to theoretical searching modeling method, which can get reasonable results while has large computational demanding and can only be applied in small systems. Since it has been found that most of the amino acids prefer a few conformations determined by its side-chain torsion angles, which are correlated with stereochemistry and energetic constraints by analyzing the characteristics of side-chains in protein structures, the rotamer library based method was proposed, which can also give good results while the results depend much on the accuracy and completeness of the library. (Seen publication 3 4 6 7 9 10 11 12)
 
Rotamer library
     
Rotamer library has been a powerful tool in managing the conformation of side-chains in protein structure prediction since Janin et al. and Bhat et al. made the pioneering work in this area. Continuous works were developed by Ponder J.W., Richard F.M, Tuffery P., M.D.Maeyer and H.Kono et al. by exploring more protein crystal structures. Based on the libraries worked out by the mentioned above groups, we rebuilt a new one, which mainly consists of 330 rotamers from M.D.Maeyer's work, and also combines the results from P.Tuffery's and H.Kono's. The cutoff deviations of side-chain torsion angles c1, c2, c3, c4 are 10¡ã, 20¡ã, 10¡ã and 10¡ã respectively, the total deviation of above four torsion angles is 20¡ã. Glycine and Alanine are treated to have only one conformer, Pro has two conformers which are up- and down- type proline minimized by amber force field. Thus we got a 525-rotamer library. Considering side-chains of a few residues, such as HIS, TRP, ASN and GLN, has asymmetrically planar structure, the rotamer library was enlarged into 549 rotamers. (Seen publication 4 and rotamer library in supplimentary materials)
 
Disturbing Genetic Algorithm
     
In order to overcome the inaccuracy and incompleteness of rotamer library, the Disturbing Genetic Algorithm (DGA) which incorporates the Disturbing Mutation process (Seen Figure 1) into the genetic algorithm (GA) flow has been proposed to upbuild protein side-chains. The program SURMG (Found on Ftp Server)also includes Growing Generation Amount (GGA) method (Seen Figure 2) that inherits the characteristics of the natural evolution process. By repacking side-chains within proteins and at the protein-protein interfaces using pseudo energy function of Root Mean Standard Deviation (RMSD), the DGA method was found to combine the advantages from both library-based GA method and ab initio modeling method and is superior than general Genetic Algorithm (GA). Real energy function, which mainly consists of conformational energy (including van der Waals energy and electrostatic energy), and its parameters have been developed for side-chain repacking in proteins and protein-protein interfaces. Reasonable results were obtained when the program was applied to 31 cases and the averaged veracity of the torsion angles c1 is 80.37% for the buried residues. (see publication 4 9 10)
 
Rationality Analysis of produced new sequences on know folding motifs
     
We revised our Disturbing Genetic Algorithm for protein side-chain upbuilding by incorporating additional energy item to energy function, which describes the tendencies of each amino acids to all sorts of special secondary structures. Then each 100 new sequences were designed for four kinds of folding motifs a, b, a/b and a+b respectively, which constitutions were same to the represent crystal sequence while homologies were distributed between 0 and 100%, and their energies were calculated. The result shows that the energies could be decreased dramatically through improving the homologies of designed sequences and the percent of the conservative residues in general. Contrast to above designed sequences, most of the sequences from known crystal structures had relatively lower energies. It was found that the calculated energy with our method was suitable to judge the rationality of the sequence fitting to special structural folding motif. Therefore, through retaining the key residues and mutating rest residues, we may design new stable proteins which have relatively lower energy than known sequences and are more suitable to special folding motifs. (Seen publication 3)
 
Redesign the active sites of hypervariable region of antibody, protein receptor and protein ligand
     
The results on protein loop and interface of protein receptor with its ligand encourage us to go deep into the protein complex. Based on the Loop Modeling method of Monte Carlo Simulated Annealing (MCSA) and Side Chain method of Disturbing Genetic Algorithm (DGA), the hypervariable region of antibody and the two parts of protein complex will be studied. We try to redesign some active sites to get new functional sequences, especially to get new functional enzymes and protein antagonisms. These works are now in process¡­ (Seen publication 6 7 11 12)
 
This homepage is supported by Liu Zhijie, E-mail: lzj@paradox.harvard.edu
Post Address: Harvard University, Chemistry Laboratories, 12 Oxford St. Mbox 123, Cambridge MA, 02138, USA
Resident Address: Harvard University, Cambridge MA, 02138, USA
Tel: 1-617-496-4368 (Lab)
 
BackGuide to Publications