Using the PELE Web Server

You have to sign up with the server to be able to submit jobs and to have records of all your submitted jobs in one place. One advantage of this way is that you may gain access to more resources (more CPUs per jobs, longer wall-clocks, or even additional compute clusters) later by contacting the server administrator. For submitting in this way, you must first register with the server, then you will be able to login. Upon login, you will be automatically directed to your jobs page. From there (and also from submit page), you can submit new jobs. But all the jobs submitted while you were logined, are only accessible by going to your jobs page and not with bookmarks.

As input, the only mandatory file that must be provided by the user is the structure(s) in PDB format (with all hydrogen atoms added). Then a control file can be made based on the selected ready-made script or uploaded by the user. You can find more information about the format of inputs and different running parameters below.

Requirements (input files)

The user needs to have a PDB file for the structure(s), including all hydrogens. The server does not modify any protonation state or add any ligand atoms (i.e. the user should know the system and decide the best protonation state).

Additionally, in order to recognize the right protonation state, the user should change the name of some residues in the PDB file:

HIS/HIE/HIP (for delta, epsilon or both protonated histidines)

LYN/ARN (neutral lysine or argenine)

ASH/GLH (protonated aspartic or glutamic)

CYT/CYX (deprotonated negative/neutral cysteine)

Water residues (if present) should be named SPC or HOH and defined as HETATM. For example:

HETATM 4957  OW  HOH A 605      -1.397   5.896   2.606  1.00 0.00            O
HETATM 4958  HW1 HOH A 606      -1.036   6.564   1.955  1.00 0.00            H
HETATM 4959  HW2 HOH A 607      -2.357   6.099   2.795  1.00 0.00            H

If a ligand is present (checking the ligand box), the server will automatically create a template for the ligand, assigning bond order and charges (OPLS-2005 charges).

In near future, all these assignment will be performed interactively with a protein viewer.


Control files and ready-made scripts

There are several optimized scripts to perform routine tasks. By selecting any of them, only few parameters should be entered. More information about each script can be found in the submit page. Hovering pointer over each option shows a descriptive text about its purpose.

The last two scripts are actually special cases:

  • Use an uploaded control file. This option allows you to upload a ready-made PELE's control file as a script and use it to run the job (for advanced users).
  • Set all script parameters manually. This option allows you to set all the parameters manually by going through a long list of all possible parameters.


For manual fine tuning of the job, it is also possible to edit the generated control file by clicking on Create and edit button. In this case you should have a good understanding of PELE's control file syntax. You may consult information below to have some insights about those commands.


Short tutorial on editing control file

Briefly, the pele control file uses several control lines. The first lines define the data directories. Then there is a section were we can define energy parameters, solvent models, ionic strength, etc. For example, 

energy params solvent vdgbnp
energy params ionic 0.3

where we selected a variable dielectric SGB model and an ionic strength of 0.3 (M). 

Next we specify the input file to be used.

load pdb filename ions yes waters no het yes

The server automatically assigns (or changes in case you upload a control file) filename to the job ID. For instance, if the job ID is 597, filename becomes 597.pdb.

After providing an input file, you can specify geometrical constraints. Here are some options:

constraints atom A:1:_CA_ atom _:3:_FE_ 100 2.4
constraints atom A:1:_CA_ xyz 1.0 2.0 3.0 100 2.4
constraints atom A:1:_CA_ current 100 2.4

constraints angle A:1:_CA_ A:1:_CB_ C:1:_H1_ 10.0 1.5708
constraints torsion A:1:_CA_ A:1:_CB_ A:1:_CG_ A:1:_CD_ 10.0 0
constraints atom A:76:_CA_ AFM 87.59 78.72 112.2 0.01 7.0 0.0

The first constraint setups a harmonic constraint between two atoms: the Cα of the residue number 1 in the A chain and the FE atom of residue number 3 in the chain with no name (_),  with a 100 kcal/mol and a distance of 2.4 A. Similarly the next lines constraint to a point or to its current position. Angle and torsions constraint units are in radians.

Last constraint example (AFM) is used to pull atoms. Following the example, an harmonic constraint is generated between A:76:_CA_ and a virtual point at the beginnig in the same position of the atom. Then, virtual bead is moved at a 0.01 A/step in the direction between the position introduced  87.59 78.72 112.2 and the atom selected. The harmonic constant value is 7.0 and the equilibrium length is 0.0.


Notice: constraints, for example, are important in the case that you have an ion (Ca2+, Mg2+..) which is coordinated to the protein and/or waters. Not adding the constraints will most likely result in the ion flying away or changing its coordination ligands. 

Additionally, a native input file can be provided. The native file can be used to compute ligand or overall rmsd. Optionally,  the rmsd to the native file can then be used to follow or to bias the simulation (see below the sapwn commands).

load native 1iep_CUT.pdb het yes

Next the main PELE line, defining the run tasks and parameters is defined. Within the PELE line the order of parameters does not matter. We usually start, however to define the task, then the global parameters, the the side chain parameters, then the minimization and finally the ANM parameters. The task section defines the flow of the program: how to end, what to do, using mpi to interchange trajectories, etc.. For example:

task &
  show atom 1 A:154:FE__ & 
  show bind_ene 1 &
  show rmsd_lig 1 heavy 0 &
  show adist A:3:_CA_ A:55:_CA_ &
  exit steps gt 1000 &
end_task &

Where we ask PELE to perform 1000 Monte Carlo steps (for each processor!), showing us at each step the distance of the center of mass of ligand 1 ((the ligand we are perturbing--at present time always 1) to the iron atom in the residue 154 of chain A. similarly it shows the binding energy of the ligand and its rmsd to the native structure. The rmsd includes all heavy atoms and no ligands around (0Å around --i.e. only the ligand). The following line

show rmsd_lig 1 heavy 5 &

would compute the rmsd to the native file of the ligand and everything around 5 Å. If no native file is entered or if rmsd_ini_lig is used, the rmsd is computed to the initial structure. The task also asks PELE to print the distance between two protein atoms, the Cα's between residues 3 and 55 from the chain A.

A more complex task might include changing some variables depending on the current step number. For example, we added here a change in the translation of the ligand. Every fifth step it will do 2.75Å, otherwise only 1.0Å. We also change the number of tries to perturb the ligand in each step. Each try, PELE does a full ligand perturbation, choosing at the end the best of them (based on the potential energy after each try). Thus, increasing the tries we might guide better the ligand towards "preferred" electrostatic areas...

task &
  if modulos 5 = 0 then &
     tra_r 2.75 &
     tries 10 & 
   else &
     tra_r 1.0 &
     tries 1 &
  endif &
  exit steps gt 1000 &
end_task &

Even more complex, including an instruction to abandon the current coordinates and to obtain new ones from another processor which is more advanced in a given coordinate:
task &
  spawn atom 1 A:3:_CA_ gt 2 &
  exit steps gt 1000 &

end_task &

Where if in processor A the distance from ligand 1 (the ligand we are perturbing--at present time always 1) to the Cα of residue 3 in chain A is 2Å smaller than in processor B, then processor A will forget about its coordinates and get a new set from processor B. Thus the leading processor exports its coordinates to those behind in the desired reaction coordinate. Of course if the gap (here 2Å) is set to narrow we would be biasing too much the simulation. On the other site if it is too large (or no spawning is defined) we will have each trajectory independent of the others. Other examples of spawning, guiding the ligand to/from a xyz point, optimizing binding energy, etc... are:
spawn point 1 15.8  53.7  13.4 lt 6 &
spawn bind_ene 1 lt 10 &
spawn atom 1 A:3:_CA_ within 10.0 &
spawn rmsd_sys calpha lt 2.0 &

The last line bias the simulation towards the native structure using the Cα rmsd.

Program Output

The job will typically provide two main output files: the log file and the trajectory file (or files if having more than one processor). The log file has lots of information about the input parameters, evolution of the job/tasks/variables, timing of different sections, etc. If running in parallel, there is an overall log file, describing the overall evolution of all processors and individual log files reporting details for the runs in each processor.  
When adding a job, you have several options to control the output. In the additional output section, you might define several variables to be printed in the log file. Furthermore, once the job starts, the server provides a plot with those variables generated in the log file. By default, the energy versus the steps is plotted, but different variables can be chosen interactively. This plot allows for monitoring the job as it is evolving, which might help deciding if it performing correctly.
Output variables include:
Steps (always shown by default): PELE's total Monte Carlo steps, accepted and non-accepted, for a given processor.
Acc Steps (always shown by default): PELE's Monte Carlo steps that have been accepted (that passed the Metropolis test). 
Energy (always shown by default): PELE's energy is provided in kcal/mol. It is based on the 2005 OPLS-AA force field. A variable dielectric SGBA implicit solvent models is added by default with an ionic force constant of 100mM.
SASA:  Solvent accessible area for the ligand. This variable will allow to monitor how buried/exposed the ligand is.
Tip: SASA can be used in setting up other parameters. For example, in one task we can add
           if SASA 1 lt 0.2 then tra_r 0.5 else tra_r 4.0 endif 

which would use a lower translation inside the protein and larger outside for ligand #1.

Binding Energy: This value plots the interaction energy between the protein and the ligand (AB-A-B). The energy is provided in kcal/mol.
Native RMSD: RMSD to a native structure (you should load a native pdb!). You can decide the atom selection between alpha carbons, heavy atoms, side chains, backbone heavy atoms and all atoms. The algorithm will first align the two structures using your atoms selection and then perform the RMSD.
Ligand RMSD: RMSD of the ligand to a native structure (you should load a native pdb!). You can decide the atom selection between heavy or all atoms. Additionally, you might include a sphere of atoms around the ligand (again heavy or all atoms).
Point Distance: Monitors the distance between the center of mass of the ligand and a given XYZ point in the space. In angstroms

Atom Distance: Monitors the distance between the center of mass of the ligand and a given atom (for example, A:12:_CA_,  alpha carbon of residue 12 in chain A). In angstroms
Two Atom Distance: Monitors the distance between two atoms along the simulation. In angstroms
Two Groups Distances: Monitors the distance between the geometrical center between two groups of atoms. In angstroms

Global Control Variables

The following variables indicate how often a given procedure is performed within the Metropolis cycles. A value of 1 means to perform the procedure in each cycle.

  • LIG freq - How often a ligand perturbation is produced.
  • ANM freq - How often a backbone ANM perturbation is produced.
  • MIN freq - How often the final minimization is produced.
  • SP freq - How often the side chain sampling is produced.
  • WR freq - How often to write an accepted structure (pdb).
  • Spawn WR freq - How often to write a structure that advances in the spawning criteria (if a spawning criteria is defined).

Advanced Global Control Variables

  • Metropolis temperature - Metropolis temperature. Default is 1000. We should not confuse this temperature with the molecular dynamics temperature.
  • Init Min - Whether to perform an initial minimization or not before running PELE.
  • Rem Bulk - Level of removing overall translation and rotation motion. Default 3, meaning to remove overall translation and rotation at the end of each Monte Carlo step.

Ligand Control Variables

  • Ligand Chain - Chain name of the ligand to be perturbed.
  • Ligand Res - Residue number of the ligand to be perturbed.
  • Tra range - Ligand translation range in the perturbation. In Angstroms.
  • Rot range - Ligand rotational range in the perturbation. In radians.

Advanced Ligand Control Variables

  • Waitfor - Number of steps to keep the same ligand translation perturbation direction. Increasing the waitfor we can further force the escape from cavities, etc.
  • lcom con - Force constant to constrain the ligand after translation and rotation. Thus, we can constrain the position of the ligand after the initial perturbation. In this way we will minimize the ligand motion in the following minimizations.
  • tries - Number of independent ligand perturbation in each cycle. We can perform several independent perturbations and the program will choose the one with best ligand-protein interaction energy.

ANM Control Variables

  • ANM eig freq - How often, within the Monte Carlo cycles, to compute and diagonalize the ANM matrix.
  • ANM eig num - Number of ANM modes to include in the calculation.
  • ANM displacement - Maximum Cα displacement in the ANM perturbation. The maximum displacement will correspond to the largest eigenvector coefficient. In Angstroms.
  • ANM type - Whether to perform a calpha ANM or all atom ANM. Keep in mind that all atom diagonalization might require several minutes.

Advanced ANM Control Variables

  • Ualig - Whether to include the ligand atoms in the ANM matrix. The program automatically assigns some ligand atoms depending on its size.
  • Mix mod - A float number indicating the level of mixing the modes. For example If 03 is selected then the chosen mode will contribute 0.3 and the rest of modes will be the remaining 0.7.
  • Belly - A float number indicating the radius around the ligand to include in the ANM matrix. By default all the protein is included in the ANM but for quicker induced fit studies we can select only a smaller region around the ligand.
  • rmsg - Root mean square gradient for the ANM minimization.
  • Alphaup - Whether to update the solvent Born alpha during the ANM minimization. Since the ANM minimization is only an approximate way to move the backbone usually we prefer not to update the alpha (makes it much faster). If no final minimization is present, however, it should be updated.

Side Chain Control Variables

  • SP radius - A float number indicating the radius around the ligand to include in the side chain prediction.
  • Top Side - The number of excited side chains to include in the prediction. PELE computes the energy for each side chain before and after the ANM and can include the residues more excited as a result of the backbone displacement in the side chain prediction step.
  • Randomize - Whether to randomize the side chain chain conformation before the prediction.
  • inccur - Include the current side chain dihedrals in the rotamer list.
  • SP RES - Side chain rotamer library resolution.
  • SP Algorithm - Side Chain prediction algorithm [Heuristic/Monte Carlo].

Minimization Control Variables

  • Rmsg - Root mean square gradient for the minimization.

Advanced Minimization Control Variables

  • caconst - To keep the conformational sampling provided by the ANM perturbation, the final minimization can have constraints in the Cα positions.
  • alphaup - Whether to update the solvent Born alpha during the minimization.
  • iterations - Max number of iterations.
  • MIN radius - If we are doing ligand perturbation, we can also define a region to minimize only around the perturbed ligand.