This is Info file parallel.doc, produced by Makeinfo-1.61 from the input file parallel.texi.
Parallel Implementation of CHARMM
CHARMM has been modified to allow computationally intensive simulations
to be run on multi-machines using a replicated data model. This
version, though employing a full communication scheme, uses an efficient
divide-and-conquer algorthm for global sums and broadcasts.
Curently the following hardware platforms are supported:
1. Cray T3D/T3E 7. Intel Paragon machine
2. Cray C90, J90 8. Thinking Machines CM-5
3. SGI Power Challenge 9. IBM SP1/SP2 machines
4. Convex SPP-1000 Exemplar 10. Parallel Virtual Machine (PVM)
5. Intel iPSC/860 gamma 11. Workstation clusters (SOCKET)
6. Intel Delta machine 12. Alpha Servers (SMP machines, PVMC)
13. TERRA 2000 14. HP SMP machines
15. Convex SPP-2000 16. SGI Origin
17. LoBoS
* Menu:
* Installation:: Installing CHARMM on parallel systems
* Running:: Running CHARMM on parallel systems
* Status:: Parallel Code Status (as of October 1993)
* Using PVM:: Parallel Code implemented with PVM
For support of many parallel comunication libraries the CMPI keyword
was added. In order to get to the old communication routines always
specify CMPI otherwise MPI is the default choice (see recommended
keyword combination for each specific platform). On some platforms
recommended preflx directives prepare the code which does the
communication much faster, eg on 128 nodes T3E CMPI is 4 times faster
than MPI.
This is a complete list of supported combinations for message passing
libraries implemented in the parallel CHARMM
Combinations of pref.dat keywords for MPI library (can be specified on
any platform that support MPI):
1. < no extra keywords > (Calls to MPI collective routines)
2. CMPI MPI (non-blocking cube topology using send/receive from MPI)
3. CMPI MPI GENCOMM (non-blocking ring topology, MPI send/receive)
4. CMPI MPI SYNCHRON (blocking cube topology, MPI send/receive)
5. CMPI MPI GENCOMM SYNCHRON (blocking ring topology, MPI send/receive)
Native library options
6. CMPI DELTA (for Intel Paragon)
7. CMPI IBMSP (for IBM SP2)
8. TERRA (for TERRA 2000)
9. CMPI CM5 (For CM5)
10. CSPP (Convex version of MPI)
Workstation clusters using SOCKET
11. CMPI SOCKET SYNCRON (blocking cube topology)
12. CMPI SOCKET SYNCRON GENCOMM (blocking ring topology)
PVM library
13. CMPI PVMC SYNCHRON (blocking cube, PVM send/receive)
14. CMPI PVMC GENCOMM SYNCHRON (blocking ring, PVM send/receive)
Combination 1., 8. and 10. are currently implemented in
machdep/paral1.src so there is no need for paral2.src and paral3.src
files, which will eventually become unnecessary. Efficiency of
different topologies also varies with the number of nodes.
Also on some platforms EXPAND keyword is recommended in the combination
of the fastest FAST option in the CHARMM input script, eg for IBMSP:
EXPAND (fast parvect)
The installation script now installs default configuration for any
parallel platform. If one of X,G,P,M,1,2,64,Q,S is specified size
keyword must be specified too. Run install.com for without parameters
for current set of options.
Installation command for parallel machines with relevant options:
1. Cray T3E
install.com t3e [size] [Q] [P] or [M]
2. Cray T3D
install.com t3d [size] [Q] [P] or [M]
3. Cray C90, J90
install.com t3d [size]
4. SGI Power Chellenge
install.com sgi size P 64 [Q] [X]
uname -a : IRIX64 icpsg1 6.2 03131016 IP25
5. Convex SPP-1000 Exemplar
install.com sgi size P or M [Q]
6. Intel Paragon machine
install.com intel
uname -a : Paragon OSF/1 timewarp 1.0.4 R1_4 paragon
7. IBM SP1/SP2 machines
install.com ibmsp size [Q]
uname -a: AIX f1n3 1 4 000104697000
8. Generic Parallel Virtual Machine (PVM)
install.com machine size P
9. TERRA 2000
install.com terra
10. Workstation clusters
install.com machine size S [Q] [X]
11. Alpha Servers (SMP)
install.com alphamp
The following keywords in pref.dat are used for parallel CHARMM:
Machine independent keywords:
PARALLEL Needed for parallel version
SOCKET If TCP/IP sockets
PVM If using PVM library
PVMC If using PVM library on some platforms (see below).
PARAFULL Currently the only one which works
(must be specified)
PARASCAL For force decomposition scheme
(not ready for general use yet.)
SYNCHRON Most of the machines don't do
receive and send at the same time
GENCOMM Different communication arcitecture.
Can run any number of nodes
MPI If using MPI parallel library.
(point-to-point routines only)
CMPI CHARMM implementation of the MPI library.
Enables all the old functionality plus some
combinations of libraries on the same platform.
Machine specific keywords:
TERRA
CM5
CSPP
DELTA
INTEL
PARAGON
SHMEM
CSPPMPI
T3D
T3E
IBMSP
ALPHAMP
SGIMP
Running CHARMM on parallel systems
1. Cray T3D (Cray-PVM)
~charmm/exec/t3d/charmm24 -npes 256 < input_file > output_file &
The same command may be used in a batch script but without `&'.
Example using batch:
#QSUB -lM 16Mw
#QSUB -lT 600:00
#QSUB -mb -me
#QSUB -l mpp_p=32
#QSUB -l mpp_t=600:00
#QSUB -q mpp
setenv MPP_NPES 32
~charmm/exec/t3d/charmm24 < Input_file > output_file
Preflx directives required: T3D UNIX PARALLEL PARAFULL
Additional preflx directives recommended: PVM or MPI
1. Cray T3E (Cray-PVM)
CHARMM can be run on either a single processor or in parallel on the T3E.
Single processor runs are useful for small analysis jobs and other tasks
that are not amenable to parallel processing. The syntax for a single
pe run is:
charmm24 < filename.inp >& filename.out [&]
Large CHARMM jobs should be run in parallel using the queue system.
The syntax for a parallel run is:
mpprun -n charmm24 < filename.inp >& filename.out [&]
(here n is the desired number of pe's)
The same command may be used in a batch script but without `&'.
Example using batch:
#QSUB -lM 16Mw
#QSUB -lT 600:00
#QSUB -mb -me
#QSUB -l mpp_p=32
#QSUB -q mpp
mpprun -n 32 charmm24 < Input_file > output_file
Preflx directives required: T3E UNIX PARALLEL PARAFULL
Additional preflx directives recommended: EXPAND(fast off)
and either PVM or MPI
Optimization Notes:
T3E users should use the PBOUND command for simulations of periodic
systems. The pbound command optimizes non-bonded list-generation and
computations on parallel machines such as the T3E, giving significantly
better performance for parallel applications using simple perodic
boundary conditions. Note that the pbound command is currently
implemented only for scalar architectures such as the T3D and T3E.
3. Cray C90, J90 (Cray-PVM)
No info yet
4. SGI Power Challenge (PVM)
pvm
quit
setenv NTPVM 16 (or NTPVM=16 ; export NTPVM)
~charmm/exe/sgi/charmm24 <input_file >output_file &
Preflx directives required: SGI UNIX PARALLEL PARAFULL CMPI PVMC SGIMP
Additional preflx directives recommended: EXPAND(fast off)
Alternative, but not tested yet: SGI UNIX PARALLEL PARAFULL
5. Convex SPP-1000 Exemplar
With PVM
(see below for information setting up a PVM Hostfile)
mpa -sc <name_of_subcomplex> /bin/csh
setenv PVM_ROOT /usr/convex/pvm
/usr/lib/pvm/pvm
quit
~/pvm3/bin/CSPP/charmm24 -n 16 <input_file >output_file &
~charmm/exe/cspp/charmm24 <input_file >output_file &
Which subcomplexes are available check with the scm utility.
(For information on how to set up a PVM hostfile see *note 1: Using PVM.)
Preflx directives required: CSPP UNIX PARALLEL PARAFULL PVM HPUX
SYNCHRON (GENCOMM)
Note: The first time that you build CHARMM with PVM specify the P option
with install.com. You will be asked for the location of the PVM include
files and libraries. If these do not change and you do not reconstruct the
Makefiles, you do not have to specify this option each time you run
install.com.
With MPI
mpa -DATA -STACK -sc <name_of_subcomplex> \
~charmm/exe/cspp/charmm24 -np <n> <input_file >output_file &
Where <n> is the number of processors to use.
There are two environmanet variables that can be set:
setenv MPI_GLOBMEMSIZE <m>
where <m> is the size of the shared memory region on each hypernode
in bytes. The default is 16MB.
And:
setenv MPI_TOPOLOGY <i>,<j>,<k>,<l>,...
where <i>, <j>, <k>, <l>, ... are the number of tasks on each hypernode.
The sum must equal the number of processors specified with -np on the
command line. This is optional the default behavior is generally what
you want. If you are using a sub-complex with more than one hypernode,
use may want to include '-node 0' after mpa to keep the 0th process
on the 0th hypernode of the sub-complex.
Preflx directives required: CSPP UNIX PARALLEL PARAFULL HPUX
MPI CSPPMPI
The CSPPMPI directive specifies the use of extensions in the Convex
MPI implementation. This directive is optional. Use of the MPI
directive alone will result in a fully MPI Standard compliant program,
albeit with a loss of performance.
Note: The first time that you build CHARMM with MPI specify the M option
with install.com. You will be asked for the location of the MPI include
files and libraries. If these do not change and you do not reconstruct the
Makefiles, you do not have to specify this option each time you run
install.com.
6. Intel gamma
Because the fortran compiler on the Intel gamma does not know how
to rewind the redirected input file the program uses charmm.inp
file name from current working directory. The script for running
CHARMM should look like the following example:
cp input_file.inp charmm.inp
getcube -t128 > output_file
load ~charmm/exec/intel/charmm24
waitcube
Preflx directives required: INTEL UNIX PARALLEL PARAFULL
7. Intel Delta
mexec "-t(32,16)" ~charmm/exec/intel/charmm23<input_file>output_file&
Preflx directives required: INTEL UNIX DELTA PARALLEL PARAFULL
8. Intel Paragon
~charmm/exec/intel/charmm23 -sz 64 <input_file >output_file &
Preflx directives required: INTEL UNIX PARAGON PARALLEL PARAFULL
9. CM-5
~charmm/exec/cm5/charmm23 <input_file >output_file &
Preflx directives required:CM5 UNIX PARALLEL PARAFULL
10. IBM SP2 or SP1
setenv MP_RESD yes
setenv MP_PULSE 0
setenv MP_RMPOOL 1
setenv MP_EUILIB us
setenv MP_INFOLEVEL 0
poe ~charmm/exec/ibmsp/charmm24 -hfile nodes -procs 64 <input >output
See `man poe' for details.
Preflx directives required:IBMSP UNIX PARALLEL PARAFULL
Additional preflx directives recommended: EXPAND(fast parvect)
11. PVM
pvm
add host host1
add host host2
quit
setenv NTPVM 3
~/pvm3/bin/SGI5/charmm24 <input_file >output_file&
Preflx directives required: machine_type UNIX PARALLEL CMPI PVM
PARAFULL SYNCHRON
12. PARALLEL VERSION OF CHARMM23 ON WORKSTATION CLUSTERS
Preflx directives required: machine_type UNIX PARALLEL CMPI SOCKET
PARAFULL SYNCHRON
Currently the code runs on HP, DEC alpha, and IBM RS/6000
machines. This has been tested. The rest of UNIX world should run
too without any changes as long as the following is true:
Assumptions for cluster environment:
Before you run CHARMM you have to define some environment
variables. If you define nothing then CHARMM will run in a scalar
mode, i.e. default is one node run. (We could adopt PARALLEL
keyword in pref.dat as default.)
PWD
The program supports two shells: ksh (Korn Shell) and tcsh, which
is available from anonymous ftp. The only difference from csh on
which CHARMM makes assumption is definition of variable PWD. This
variable is correctly defined in ksh and tcsh by default, while
using csh it has to be defined by the user. Variable PWD points to
the current working directory. If some other directory is
requested the PWD environment variable may be changed
appropriately. The program can figure out current working
directory by itself but there are problems in some NFS
environments, because home directory names can vary on different
machines.( PWD is always defined correctly by shell which supports
it ) So csh may sometimes cause problems. Using csh the cd command
may be redefined so that it always defines also PWD. This is done
with something like:
alias cd 'chdir \!*; setenv PWD $cwd '
in the ~/.cshrc file.
If you get an error which looks something like nonexistent
directory then define PWD variable directly.
[NIH specific:
If you want to use tcsh as your login shell you may run the
following command:
runall chsh username /usr/local/bin/tcsh
runall is a script which runs the command on the whole cluster of
machines it is on /usr/local/bin at NIH. ]
NODEx
In order to run CHARMM on more then one node environment variables
NODE0, NODE1, ..., NODEn have to be defined.
Example for a 4 node run:
setenv NODE0 par0
setenv NODE1 par1
setenv NODE2 par2
setenv NODE3 par4
charmm23 < input_file > output_file 1:parameter1 2:parameter2 ...
"par0,par1,par2,.." are the names of the machines in the local
network. There is no requirement that all machines should be of
the same type. There is nothing in the program to adjust for
unequal load balance so all nodes will follow the slowest one. In
near future we may implement dynamic load balance method based on
actual time required.
The assumption here is that the node from where CHARMM program is
started is always NODE0!
Setup for your login environment
In order to run CHARMM in parallel you have to be able to rlogin to
any of the nodes defined in NODEx environment variables. Before you
run CHARMM check this out:
rlogin $NODE1
if it doesn't ask you for Password then you are OK. If it asks for
Password then put a line like this:
machine_name user_name
in your ~/.rhosts file.
[NIH specific:
How to submit job to HP.
Currently we have assigned machines par0, par1, par2, and par4 to
work in parallel. You may use script
/usr/local/bin/charmm23.parallel and submit it to par0. Example:
submit par0 charmm23.parallel <input_file >output_file ^D
To construct your own parallel scripts look at
/usr/local/bin/charmm23.parallel ]
In the input scripts
Everything should work, but avoid usage of IOLEV and PRNLEV in your
parallel scripts.
Parallel Code Status (as of February 1998)
The symbol ++ indicates that parallel code development is underway.
-----------------------------------------------------
Fully parallel and functional features:
Energy evaluation
ENERgy, GETE
MINImization
DYNAmics (leap frog integrator)
BLOCK
CRYSTAL
IMAGES
INTEraction energy
CONStraints (SHAKE,HARM,IC,DIHEdral,FIX,NOE)
ANAL (energy partition)
NBONds (generic)
EWALD
PME
PERT
-----------------------------------------------------
Functional, but nonparallel code in the parallel version (no speedup):
( ** indicates that these can be very computationally intensive and are
not recommended on parallel systems)
VIBRAN **
CORREL **(Except for the energy time series evaluation, which is
parallel)
READ, WRITE, and PRINT (I/O in general)
NOTE:
always protect prnlev ...
with
if ?mynode .eq. 0 then prnlev ...
CORMAN commands
HBONds
HBUIld **
IC (internal coordinate commands)
SCALar commands
CONStraints (setup, DROPlet, SBOUnd)
Miscellaneous commands
GENErate, PATCh, DELEte, JOIN, RENAme, IMPAtch (all PSF
modification commands)
MERGE
NBONDS (BYCUbe option)
QUANtum ** ++
QUICk
REWInd (not fully supported on the Intel)
SOLANA
-----------------------------------------------------
Nonfunctional code in parallel version:
ANAL (table generation)
DYNAmics (old integrator, NOSE integrator)
GRAPhics
TSM
MMFP
PATH
RISM
TRAVEL
RXNCOR
-----------------------------------------------------
Untested Features (we don't know if it works or not):
ANALysis
MOLVIB (No testcase for this code?)
MONItor
NMR
PRESsure (the command)
RMSD
Note: Currently one should specify the absolute path to the pvm include
files and the pvm library files. This is done because PVM installation
is not currently standard. During installation, through use of
install.com, you are asked to specify these paths.
Convex PVM
This version runs using PVM (Parallel Virtual Machine) versions 3.2.6 and
higher. To run:
1. create hostfile - as in the example below:
#host file
puma0 dx=/usr/lib/pvm/pvmd3 ep=/chem/sfleisch/c24a2/exec/cspp
The first field (puma0) is the hostname of the machine. The dx= field
is the absolute path to the PVM daemon, pvmd3. This includes the
filename, pvmd3. The last field, ep= is the search path for find the
executable when the tasks are spawned. This can be a colon (:) separated
string for searching multiple directories. The PVM system can be
monitored using the console program, pvm. It has some useful commands:
conf list machines in the virtual machine.
ps -a list the tasks that are running.
help list the commands.
quit exit the console program without killing the daemon.
halt kill everything that is running and the daemon and exit
the console program.
2. Run the PVM daemon, pvmd3:
pvmd3 hostfile &
3. Run the program e.g.:
/chem/sfleisch/c24a2/exec/cspp/charmm -n <ncpu> <input_file >output_file
&
where -n <ncpu> indicates how many pvm controlled processes to run
4. Halt the daemon. See above.
The Convex Exemplar PVM implementation uses shared memory via the System V
IPC routines, shmget and shemat.
Generic PARALLEL PVM version for workstation clusters
Preflx directives required: <MACHTYPE> UNIX SCALAR CMPI PVM PARALLEL
PARAFULL SYNCHRON
Where <MACHTYPE> is the workstation you are compiling on, e.g.,
HPUX, ALPHA, etc.
Note: Currently one must specify the absolute path to the pvm include
files and the pvm library files. This is done because PVM installation
is not currently standard. During installation, through use of
install.com, you are asked to spceify these paths.
This version runs using PVM (Parallel Virtual Machine) versions 3.2.6 and
higher. To run:
1. create hostfile - as in the example below:
#host file
boa0 dx=/usr/lib/pvm/pvmd3 ep=/cb/manet1/c24a2/exec/hpux
boa1 dx=/usr/lib/pvm/pvmd3 ep=/cb/manet1/c24a2/exec/hpux
boa2 dx=/usr/lib/pvm/pvmd3 ep=/cb/manet1/c24a2/exec/hpux
boa3 dx=/usr/lib/pvm/pvmd3 ep=/cb/manet1/c24a2/exec/hpux
The first field (boa0, etc) is the hostname of the machine. The dx= field
is the absolute path to the PVM daemon, pvmd3. This includes the
filename, pvmd3. The last field, ep= is the search path for find the
executable when the tasks are spawned. This can be a colon (:) separated
string for searching multiple directories. The PVM system can be
monitored using the console program, pvm. It has some useful commands:
conf list machines in the virtual machine.
ps -a list the tasks that are running.
help list the commands.
quit exit the console program without killing the daemon.
halt kill everything that is running and the daemon and exit
the console program.
2. Run the PVM daemon, pvmd3:
pvmd3 hostfile &
3. Run the program e.g.:
/cb/manet1/c24a2/exec/hpux/charmm -n <ncpu> <input_file >output_file &
where -n <ncpu> indicates how many pvm controlled processes to run
4. Halt the daemon. See above.
CHARMM Documentation / rvenable@deimos.cber.nih.gov