LoBoS Pack Software
Version 0.1
|
Date |
Author |
Description |
|
07/17/97 |
EMB |
Initial Version |
|
10/09/97 |
EMB |
Recent additions to code |
1. Description
The queue manager is responsible for providing users with an interface to the LoBoS system which:
The following assumptions are inherent in the design of the queueing system:
The system handles manages resources using three distinct pieces of code:
|
Program |
Description |
Configuration file |
Output |
|
Pulse |
Nodes exchange datagrams reporting on CPUs, network links, disk space
|
/lobos/config/pulse.conf /lobos/config/nodes /lobos/config/topology |
/lobos/node_status /lobos/node.log |
|
lobosq |
Accepts job requests and dispatches them
|
/lobos/config/nodes /lobos/config/topology
|
/lobos/job_status /lobos/job.log |
|
LoBoS Web Page |
Displays of system and job status |
/lobos/config/nodes /lobos/config/topology /lobos/node_status /lobos/job_status |
Display |
2. Commands
In general, when a job is submitted the following sequence occurs:
2.1 To Submit a job
Type:
lobosq <CR>
Enter the commands…
<ctrl>-d
or
lobosq < filename
2.2 To cancel a job
Type:
lobosq can username.yymmddhhmmss <CR>
2.3 To check on job status
Use the web to monitor the job prior to execution.
Use the username.yymmddhhmmss.log file to monitor the job during execution
3. Lobos Queue Directives
The following directives may be used in the scripts submitted to LoBoS.
|
Directive |
Min |
Max |
Default |
Units |
Description |
|
#ETIME |
1 |
60 |
Minutes |
Estimated time of job |
|
|
#CONFidence |
0 |
100 |
90 |
Percent |
Confidence that ETIME is correct (future) |
|
#NPROCS |
1 |
128 |
4 |
CPUs |
Number of requested processors Syntax: n or nlow-nhigh (future) |
|
#NPROC0 |
None |
CPU |
Requested CPU to use for NODE0 (future) |
||
|
#TPIP |
1 |
2 |
1 |
Threads |
Number of threads per IP |
|
#ADMIN |
None |
Schedules job immediately, permanently (root use only) |
|||
|
#RTIME |
Time |
Schedule job at the requested time (root use only) |
|||
4. Configuration Files
These configuration files collectively define the role and resources of nodes
as well as the network topology. The files are to be identical on all nodes of
the system. They comprise the database available to each node to determine its
role and network connections.
4.1 Node Configuration
The file is called /lobos/config/nodes
/lobos/config/nodes----------------------------------------------------------------
# Description of the nodes within the LoBoS system
#
# To do:
# Must add hub type
# type 0=compute;non-zero is rank of master node
# The format for the file is:
# Host CPU- Logical Physical
#id xxx.xxx.xxx.xxx name type CPUs Type clock speed RAM swap links x y x y
1 165.112.185.1 pe1 0 2 1 200 1.0 128 512 2 0.878 0.437 0.160 0.850
2 165.112.185.2 pe2 0 2 1 200 1.0 128 512 2 0.804 0.460 0.120 0.850
3 165.112.185.3 pe3 0 2 1 200 1.0 128 512 2 0.864 0.510 0.080 0.850
4 165.112.185.4 pe4 0 2 1 200 1.0 128 512 2 0.786 0.519 0.040 0.850
…
65 165.112.184.11 master1 1 2 1 200 1.0 128 128 2 0.250 0.850 0.260 0.850
66 165.112.184.12 master2 2 2 1 200 1.0 128 128 2 0.400 0.900 0.340 0.850
67 165.112.184.13 master3 3 2 1 200 1.0 128 128 2 0.600 0.900 0.260 0.680
68 165.112.184.14 master4 4 2 1 200 1.0 128 128 2 0.750 0.850 0.340 0.680
69 999.999.999.999 hub1 0 0 1 0 0.0 0 0 0 0.400 0.400 0.300 0.100
70 999.999.999.999 hub2 0 0 1 0 0.0 0 0 0 0.600 0.400 0.300 0.300
71 999.999.999.999 Gigabit 0 0 1 0 0.0 0 0 0 0.100 0.150 0.300 0.500
--------------------------------------------------------------------------------
4.2 Network Topology
The file is called /lobos/config/topology
/lobos/topology-------------------------------------------------------------
# Description of the topology within the LoBoS system
#
#
# The format for the file is:
# name1 name2 IP1 IP2
master1 hub1 165.112.184.11 165.112.184.1
master2 hub1 165.112.184.12 165.112.184.1
master3 hub1 165.112.184.13 165.112.184.1
master4 hub1 165.112.184.14 165.112.184.1
master1 hub2 165.112.184.11 165.112.184.1
master2 hub2 165.112.184.12 165.112.184.1
master3 hub2 165.112.184.13 165.112.184.1
master4 hub2 165.112.184.14 165.112.184.1
pe1 pe2 165.112.185.1 165.112.185.2
pe2 pe3 165.112.185.2 165.112.185.3
pe3 pe4 165.112.185.3 165.112.185.4
pe4 pe5 165.112.185.4 165.112.185.5
…
pe1 hub1 165.112.185.1 999.999.999.999
pe3 hub1 165.112.185.3 999.999.999.999
…
pe2 hub2 165.112.185.2 999.999.999.999
pe4 hub2 165.112.185.4 999.999.999.999
…
pe37 Gigabit 165.112.185.37 999.999.999.999
pe38 Gigabit 165.112.185.38 999.999.999.999
pe39 Gigabit 165.112.185.39 999.999.999.999
pe40 Gigabit 165.112.185.40 999.999.999.999
--------------------------------------------------------------------------------
4.3 Status
/lobos/node_status---------------------------------------------------------------
# Node status file
# Version 0.10 Date: Sat Sep 20 13:53:04 1997
node 01:53:04 pe1 0
node 01:53:04 pe2 0
node 01:53:04 pe3 0
node 01:53:04 pe4 0
node 01:53:04 master1 0
node 01:53:04 master2 0
node 01:53:04 master3 0
node 01:53:04 master4 0
node 01:53:04 hub1 0
node 01:53:04 hub2 0
node 01:53:04 Gigabit 0
link 10:50:24 pe1 pe2 165.112.185.1 165.112.185.2 1
link 10:50:24 pe1 hub1 165.112.185.1 231.231.231.231 1
link 10:50:24 pe2 pe3 165.112.185.2 165.112.185.3 1
link 10:50:24 pe2 hub2 165.112.185.2 231.231.231.231 1
link 10:50:24 pe3 pe4 165.112.185.3 165.112.185.4 1
link 10:50:24 pe3 hub1 165.112.185.3 231.231.231.231 1
link 10:50:24 pe4 pe5 165.112.185.4 165.112.185.5 1
link 10:50:24 pe4 hub2 165.112.185.4 231.231.231.231 1
…
link 10:50:24 master1 master1 165.112.184.11 165.112.184.11 1
link 10:50:24 master1 hub1 165.112.184.11 231.231.231.231 1
link 10:50:24 master1 hub2 165.112.184.11 231.231.231.231 1
link 10:50:24 master2 master1 165.112.184.12 165.112.184.11 1
link 10:50:24 master2 hub1 165.112.184.12 231.231.231.231 1
link 10:50:24 master2 hub2 165.112.184.12 231.231.231.231 1
link 10:50:24 master3 hub1 165.112.184.13 231.231.231.231 1
link 10:50:24 master3 hub2 165.112.184.13 231.231.231.231 1
link 10:50:24 master4 hub1 165.112.184.14 231.231.231.231 1
link 10:50:24 master4 hub2 165.112.184.14 231.231.231.231 1
--------------------------------------------------------------------------------
4.4 Pulse configuration
The file is called /lobos/pulse_config
/lobos/pulse_config---------------------------------------------------------------
# Configuration information for reporting status of the machine
#
# keyword parameters
HeartBeat 61
NodeTimeOut 90.
LinkTimeOut 90.
MasterTimeOut 90.
--------------------------------------------------------------------------------
5. Software
5.1 pulse
The pulse software is implemented as a daemon which runs on each of the LoBoS
nodes. On a compute node the pulse daemon reports on each processor and network
connection defined in the configuration files for the node. On a master node
the pulse daemon listens for reports from other nodes and updates the file
/lobos/node_status. Eventually the master nodes will converse and provide a
fail-over capability.
6. Lobosq
6.1 Job status
Job dispatching and assignments to individual CPUs is performed by lobosq.
It generates the following status file.
The file is called /lobosq/job_status
/lobos/job_status-------------------------------------------------------------------------------
# Job Status Date:Wed Oct 8 16:23:00 1997
# job id uid gid pri CPUs Start Time Est Time Act Time cpus
#xxxxxxxxxxxxxxxxxxxxxxxx xxxx xxxx xxx xxxx mmm dd hh:mm:ss hhh:mm:ss hhh:mm:ss xxxxxxxxxxxxx
billings.971008161803 260 9000 1 2 1997 Oct 08 18:23:00 1:00:00 0:00:00 1,3
billings.971008161805 260 9000 1 2 1997 Oct 08 18:23:00 1:00:00 0:00:00 2,4
billings.971008162411 260 9000 1 2 1997 Oct 08 16:23:00 1:00:00 0:00:00 2,4
------------------------------------------------------------------------------------------------