rocks7 SGE任务管理配置

集群中的主机分2种：控制节点（mater）和计算节点（slave）。其中控制节点只在一台机器上部署，该控制节点也同时作为计算节点；其它主机全部是计算节点。计算资源是由host的slots构成。可以选取集群中部分的hosts，定义为host用户组。队列则表示集群中计算资源的容器。例如，名称叫all.q的队列对应着集群中全部的计算资源。若不想让某些用户使用集群全部的计算资源，则定义一个新的队列名，且该队列仅能使用集群部分的计算资源。使用SGE集群进行计算的时候，为了进行并行化计算，需要设置并行化参数。

/opt/gridengine/bin/lx-amd64/sge_execd 启动计算节点SGE

1.常用命令:

systemctl start rpcbind nfs-server
systemctl enable rpcbind nfs-server
systemctl disable iptables.service
netstat -lntup|grep sge

2.主节点，安装：

/opt/gridengine/install_qmaster

/opt/gridengine/install_execd 选装

3.计算节点，安装：

计算节点修改/etc/hosts：

127.0.0.1 localhost.localdomain localhost

192.168.1.99 rocks rocks.local

10.1.1.254 compute-0-0.local compute-0-0

cd /opt/gridengine

运行安装：install_execd

3.SGE 的配置

qconf - 此命令为群集和队列配置提供用户界面。

qconf [options]

a: add; d: delete; m: modify; s: show; 后面跟的l: list

常用命令说明

qconf -ae hostname
    添加执行主机
qconf -de hostname
    删除执行主机
qconf -sel
    显示执行主机列表
qconf -ah hostname
    添加管理主机
qconf -dh hostname
    删除管理主机
qconf -sh
    显示管理主机列表
qconf -as hostname
    添加提交主机
qconf -ds hostname
    删除提交主机
qconf -ss
    显示提交主机列表
qconf -ahgrp groupname
    添加主机用户组
qconf -mhgrp groupname
    修改主机用户组
qconf -shgrp groupname
    显示主机用户组成员
qconf -shgrpl
    显示主机用户组列表
qconf -aq queuename
    添加集群队列
qconf -dq queuename
    删除集群队列
qconf -mq queuename
    修改集群队列配置
qconf -sq queuename
    显示集群队列配置
qconf -sql
    显示集群队列列表
qconf -ap PE_name
    添加并行化环境
qconf -mp PE_name
    修改并行化环境
qconf -dp PE_name
    删除并行化环境
qconf -sp PE_name
    显示并行化环境
qconf -spl
    显示并行化环境名称列表
qstat -f
    显示执行主机状态
qstat -u user
    查看用户的作业
qhost
    显示执行主机资源信息

1）节点分组配置

在创建队列前需要创建节点hostgroup，

#创建@general 这个SGE hostgroup,

qconf -ahgrp @general

#把compute-0-0.local , compute-0-0.local 这两个host添加进@general 这个hostgroup，

qconf -aattr hostgroup hostlist "compute-0-0.local compute-0-0.local" @general

#也可以用qconf -mhgrp @general 修改添加相应的host。

#显示有哪些hostgroup,

qconf -shgrpl

#删除@general 这个hostgroup

qconf -dhgrp @general

2）队列配置

qconf -mq all.q #修改

#slots 1,[compute-0-0.master=40],[compute-0-1.master=40], [compute-0-2.master=40]

hostlist：放上自己所需的主机组名，如 @allhosts
seq_no：队列编号，可以用来表示优先级，0 代表最优先
priority：优先级，默认为0（最优先），可以调大使优先级降低
slots：表示所使用的主机CPU数，默认为1，可以设置一个统一的值，也可以针对单台主机设定各自的CPU数，如 1,[host1=4],[host2=8]
shell：默认使用/bin/csh，可以改为其他的shell，我改成了/bin/sh
shell_start_mode：shell的起始模式，默认为posix_complicant，我改成了unix_bahavior，可以识别shell第一行的指定程序(如 #!/bin/bash)

qconf -mc #添加

num_proc p INT <= YES YES 0 0

每个计算节点资源配置

qconf -rattr exechost complex_values slots=40,num_proc=40,h_vmem=64g,virtual_free=64G compute-0-0

qconf -rattr exechost complex_values slots=24,num_proc=24,h_vmem=125g,virtual_free=125G compute-0-7

ntroduction

Sun Grid Engine is a job scheduling system that is widely used in computing clusters today. Users should use use qmon (configure SGE by GUI) and qconf (configure SEG by command line) to configure SGE.

Detailed Reference

The more authorative and comprehensive reference is http://gridscheduler.sourceforge.net/htmlman/htmlman1/qstat.html and http://gridscheduler.sourceforge.net/htmlman/htmlman1/qconf.html

A simpler configuration tutorial is given at http://ait.web.psi.ch/services/linux/hpc/merlin3/sge/admin/sge_queues.html

SGE cheat sheet

qstat -f -u "*"                             all user jobs
qstat -g c                               show available queue and load (total/available cores)
qstat -f                                 detailed list of machines and job state 
qstat -explain c -j <job-id>               specific job status
qdel job-id                              delete job
qsub -l h_vmem=### job.sh                mem limit, see queue_conf(5) RESOURCE LIMITS


qconf -mc		change the complex configuration (very important command!)

Command 		Description
qconf -sp pename 	Show the configuration for the specified parallel environment.
qconf -spl 		Show a list of all currently configured parallel environments.
qconf -ap pename 	Add a new parallel environment.
qconf -Ap filename 	Add a parallel environment from file filename.
qconf -mp pename 	Modify the specified parallel environment using an editor.
qconf -Mp filename 	Modify a parallel environment from file filename.
qconf -dp pename 	Delete the specified parallel environment.
qconf -sql 		list of currently defined queues

To view the basic cluster and host configuration use the qconf -sconf command:

   qconf -sconf  [global]
   qconf -sconf  host_name

To modify the basic cluster or host specific configuration use the following commands:

   qconf -mconf  global
   qconf -mconf  host_name

To view and change the setup of hosts use qconf -XY [file] with the following options:

   =================================================================
	                 HOST   execution   admin     submit  host_group 
	                   Y      e          h          s
	ACTION        X 
   ----------------------------------------------------------------- 
   add (edit)     a           *          *          *         *
   add (file)     A           *                               *
   delete         d           *          *          *         *
   modify (edit)  m           *                               *
   modify (file)  M           *                               *
   show           s           *          *          *         *
   show list      sYl         *                               *
   ================================================================

Great reference on SGE configuration http://arc.liv.ac.uk/SGE/howto/sge-configs.html

Configuration of h_vmem for any new cluster

This is extremely important to set up for any cluster, to enable to limit the amount of memory that each job can request to avoid memory issues for user jobs. The key is to make h_vmem requestable.

Use qconf -mc to change the line to

h_vmem              h_vmem     MEMORY      <=    YES         YES        4G       0

So that h_vmem is consumable with default value of 4G.

Now run the example command below 20 times to confirm: 'echo "sleep 120" | qsub -cwd -V -l hostname=compute-0-0 If default is 4G, and if the resource is 48G at compute-0-0, then only 12 jobs can be run, and 8 jobs will be put into waiting list.

Configuration of parallel environment

check parameters of a pe

[kaiwang@biocluster ~/]$ qconf -sp mpi
pe_name            mpi
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /opt/gridengine/mpi/startmpi.sh $pe_hostfile
stop_proc_args     /opt/gridengine/mpi/stopmpi.sh
allocation_rule    $fill_up
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE

To add a new pe such as smp first edit a file pe.txt

pe_name           smp
slots             9999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $pe_slots
control_slaves    FALSE
job_is_first_task TRUE
urgency_slots     min
accounting_summary FALSE

Then run qconf -Ap pe.txt as root. Now smp can be used as a PE in the qsub argument.

Next thing is to add the new PE smp into the all.q queue by qconf -mq all.q, and add smp to the pe_list line in the file.

SGE complex value for host

qconf -me <HOSTNAME>

Then use "complex_values h_vmem=48G", to set 48G for a particular host as h_vmem value.

Extremely important: whenever adding a new host, one must use "qconf -me" to set up complex_values. In combination with -l h_vmem=xG in the qsub command, this will eliminate the possibility of running out of memory when multiple jobs in the same host all request large chunks of memory at the same time.

Temporarily enable or disable a host

qmod -d all.q@compute-0-4
qmod -e all.q@compute-0-4

qmod -d all.q@compute-* should disable all the queue instances

job status

'au' simply means that Grid Engine is likely not running on the node. The "a" means 'alarm' and the "u" means unheard/unreachable. The combination of the two more often than not means that SGE is not running on the compute node.

E is a worse state to see. It means that there was a major problem on the compute node (with the system or the job itself). SGE intentionally marked the queue as state "E" so that other jobs would not run into the same bad problem.

E states do not go away automatically, even if you reboot the cluster. Once you think the cluster is fine you can use the "qmod" command to clear the E state.

##host status

'au' - Host is in alarm and unreachable,
'u' - Host is unreachable. Usually SGE is down or the machine is down. Check this out.
'a' - Host is in alarm. It is normal on if the state of the node is full, it means, if on the node is using most of its resources.
'aS' - Host is in alarm and Suspended. If the node is using most of its resources, SGE suspends this node to take any other job unless resources are available.
'd' - Host is disabled,
'E' - ERROR. This requires the command qmod -c to clear the error state.

When job is in dr state

When deleting a job, sometimes a "dr" will show up, indicating that the job is not running correctly and cannot be easily deleted. In this case, log in as "su", then "qdel " to delete the job forcefully. If it does not work, do "qdel -f " to delete the job.

When node is in E state

re-install the node, then clear the Error log:

[root@biocluster /home/kaiwang]$ qmod -c all.q@compute-0-2
root@biocluster.med.usc.edu changed state of "all.q@compute-0-2.local" (no error)

See more explanations here: http://www.gridengine.info/2008/01/20/understanding-queue-error-state-e/

When machine restarts yet nodes are still full, use qstat -u "*" to show whose jobs are in dr state, then qdel these jobs

If a node continues to be in E state after clearing the error multiple times by qmod, then it is likely that there is a hardware error. In this case, try rocks set host runaction compute-0-0 action=memtest and restart the node to check for potential issues.

Check error messages from SGE

less /opt/gridengine/default/spool/qmaster/messages

Restart SGE

You can find these in $SGE_ROOT//common/

If your cell is the usual "default" then all you need to do is:

cd $SGE_ROOT/default/common/
./sgemaster start
./sgeexecd start

Before you restart the master, make sure you don't have any old
sge_qmaster or sge_schedd processes hanging around.

Fair share policy

There are two types of fair shares: share tree versus functional.

Make 2 changes in the main SGE configuration ('qconf -mconf'): * enforce_user auto * auto_user_fshare 100
Make 1 change in the SGE scheduler configuration ('qconf -msconf'): * weight_tickets_functional 10000

Very useful tricks

Restart a failed job

If you job fails in a node (the node should show up as 'au' status in qstat), you can restart the job in a different node. First, alter the job to be restartable, then submit it again.

qalter -r y <jobid>
qmod -r <jobid>

You will see that the status of the job becomes "Rq", and soon it will be submitted to a different node.

Clear error for a job

Sometimes you will see that a job is at "Eqw" state in qstat. This is due to errors in running the job, usually due to NFS error in the node in my experience. If you fixed the error, you can clear the error message by qmod -cj <jobid>, and the job will be submitted again.

Change priority of a job

Use qlater -p <priority> <jobid> to change the priority of a job. The valid range is -1024 to 1023. Lower number means lower priority. Regular users can only lower the priority. This applies only to queued jobs, not running jobs.

Adding a new queue

We want to add a new queue using all.q as the template:

[root@biocluster ~]# qconf -sq all.q > bigmem.q

Then edit the bigmem.q file (change qname to bigmem, change hostlist to @bigmemhosts, change slots to something like 1,[dragon.local=32] where dragon.local is a host in bigmemhosts), then

[root@biocluster ~]# qconf -Aq bigmem.q

to add this queue.

Later you can directly edit it qconf -mq bigmem and further change the hostlist and slots parameter there.

For example, to switch a host from the all.q to bigmem, we can do this: (1) first qconf -mhgrp @allhosts to remove it, then qconf -ahgrp @bigmemhosts to add it. Then add this hostgroup to the bigmem queue.

default submission parameters

If you have multiple queues, it makes sense to set up a default queue, since SGE rand

Edit the /opt/gridengine/default/common/sge_request file, add -q all.q as the default queue.

From user's perspective, they can also use a default SGE specification file .sge_request in their home directory. If they do not specify the parameters in command line, these defaults from the file will be used.

Change appliance type

Some times you may want to chagne a NAS to a job execution host. This can be done by changing appliance type.

[root@biocluster ~]# rocks list membership
MEMBERSHIP               APPLIANCE    DISTRIBUTION PUBLIC
Frontend:                frontend     rocks-dist   no    
Compute:                 compute      rocks-dist   yes   
NAS Appliance:           nas          rocks-dist   yes   
Ethernet Switch:         network      rocks-dist   yes   
Power Distribution Unit: power        rocks-dist   yes   
Development Appliance:   devel-server rocks-dist   yes   
Login:                   login        rocks-dist   yes

shows all appliance types.

[root@biocluster ~]# rocks set host membership dragon membership="Compute"

change the appliance type. Then re-install the node, and it will show up in qhost.

发表于 2020-12-29 14:17
阅读 ( 7040 )
分类：linux