First find out which nodes your running job is using:
qstat -awu <userid> -n1
Where <userid> refers to your userid on Lengau. Or if your job has many nodes then this is more legible.
qstat -awu <userid> -n
Thereafter login (“ssh” command) to the 1st node, and then at least 1 or 2 other nodes (if not an smp single node job of course). On each node run the command “htop” (or “top”). You can see how the resources of cpu and memory etc are being utilised on each node.
If only the first node is showing any real usage for example, then you know there's a potential issue - this code is only running on one node, and the other nodes are being wasted. You need to investigate whether this code can parallelise to more than one node. Please also check how many of the 24 cores on each node (normal queue) is actually being used, is this what you are expecting. Further, please look at the info htop provides to see if anything seems suspicious or if all is ok. If you need advice, please submit a ticket to our Help Desk.
An example (htop info not shown):
[alopis@login2 ~]$ qstat -awu alopis -n1 sched01: Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time 3672969.sched01 alopis smp 3P_13 108699 1 24 -- 00:30 R 00:10:04 cnode0263/0*24 [alopis@login2 ~]$ ssh cnode0263 Warning: Permanently added 'cnode0263,172.18.1.206' (ECDSA) to the list of known hosts. [alopis@cnode0263 ~]$ htop [alopis@cnode0263 ~]$ exit logout Connection to cnode0263 closed. [alopis@login2 ~]$