User Tools

Site Tools


guide:intelphi

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

guide:intelphi [2015/03/18 11:35]
kevin created
guide:intelphi [2015/03/18 11:54] (current)
kevin
Line 37: Line 37:
 For **offload** execution you do not include the ''​-mmic''​ flag. For **offload** execution you do not include the ''​-mmic''​ flag.
  
-=====PBSPro Job Scripts=====+=====PBSPro Job Script Examples=====
  
 ====Native Execution==== ====Native Execution====
  
-**native.pbs**:​+This is not recommended. ​ You are better off using //offload// mode and running your program on the host node. 
 + 
 +Job script file **native.pbs**:​
  
 <code bash> <code bash>
Line 92: Line 94:
 ====Offload Execution==== ====Offload Execution====
  
-**offload.pbs**:​+This is the preferred way to run Phi code: execute on the host and offlad the compute intensive part on the Phi device. 
 + 
 +Job script file **offload.pbs**:​
  
 <​code>​ <​code>​
Line 138: Line 142:
 As you can see above //offload// execution is much simpler. As you can see above //offload// execution is much simpler.
  
-===Fortran Examples===+=====Warning===== 
 + 
 +The MIC architecture is designed for parallel code with //many// threads. ​ The Phi cards at the CHPC have 60 cores each but are only really effective if your code uses multiple threads per core, at minimum two, up to four threads per core.  Unless you have a code that performs well with 120 to 240 threads in a shared memory environment of 8GiB of memory, the Phi card will dissapoint. 
 + 
 +====Cons==== 
 + 
 +  * Only 8GiB of memory 
 +  * Only 1GHz CPU clock 
 +  * The CPU can only retire an instruction every //second// clock cycle; effectively each thread runs at 500MHz. 
 +  * Must use at least two threads per core to use full clock cycles. 
 +  * Code must scale well from 120 to 240 threads. 
 + 
 +====Pros==== 
 + 
 +  * 512 bit wide vector processing unit: can execute ''​z[i] = a[i]*x[i] + y[i]''​ on 8 //​double-precision//​ elements per tick. 
 +  * 240 threads in one chip.  (But you have to give them all enough work.)
  
  
/var/www/wiki/data/pages/guide/intelphi.txt · Last modified: 2015/03/18 11:54 by kevin