Supercomputer documentation is always a work in progress! Please email questions, corrections, or suggestions to the HPC support team at help-hpc@uky.edu as usual. Thanks!

General Information

Accounts

What is a batch job?
A batch job is a program or series of programs run on the cluster without manual intervention. Usually a script is used to supply the input data and parameters necessary for the job to run. The script is submitted to the batch schedule will be run when the resources are available. You don’t need to specify which node or nodes on which the job will run. The batch system will select a node or nodes appropriately for the job. The batch system also allows for accounting and tracking of jobs. For more information, see Getting Started: Running Jobs.

How do I submit a batch job?
A batch job is submitted to a batch queue using the sbatch command. For details on the commands for batch execution, see the examples (including batch job scripts) in /share/cluster/examples/dlx on the cluster and Getting Started: Job Scripts.


Can I run jobs on the login node?
The login nodes are intended for editing files, compiling code, running short tests, and similar activities. Please your jobs as batch jobs whenever you can. As a special case, interactive jobs up to 120 cpu-minutes may be run in on the login node. If you your job exceeds this limit it may be canceled. The login nodes are shared by all of the cluster users, so any job that does intensive computing, produces heavy I/O, or spawns large number of processes will adversely affect the whole node. Please respect your fellow users!


What types of jobs must be run in batch?
Almost all jobs must be run through the batch system using the sbatch command. Non-batch jobs on any node except the login node will be killed, unless special permission has been obtained in advance. Send email to the Help HPC list at help-hpc@uky.edu to make arrangements.


Which batch queue should I use?
Normally you won’t specify the queue, but just let the scheduler pick the correct one for you. Be sure to give the scheduler enough information to put your job into the proper queue. Always specify the number of processors (cores) that the jobs needs and the amount of time you expect it to run. There are only a few cases where you should specify the job queue yourself, and these are described in Getting Started: Job Queues.

Use the queue_wcl command to show the names of the queues and and the limits associated with them. The sinfo command also displays information about the queues. Note that the queues are defined for particular types of jobs, to make sure the jobs have the appropriate resources and don’t interfere with one another. Please run your job in the appropriate queue! If you have questions, or you need to do something you can’t do under the queuing system, please send email to the Help HPC list at help-hpc@uky.edu describing your problem or question.


How do I check the status of my job?
Use the squeue command to show the status of batch jobs. The default is to show all pending and running jobs. Use squeue -u myloginid to see your own jobs. Use squeue –help or man squeue to get more information about the command.


Why has my job been Pending for so long?
Use checkjob -v command to see why your job is pending.

There are very likely other pending jobs ahead of yours. You can see the pending jobs with the squeue -t pending command. The batch system manages the flow of jobs and allows jobs to run when the system resources are available to do so in a fair and orderly fashion. When the system is less busy your jobs might execute quickly. When the system is more busy, your jobs might wait for longer periods. This wait will vary based on many factors.

If your job has been pending for longer than you expect, please try to find out why before you submit a trouble report. More often than not, your job is just waiting on the resources you requested. For more information, see Getting Started: Running Jobs

Your job will be scheduled automatically when the resources are available.


How many jobs I can run at once?
Job queues and node allocations have been established to assure equitable distribution and access to the entire complex. See the Getting Started: Quotas for specific information.


Can I kill a batch job?
Use scancel job_id to terminate a batch job. The time required for the job to actually terminate will vary some, depending on how busy the system and the network are, and on how many parallel processes the job is running. You can use the squeue command to find the jobid (see #6 above).


How do I checkpoint or restart a job?
A checkpoint allows a job to be restarted when it is terminated for some reason outside of its control. There are many reasons why a job might be abnormal terminated. Here are a couple of examples: A hardware or software failure can stop a job before it finishes normally and writes out its results. Also, a job might still be running when a planned outage occurs. Since HPC jobs often run for days or weeks on hundreds of cores, an abnormal termination can result in the loss of lots of work and lots of computing resources. It sometimes causes researchers to miss deadlines. The longer a job takes to run, and the more nodes involved, the more likely a problem becomes.

We strongly urge that all jobs use checkpoints.

A checkpoint / restart facility must be built into the code. It cannot be done automatically by our current operating system or batch system. Every so often, the code must write to disk enough internal state information and intermediate results, so that the job can be restarted from that point. If you write your own code, you are responsible for building this into your code. If you are using a 3rd party package, then please see the application documentation for its checkpoint / restart capability.


What nodes is my job running on?
The squeue command shows the status of batch jobs (see above).


What are MOAB and SLURM?
MOAB is the batch job scheduler that decides which jobs should run next. SLURM is the Resource Manager that allocates compute nodes upon request. Together they submit the jobs, select the most suitable hosts, and interact with the individual tasks of parallel batch jobs.

A batch job is submitted to a queue with the sbatch command. The batch system then attends to all of the details associated with running it. A batch job may not run immediately after being submitted if the resources it needs (usually compute nodes) are not available. The job will wait until it reaches the front of the queue and the resources become available. At that time the batch job will be dispatched to the most suitable host or hosts available for execution.


Where can I get more information on MOAB and SLURM?
The vendor’s documentation on MOAB is at http://www.clusterresources.com/products/mwm/docs/.

Also see the Lawrence Livermore Moab Quick-Start User Guide at https://computing.llnl.gov/jobs/moab/QuickStartGuide.pdf.

The Lawrence Livermore SLURM: A Highly Scalable Resource Manager is at https://computing.llnl.gov/linux/slurm/.


How do I do a timing run?
A timing run is a job used to determine how effectively and efficiently a program performs. The job must to be run on a node that is not shared with any other job, or its timing could be affected by the other job. On some HPC systems, several jobs can share a single node, and exclusive use of a node for a timing run must be scheduled with the sysadmin. This is not an issue here, because the basic compute nodes, the fat nodes, and the GPU nodes are allocated exclusively to a single job by default.


Can I run a job using multiple-level parallelism?
This answer needs revision.A multiple-level parallel job is an MPI job with sub-processes that use parallelized library routines, OpenMP, or routines using loop-parallelism; also known as mixed-mode). Consult the following links more OpenMPI information.

Link for OPENMPI documentation: http://www.open-mpi.org/doc/v1.4/.

FAQ: http://www.open-mpi.org/faq/.

LLNL Tutorial: https://computing.llnl.gov/tutorials/mpi/.

Another tutorial: http://www.lam-mpi.org/tutorials/.

MPI FORUM: http://www.mpi-forum.org/.


Logging In

What do I need before I can login?
You need an HPC account. See the Account Information page for detailed information on who is eligible for an account and how to get one.

You must follow UK and IT policies and procedures. See the UKIT Policies page.

You must use a secure shell (ssh) client to connect to the DLX cluster. See the Secure Shell page for more information.


What is a secure shell (ssh)?
The secure shell (ssh) is a means of encrypting your login session. See the Secure Shell page for more information.


Is it really necessary for me to use secure shell for login?
Yes, it is very important to encrypt your login session, so that others cannot see your private information, especially your password. Unencrypted logins to the cluster are not allowed. See the Secure Shell page for more information.


How can I get a secure shell (ssh) client?
Clients for ssh are available for all common operating systems, and many have the the client already built in. See the Secure Shell page for more information.


Can I connect with an XDM/CDE session?
No, CDE sessions are not permitted for security reasons.


How do I login and run X applications?
It is important for security reasons to use an encrypted X session, so that others cannot see your private information, especially your password. You will tunnel or tunnel your X connection through ssh. This depends on the proper client settings

On Linux, your client may be configured globally by default to set up forwarded X connections, or you may override the default with the -X command-line flag, or you may specify the tunnel on a per-host basis in your own configuration file. See the man ssh page on your Linux workstation.

On MS-Windows clients there is often a check-box in the session setup for X forwarding. This may be on a global basis or a per-host basis. For more info about forwarding X11 connections with ssh clients, such as puTTY, see http://www.tldp.org/HOWTO/XDMCP-HOWTO/ssh.html. Note: this is an off-site link and could disappear without notice.


How do I tell if my X connection is tunneled?
Check the DISPLAY variable to see if your X connection is being tunneled through ssh. If DISPLAY is set to a cluster node’s IP number instead of your local workstation’s IP number, then the connection is being tunneled. Use the command echo DISPLAY while connected to the cluster with X.


Home and Scratch Disk

What is my 'home' directory?
Each user has space allocated on the /home filesystem for programs, code, and modest amounts of data. The path is /home/userid and there is a quota (limit) on how much space you can use. Your home directory is backed up nightly.


How do I check my disk quota?
Use the quota command.


Why do I get the message disk quota exceeded?
If you run a large job from your /home directory, then you can easily exceed your disk quota. In order to run jobs without exceeding your disk quota, create any large files in your scratch directory (/scratch/userid).


What is 'scratch' space?
Scratch space is temporary disk space for actively running jobs. You have your own scratch area in /scratch/userid where your jobs won’t interfere with others. There is a link to this in your home directory (/home/userid/scratch) for your convenience.

There are no quotas on the /scratch filesystem and your jobs can write data to the limit of the filesystem. However, there is a finite amount of space and that space is shared among all users. Use the command df -h /scratch to check the current usage of the scratch file system.

If scratch fills completely, then most active jobs will fail. Whenever scratch becomes dangerously full, the system administrators will take countermeasures, including canceling running jobs.

Only put files in scratch temporarily! Make sure you don’t put source code or other hard to recreate files in scratch, unless you have another copy stored elsewhere. Large files that need to be stored for an extended time may be transferred to the HSM near-line storage system. See Long Term Storage for more information.

User directories in /scratch on the cluster are automounted by the file system as they are needed. When you list the subdirectories under /scratch (that is, ls /scratch), you will see only a few of them, the ones that are currently in use. However, there are hundreds of userids and each has its own scratch directory. When you cd to one (cd /scratch/userid) it will be ‘automagically’ mounted for you.


How do I check the available space on a filesystem?
Use the df -h command to list information about each available file system.


Is my scratch space the same on all machines in the cluster?
Yes. The global clustered file system presents the same home and scratch filesystem to the login nodes and all of the compute nodes.


Can two of my jobs interfere with each other? Will files from one job over-write files from the other?
Yes, but only if the files have the exact same pathname. The easiest and safest way to prevent that is to make separate scratch directories for the files for each job, either by hand or by using the mktemp command to create a unique directory name.

Note: this problem will not normally occur with Gaussian jobs. The script that sets up the job creates a unique scratch directory for each job. However, if you run two jobs from the same directory using the same Gaussian file command file name, then the output file from the first job to finish will be overwritten by the second. Don’t run jobs with the same Gaussian command file name from the same directory.


How long can files remain in my scratch directory?

Files in the scratch filesystem are NOT backed up. Files left in scratch more than 30 days may be deleted. Once a file is deleted from a scratch directory, it is permanently gone. It is each user’s responsibility to keep copies, either in the home directory or in some other location

Only put files in scratch temporarily! Make sure you don’t put source code or other hard to recreate files in scratch, unless you have another copy stored elsewhere. Large files that need to be stored for an extended time may be transferred to the HSM near-line storage system. See Long Term Storage for more information.


Batch Jobs FAQ

What is a batch job?
A batch job is a program or series of programs run on the cluster without manual intervention. Usually a script is used to supply the input data and parameters necessary for the job to run. The script is submitted to the batch schedule will be run when the resources are available. You don’t need to specify which node or nodes on which the job will run. The batch system will select a node or nodes appropriately for the job. The batch system also allows for accounting and tracking of jobs. For more information, see Getting Started: Running Jobs.

How do I submit a batch job?
A batch job is submitted to a batch queue using the sbatch command. For details on the commands for batch execution, see the examples (including batch job scripts) in /share/cluster/examples/dlx on the cluster and Getting Started: Job Scripts.


Can I run jobs on the login node?
The login nodes are intended for editing files, compiling code, running short tests, and similar activities. Please your jobs as batch jobs whenever you can. As a special case, interactive jobs up to 120 cpu-minutes may be run in on the login node. If you your job exceeds this limit it may be canceled. The login nodes are shared by all of the cluster users, so any job that does intensive computing, produces heavy I/O, or spawns large number of processes will adversely affect the whole node. Please respect your fellow users!


What types of jobs must be run in batch?
Almost all jobs must be run through the batch system using the sbatch command. Non-batch jobs on any node except the login node will be killed, unless special permission has been obtained in advance. Send email to the Help HPC list at help-hpc@uky.edu to make arrangements.


Which batch queue should I use?
Normally you won’t specify the queue, but just let the scheduler pick the correct one for you. Be sure to give the scheduler enough information to put your job into the proper queue. Always specify the number of processors (cores) that the jobs needs and the amount of time you expect it to run. There are only a few cases where you should specify the job queue yourself, and these are described in Getting Started: Job Queues.

Use the queue_wcl command to show the names of the queues and and the limits associated with them. The sinfo command also displays information about the queues. Note that the queues are defined for particular types of jobs, to make sure the jobs have the appropriate resources and don’t interfere with one another. Please run your job in the appropriate queue! If you have questions, or you need to do something you can’t do under the queuing system, please send email to the Help HPC list at help-hpc@uky.edu describing your problem or question.


How do I check the status of my job?
Use the squeue command to show the status of batch jobs. The default is to show all pending and running jobs. Use squeue -u myloginid to see your own jobs. Use squeue –help or man squeue to get more information about the command.


Why has my job been Pending for so long?
Use checkjob -v command to see why your job is pending.

There are very likely other pending jobs ahead of yours. You can see the pending jobs with the squeue -t pending command. The batch system manages the flow of jobs and allows jobs to run when the system resources are available to do so in a fair and orderly fashion. When the system is less busy your jobs might execute quickly. When the system is more busy, your jobs might wait for longer periods. This wait will vary based on many factors.

If your job has been pending for longer than you expect, please try to find out why before you submit a trouble report. More often than not, your job is just waiting on the resources you requested. For more information, see Getting Started: Running Jobs

Your job will be scheduled automatically when the resources are available.


How many jobs I can run at once?
Job queues and node allocations have been established to assure equitable distribution and access to the entire complex. See the Getting Started: Quotas for specific information.


Can I kill a batch job?
Use scancel job_id to terminate a batch job. The time required for the job to actually terminate will vary some, depending on how busy the system and the network are, and on how many parallel processes the job is running. You can use the squeue command to find the jobid (see #6 above).


How do I checkpoint or restart a job?
A checkpoint allows a job to be restarted when it is terminated for some reason outside of its control. There are many reasons why a job might be abnormal terminated. Here are a couple of examples: A hardware or software failure can stop a job before it finishes normally and writes out its results. Also, a job might still be running when a planned outage occurs. Since HPC jobs often run for days or weeks on hundreds of cores, an abnormal termination can result in the loss of lots of work and lots of computing resources. It sometimes causes researchers to miss deadlines. The longer a job takes to run, and the more nodes involved, the more likely a problem becomes.

We strongly urge that all jobs use checkpoints.

A checkpoint / restart facility must be built into the code. It cannot be done automatically by our current operating system or batch system. Every so often, the code must write to disk enough internal state information and intermediate results, so that the job can be restarted from that point. If you write your own code, you are responsible for building this into your code. If you are using a 3rd party package, then please see the application documentation for its checkpoint / restart capability.


What nodes is my job running on?
The squeue command shows the status of batch jobs (see above).


What are MOAB and SLURM?
MOAB is the batch job scheduler that decides which jobs should run next. SLURM is the Resource Manager that allocates compute nodes upon request. Together they submit the jobs, select the most suitable hosts, and interact with the individual tasks of parallel batch jobs.

A batch job is submitted to a queue with the sbatch command. The batch system then attends to all of the details associated with running it. A batch job may not run immediately after being submitted if the resources it needs (usually compute nodes) are not available. The job will wait until it reaches the front of the queue and the resources become available. At that time the batch job will be dispatched to the most suitable host or hosts available for execution.


Where can I get more information on MOAB and SLURM?
The vendor’s documentation on MOAB is at http://www.clusterresources.com/products/mwm/docs/.

Also see the Lawrence Livermore Moab Quick-Start User Guide at https://computing.llnl.gov/jobs/moab/QuickStartGuide.pdf.

The Lawrence Livermore SLURM: A Highly Scalable Resource Manager is at https://computing.llnl.gov/linux/slurm/.


How do I do a timing run?
A timing run is a job used to determine how effectively and efficiently a program performs. The job must to be run on a node that is not shared with any other job, or its timing could be affected by the other job. On some HPC systems, several jobs can share a single node, and exclusive use of a node for a timing run must be scheduled with the sysadmin. This is not an issue here, because the basic compute nodes, the fat nodes, and the GPU nodes are allocated exclusively to a single job by default.


Can I run a job using multiple-level parallelism?
This answer needs revision.A multiple-level parallel job is an MPI job with sub-processes that use parallelized library routines, OpenMP, or routines using loop-parallelism; also known as mixed-mode). Consult the following links more OpenMPI information.

Link for OPENMPI documentation: http://www.open-mpi.org/doc/v1.4/.

FAQ: http://www.open-mpi.org/faq/.

LLNL Tutorial: https://computing.llnl.gov/tutorials/mpi/.

Another tutorial: http://www.lam-mpi.org/tutorials/.

MPI FORUM: http://www.mpi-forum.org/.


Long Term Storage FAQ

How do I archive my files that are on the cluster?
You may copy your files any place they will be safe, including your desktop computer, a departmental disk store, or some other research facility. Copies are usually made with the sftp or scp commands.

However, you may want to use our Hierarchical Storage System (HSM), a fast near-line mass storage system available at hsm.uky.edu. Files transferred to HSM are moved off onto a tape robot where they can be recalled in just a few seconds. To set up an HSM account, please contact the IT Service Desk by calling 859-218-HELP (859-218-4357) or emailing helpdesk@uky.edu. For more information on HSM see Data Storage Information.

Note: You are responsible for the safety and integrity of your own files, wherever they are.


Why should I store my files somewhere else?
There are many reasons, but here are a few:

You might want to keep the files after you leave the University. The quota on your home directory will not let you keep everything there. Large files that won’t be used for a while can be moved somewhere else. Your scratch directory not backed up. Make a copy of any scratch file that can’t be easily recreated. Scratch files may be purged after 30 days. (There is no such limit on your home directory.) Moving inactive files off the cluster will keep them available, but make it unlikely they will be accidentally altered or destroyed by your current work.


Are my files backed up?
Your home directory is backed up nightly. Your scratch directory is not. We strongly recommend that you make copies of any important files in your scratch directory and keep them elsewhere.


MPI

How do I compile an executable file to run as an MPI application?
To compile a source file named my_app written in C, use the command:

mpicc -o my_app my_appl.c To compile a source file named myapp written in Fortran 90 use the command:

mpif90 -o myapp myapp.f90 See the MPI Documentation section of this Web site for more information on MPI compiling.

Also, see man mpicc for C compiler specifics and man mpif90 for Fortran90 compiler specifics.


Which compilers should I use to compile an MPI application?

use mpif90.

How do I execute an MPI program on multiple nodes?
Create a shell script and submit it to the batch scheduler. For example, the script file myscript might look like this:

!/bin/bash

mpirun mybinary If you prefer a different shell, you can use sh, ksh, csh, or bash. We recommend bash, which is the default.

Run it with a command like sbatch -n t myscript, where t is the number of tasks. The batch scheduler will distribute the job over the required number of nodes.


Where can I get more MPI documentation?
See the MPI documentation in the Documentation section.


How do I use multiple-level parallelism?
To use multiple-level parallelism, which is running an MPI job that uses OpenMP, loop-parallelism, or that calls a parallelized library routine, consult the Multi Node Parallelism documentation.


GPU

How do I run a job in the GPU queue?
To run a job with GPU enabled code put the SBATCH option into your job script:

SBATCH –partition=GPU

Or add the partition flag to the sbatch command.

sbatch -n12 -pGPU aaa.sh One or the other is enough, you don’t need to do both.


How do I use Amber with GPUs?
Only the PMEMD module in Amber 11 is GPU enabled, but the Amber sample jobs that CCS tested ran much faster when using GPUs.

See the page Amber on GPUs for information on running Amber on the GPU nodes.


How do I use NAMD with GPUs?
This information will be coming soon.


Can I write my own GPU code?
If you are interested in GPU enabling your own code, then see the extensive Nvidia GPU developer info on the Nvidea web page http://developer.nvidia.com/gpu-computing-sdk.

Note that the “SDK” is a misnomer; this is mostly sample code. The Toolkit is the development environment, which you establish by loading the CUDA module (module load cuda).


Transfering Files

Why would I transfer files?
You will almost certainly want to upload program or data files from your workstation to the cluster and download results, programs, or data from the cluster to your workstation. You may want to copy data from another site or put data out somewhere for others to see and use.

Use scp or sftp to transfer files to or from the cluster, to or from your workstation, and to or from most other sites. These are related to the ssh (secure shell) client and will encrypt your transfer (and password) in the same way that ssh encrypts your login session.


How do I transfer files?
On Linux, Unix, or MacOS X: MacOS X and most Linux and Unix systems come with scp and sftp already installed. If you are using your link blue userid on your workstation, then you may omit it from the scp or sftp commands, as you do with the ssh command. The free OpenSSH client includes scp and sftp, so if you installed it to get ssh, you will get scp and sftp at the same time. Using scp is much like using cp scp file1 file2 copies file1 (the source file) to file2 (the target file). scp aprog.f77 dlx.uky.edu copies aprog.f77 in the current directory on your workstation to your home directory on the DLX. You might be prompted for your DLX password. scp aprog.f77 dlx.uky.edu:newprog.f77 copies the same file to the DLX, but gives it a new name. scp aprog.f77 userid@dlx.uky.edu:newprog.f77 copies the file to the DLX, but specifies the DLX userid. Using sftp is much line using ftp

$ sftp jdough@dlx.uky.edu jdough@dlx.uky.edu’s password: Connected to dlx.uky.edu. sftp> ls amber benchmarks1 sbatch.ps scratch test1 sftp> get sbatch.ps Fetching /home/jdough/sbatch.ps to sbatch.ps /home/jdough/sbatch.ps 100% 117KB 117.0KB/s 00:00 sftp> exit On Windows: The programs PSCP and PSFTP are free implementations of scp and sftp for Windows that are widely used. Download them from the PuTTY web page.


What is a secure shell (ssh)?
The secure shell (ssh) is a means of encrypting your login and other sessions. See the Secure Shell page for more information.

SSH

Why must I use SSH?
SSH provides strong authentication and encrypted communications, replacing telnet, rlogin, rsh, rcp, and rdist. Those older programs transmit plain text, which could expose your password and other private data. Always use ssh and slogin in place of telnet and rlogin.

How does SSH encrypt my connection?
By default SSH uses automatically generated public-private key pairs to encrypt the connection. You use your password as usual to login. SSH can also use a manually generated public-private key pair, which will allow you to login without giving your password. See How can I set up an SSH key pair? below for details.

How do I use SSH from Unix or Linux?
Most Unix and Linux systems come with the OpenSSH installed. If your workstation does not have it, then you can download it for free and compile it yourself. Use the ssh command to login to the remote machine:

ssh dlx.uky.edu SSH will assume that your userid on the remote machine is the same as the one you’re using on your local machine. If it’s not, add the correct userid for the remote machine:

ssh userid@dlx.uky.edu The very first time you connect to a remote machine, you’ll see a message about a missing host key:

ssh dlx.uky.edu Host key not found from the list of known hosts. Are you sure you want to continue connecting (yes/no)? Enter yes and the host key will be saved (in the hidden directory .ssh/known_hosts). On subsequent logins, you won’t see the message.

Get more information by using the man ssh command.


How do I use SSH from Windows?
PuTTY is a free implementation of SSH (and other protocols) for Windows and Unix platforms that is widely used. You can download it from the UK Download server or from the PuTTY web page.

There are also commercial SSH Clients for Windows with more extensive features.


How do I use SSH from a Macintosh?
Max OS X has the SSH commands already installed. Find the Terminal.app in the Utilities folder in the Applications folder. Double click it to get a window with the command line prompt, then follow the directions for a Unix or Linux Client above.

How can I set up an ssh key pair?
Setting up a public/private key pair allows you to log onto the DLX (and other machines) without giving your password each time. You will have a private key stored on your workstation and a corresponding public key stored on the DLX.

On a Unix, Linux, or MacOS X workstation On your workstation, use the ssh-keygen command to generate a public/private RSA key pair. Be sure to use a strong passphrase when asked. $ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/Users/userid/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Your identification has been saved in /Users/userid/.ssh/id_rsa. Your public key has been saved in /Users/userid/.ssh/id_rsa.pub. Now copy the public key file to the DLX with the scp command. The first argument is the source file on your workstation and the second is the destination file in your .ssh directory on the DLX. scp authorized_keys userid@dlx.uky.edu:.ssh/new_key userid@dlx.uky.edu’s password: Login to the DLX as usual, change directories to the hidden .ssh directory, and concatenate the new public key onto the end of the authorized_keys file. Be careful not to clobber the existing authorized_keys file. Your DLX account requires the key generated on the cluster in its authorized_keys file for intra-node authorization.

ssh userid@dlx.uky.edu cd .ssh cat id_rsa.pub >> authorized_keys Logout of the DLX and then login again. You won’t need to give your password. ssh userid@dlx.uky.edu Copy the public key to any other machines you want to use it on. There is a good discussion of setting up key pairs on this Getting started with SSH web page.

On a Windows machine using PuTTY Use the PuTTYgen.exe program, available with PuTTY, to generate a public/private RSA key pair. Set SSH-2 RSA as the Type of Key. Leave the Number of Bits at the default of 1024. Click the Generate button. Follow the directions. After the key is generated, which might take a minute: Fill in a strong passphrase. Click the Save Private Key button. Give the file a name (like private_key) and save it. This is the key file PuTTY will use for your workstation. Click the Save Public Key button. Give the file a name (like public_key) and save it. Copy the public key from the field Public key for pasting into OpenSSH authorized_keys file. You will paste this key into the authorized_keysfile on the DLX. Add the key to the DLX Use PuTTY to login to the DLX as usual. Edit the file .ssh/authorized_keys. Paste the public key you copied in the step above in at the end of the file. Be careful not to clobber any other entries in the existing authorized_keys file. Your DLX account requires the key generated on the cluster in its authorized_keys file for intra-node authorization. Save the file. Logoff of the DLX. Open the PuTTY Configuration. Go to Connection → SSH → Auth. Go to the field Private key file for authentication. Click Browse and find the private key file you saved above. Click Open. Login to the DLX with PuTTY as usual. You won’t need to give your password. Copy the public key to any other machines you want to use it on.


Compiling and Linking FAQ

What compilers should I use?
The Intel compilers is the compilers of choice for most applications. Use icc for c and c++ programs and ifort for fortran 77 and 90 programs. Add the –help option to one of these to get more information. For example:

ifort –help


What versions of the Intel compilers are available?
At this writing version 11.1 is current the default for all of the Intel compilers. You don’t need to do anything special to use that version of any of them. Older and newer versions are sometimes installed for testing or other special special purposes. Use the command module avail to see all of the modules available, or the command module avail icc command to see just the icc modules. For example:

$ module avail icc
————— /usr/share/Modules/modulefiles ————–
icc/10.1 icc/11.1(default) icc/12.0 icc/12.0.3
If module avail shows that version icc/10.1 is available, then load it with the command module load icc/10.1/default command at the beginning of your session to make it the default for the rest of your session.


How do I compile with the Open MPI libraries?
OpenMPI is the MPI of choice for most applications, and it is is the default. Use mpicc for c programs, mpiCC for c++, mpif77 for Fortran 77, and mpif90 for Fortran 90.

How do I compile with the Intel MPI libraries?
If you need the Intel MPI libraries, load them at the beginning of your session with the module command. Use the module avail command to see all of the modules available, or the command module avail mpi/intel command to see just the mpi/intel modules. For example:

$ module avail mpi/intel
————— /usr/share/Modules/modulefiles —————
mpi/intel/3.0 mpi/intel/3.1 mpi/intel/3.2
If module avail shows that version mpi/intel/3.2 is available, then load it with the module load mpi/intel/3.2/default command at the beginning of your session to make it the default for the rest of your session. Use mpiicc and mpiifort to compile the programs, as before.


Is there any more documentation on the compilers?
There is a lot of documentation on the Intel web site. These might be useful:

Intel© C++ Compiler User and Reference Guides

Intel© Fortran Compiler User and Reference Guides


Is there any documentation on the Intel Math Kernel Library (IMKL)?
There is a lot of documentation on the Intel web site. This might be useful:

Intel© Math Kernel Library Documentation.


How can I find out what library dependencies an application has?
Use the ldd command. To see the ldd online manual use the man ldd command.

Are the GNU compilers available?
Yes. You can use gcc to compile c and c++ programs and gfortran for fortran 77 Fortran 95.

A short help text will be displayed by the gcc –help or gfortran –help commands.

More extensive documentation is in the online manual. Use the man gcc or man gfortran commands.

Documentation can also be found at http://gcc.gnu.org/onlinedocs/.


Application Packages

Amber

How do I submit a Amber job?
Build a job script to load the Amber and OpenMPI modules and use mpiexec to run amber across one or more nodes. For more information about job scripts, see Getting Started – Job Scripts. Here is a sample job script:

!/bin/bash

SBATCH -n16

SBATCH -t 24:00:00

module load amber/openmpi/12p12 mpi/openmpi/intel/1.6.3
cd ~/amberjob1/
time mpirun pmemd.MPI -O -i mdin -o mdout -p prmtop -c inpcrd
This script requests 16 processors (one node) and 24 hours of run time. The modules shown in the module load command are examples. Use the module avail command to see which versions of Amber and OpenMPI are available and choose the ones you need.

Use the sbatch command to submit your job. If the job script file were named amber1.sh, then you could use the command sbatch amber1.sh to submit it. You can specify sbatch options in the job script or on the sbatch command itself. If an option is specified in both places, then the value on the sbatch command will be used.


How do I run Amber on GPUs?
Build a job script to load the Amber and OpenMPI modules and use mpiexec to run amber across one or more nodes. For more information about job scripts, see Getting Started – Job Scripts. See also Running Amber on GPUs. Here is a sample job script:

!/bin/bash

One CPU and one GPU on one node.

SBATCH -p GPU

SBATCH -t 1:00:00

SBATCH -n1

SBATCH –tasks-per-node=1

module load amber/openmpi/12p12 mpi/openmpi/intel/1.6.3
cd ~/amberjob1/
time mpirun pmemd.MPI -O -i mdin -o mdout -p prmtop -c inpcrd
This script requests 1 processor and one hour of run time on a GPU node. The modules shown in the module load command are examples. Use the module avail command to see which versions of Amber and OpenMPI are available and choose the ones you need.

Use the sbatch command to submit your job. If the job script file were named amber1.sh, then you could use the command sbatch amber01.sh to submit it. You can specify sbatch options in the job script or on the sbatch command itself. If an option is specified in both places, then the value on the sbatch command will be used.


Gaussian

How do I submit a Gaussian job?
Create a Gaussian input file, then use the batchg09 command to build the job script and submit it to the batch scheduler:

batchg09 infile.com [sbatch options] where infile.com is the name of your Gaussian input file. You may omit the .com extension from the batchg09 command. Any sbatch options may be added except for the -q option.

Gaussian jobs will be routed to a queue running on a DLX Hi-Mem node. batchg09 defaults with 8 cores or one socket.


Can I combine multiple Gaussian jobs in a single input file?
Yes, combine multiple Gaussian calculations by using the Link 1 command. Separate the input for each successive job from that of the preceding job with the Link 1 separator.

%Chk = frequency

B3LYP/6-31G(d,p) Freq


Frequency Calculation on XYZ at STP

0 1
. . .

–Link1–
%Chk = frequency

B3LYP/6-31-G(d,p) Geom=Checkpoint Guess=Read Freq=(ReadFC,ReadIsotopes)


Frequency Calculation on XYZ at Elevated Temperature

0 1

450.0 1.0
A blank line must be included between the last line of the first job and the –Link 1– separator. Failure to include a blank line will result in a segmentation violation error when the job is run.

If the checkpoint file from the first job is to be used in a subsequent job, then the names specified with the %Chk command in the Link 0 section of each job must be identical. Otherwise, the second job will not be able to find the checkpoint file from the first job, which will cause the second job to fail.


Can I use a checkpoint file created on a different computer or operating system?
Yes, but you must convert the file from binary to text on the source machine before you transfer it to the DLX. The Gaussian utility chkmove will convert a binary format checkpoint file to and from text format. Although the file is text, it is basically unreadable, but the text format allows the checkpoint file to be transfered between different computers and operating systems. The chkmove utility will only work on checkpoint files from completed calculations. If the file is from a failed job or some intermediate calculation, such as an unconverged optimization or numerical frequency, then chkmove cannot convert the checkpoint file.

To transfer a checkpoint file named mycheckfile.chk from machine ABC to the DLX cluster:

Login to machine ABC and convert mycheckfile.chk from binary to a text format with this command:
chkmove f mycheckfile.chk mycheckfile.xfr
where f indicates conversion from binary to text and .xfr is the extension for text checkpoint files.

Transfer mycheckfile.xfr from the machine ABC to the DLX by any means convenient.
Convert mycheckfile.xfr from text back to binary with this command:
chkmove u mycheckfile.xfr mycheckfile.chk
where u indicates conversion from text to binary.

The Gaussian utilities are located in a release-dependent directory. Before running chkmove, use this command:

source /usr/local/env/gaussian.csh
to make sure the appropriate paths and environment variables are set. This is not done automatically, to allow the use of earlier releases of Gaussian (for which the setup is somewhat different). Do this once per login session or add it to your startup file (such as .bashrc).


How do I convert a binary checkpoint file into a format that I can actually read?
The Gaussian utility formchk will convert data in a binary checkpoint file to a formatted text format that is readable. To reformat the binary checkpoint file mycheckfile.chk into a format that is readable, issue this command:

formchk [options] mycheckfile.chk mycheckfile.fchk
where .fchk is the extension for formatted checkpoint files on UNIX systems.

[options]

-3  Produce a version 3 formatted checkpoint file, including all features supported in Gaussian 09 (e.g., user-defined MM types, redundant coordinate definitions, values, forces, and the Hessian).

-2  Produce a version 2 formatted checkpoint file. This was the version used with Gaussian 03.

-c  Causes the molecular mechanics atom types to appear in the formatted checkpoint file as strings rather than integers.

By default formchk produces a formatted checkpoint file that is backwardly compatible with all previous versions of Gaussian.

The Gaussian utilities are located in a release-dependent directory. Before running formchk, use this commad:

source /usr/local/env/gaussian.csh
to make sure the appropriate paths and environment variables are set. This is not done automatically, to allow the use of earlier releases of Gaussian where the setup is somewhat different. Do this once per login session or add it to your startup file (such as .bashrc).


How do I change the number of processors used from the default of 8?
There are two steps. First, use %nproc=nn in your Gaussian input (.com) file to specify the number of processors. Then use the -n nn option when submitting the job with batchg09. Use the same number (nn) in both places. If the -n option is omitted, batchg09 will read your command file to get the number of processors to use. If you specify both %nproc and -n, but the numbers do not match, the job will not be submitted.

Matlab

How do I submit a Matlab job
Build a job script to load the Matlab module. For more information about job scripts, see Getting Started – Job Scripts. Here is a sample job script:

!/bin/bash

SBATCH -n16

SBATCH -t 24:00:00

SBATCH –job-name=Matlabtest

module load matlab/R2012a
matlab -nodisplay -r “mymatlabprogram, exit”
This script requests 16 processors (one node) and one day (24 hours) of run time. The modules shown in the module load command are examples. Use the module avail command to see which versions of Matlab are available, and choose the one you need.

Use the sbatch command to submit your job. If the job script file were named matlab1.sh, then you could use the command sbatch matlab1.sh to submit it. You can specify sbatch directives in the job script or on the sbatch command itself. If a directive is specified in both places, the value on the sbatch command will be used.