Update Documentation authored by Peter Heger's avatar Peter Heger
# Table of contents
1. [RAMSES Specifications](#ramses-specs)
1. [RAMSES specifications](#ramses-specs)
2. [Get access to the RAMSES cluster](#ramses-application)
3. [generate SSH Keypair](#ssh-gen)
4. [set up Cisco Duo](Cisco-Duo)
3. [Generate SSH keypair](#ssh-gen)
4. [Set up Cisco Duo](Cisco-Duo)
5. [Login](#login)
6. [Data transfer](#data-transfer)
7. [Filesystem](#filesystem)
8. [Submitting Jobs](#submitting)
8. [Submitting jobs](#submitting)
9. [Backup options](#backup)
10. [Environment modules](#env-modules)
11. [Getting help](#help)
......@@ -39,7 +39,7 @@
## Get access
To gain access to RAMSES you need to fulfill three requirements (in any order):
To gain access to RAMSES, you need to fulfill three requirements (in any order):
- apply for a project
- secure the connection with ssh keys
......@@ -49,11 +49,11 @@ Apply for a **user account**:
- [Application form for ITCC projects](https://hpc-access.itcc.uni-koeln.de/jards/WEB/application/login.php?appkind=itcc)
New users can apply for a trial account with limited core/GPU hours without a project description. Applications for a full account need a project description to be reviewed. Up to 15 million core hours per project, a technical review (reasonable usage of resources) is sufficient. Beyond that, a scientific review (importance of research) becomes necessary.
New users can apply for a trial account with limited core/GPU hours without a project description. Applications for a full account need a project description to be reviewed. Up to 15 million core hours per project, a technical review (reasonable usage of resources) is sufficient. Beyond that, a scientific review (importance of research) will be necessary.
### 2FA
For security reasons, you can't login with a username/password. We use a system called **2-Factor-Authentication** (2FA/MFA), meaning you need to prove your identity with two different (as in different systems/locations) 'factors':
For security reasons, you can't login with a combination of username and password. Instead, we use a system called **2-Factor-Authentication** (2FA/MFA), meaning you need to prove your identity with two different (as in different systems/locations) 'factors':
- The first factor is an SSH public key. Please send your SSH _public_ key to the HPC team. [General information on public key authentication](https://www.ssh.com/academy/ssh/public-key-authentication)
- As the second factor we use Cisco Duo. To use it, you will need to enroll your account, see [cisco-duo-setup.pdf](uploads/cd518a29f4362a9383c7345a975ed065/cisco-duo-setup.pdf) .
......@@ -64,7 +64,7 @@ After you have successfully enrolled in Duo and prepared your SSH Key, please se
### Generate SSH keys
Here is a quick intro to ssh keys: There is always a private (as in **private - don't share, don't give away**) and a public key in a key pair. The public key (\*.pub) is put into the file `~/.ssh/authorized_keys` on ramses . When you have the matching private key, this makes the login authentication work. Do not give away the private key and secure it with a passphrase. The keypairs are usually stored in a hidden directory (folder) named .ssh (same on Linux/Mac/WIN).
Here is a quick intro to ssh keys: There is always a private (as in **private - don't share, don't give away**) and a public key in a key pair. The public key (\*.pub) is put into the file `~/.ssh/authorized_keys` on Ramses. When you have the matching private key, this makes the login authentication work. Do not give away the private key and protect it with a passphrase. The keypairs are usually stored in a hidden directory (folder) named .ssh (same on Linux/Mac/WIN).
You can create a modern key (ed25519) using
......@@ -84,7 +84,7 @@ Next you **have to** enter a passphrase (you could use this [passphrase generato
Enter passphrase (empty for no passphrase):
```
**Don't** leave it empty!! As usual, store your password in a secure place, use a password-manager ([e.g. KeePass](https://keepass.info/download.html)).
**Don't** leave it empty!! As usual, store your password in a secure place or use a password-manager ([e.g. KeePass](https://keepass.info/download.html)).
```
Enter same passphrase again:
......@@ -107,7 +107,7 @@ cat ~/.ssh/id_ed25519.pub
**Please send the public key to: ****hpc-mgr@uni-koeln.de**
If `ssh` on your computer is old, it will not know the key type ed25519. In this case use
If the `ssh` program on your computer is outdated, it will not know the key type ed25519. In this case use
```
ssh-keygen -t rsa -b 4096 -C "Your Name"
......@@ -117,7 +117,7 @@ and send us the file `~/.ssh/id_rsa.pub` instead.
To avoid having to enter the passphrase every time you log in, you can load the key into memory using the ssh-agent.
On most Linux and Macs this is pre-installed, you can check with the command `ssh-add -l`. This should not return an error, but usually `This agent has no identities`. Otherwise you can start the ssh-agent:
On most Linux and Mac systems, the agent is pre-installed, you can check with the command `ssh-add -l`. This should not return an error, but usually `This agent has no identities`. Otherwise you can start the ssh-agent:
```
ssh-agent # start the ssh-agent
......@@ -146,9 +146,9 @@ If you already have access to RAMSES but you are using the CHEOPS key, we advise
### LOGIN
There are 4 login servers: ramses1.itcc.uni-koeln.de up to ramses4.itcc.uni-koeln.de Do not use ramses2 or ramses3, they are for internal use only for now.
Ramses has four login servers: ramses1.itcc.uni-koeln.de to ramses4.itcc.uni-koeln.de . Do not use ramses2 or ramses3, they are for internal use only for now.
When logging in to ramses1, the public key you sent us is authenticated with the private key on your computer (1st factor, you will be asked for the ssh passphrase). If successful, a verification request is automatically pushed to the Duo App on your device where you confirm the login (2nd factor).
When logging in to ramses1, the public key you sent us is authenticated with the private key on your computer (1st factor, you will be asked for the ssh passphrase). If successful, a verification request is automatically pushed to the Duo App on your device where you need to confirm the login (2nd factor).
On your terminal you should see something like this:
......@@ -179,9 +179,10 @@ Enter a passcode or select one of the following options:
Success. Logging you in...
```
In this example, if you choose '1', an authentication request is pushed to your phone and you just have to confirm it with a tap on the screen. Alternatively, instead of choosing a number in the above example, you could also open the Duo Mobile App on your device and enter the 6-digit passcode shown in the app. This code changes every 30 seconds.
In this example, if you choose '1', an authentication request is pushed to your phone and you just have to confirm it with a tap on the screen. Alternatively, instead of choosing a number in the above example, you could also open the Duo Mobile app on your device and enter the 6-digit passcode shown in the app. This code changes every 30 seconds.
**PLEASE NOTE**: Be carefull with scripted logins: Any login attempt with your SSH key that triggers Duo Autopush is counted by Duo. If you don't respond in your App, your account will be blocked after 10 attempts (and has to be unlocked by an admin).
**PLEASE NOTE**: Be carefull with scripted logins: Any login attempt with your SSH Key that triggers Duo Autopush is counted by Duo. If you don't respond in your App, your account will be blocked after 10 attempts (and has to be unlocked by an admin).
## Data transfer
......@@ -191,7 +192,7 @@ There is no automatic mechanism to sync/copy files between Cheops and Ramses. Yo
Please note: you can transfer data ONLY to the login nodes (ramses1 ... ramses4), not directly to compute nodes.
- for small numbers of files/folder:
- for small numbers of files/folders:
```
- scp local_file username@ramses1.itcc.uni-koeln.de:remote_destination_dir ( . for home folder)
......@@ -207,6 +208,7 @@ Please note: you can transfer data ONLY to the login nodes (ramses1 ... ramses4)
- you can also use [rsync](https://tldr.inbrowser.app/pages/common/rsync)
- if you prefer interactive transfer with a shiny GUI: e.g. [FileZilla (Linux/Mac/Win)](https://filezilla-project.org/), [WinSCP (Win only)](https://winscp.net/eng/download.php), [Cyberduck (Mac only)](https://cyberduck.io/download/)
## Filesystem
The filesystem setup is exactly as on CHEOPS:
......@@ -267,7 +269,7 @@ When a partition isn't explicitly specified with the “-p” parameter, the aut
- "bigsmp”: when the requested memory exceeds 750GB per node
- "smp": in all other cases
In order to get access to GPU cards, make sure to specify the “gpu” partition as well as the type and number of GPU cards with the “-G” parameter, e.g. “-p gpu -G h100:2” in order to get 2x H100 GPU Cards. Types like “h100_2g. 24gb” are instances of the H100 card created by MIG partitioning, they behave like a separate device.
In order to get access to GPU cards, make sure to specify the “gpu” partition as well as the type and number of GPU cards with the “-G” parameter, e.g. “-p gpu -G h100:2” in order to get 2x H100 GPU Cards. Types like “h100_2g. 24gb” are instances of the H100 card created by [MIG](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/) (multi-instance GPU) partitioning, they behave like a separate device.
Each user has a default group account in slurm which corresponds to his workgroup (not uniuser/hpcuser/smail). For each job the right group account must be specified with the “-A” parameter. Without it the default group account will be chosen automatically. The default group account can be found out by executing the following command:
......@@ -279,9 +281,10 @@ sacctmgr show assoc -n user=$USER format=Account
\[coming soon\]
## Environment Modules
To avoid software conflicts (resulting from incompatibilities, versioning, dependencies...), software is provided as Environment Modules. By using Modules, it is possible to have different versions of software installed on the system.\
To avoid software conflicts (resulting from incompatibilities, versioning, dependencies, etc.), software is provided as Environment Modules. By using Modules, it is possible to have different versions of software installed on the system.\
You can select the module(s) you need directly on the command line or in your scripts.
Basic commands are:
......
......