2.[Get access to the RAMSES cluster](#ramses-application)
3.[generate SSH Keypair](#ssh-gen)
4.[set up Cisco Duo](Cisco-Duo)
5.[Login](#login)
6.[Data transfer](#data-transfer)
7.[Filesystem](#filesystem)
8.[Submitting Jobs](#submitting)
9.[Backup options](#backup)
10.[Environment modules](#env-modules)
<br>
<br>
## RAMSES Specifications:<a name="ramses-specs"></a>
(**R**esearch **A**ccelerator for **M**odeling and **S**imulation with **E**nhanced **S**ecurity)
- 164 compute nodes, 10 Kubernetes nodes
- 348 CPUs = 31576 Cores
- Accelerators
- 40 NVIDIA Hopper H100 GPUs
- 32 NVIDIA A30 GPUs
- 2 AMD Instinct GPUs
- 2 NEC Vector Engines
- Performance
- CPU Performance: 1,7 PFLOP/s
- GPGPU Performance: 3,1 PFLOP/s
- total performance: 4,8 PFLOP/s
- main memory
- 167 TB
- Storage
- 15 PB HDD Speicherplatz
- 940 TB SSD NVMe Speicherplatz
- high-speed interconnect
- HDR100 InfiniBand
<br>
<br>
## Get access<a name="ramses-application"></a>
To gain access to RAMSES you need to fulfill three requirements (in any order):
- apply for a project
- secure the connection with ssh keys
- setup a second authentication factor
Apply for a **user account**:
-[Application form for ITCC projects](https://hpc-access.itcc.uni-koeln.de/jards/WEB/application/login.php?appkind=itcc)
New users can apply for a trial account with limited core/GPU hours without a project description. Applications for a full account need a project description to be reviewed. Up to 15 million core hours per project, a technical review (reasonable usage of resources) is sufficient. Beyond that, a scientific review (importance of research) becomes necessary.
### 2FA
### 2FA
To secure access to RAMSES, we use Two-Factor-Authentication (2FA/MFA).
For security reasons, you can't login with a username/password. We use a system called **2-Factor-Authentication** (2FA/MFA), meaning you need to prove your identity with two different (as in different systems/locations) 'factors':
- The first factor is an SSH public key. Please send your SSH *public* key to
- The first factor is an SSH public key. Please send your SSH *public* key to
the HPC team.
the HPC team.[general information on public key authentication](https://www.ssh.com/academy/ssh/public-key-authentication)
- As the second factor we use Cisco Duo. To use it, you will need to enroll
- As the second factor we use Cisco Duo. To use it, you will need to enroll
your account, see [cisco-duo-setup.pdf](uploads/cd518a29f4362a9383c7345a975ed065/cisco-duo-setup.pdf) .
your account, see [cisco-duo-setup.pdf](uploads/cd518a29f4362a9383c7345a975ed065/cisco-duo-setup.pdf) .
If you own a [Yubikey](https://en.wikipedia.org/wiki/YubiKey) hardware token, you can also use it (in OTP mode) as the second authentication factor instead of Cisco Duo. If you are interested in using a Yubikey, please contact the [HPC-Team](mailto:hpc-mgr@uni-koeln.de).
Please note: we can't provide Yubikeys to users, but it could be a worthwhile investment for about 50€.
After you have successfully enrolled in Duo and prepared your SSH Key, please
After you have successfully enrolled in Duo and prepared your SSH Key, please
send your key to pabel@uni-koeln.de .
send your key to pabel@uni-koeln.de .
### SSH KEYS
### Generate SSH keys<a name="ssh-gen"></a>
Here is a quick intro to ssh keys: There is always a private (as in **private - don't share, don't give away**) and a public key in a key pair. The public key (*.pub) is put into the file `~/.ssh/authorized_keys` on ramses . When you have the matching private key, this makes the login authentication work. Do not give away the private key and secure it with a passphrase.
The keypairs are usually stored in a hidden directory (folder) named .ssh (same on Linux/Mac/WIN).
Here is a quick intro to ssh keys: There is always a private and a public key
in a key pair. The public key (*.pub) is put into the file `~/.ssh/
authorized_keys` on ramses . When you have the matching private key, this makes
the login authentication work. Do not give away the private key and secure it
with a passphrase:
You can create a modern key (ed25519) using
You can create a modern key (ed25519) using
...
@@ -26,9 +85,39 @@ You can create a modern key (ed25519) using
...
@@ -26,9 +85,39 @@ You can create a modern key (ed25519) using
ssh-keygen -t ed25519 -C "Your Name"
ssh-keygen -t ed25519 -C "Your Name"
```
```
and it should be created as `~/.ssh/id_ed25519(.pub)`
You are asked for a file location, just press ENTER
```
Enter file in which to save the key (/home/<username>/.ssh/id_ed25519):
```
Next you **have to** enter a passphrase (you could use this [passphrase generator](https://www.tu-braunschweig.de/it-sicherheit/pwsec/pwgen)):
```
Enter passphrase (empty for no passphrase):
```
**Don't** leave it empty!! As usual, store your password in a secure place, use a password-manager ([e.g. KeePass](https://keepass.info/download.html)).
```
Enter same passphrase again:
Your identification has been saved in /home//<username>/.ssh/id_ed25519
Your public key has been saved in /home//<username>/.ssh/id_ed25519.pub
The key fingerprint is:
SHA256:mDIS+q3blablaBLABLAjePkEMEoR4sAIVumEoJiCXDNVs Your Name
The key's randomart image is:
+--[ED25519 256]--+
...
| .. .. |
+----[SHA256]-----+
```
You can ignore the rest of the output. The keypair is stored under \~/.ssh/id_ed25519(.pub). You can now send us the **public** key (id_ed25519\*\*.pub\*\*), either as a file or just copy/paste the key itself:
```
cat ~/.ssh/id_ed25519.pub
```
Then send us the **id_ed25519.pub** file.
If your `ssh` on your computer is old, it will not know the key type ed25519.
If your `ssh` on your computer is old, it will not know the key type ed25519.
Please set a password on the ssh key (it will ask you for one during `ssh-
keygen`) and use the `ssh-agent` to load the file into memory:
To avoid having to enter the passphrase every time you log in, you can load the key into memory using the ssh-agent.
On most Linux and Macs this is pre-installed, you can check with the command
On most Linux and Macs this is pre-installed, you can check with the command
`ssh-add -l`. This should not return an error, but usually
`ssh-add -l`. This should not return an error, but usually
`This agent has no identities`. Then add your key:
`This agent has no identities`. Otherwise you can start the ssh-agent:
```
ssh-agent # start the ssh-agent
```
Then you add the public key you just created:
```
```
ssh-add [ path to your key file, ~/.ssh/id_rsa or id_ed25519 ]
ssh-add [ path to your key file, ~/.ssh/id_rsa or id_ed25519 ]
```
```
...
@@ -62,6 +155,8 @@ You can usually just run `ssh-add` since `ssh-add` can find the files on its own
...
@@ -62,6 +155,8 @@ You can usually just run `ssh-add` since `ssh-add` can find the files on its own
You can now use it within your session without having to re-enter your SSH Key
You can now use it within your session without having to re-enter your SSH Key
password.
password.
If you have to use a Windows System: [Key-based authentication in OpenSSH for Windows](https://learn.microsoft.com/en-gb/windows-server/administration/openssh/openssh_keymanagement)
If you already have access to RAMSES but you are using the CHEOPS key, I
If you already have access to RAMSES but you are using the CHEOPS key, I
advise you to create your own SSH key on your local machine/laptop and then
advise you to create your own SSH key on your local machine/laptop and then
add the public key to your `.ssh/authorized_keys` file in your home on RAMSES.
add the public key to your `.ssh/authorized_keys` file in your home on RAMSES.
...
@@ -69,15 +164,17 @@ Any text editor will work for this.
...
@@ -69,15 +164,17 @@ Any text editor will work for this.
**PLEASE NOTE**: Do no share SSH Keys with other people and do not copy around private keys to other computers. Just create new SSH Key pairs on each computer you use regularly. You can also use SSH Agent Forwarding, where an SSH Key is taken along into a SSH session to a remote computer, eliminating the need to create many keys.
**PLEASE NOTE**: Do no share SSH Keys with other people and do not copy around private keys to other computers. Just create new SSH Key pairs on each computer you use regularly. You can also use SSH Agent Forwarding, where an SSH Key is taken along into a SSH session to a remote computer, eliminating the need to create many keys.
Once you received your access credentials and set up your SSH keys and Cisco Duo, you have to send us the **public** part of your keypair. (*.pub). **AGAIN:NEVER EVER SHARE THE PRIVATE KEY**
Please send the public key to: [hpc-mgr@uni-koeln.de](mailto:hpc-mgr@uni-koeln.de)
### LOGIN
### LOGIN<a name="login"></a>
There are 4 login servers:
There are 4 login servers:
ramses1.itcc.uni-koeln.de up to ramses4.itcc.uni-koeln.de
ramses1.itcc.uni-koeln.de up to ramses4.itcc.uni-koeln.de
Do not use ramses2 or ramses3, they are for internal use only for now.
Do not use ramses2 or ramses3, they are for internal use only for now.
When you log into ramses1, a verification request is automatically
pushed to your Duo App on your phone.
When logging in to ramses1, the public key you sent us is authenticated with the private key on your computer (1st factor, you will be asked for the ssh passphrase, see also [here](#ssh-usage)). If successful, a verification request is automatically pushed to the Duo App on your device where you confirm the login (2nd factor).
On your terminal you should see something like this:
On your terminal you should see something like this:
...
@@ -103,59 +200,106 @@ Duo two-factor login for rpabel2
...
@@ -103,59 +200,106 @@ Duo two-factor login for rpabel2
Enter a passcode or select one of the following options:
Enter a passcode or select one of the following options:
1. Duo Push to Android
1. Duo Push to +XX XXX XXXX456
2. Duo Push to Android
Passcode or option (1-1):
Passcode or option (1-2): 1
Success. Logging you in...
```
```
You can also enter here the 6-digit TOTP Passcode that is shown in the Duo
App. This code changes every 30 seconds.
In this example, if you choose '1', an authentication request is pushed to your phone and you just have to confirm it with a tap on the screen. Alternatively, instead of choosing a number in the above example, you could also open the Duo Mobile App on your device and enter the 6-digit passcode shown in the app. This code changes every 30 seconds.
**PLEASE NOTE**: Be carefull with scripted logins: Any login attempt with your SSH
**PLEASE NOTE**: Be carefull with scripted logins: Any login attempt with your SSH
Key that triggers Duo Autopush is counted by Duo. If you don't respond in your
Key that triggers Duo Autopush is counted by Duo. If you don't respond in your
App, your account will be blocked after 10 attempts (and has to be unlocked by
App, your account will be blocked after 10 attempts (and has to be unlocked by
an admin).
an admin).
<br>
<br>
## Data transfer<a name="data-transfer"></a>
To transfer your data to the cluster, we recommend using scp (**s**ecure **c**o**p**y) - either on the command line (CLI/Terminal) or with a graphical client (e.g. WinSCP).\
Regarding 2FA login: If you own (*) a [Yubikey](https://www.yubico.com/de/product/yubikey-5-series/yubikey-5-nfc/) hardware token, it is now possible
There is no automatic mechanism to sync/copy files between Cheops and Ramses. You have to copy your files yourself.
to use it (in OTP mode) as the second authentication factor instead of Duo
Push. If you are interested in using a yubikey, please contact us at [hpc-mgr](mailto:hpc-mgr@uni-koeln.de) .
(*) We cannot supply yubikeys to users, since these cost about 50€ per piece.
Please note: you can transfer data ONLY to the login nodes (ramses1 ... ramses4), NEVER directly to compute nodes.
Maybe ask your department head if they are willing to order some for your work
group. Nitrokeys are not supported yet, sadly.
- for small numbers of files/folder:
### FILESYSTEMS
```one or more computation steps, each with one or multiple (parallel) tasks and specific resources (cores, nodes, RAM etc.)
- scp local_file username@ramses1.itcc.uni-koeln.de:remote_destination_dir ( . for home folder)
- for huge amounts of small files use [tar](https://www.gnu.org/software/tar/manual/html_chapter/Tutorial.html#Tutorial)(or zip) to create an archive-file before copying:
The filesystem setup is exactly as on Cheops, with `/home`, `/projects` and `/
```
scratch`:
- create: tar -czf name_of_archive.tar.gz files_or_folder_to_add
- On home the quota is also 100GB and 100.000 files. There is no backup yet.
- extract: tar -xvzf example.tar.gz
- you can use `/scratch` up to 40TB, the automatic deletion of files will be
- show contents: tar -tf <file>
enabled soon.
```
-You can create your own projects directory for now under `/projects/
-you can also use rsync -\> see: [rsync](man.rsync)
friendly_users/` . These will be deleted after this phase ended.
- if you prefere interactive transfer with a shiny GUI: e.g. [FileZilla (Linux/Mac/Win)](https://filezilla-project.org/), [WinSCP (Win only)](https://winscp.net/eng/download.php), [Cyberduck (Mac only)](https://cyberduck.io/download/)
There is no automatic mechanism to sync files in Cheops and Ramses. You have to
<br>
copy your files yourself.
<br>
### SUBMITTING JOBS
## Filesystem<a name="filesystem"></a>
The filesystem setup is exactly as on CHEOPS:
- /home/\<user\>
- size per user 100GB / 100.000 files
- directory 'user' automatically created
- (Backup - coming soon)
- /scratch/\<user\>
- size per user 40TB
- create directory youself! (you _could_ use any name you like, but for clarity we recommend choosing your login-name.
- NO AUTOMATIC BACKUP, automatic deletion of files will be activated soon
- typical usage: input data should be copied to the scratch-partition only for running or soon running jobs. Accordingly, input and temporary data on /scratch should be deleted and output data transferred to longer term storage after job completion.
- /project/\<user/group\>
- size per user ...
- created on request
- NO AUTOMATIC BACKUP
<br>
<br>
### SUBMITTING JOBS<a name="submitting"></a>
There are several partitions/queues in slurm intended for general usage:
There are several partitions/queues in slurm intended for general usage:
- “smp”: default partition, with 136 smp nodes, for single node jobs
- “bigsmp”: partition with 8 bigsmp nodes, for large single node jobs
- "mpi": same nodeset as "smp" but for MPI jobs
- “interactive”: partition with 8 interactive nodes, dedicated for interactive usage
- “gpu”: partition with 10 gpu nodes with the following gpu types: h100:38, h100_1g.12gb:1, h100_2g.24gb:3, h100_3g.47gb:1, h100_4g.47gb:1
- “ft-instinct”: a partition with a single node that contains two AMD Instinct MI210 GPUs
- "ft-aurora": a partition with a single node that contains two NEC SX Aurora Vector Engine Cards
- a partition with a single node that contains two AMD Instinct MI210 GPUs
- _ft-aurora_
- a partition with a single node that contains two NEC SX Aurora Vector Engine Cards
When a partition isn't explicitly specified with the “-p” parameter, the automatic routing mechanism determines the right partition for the job:
When a partition isn't explicitly specified with the “-p” parameter, the automatic routing mechanism determines the right partition for the job:
- "mpi" partition:
- "mpi" partition:
...
@@ -179,5 +323,34 @@ executing the following command:
...
@@ -179,5 +323,34 @@ executing the following command:
```
```
sacctmgr show assoc -n user=$USER format=Account
sacctmgr show assoc -n user=$USER format=Account
```
```
<br>
<br>
## Backup your data<a name="backup"></a>
[coming soon]
<br>
<br>
## Environment Modules<a name="env-modules"></a>
To avoid software conflicts (resulting from incompatibilities, versioning, dependencies...), software is provided as Environment Modules. by using Modules, it is possible to have different versions of software installed on the system.\
You can select the module(s) you need directly on the command line or in your scripts.
Basic commands are:
```
- module avail : list **available** modules
- module whatis <module>: show desription
- module load \<module> : load module
- check the software environment: which \<command>, echo $PATH
- modules list : list **loaded** modules
- module unload \<module> (module purge \<module>: also unload dependencies)
```
<br>
<br>
<br>
<br>
If you encounter any problems, please write to hpc-mgr@uni-koeln.de .
If you encounter any problems, please write to hpc-mgr@uni-koeln.de .