## RAMSES Specifications:<a name="ramses-specs"></a>
## RAMSES Specifications:
(**R**esearch **A**ccelerator for **M**odeling and **S**imulation with **E**nhanced **S**ecurity)
...
...
@@ -36,12 +33,9 @@
- 15 PB HDD Speicherplatz
- 940 TB SSD NVMe Speicherplatz
- high-speed interconnect
- HDR100 InfiniBand
<br>
<br>
- HDR100 InfiniBand
## Get access<a name="ramses-application"></a>
## Get access
To gain access to RAMSES you need to fulfill three requirements (in any order):
...
...
@@ -50,32 +44,25 @@ To gain access to RAMSES you need to fulfill three requirements (in any order):
- setup a second authentication factor (2FA)
Apply for a **user account**:
-[Application form for ITCC projects](https://hpc-access.itcc.uni-koeln.de/jards/WEB/application/login.php?appkind=itcc)
New users can apply for a trial account with limited core/GPU hours without a project description. Applications for a full account need a project description to be reviewed. Up to 15 million core hours per project, a technical review (reasonable usage of resources) is sufficient. Beyond that, a scientific review (importance of research) becomes necessary.
-[Application form for ITCC projects](https://hpc-access.itcc.uni-koeln.de/jards/WEB/application/login.php?appkind=itcc)
New users can apply for a trial account with limited core/GPU hours without a project description. Applications for a full account need a project description to be reviewed. Up to 15 million core hours per project, a technical review (reasonable usage of resources) is sufficient. Beyond that, a scientific review (importance of research) becomes necessary.
### 2FA
For security reasons, you can't login with a username/password. We use a system called **2-Factor-Authentication** (2FA/MFA), meaning you need to prove your identity with two different (as in different systems/locations) 'factors':
- The first factor is an SSH public key. Please send your SSH *public* key to
the HPC team. [General information on public key authentication](https://www.ssh.com/academy/ssh/public-key-authentication)
- As the second factor we use Cisco Duo. To use it, you will need to enroll
your account, see [cisco-duo-setup.pdf](uploads/cd518a29f4362a9383c7345a975ed065/cisco-duo-setup.pdf) .
- The first factor is an SSH public key. Please send your SSH _public_ key to the HPC team. [General information on public key authentication](https://www.ssh.com/academy/ssh/public-key-authentication)
- As the second factor we use Cisco Duo. To use it, you will need to enroll your account, see [cisco-duo-setup.pdf](uploads/cd518a29f4362a9383c7345a975ed065/cisco-duo-setup.pdf) .
If you own a [Yubikey](https://en.wikipedia.org/wiki/YubiKey) hardware token, you can also use it (in OTP mode) as the second authentication factor instead of Cisco Duo. If you are interested in using a Yubikey, please contact the [HPC-Team](mailto:hpc-mgr@uni-koeln.de).
Please note: we can't provide Yubikeys to users, but it could be a worthwhile investment for about 50€.
If you own a [Yubikey](https://en.wikipedia.org/wiki/YubiKey) hardware token, you can also use it (in OTP mode) as the second authentication factor instead of Cisco Duo. If you are interested in using a Yubikey, please contact the [HPC-Team](mailto:hpc-mgr@uni-koeln.de). Please note: we can't provide Yubikeys to users, but it could be a worthwhile investment for about 50€.
After you have successfully enrolled in Duo and prepared your SSH Key, please
send your key.
After you have successfully enrolled in Duo and prepared your SSH Key, please send your key.
### Generate SSH keys
### Generate SSH keys<a name="ssh-gen"></a>
Here is a quick intro to ssh keys: There is always a private (as in **private - don't share, don't give away**) and a public key in a key pair. The public key (*.pub) is put into the file `~/.ssh/authorized_keys` on ramses . When you have the matching private key, this makes the login authentication work. Do not give away the private key and secure it with a passphrase.
The keypairs are usually stored in a hidden directory (folder) named .ssh (same on Linux/Mac/WIN).
Here is a quick intro to ssh keys: There is always a private (as in **private - don't share, don't give away**) and a public key in a key pair. The public key (\*.pub) is put into the file `~/.ssh/authorized_keys` on ramses . When you have the matching private key, this makes the login authentication work. Do not give away the private key and secure it with a passphrase. The keypairs are usually stored in a hidden directory (folder) named .ssh (same on Linux/Mac/WIN).
You can create a modern key (ed25519) using
...
...
@@ -115,10 +102,10 @@ You can ignore the rest of the output. The keypair is stored under \~/.ssh/id_ed
```
cat ~/.ssh/id_ed25519.pub
```
**Please send the public key to: [hpc-mgr@uni-koeln.de](mailto:hpc-mgr@uni-koeln.de)**
If `ssh` on your computer is old, it will not know the key type ed25519.
In this case use
**Please send the public key to: ****hpc-mgr@uni-koeln.de**
If `ssh` on your computer is old, it will not know the key type ed25519. In this case use
To avoid having to enter the passphrase every time you log in, you can load the key into memory using the ssh-agent.
On most Linux and Macs this is pre-installed, you can check with the command
`ssh-add -l`. This should not return an error, but usually
`This agent has no identities`. Otherwise you can start the ssh-agent:
On most Linux and Macs this is pre-installed, you can check with the command `ssh-add -l`. This should not return an error, but usually `This agent has no identities`. Otherwise you can start the ssh-agent:
```
ssh-agent # start the ssh-agent
```
Then you add the public key you just created:
```
ssh-add [ path to your key file, ~/.ssh/id_rsa or id_ed25519 ]
```
You can usually just run `ssh-add` since `ssh-add` can find the files on its own.
`ssh-add` asks for the password you set in the `ssh-keygen` step and afterwards
`ssh-add -l` should list your key like this:
You can usually just run `ssh-add` since `ssh-add` can find the files on its own. `ssh-add` asks for the password you set in the `ssh-keygen` step and afterwards `ssh-add -l` should list your key like this:
You can now use it within your session without having to re-enter your SSH Key
password.
You can now use it within your session without having to re-enter your SSH Key password.
If you have to use a Windows System: [Key-based authentication in OpenSSH for Windows](https://learn.microsoft.com/en-gb/windows-server/administration/openssh/openssh_keymanagement)
If you already have access to RAMSES but you are using the CHEOPS key, we
advise you to create your own SSH key on your local machine/laptop and then
add the public key to your `.ssh/authorized_keys` file in your home on RAMSES.
Any text editor will work for this.
If you already have access to RAMSES but you are using the CHEOPS key, we advise you to create your own SSH key on your local machine/laptop and then add the public key to your `.ssh/authorized_keys` file in your home on RAMSES. Any text editor will work for this.
**PLEASE NOTE**: Do no share SSH Keys with other people and do not copy private keys to other computers. Just create new SSH Key pairs on each computer you use regularly. You can also use [SSH Agent Forwarding](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/using-ssh-agent-forwarding), where an SSH Key is taken along into a SSH session to a remote computer, eliminating the need to create many keys.
### LOGIN
There are 4 login servers: ramses1.itcc.uni-koeln.de up to ramses4.itcc.uni-koeln.de Do not use ramses2 or ramses3, they are for internal use only for now.
### LOGIN<a name="login"></a>
There are 4 login servers:
ramses1.itcc.uni-koeln.de up to ramses4.itcc.uni-koeln.de
Do not use ramses2 or ramses3, they are for internal use only for now.
When logging in to ramses1, the public key you sent us is authenticated with the private key on your computer (1st factor, you will be asked for the ssh passphrase). If successful, a verification request is automatically pushed to the Duo App on your device where you confirm the login (2nd factor).
When logging in to ramses1, the public key you sent us is authenticated with the private key on your computer (1st factor, you will be asked for the ssh passphrase). If successful, a verification request is automatically pushed to the Duo App on your device where you confirm the login (2nd factor).
On your terminal you should see something like this:
...
...
@@ -185,11 +160,9 @@ Success. Logging you in...
rpabel2@ramses1:~>
```
Even though the message `Autopushing...` appears twice, only one push is
executed and only one verification is needed.
Even though the message `Autopushing...` appears twice, only one push is executed and only one verification is needed.
On ramses4, you can choose different Cisco Duo authenticators, if you have
configured any:
On ramses4, you can choose different Cisco Duo authenticators, if you have configured any:
```
rpabel2@soliton:~> ssh ramses4
...
...
@@ -204,21 +177,15 @@ Enter a passcode or select one of the following options:
Success. Logging you in...
```
In this example, if you choose '1', an authentication request is pushed to your phone and you just have to confirm it with a tap on the screen. Alternatively, instead of choosing a number in the above example, you could also open the Duo Mobile App on your device and enter the 6-digit passcode shown in the app. This code changes every 30 seconds.
**PLEASE NOTE**: Be carefull with scripted logins: Any login attempt with your SSH
Key that triggers Duo Autopush is counted by Duo. If you don't respond in your
App, your account will be blocked after 10 attempts (and has to be unlocked by
an admin).
<br>
<br>
**PLEASE NOTE**: Be carefull with scripted logins: Any login attempt with your SSH Key that triggers Duo Autopush is counted by Duo. If you don't respond in your App, your account will be blocked after 10 attempts (and has to be unlocked by an admin).
## Data transfer<a name="data-transfer"></a>
## Data transfer
To transfer your data to the cluster, we recommend using [scp](https://tldr.inbrowser.app/pages/common/scp)(**s**ecure**c**o**p**y) - either on the command line (CLI/Terminal) or with a graphical client (e.g. WinSCP).\
To transfer your data to the cluster, we recommend using [scp](https://tldr.inbrowser.app/pages/common/scp)(**s**ecure**c**o**p**y) - either on the command line (CLI/Terminal) or with a graphical client (e.g. WinSCP).\\
There is no automatic mechanism to sync/copy files between Cheops and Ramses. You have to copy your files yourself.
There is no automatic mechanism to sync/copy files between Cheops and Ramses. You have to copy your files yourself.
Please note: you can transfer data ONLY to the login nodes (ramses1 ... ramses4), not directly to compute nodes.
...
...
@@ -236,14 +203,9 @@ Please note: you can transfer data ONLY to the login nodes (ramses1 ... ramses4)
- show contents: tar -tf <file>
```
- you can also use [rsync](https://tldr.inbrowser.app/pages/common/rsync)
- if you prefer interactive transfer with a shiny GUI: e.g. [FileZilla (Linux/Mac/Win)](https://filezilla-project.org/), [WinSCP (Win only)](https://winscp.net/eng/download.php), [Cyberduck (Mac only)](https://cyberduck.io/download/)
- if you prefer interactive transfer with a shiny GUI: e.g. [FileZilla (Linux/Mac/Win)](https://filezilla-project.org/), [WinSCP (Win only)](https://winscp.net/eng/download.php), [Cyberduck (Mac only)](https://cyberduck.io/download/)
<br>
<br>
## Filesystem<a name="filesystem"></a>
## Filesystem
The filesystem setup is exactly as on CHEOPS:
...
...
@@ -258,17 +220,12 @@ The filesystem setup is exactly as on CHEOPS:
- typical usage: input data should be copied to the scratch-partition only for running or soon running jobs. Accordingly, input and temporary data on /scratch should be deleted and output data transferred to longer term storage after job completion.
- /project/\<user/group\>
- created on request
- NO AUTOMATIC BACKUP
<br>
<br>
### SUBMITTING JOBS<a name="submitting"></a>
- NO AUTOMATIC BACKUP
### SUBMITTING JOBS
There are several partitions/queues in slurm intended for general usage:
- _smp_
- default partition, for single node jobs
- 136 nodes
...
...
@@ -298,38 +255,26 @@ There are several partitions/queues in slurm intended for general usage:
- a partition with a single node that contains two NEC SX Aurora Vector Engine Cards
When a partition isn't explicitly specified with the “-p” parameter, the automatic routing mechanism determines the right partition for the job:
- "mpi" partition:
- when the memory specification is core oreiented (mem_per_cpu) and multiple tasks are specified
- when multiple nodes are specified
- when the memory specification is core oreiented (mem_per_cpu) and multiple tasks are specified
- when multiple nodes are specified
- "bigsmp”: when the requested memory exceeds 750GB per node
- "smp": in all other cases
In order to get access to GPU cards, make sure to specify the “gpu” partition
as well as the type and number of GPU cards with the “-G” parameter, e.g.
“-p gpu -G h100:2” in order to get 2x H100 GPU Cards. Types like “h100_2g.
24gb” are instances of the H100 card created by MIG partitioning, they behave
like a separate device.
In order to get access to GPU cards, make sure to specify the “gpu” partition as well as the type and number of GPU cards with the “-G” parameter, e.g. “-p gpu -G h100:2” in order to get 2x H100 GPU Cards. Types like “h100_2g. 24gb” are instances of the H100 card created by MIG partitioning, they behave like a separate device.
Each user has a default group account in slurm which corresponds to his
workgroup (not uniuser/hpcuser/smail). For each job the right group account
must be specified with the “-A” parameter. Without it the default group account
will be chosen automatically. The default group account can be found out by
executing the following command:
Each user has a default group account in slurm which corresponds to his workgroup (not uniuser/hpcuser/smail). For each job the right group account must be specified with the “-A” parameter. Without it the default group account will be chosen automatically. The default group account can be found out by executing the following command:
```
sacctmgr show assoc -n user=$USER format=Account
```
<br>
<br>
## Backup your data<a name="backup"></a>
## Backup your data
[coming soon]
\[coming soon\]
<br>
<br>
## Environment Modules<a name="env-modules"></a>
## Environment Modules
To avoid software conflicts (resulting from incompatibilities, versioning, dependencies...), software is provided as Environment Modules. By using Modules, it is possible to have different versions of software installed on the system.\
You can select the module(s) you need directly on the command line or in your scripts.
...
...
@@ -343,56 +288,43 @@ Basic commands are:
- check the software environment: which \<command>, echo $PATH
- modules list : list **loaded** modules
- module unload \<module> (module purge \<module>: also unload dependencies)
```
<br>
<br>
<br>
<br>
```
## Getting help
#### HPC support
For questions about the operation of the HPC system, about parallel computing, the batch system, or development software, please contact our HPC team at [hpc-mgr@uni-koeln.de](mailto:hpc-mgr@uni-koeln.de).
For questions about the operation of the HPC system, about parallel computing, the batch system, or development software, please contact our HPC team at hpc-mgr@uni-koeln.de.
#### Scientific support
For questions regarding scientific applications or scientific computing in general, please contact our staff for scientific support at [wiss-anwendung@uni-koeln.de](mailto:wiss-anwendung@uni-koeln.de).
For questions regarding scientific applications or scientific computing in general, please contact our staff for scientific support at wiss-anwendung@uni-koeln.de.
#### HPC accounts
Please have a look at [this page](accounts.md) for information on how to obtain an HPC account. Should you have further questions, our account team at [hpc-accounts@uni-koeln.de](mailto:hpc-accounts@uni-koeln.de) will be happy to help.
Please have a look at [this page](accounts.md) for information on how to obtain an HPC account. Should you have further questions, our account team at hpc-accounts@uni-koeln.de will be happy to help.
#### Help request
If you send a **support request**, please provide all relevant information to describe your case. In particular, **error messages** are crucial for analysis and should be provided with the request. Depending on your application, error messages are usually printed to the standard error (`stderr`) and/or the standard output (`stdout`) stream so that you will either see them passing by on the screen or find them in a corresponding file. In addition, accompanying information is often helpful to track down errors. For instance, if the batch system fails to run a job, you should provide the job identifier (`<jobid>`) with your report. If building an application fails, you should provide name and version of the compiler and the libraries used.
The HPC team handles hundreds of support requests per year. In order to ensure efficient and timely resolution of issues, please include in your request as much as possible of the following information:
#### Attention block NERSC
The RRZK team handles hundreds of support requests per year. In order to ensure efficient and timely resolution of issues, please include in your request as much as possible of the following information:
* error messages
* jobids
* location of relevant files, such as:
* input/output
* job scripts
* source code
* executables
* output of module list
* any steps you have tried
* steps to reproduce
```
* error messages
* jobids
* location of relevant files, such as:
* input/output
* job scripts
* source code
* executables
* output of module list
* any steps you have tried
* steps to reproduce
```
#### Phone support
Consulting and account-support phone services are not available.
To report an urgent system issue, you may call the RRZK at +49 221 470-89555 (local and international) or, preferably, write an email to [hpc-mgr@uni-koeln.de](mailto:hpc-mgr@uni-koeln.de).
To report an urgent system issue, you may call the RRZK at +49 221 470-89555 (local and international) or, preferably, write an email to hpc-mgr@uni-koeln.de.