Taurus HPC Cluster

1. Introduction

Taurus, as depicted in Uranographia by Johannes Hevelius

This guide walks you through the steps of connecting to the Taurus High Performance Computing (HPC) Cluster, installing Visual Studio Code on your computer and compiling and running programs in C/C++, OpenMP, CUDA and MPI. Before connecting to the cluster, you will need a username and password. Contact the system administrator, Dr. Vincent Roberge (vincent.roberge(at)rmc.ca), to get an account created for you.

The Taurus HPC Cluster’s configuration is based on the Linux Containers (LXC) virtualization technology. The host Linux operating systems offer minimal services beside the LXC virtualization and all compute nodes are implemented in LXC containers. Containers run in unprivileged mode to allow direct access to the GPUs. A preconfigured set of containers has been deployed by the system administrator to allow users to program in C/C++, OpenMP, CUDA, MPI and Python. However, if your research project requires that you have your own container(s) in order for you to install your own software and libraries, this is possible. Discuss your requirements with the system administrator.

2. Cluster Specifications

The Taurus HPC Cluster is composed of eight Linux compute nodes (taurus1.local.net to taurus8.local.net) and one Linux client workstation (desktop-usb.local.net). All hosts are connected with a 10 Gbps low-latency converged Ethernet switch. The Linux nodes are Dell Precision servers equipped with 128 GB of RAM and dual Intel Xeon Silver 4214R CPUs with 12 hyper-threaded cores for a total of 24 cores or 48 virtual cores per node. All nodes are configured with an NVIDIA RTX4070 Ti graphics processing unit (GPU) with 7,680 cores and 12 GB DDR6 RAM supporting CUDA compute capability 8.9. The Linux nodes are configured with Linux Ubuntu 24.04. The Taurus HPC Cluster provides a combined processing power of 1.23 TFLOPS on the CPUs and 119.2 TFLOPS on the GPUs.

3. Connecting to the Taurus HPC Cluster

There are three ways to connect to the Taurus HPC Cluster:

Remote SSH connection
Local Wifi connection
Local GUI terminal connection with USB

Each method gives you different capabilities. The SSH connection allows you to connect to the cluster from anywhere on the internet, while Wifi and local terminal give you the best bandwidth.

SSH connection: SSH is the simplest of the four methods and allows you to program on the cluster using a Linux command prompt or Visual Studio Code. If you are programming in C, C++, CUDA or Python, SSH is all you should need. SSH uses a single port.

Local Wifi network: If you are physically located in a Electrical and Computer Engineering (ECE) lab, the preferred method to connect to the cluster is using the local Taurus Wifi network. You can connect to this network using your personal computer. If you require a standalone computer for this purpose (i.e., a computer that is not configured to connect to the RMC domain), you can borrow a laptop from the ECE tech shop.

Local terminal connection: The last method to connect to the Taurus HPC Cluster is to use the client workstation that is physically located beside the cluster in the ECE Power Lab s4205. This computer is configured with Linux and joined to the Taurus domain. It is mostly used to copy files to and from the cluster using USB ports.

The following subsections describe in detail how to connect to the Taurus HPC cluster using each one of the three methods.

3.1. SSH Connection to the Cluster

To connect to the computer cluster, you need to access an SSH gateway:

Host: tauruscluster.duckdns.org
Port: 81

Once on the gateway, you can access the first node of the cluster also using SSH:

Host: taurus1.local.net
Post: 22

Note that you cannot open an SSH terminal directly on the SSH gateway server. The gateway is configured as an SSH jump host only. You do not need to worry about the details, simply follow the instructions below. These instructions are written for a Windows computer, but can easily be adjusted for Linux or Mac.

Important note: In this manual, commands that are to be run on Windows starts with “>” and commands that are to be run on the compute cluster starts with “$”. Do not include the “>” or “$” when typing your commands.

On your Windows computer, start a command prompt (not PowerShell but cmd.exe) and type:

> mkdir %USERPROFILE%\.ssh
> type nul > %USERPROFILE%\.ssh\config
> notepad %USERPROFILE%\.ssh\config

Enter the following text and save the file. Make sure that you replace "username" with your actual username as assigned to you by the system administrator.

Host tauruscluster
    HostName tauruscluster.duckdns.org
    Port 81
    User username

Host taurus1
    HostName taurus1.local.net
    ProxyJump tauruscluster
    User username

From the command prompt, you can now connect to the taurus1 server using the following command. It should prompt you to enter your password twice.

> ssh taurus1

Once you have logged successfully onto taurus1, it is recommended to change your password using the following command. You must use a complex password (more than 8 characters, with upper cases, lower cases and digit).

$ passwd

You can now log off.

$ exit

From your Windows computer, generate an SSH public key that you will upload onto the taurus1 server. This will allow you to log onto the cluster without a password. This will save you quite a bit of time later on when programming in Visual Studio Code.
In the Windows command prompt, generate an RSA public key and copy the key to the cluster:

> ssh-keygen
> type %USERPROFILE%\.ssh\id_rsa.pub | ssh taurus1 "cat > id_rsa.pub"

Now, login to the server again and add the public key to your domain account. Do not forget to replace "username" with your actual username:

> ssh taurus1
$ dos2unix id_rsa.pub
$ kinit
$ ipa user-mod username --sshpubkey="$(cat id_rsa.pub)"

You can now log off and try logging into taurus1, it should not ask you for a password. Note that it may take a minute or two for your key to propagate to all the domain hosts. If the system prompts you for a password. Do not worry, simply try again later.

$ exit
> ssh taurus1

If you move to a different computer and want to add multiple SSH keys to your account, upload all your keys to your home directory on taurus1 and then use the usermod command with multiple instances of the --sshpubkey option to add your multiple RSA keys all at once. All keys must be added at once.

Create a key on the first computer and upload it to the server. Note that the first key is named id_rsa1.pub when uploaded onto the taurus1 server:

> type %USERPROFILE%\.ssh\id_rsa.pub | ssh taurus1 "cat > id_rsa1.pub"

Create a key on the second computer and upload it to the server. Now the key is named id_rsa2.pub:

> type %USERPROFILE%.ssh\id_rsa.pub | ssh taurus1 "cat > id_rsa2.pub"

> ssh taurus1
$ dos2unix id_rsa1.pub id_rsa2.pub
$ kinit
$ ipa user-mod username --sshpubkey="$(cat id_rsa1.pub)" --sshpubkey="$(cat id_rsa2.pub)"

If needed, you can delete your SSH public keys from your domain account using:

$ kinit
$ ipa user-mod username --sshpubkey=

3.2. Local Wifi Network

If you are physically located at RMC, the recommended way to connect to the Taurus HPC Cluster is through the local Taurus Wifi network. Several Taurus Wifi access points have been installed throughout the ECE department. Search for the Wifi network named “Taurus” and connect to it. No password is required, but you need to register your wireless adapter MAC address with the network administrator. Once connected to the Taurus Wifi network, you can SSH to the cluster gateway at tauruscluster.duckdns.org on port 22 or port 81 as explained in the previous section.

Note: Internet access from the Taurus network is filtered to prevent accidental browsing to unprofessional websites. Also, all DNS requests are logged. If you find that a website is blocked wrongfully, please contact the system administrator.

3.3. Local GUI Terminal

There is one workstation that is physically co-located with the cluster to allow users to connect a USB external drive and transfer data to and from the cluster. This is useful to upload and download machine learning datasets as an example. The workstation is located in the ECE Power Lab s4205. You can log on using your Taurus HPC Cluster user account. The workstation is running Ubuntu Desktop and automatically mounts any FAT32, NTFS or Ext3/4 USB drives connected to it. You can then drag-n-drop your files to your user shared directory (discussed in the next section). The USB interface is 3.0 and allows for a fast transfer rate.

4. Shared Folder and Daily Backups

The Taurus HPC Cluster is configured with a shared directory accessible from every taurus node. When you log onto a taurus node, the shared directory is accessible from your home directory at /home/username/shared_link. It is recommended to use this folder for all your work. If you transfer data to and from the cluster using the Local GUI Terminal, make sure that you copy your files in your shared directory. The /home/username/shared_link directory is a symlink to /mnt/nfs_shared/username. A symlink is convenient and does not incur any performance issues; however, it does not play nicely with GDB when debugging code as GDB does not follow symlinks. For this, we have created a mount point to your shared folder located at /home/username/shared_mount. This uses bindfs and works nicely with GDB; however, the performance is significantly degraded. Which one to use? Use both depending on what you are doing.

Your shared folder is backed up every night for 7 days, every week for 4 weeks and every month for 3 months. If you accidentally delete files, browse to your /home/username/backup directory to recover your files. Note that files stored directly in your home directory are local to the node and not backed up.

The folder is named shared_link and shared_mount because it is shared between all compute nodes on the cluster. It is, in fact, a private folder that only you can access. Save all your work in this private folder.

5. Installing Visual Studio Code

Visual Studio Code (VSCode) is a free Integrated Development Environment (IDE) that runs on several operating systems including Windows, Linux and Mac. It is highly configurable and allows you to develop code using Makefile and perform remote debugging without any lag. This makes it the perfect IDE to program on the Taurus cluster. VSCode is free; you can download it from the internet and install it on your computer. VSCode does not require admin privilege to install and installs, by default, in your user profile directory.

Once installed, start VSCode. On the left toolbar, click on the Extensions icon and install the following extensions:

Nsight Visual Studio Code Edition
C/C++ IntelliSense
Remote Development
Makefile Tools
Shell Debugger

Once these extensions are installed, you can use VSCode to connect to taurus1. Click on the remote connection button in the bottom left corner of the VSCode window and select Connect to Host. Type taurus1.

It may ask you for the operating system of the remote host, select Linux. It may also ask you for your password a few times (this is if your SSH key has not propagated to the SSH gateway host yet), enter it each time. Once you are connected, click on Menu > Terminal > New Terminal. This gives you a bash terminal on taurus1. You can type id and pwd to confirm that you are connected as yourself and that you are in your home directory.

Important note: Once connected to a remote server, the VSCode will automatically install the VSCode server in your home directory on the remote server; this may take a few seconds. You will also need to reinstall the extensions listed above on VSCode once you are running on the remote server.

6. Running a C/C++ Program

This section covers how to compile, run and debug a C/C++ program on the Taurus HPC Cluster using VSCode. First, use the following command in the VSCode terminal to download the example code in your home directory:

$ cd ~
$ wget --user cluster --password computing http://roberge.duckdns.org/files/cluster/prime_cpp.zip

Unzip the start code:

$ unzip prime_cpp.zip

Go to Menu > File > Open Folder and open the prime_cpp directory. Inspect the content of the prime.cpp file. This code computes the number of prime numbers between 2 and n. To compile and run the code, click on the debug icon on the left toolbar to open the debug window and then select Run debug from the drop-down menu in the left window. Then click on the green arrow just left of the drop-down menu. The start code should run successfully. You should see the output of the program in the TERMINAL window.

Important note: To run your code, always use the Run and Debug (Ctrl+Shift+D) button on the left toolbar, select the run configuration in the drop-down menu and click Start Debugging (F5). This allows you to use the Makefile and the build and launch tasks that have been manually programmed in the tasks.json and launch.json files (keep reading to learn about these two files).

Tip: If you use the Debug or run button on the right-hand side of VSCode, you will be using some default task and launch parameters and the project will fail. To avoid this mistake, right-click on the Debug or run button on the right-hand side of VSCode and disable its visibility.

This project has been configured to be compiled using a Makefile. Inspect the content of the source file and the Makefile. The Makefile is a bit complex but complete. It can be used as a starting point when creating your own C/C++ projects. Now, inspect the content of the .vscode directory, which is used by VSCode to configure the project.

The c_cpp_properties.json file is used by the VSCode to perform the syntax highlighting and to allow you to see function definitions and perform auto-completion. It is not used during the compilation. If the c_cpp_properties.json is not configured correctly, you may see errors highlighted in your code, but your code still compiles correctly. These errors are false positives due to the misconfiguration.

The launch.json file configures the launch option for your program. In this example, there is a “Run debug” and a “Run release” launch configuration. For both configurations, you can see the path to the binary being executed. You can also see the arguments used. Note that the launch configurations were configured with a pre-launch task which compiles the program before it is run.

The tasks.json file configures the compilation tasks. Here, it calls the Makefile with the appropriate target.

To debug your program, add breakpoints by clicking to the left of the line number in the .cpp source file and compile and run your program in debug mode. When the program stops at a breakpoint, you can see the content of variables on the left window. To see the content of arrays, add an entry such as the following in the watch window:

(double[5]) *input

This entry would allow you to see the first 5 values of the array input. By the way, there is no array in this C++ example, but knowing this trick will prove invaluable when writing your own code in VSCode. To print the value of a single element of an array, you can type something like the following in the DEBUG CONSOLE tab:

input[5]

This entry would allow you to see the 6th element of array input. Remember that C/C++ is zero indexed.

Another useful trick, VSCode has an auto-format feature and fixes the indentation of your code. To use it, simply type:

On Windows: Shift + Alt + F or Ctrl + k followed by Ctrl + f
On Mac: Shift + Option + F
On Linux: Ctrl + Shift + I

Once your code is working in debug mode, compile it and run it in release mode. This will ensure that it works correctly in release mode. However, to measure accurate runtimes, you must run the code without the GDB debugger attached. For this, go to the Terminal tab and call the program directly. In this case, call:

$ release/prime

7. Running an OpenMP Program (multicore CPU)

Now that you have run a sequential program, let’s use OpenMP to take advantage of the multicore processor installed in the Taurus HPC Cluster. First, download the example code :

$ cd ~
$ wget --user cluster --password computing http://roberge.duckdns.org/files/cluster/prime_omp.zip

Unzip the start code:

$ unzip prime_omp.zip

Go to Menu > File > Open Folder and open the prime_omp directory. Inspect the content of the prime.cpp file. This code also computes the number of prime numbers between 2 and n, but now uses multiple threads. You can inspect the Makefile to see the options used to compile the program.

Adjust the number of threads and run the program. Run it outside of the debugger to get the real runtime of the program. As an example, the following command will run the program with 4 to 12 threads with increments of 4.

$ make release
$ release/prime_omp 4 4 12

Inspect the launch.json file in the .vscode directory to see how VSCode can be configured to pass arguments to the program

8. Running a CUDA Program (GPU)

The GPU contains a very large number of cores when compared to multicore CPUs, but each core is much simpler in design. GPUs are optimized for massively parallel programs that exploit data-level parallelism. To test the GPU installed on the Taurus HPC Cluster, download the CUDA example program using the following commands:

$ cd ~
$ wget --user cluster --password computing http://roberge.duckdns.org/files/cluster/prime_cuda.zip

Unzip the start code:

$ unzip prime_cuda.zip

Go to Menu > File > Open Folder and open the prime_cuda directory. Inspect the content of the prime.cu file, the Makefile and the files in the .vscode directory. When you are ready, compile and run the example program using the launch button. You should note a much higher speedup compared to the OpenMP program. You can also launch the program right from the terminal to avoid the debugger overhead.

$ make release
$ release/prime

9. Running an MPI Program (Distributed and Multicore)

For highly complex tasks, it is sometimes necessary to use the computing power of multiple computers connected together in a cluster. This can be achieved using a high-performance multi-process library for distributed systems such as Message Passing Interface (MPI). MPI programs must run from a directory that is shared between all the nodes in the cluster. For this reason, change directory to ~/shared_mount and download the MPI example program there:

$ cd ~/shared_mount
$ wget --user cluster --password computing http://roberge.duckdns.org/files/cluster/prime_mpi.zip

Unzip the start code:

$ unzip prime_mpi.zip

Go to Menu > File > Open Folder and open the prime_mpi directory. Inspect the content of the various files in the directory.

Your MPI program will run on 4 nodes (taurus1 to taurus4) and will make use of all the cores on the CPUs. Before you run the program, you must acquire a Kerberos ticket from the domain controller, which will allow you to access the other nodes without a password. You must also log in to the other taurus nodes so that the soft link and the mount point to your shared directory get created in your home directory. The nodes have been configured so that the soft link and the mount point get created the first time you open a Bash shell on the node. To do this, run the following commands on taurus1. It can be done right from the terminal window of VSCode. Note that you will have to run these commands every time the taurus nodes reboot as the share_mount mount point gets unmounted at shutdown.

$ kinit $USER
$ ssh taurus2.local.net
$ exit
$ ssh taurus3.local.net
$ exit
$ ssh taurus4.local.net
$ exit

This previous step only needs to be done after a reboot of the Taurus HPC Cluster. However, at the beginning of every MPI programming session, you will need to enter the command below to reacquire your Kerberos ticket:

$ kinit $USER

You are now ready to compile and run the MPI example program. Use the launch button and select mpirun release. Mpirun is the application that launches the multiple processes on the cluster computers.

Debugging an MPI program is a bit tricky because mpirun is the first process to run, which, in turn, starts several instances of your program. If you try debugging your program in the standard way, the debugger will start debugging mpirun and not your application. This will fail catastrophically. To debug your MPI program, compile your program in DEBUG mode using the Makefile.

$ make debug

Note that there is a line of code at the beginning of the main() function that is included only in debug mode:

#ifndef NDEBUG
wait_for_debugger(world_rank, 0);
#endif

This line of code makes process 0 wait for the debugger to attach to it. Once the program is compiled in debug mode, insert at least one breakpoint into the source code and use the command prompt to launch the program:

$ make run-debug

Process 0 will spin lock until the debugger attaches to it. Now, from VSCode, use the launch button and select attach debug. You will be prompted to select the PID of the process to attach to. Typically, process 0 of your program will be the process named prime with the lowest PID. Once attached, process 0 will start running automatically and should stop at your first breakpoint. You can now use the debugger normally.

10. Conclusion

Thank you for using the Taurus HPC Cluster. If you have comments or requests, do not hesitate to contact the system administrator. If you would like to have your own Linux container (LXC) on the cluster so that you can install personalized packages (ex.: python with different machine learning or artificial intelligence libraries), this is possible. Discuss it with the system administrator.