🤔 Problem

The initial implementation of Renku sessions at CSCS left a few areas for improvement open. We have now received some user feedback from which we can gauge where we should invest more engineering effort to smooth over some of those problems.

🍴 Appetite

6 weeks

🎯 Solution

Improvements are sketched in sub-sections below.

Improving on-boarding

Right now “getting access to CSCS” means activating the integration and being granted access to a special resource pool. We should improve this by making the second step either automatic or unnecessary.

Improving feedback during session start

Users are left waiting for a while because:

  1. the renku session starts and submits a job on the user’s behalf to CSCS
  2. CSCS has to schedule the job - the user needs to wait for the job to be up and running
  3. when the job starts up a number of setup steps need to happen - during this time, the user may see a screen in their renku session that says something about the remote being unavailable

This last part leads users to believe something is wrong although everything is fine and the session is being prepared - we should communicate this better to avoid confusion/panic.

Allow mounting other filesystems

Right now we just mount the default filesystems that everyone has access to (safe). However, there are CSCS customers who have their own filesystems (e.g. Meteoswiss). We should provide some option for them to specify what else to mount.

Allow modifying job submission parameters

At the moment the SLURM job parameters are fixed - we could allow some customization (via env vars potentially)