🤔 Problem

Currently when the daemonset pods for csi rclone restart we cannot recover the mounts until users restart their sessions.

This is a continuation of ‣ where we made it so that we can remount stuff after a restart. However the mounts in sessions still do not work even though on the daemonset pods they function properly.

🍴 Appetite

3 weeks

🎯 Solution

Use both methods in CSI. Currently we use only NotePublishVolume but there is also NodeStageVolume. What is usually done in CSI drivers is that the “real” mounting is done NodeStageVolume and then NotePublishVolume only means making a bind mount between the location where the volume was staged and the location where it is needed in the pod. We think that with this once we recover the mount in the staging location then the publish location will refresh and properly propagate to the session pod. Also this way if more than one pod on the same node uses the same volume then we don’t have to rclone mount the same thing twice - but just once and then just have 2 bind mounts to the same rclone mount.

We saw this section in the juicefs csi driver (which uses fuse under the hood): https://juicefs.com/docs/csi/guide/configurations/#automatic-mount-point-recovery

And in the juicefs docs they mention that they see the same error we see when the mount fails to recover. But they explain that with HostToContainer mount propagation and a bind-mount on the node you should be able to recover. We tried to set the HostToContainer mount propagation on our sessions. But this did not help. This is because we do the fuse mount directly in NodePublishVolume and we do not bind mount at all.
[Optional] Escape the FUSE process on the host so that mounts persist during daemonset restarts.
- There are things in https://pkg.go.dev/k8s.io/mount-utils that allow things to escape to the node. So we should use this to mount.
- In addition we will likely have to host path mount the bind and maybe even fuse executables from the host into the daemonset and use these to mount.

🐰 Rabbit Holes

There is code that handle the reading and dectypting of secrets which store the credentials to mount the storage. This is part of the CSI standard but we have 2 implementaiton:
- one that follows the standard which expects the k8s api to read a specific secret and pass in the contents in the appropriate requests
- an older implementation which calls the k8s api to get the secret - this should be removed
There is also yet another complication around the secrets that involves handling secrets encrypted by Renku. And the code will code the secret service on renku to decrypt these.
For the first implementation the goal is to just see whether using the bind mounts and implementing the NodeStageVolume methods will help. If this results in sessions being able to remount volumes after restart then we can continue and handle the secrets. But the first implementation can just ignore the secrets.

🤔 Problem

🍴 Appetite

🎯 Solution

🐰 Rabbit Holes

🔐 Security Implications

🙅‍♀️ No-gos