Different services in Renku have different logging formats. This makes it hard to debug and troubleshoot problems. This is even harder for other people who just want to deploy and run Renku. And they do not know all the intricacies that someone who is on the Renku team knows.
4-5 weeks
I did some more digging and it seems that this is more or less what open telemetry is trying to support. So see here for more details: https://opentelemetry.io/docs/languages/python/. We may be able to use open telemetry for all of this instead of coming up with our own custom json-like format. In addition to this open telemetry packages for python/go/js will generate and inject span/trace ids so that requests can be followed between different services. So it seems like this is a much more powerful/standard way of doing this than simply just relying on request ids that we define. They monkey patch the request libraries (for example httpx) so that when you send requests from one service to another the headers with the trace/span ids are propagated properly. The nice things is that for example Keycloak supports open telemetry. Which means that we can get tracing and visibility not just for our own components but also for external ones like keycloak, redis and postgres. But being able to just connect the gateway and data service with open telemetry would be great to start.
When an 500 status code is received in the ui I want to see the request ID so that I can track that though all the different services in the logs.
When an error occurs somewhere and I know the request id I want to easily filter and show all relevant log lines. Especially in services like Loki/Grafana.