Resolved -
Our cloud provider has confirmed that there was a service degradation between 10:48 AM and 11:01 AM PDT causing an increase in task launch error rates. We are implementing additional retry behavior and fault tolerance to prevent similar issues in the future from causing runs to fail to launch.
Apr 29, 19:28 UTC
Monitoring -
We observed an increase in Dagster+ serverless runs failing to start between 10:50AM and 11:00AM PST. Our cloud provider indicated an incident on their end which we have linked to these issues. Failure rate has since gone back down, we will post more information as we learn more.
Apr 29, 18:16 UTC