Postmortem -
Read details
Sep 19, 12:34 UTC
Resolved -
We've rolled out our fallback change and the underlying network issue also appears to be resolved.
Sep 19, 04:03 UTC
Update -
AWS has not been able to solve the network issues that they identified in a number of AZs. We are continuing to monitor their progress on the underlying issue.
In the meantime, we have been rolling out a workaround to fall back on to stable AZs. This workaround has been applied to most Dagster Cloud organizations on Serverless that encountered run launch failures today, and we are still monitoring as this rollout completes to the full set of Serverless deployments.
Sep 19, 01:40 UTC
Monitoring -
We've been rolling out changes across Serverless that have been stabilizing network issues.
Also, our underlying cloud provider (AWS) has been providing updates that expect full resolution in the next hour.
Sep 18, 23:53 UTC
Update -
We are continuing to experience networking issues with our underlying cloud provider. We have started to rollout a workaround to mitigate these networking issues.
Sep 18, 21:44 UTC
Update -
We are continuing to experience networking issues with our underlying cloud provider. We have begun implementing a workaround to failover to an availability zone that is not experiencing these issues.
Sep 18, 20:06 UTC
Update -
We are continuing to monitor the networking issues with our underlying cloud provider which are causing Serverless run launch failures.
Sep 18, 19:24 UTC
Update -
Our underlying cloud provider is having some networking issues which are likely the root cause of the Serverless run launch failures. We are continuing to investigate these errors.
Sep 18, 18:39 UTC
Investigating -
We are currently investigating issues with some Serverless runs failing to launch.
Sep 18, 18:07 UTC