T-Mobile explains why its network went down, hard, on Monday

If you’ve been wondering what could knock out one of the United States’ three big cellular carriers’ ability to deliver calls and text messages — and keep it that way for most of an entire day — T-Mobile now has a partial answer that pertains to its extensive nationwide outage Monday.

The short version, if we’re reading this correctly: a fiber-optic circuit failed, and its backup circuit also failed, which caused a chain reaction that strained the network to the point that many calls and texts couldn’t make it through.

The longer version:

June 16th, 2020 6:23pm PST

Update on T-Mobile Voice and Text Performance

Every day we see the vital role technology plays in keeping us connected, and we know T-Mobile customers rely on our network to ensure they have connections with family, loved ones and service providers. This is a responsibility my team takes very seriously and is our highest priority. Yesterday, we didn’t meet our own bar for excellence.

Many of our customers experienced a voice and text issue yesterday, specifically with VoLTE (Voice over LTE) calling. My team took immediate action — hundreds of our engineers worked tirelessly alongside vendors and partners throughout the day to resolve the issue starting the minute we were aware of it. Data connections continued to work, as did our non-VoLTE calling for many customers and services like FaceTime, iMessage, Google Meet, Google Duo, Zoom, Skype and others allowed our customers to stay in touch. Additionally, many customers were able to use circuit-switched voice connections and customers on the Sprint network were unaffected. VoLTE and text in all regions were fully recovered by 10 p.m. PDT last night. I’m happy to say the network is fully operational… and we’re working day in and day out to keep it that way.

Our engineers worked through the night to understand the root cause of yesterday’s issues, address it and prevent it from happening again. The trigger event is known to be a leased fiber circuit failure from a third party provider in the Southeast. This is something that happens on every mobile network, so we’ve worked with our vendors to build redundancy and resiliency to make sure that these types of circuit failures don’t affect customers. This redundancy failed us and resulted in an overload situation that was then compounded by other factors. This overload resulted in an IP traffic storm that spread from the Southeast to create significant capacity issues across the IMS (IP multimedia Subsystem) core network that supports VoLTE calls.

We have worked with our IMS (IP Multimedia Subsystem) and IP vendors to add permanent additional safeguards to prevent this from happening again and we’re continuing to work on determining the cause of the initial overload failure.

So, I want to personally apologize for any inconvenience that we created yesterday and thank you for your patience as we worked through the situation toward resolution.

Neville Ray

T-Mobile President of Technology

It’s not clear which third-party provider’s fiber circuit failed. There was a report on Monday that Level 3, one of the world’s major internet backbone providers, was experiencing an outage, but a spokesperson told TechCrunchdifferently.