Cardano’s resistance to global scale network issues
Almost instantaneously many network and IT infrastructure monitoring systems turned red all around the globe. Cisco’s ThousandEyes Outage live dashboard showed a huge and extraordinary peak.
Here’s a description of the BGP routing mistake, having caused this issue. It is surprising that such announcement failures can still occur where RPKI has been used for some time. It provides cryptographic evidence as a solution to this trust problem.
So this is a first good opportunity to observe possible effects on the Cardano Shelley Mainnet.
It is not yet based on an automatic P2P system in the first month after launch. Instead, pool operators can maintain manual peer lists. Or alternatively use semi-automatically generated lists from the TopologyUpdater service.
What stability can be achieved at this stage of development?
Have regional peering clusters formed that react unexpectedly weakly to sudden global routing problems? Have many operators decided to peer only with IOHK bootstrap nodes for simplicity?
The following diagram shows a relay node with location Germany. We are now interested in three essential characteristic curves:
The upper green line shows significant drops of about 25% in the connected remote peers.
The middle red line showing chain-density is ideally a value around 5% and should not drop significantly. As it didn’t happen this is a very good sign.
You can also see from the brown bottom line that the transactions in the MemPool have never accumulated. Therefore they have always been processed as quickly as possible and packed into blocks.
Other nodes in other regions showed similar but more or less pronounced effects. In no known case this led to insufficient networking and communication.
Outage and Recovery
Since CLIO1 developed the TopologyUpdater and is providing it free of charge to all stake pools in the Cardano network until the introduction of the P2P network, it is also possible to draw a graph showing how this network problem has affected the communication of the approximately 660 registered pool nodes.
The following graphic shows that about 60 nodes, i.e. about 10% of the nodes had no connection anymore.
And you can also see that the connection was restored relatively quickly after the global routing problems were resolved. (green = reachable / red = unreachable)
All in all, this unusual event, with many failures of global Internet services, was no problem for the decentralized Cardano Mainnet.
But other, bigger challenges will follow, so it is important that we have well-trained StakePool operators who really know what they are doing.