Carrier grade resilience in geographically distributed software defined networks

Master Thesis


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title

University of Cape Town

The Internet is a fundamental infrastructure in modern life, supporting many different communication services. One of the most critical properties of the Internet is its ability to recover from failures, such as link or equipment failure. The goal of network resilience heavily influenced the design of the Internet, leading to the use of distributed routing protocols. While distributed algorithms largely solve the issue of network resilience, other concerns remain. A significant concern is network management, as it is a complex and error-prone process. In addition, network control logic is tightly integrated into the forwarding devices, making it difficult to upgrade the logic to introduce new features. Finally, the lack of a common control platform requires new network functions to provide their own solutions to common, but challenging, issues related to operating in a distributed environment. A new network architecture, software-defined networking (SDN), aims to alleviate many of these network challenges by introducing useful abstractions into the control plane. In an SDN architecture, control functions are implemented as network applications, and run in a logically centralized network operating system (NOS). The NOS provides the applications with abstractions for common functions, such as network discovery, installation of forwarding behaviour, and state distribution. Network management can be handled programmatically instead of manually, and new features can be introduced by simply updating or adding a control application in the NOS. Given proper design, an SDN architecture could improve the performance of reactive approaches to restoring traffic after a network failure. However, it has been shown in this dissertation that a reactive approach to traffic restoration will not meet the requirements of carrier grade networks, which require that traffic is redirected onto a back-up route less than 50 ms after the failure is detected. To achieve 50 ms recovery, a proactive approach must be used, where back-up rules are calculated and installed before a failure occurs. Several different protocols implement this proactive approach in traditional networks, and some work has also been done in the SDN space. However, current SDN solutions for fast recovery are not necessarily suitable for a carrier grade environment. This dissertation proposes a new failure recovery strategy for SDN, based on existing protocols used in traditional carrier grade networks. The use of segment routing allows for back-up routes to be encoded into the packet header when a failure occurs, without needing to inform other switches of the failure. Back-up routes follow the post-convergence path, meaning that they will not violate traffic engineering constraints on the network. An MPLS (multiprotocol label switching) data plane is used to ensure compatibility with current carrier networks, as MPLS is currently a common protocol in carrier networks. The proposed solution was implemented as a network application, on top of an open-source network operating system. A geographically distributed network testbed was used to verify the suitability for a geographically distributed carrier network. Proof of concept tests showed that the proposed solution provides complete protection for any single link, link aggregate or node failure in the network. In addition, communication latencies in the network do not influence the restoration time, as they do in reactive approaches. Finally, analysis of the back-up path metrics, such as back-up path lengths and number of labels required, showed that the application installed efficient back-up paths.