In my post The rise and maturity of MPLS, I mentioned a number of challenges that face those organisations wishing to deliver or manage global Wide Are Networks (WANs). Whether it be a carrier, a systems integrator, an outsourcer or a global enterprise managing a global WAN, they are all faced with one particular issue that just can not be ignored and is quite profound in its nature.It is also one of the most un-understood issues in the telecommunications and Information and Communication Technology (ICT) industries today. This issue is managing end-to-end service performance where the service is being delivered over multiple carrier networks and exemplified in WANs.
The impact of this is extremely wide, affecting closed VoIP VPNs, Internet VPNs, layer-2 based VPNs and layer-3 IP-VPNs. This post will focus principally on Internet VPNs and MPLS-based layer-3 IP-VPNs although the concepts discussed are just as applicable to layer-2 services such as Layer 2 Tunnelling Protocol (L2TP).
The downside of the Internet
It strange to talk about such issues in this day and age where the Internet is all all pervasive in homes and businesses around the world. When we think of the Internet we often think of it as a single ‘cloud’ as shown on the right, where a web site is one side of the cloud and users accessing it from the other. That this is even possible, is testament to the founders of the Internet and the resilience of the IP routing algorithms (an excellent book that pragmatically goes through the origins of the Internet is Where Wizards Stay Up Late. This is well worth a read).
However in reality, the Internet is unable to deliver many types services that individuals and enterprises need at an acceptable level of performance. Why should this be so?
At an abstracted level, this perception of the Internet being a single cloud is correct, but the reality at the network level, this is somewhat different.
As the name implies – Inter and net, the Internet is built from 100s of thousands of individual networks known as Autonomous Systems (AS) connected together in a hierarchy. If you go to An Atlas of Cyberspace or The Internet Mapping Project you can see this drawn in quite an artistic way. The hierarchy is made of major teir-1 carriers such as Level3 who provide the inter-continental backbones, connecting to local regional or country carriers such as BT who, in turn, connect to small local ISPs. Consumers and enterprises can connect to the cloud at any level of the hierarchy dependent on their scale and how deep their pockets are.
Each carrier uses the standard IP routing algorithms such as Open Shortest Path First (OSPF) internally within their ‘domain’ and Border gateway Protocol (BGP) to inter-connect domains, thus creating a highly resilient network. In fact, providers of geographic components of the Internet come and go on a frequent basis with little disruption of the Internet holistically (of course, this is a pain to us as individuals if we were one of their customers!).
So what could be possibly wrong with this? It comes down to a question of predictable end-to-end performance, or rather the lack of it. Think of one of the red dots in the picture above as your local broadband DSL provider (e.g. ZEN), connecting to your local incumbent carrier – light blue diamond – who is providing the copper connection to your house or fibre into your business premises (e.g. BT), connecting via one of the global carriers to the USA (e.g. Level3) – large blue dots. In practice, you can end up transiting 60 to 800 separate routers and 40 different networks going to and from the web site you are accessing. This is shown below in the path from my PC to www.cisco.com on the West Coat of the USA using Ping Plotter.
Getting back to technology, it may be that every one of these carriers has deployed MPLS within their own so-called private network (this is a bit of a misnomer as they are carrying public traffic) that carries Internet traffic to and from their customers’ houses or business premises. This enables them to better manage Quality of Service while that traffic is on their network – once packets leave their network winding their way to and from the web site server they have no control over them at all. On the Internet, although there are supervisory bodies looking after such things as standards (IETF) or domain registration (ICANN), but there no one in control of end-to-end performance.
If a particular path becomes congested due to for any reason such as under-powered routers being used as a carrier is cash-strapped, or having insufficient bandwidth available to support the number of customers they have due to a successful advertising campaigns or any number of other reasons, then packets get put into ‘queues’ and unpredictable performance, high latency or delays or even a cessation of a connection when the link times out could be experienced.
Business use of the Internet
Many companies use IP-SEC based Internet VPNs to inter-connect their office sites quite effectively as they represent a low-cost solution. However, in most situations for larger enterprises, they provide a too unpredictable and unreliable service for use as WAN replacements.
Unpredictable performance may be acceptable for consumer browsing of the Internet and what are known as ‘store and forward’ services such as email, but it is a real killer for real-time times and other services that must have a guaranteed and predictable end-to-end performance to function effectively. Here are some examples of real time services:
Real time interactive services: Many of have experienced break up of a Voice over IP call when using services such as Skype on the Internet. One minute, the conversation is going well and next you experience weird distortion and ringing which makes the person who you are talking to unintelligible. This is due to packet loss and the VoIP software trying to interpolate gaps left by the lost packets. Would you be prepared to accept this if your were having a critical sales discussion with a potential customer or investor? Of course not. Would you ever be prepared to accept this on a fixed landline? No. If you use the Public Switched Telephone Network, then there are quality standards in place to prevent this.
Ironically, we DO seem to be prepared to accept this on mobile or cell phones where quality can be sometimes abysmal but we are prepared to accept it because we can use the phone anywhere. More on this in a future post.
One other aspect that affects intelligibility and the free conversational interplay on voice calls is delay. Once delays get above about 180mS then delay becomes very noticeable and starts to make conversations awkward. Combine this with packet loss and voice conversations become very difficult.
Another service that faces the same issue, but in a compounded way, is video conferencing where both voice and video are disrupted. Poor Internet performance killed many video conferencing start-ups in the late 90s.
Streamed but non-interruptible services: The prime example of this is video streaming on the Internet where loss of packets can be catastrophic – delay is of less importance.. This may be acceptable when looking at short snippets as exemplified by the content of YouTube, but if you want to immerse yourself in a two hour psychological thriller and ‘suspend disbelief’ then packet loss or drop-outs is a killer.
This has been raised by many people when considering the new global video service Joost recently announced by the founders of Skype. Is the quality of today’s Internet really adequate to support this type of service? I guess we will find out.
Client server, thin client and web applications: Although client / server terminology seems a little dated and is being replaced by web based services as exemplified by salesforce.com, they all have the same high performance need no matter where the application server resides.
If key press commands get ‘lost’ or edited text ‘disappears due to packet loss the service will soon become unusable and the company providing buried under a snow storm of complaints.
The main take away from the above is that guaranteed Internet performance cannot really be relied upon for real-time or client-server based services if a predicable service is mandatory. This clearly the case for the majority of business services where end-to-end predicable and guaranteed performance is an absolute need covered by tight Service Level Agreements with their network or service providers.
Carriers’ private IP networks
So how do carriers provide reliable IP data services to their business customers? They use MPLS to segment traffic on their networks into two strands:
(a) Label Switch paths (LSPs) dedicated to Internet traffic from their own customers or traffic transiting their network. Carriers want to get this traffic off their network and pass the burden to another carrier as quickly as possible to reduce costs and risk. They use a routing policy called hot potato routing to ensure this happens as quickly as possibly for traffic that does not originate from their customers.
(b) LSPs dedicated to providing VoIP, video and IP-VPN services with SLAs to their business customers. This traffic is kept on their network for as long as possible – this is called cold potato routing.
In general, carriers are only able to provide IP performance guarantees for packets on their own network i.e. only for users and business locations directly connected to the carrier’s own network. The reason for this is that carriers are not able or generally willing to physically interconnect their networks with other carriers to provide seamless, performance guaranteed services that straddle multiple networks for their customers. In general, each carrier stands alone in isolation. Why should this be so in the age of the ubiquitous Internet? Therein lies the big elephant in the room!
- Each carrier defines Class of Service in a different way as this is not explicitly covered in IETF standards, so it is not easy to interconnect and maintain the same level of priority without translation. (The easy way of doing this is not to translate but to just back-to-back two routers that terminate both carriers IP-VPNs without the need for translation. Several carriers and network integrators use this approach).
- Each carrier has adopted an entirely different set of Operation Support Software (OSS) tools make interconnect to other carriers to exchange performance data exceedingly challenging. OSS systems are usually a mixture of internally developed proprietary tools and bought-in 3rd party products (This is a major issue that affects all IP services in a big way and is not seen in the PSTN world because inter-connect standards exist as defined by the ITU).
- Carriers are generally unwilling to provide SLAs on behalf of other carriers.
Note: I would like to make it clear that this is not always the case and there are several large carriers and integrators who have proactively followed a strategy of IP-VPN interconnect with partners to better support their customers or extend the reach of their network such as Cable and Wireless, Global Crossing and Vanco (with their MPLS Matrix) to name but three.
So, if you are a small to medium UK enterprise (SME) and you are able to connect all your offices and home workers to your national incumbent carrier then they would be prepared to provide you with as SLA. If performance dropped below the performance specified in the the SLA, then you will be able to claim compensation from that provider.
However if you are a multinational enterprise, with sites located in many counties you need to work with many carriers. In general, there is no way today that any of those carriers could provide you with end-to-end performance guarantees (there is no such thing as a global carrier). So what are the alternatives to managing entyerprise multi-carrier VoIP services, MPLS IP-VPNs or layer-2 VPNs in 2007?
- As an regional enterprise, manage the integration of multiple carriers yourself.
- Go to a systems integrator, outsourcer or carrier who will act as a prime contractor by having a single SLA with you. They will back this up with multiple SLAs with each carrier providing a geographic component of the WAN.
To address IP QoS issues on their own networks MPLS has been rolled out in the majority of carriers around the world today and most offer SLA-based IP-VPNs to their business customers as one of the ways they can create WANs. It is generally perceived that IP-VPNs represent a lower cost solution than legacy services services such as frame relay. Of course, enterprises can avoid reliance on carriers altogether by just buying bandwidth in the form of TDM E1 circuits and managing WAN IP themselves as per (1) above.
There is still no universal business Internet that mirrors the public Internet so if a company requires end-to-end performance guarantees to support their client-server database or VoIP service, they will need to manage multiple carriers themselves or go to a integrator who is willing to act as a prime contractor as described above.
If you are not a network person, this may all seem quite strange and will make you wonder why the global telecommunications industry has not got its act together to better support its enterprise customers who all require this capability and spend 100s of billions of $ annually? It would seem be one of the biggest, if not the biggest, commercial opportunity in telecoms today.
One technology company that is addressing this end-to-end solution delivery issue is Nexagent and I will talk about them in a post shortly (Note: now posted as Nexagent, an enigmatic company?). Note: I should declare a personal interest in Nexagent as a co-founder in 2000.