Quality of Service

Streamlining network traffic and server resources is not just a good idea, it’s vital for conducting business in an online world. Make sure your network services are on the fast track, enabling business processes and users to perform to their full potential.

  • E-Mail
By  Jon Tullett Published  March 15, 2001

Introduction|~||~||~|The ITU's original recommendation about quality of service described it as "The collective effect of service performance which determines the degree of satisfaction of a user of the service." In perspective, that was back when the ITU was still the CCITT, and focused on telephony.

In a data world, that definition needs broadening, but the context remains the same. Everything on your network is a service, regardless of who is using it. A web browser is a service, even if it is fronting a user, just as a database automatically extracting information from a repository is also a service. Instead of meeting a nebulous "degree of satisfaction", more relevant is the operational parameters. That sounds impersonal and clinical, but need not be. For example, a web user's operational parameters are mostly about their personal experience, but that tends to translate to speed and reliability, both of which are factors easily modelled and managed in infrastructural terms. So, HTTP streams need to connect reliably (no socket errors), at sufficient speed (which will vary on the content of the site - heavy animations and graphics are fine, so long as you deliver it quickly enough), and without data loss (missing pages - so the site itself needs close management).

From that example, it is clear that the quality of the service is not a network issue, it is an overall IT operational issue. Certainly the speed of the network is one part of the equation, but the fastest network in the world won't make a slow server speed up.

When thinking about quality of service, the whole picture needs to be kept in focus. That can be a daunting tasks, but can easily be taken a step at a time.

The first step is to draft a set of services you want to support, if you don't have one already. Chances are there won't be very many; email, web access, maybe file transfer, telnet for administering network devices, distributed databases and so on. That list of services is central to drafting QoS policies. It will change over time, but it is the foundation of the network structure.

Step two is to find out what is actually happening out there on the network, and how it conforms to that list of services.

Run packet sniffers for a while, and take a look at the results. What you will probably see is a large amount of traffic that doesn't comply to the list of services you drafted, which means one of three things. Either you missed a few services, or users are making use of non-core applications like instant messaging, or your network has been compromised, and that is someone else's packets flying about. Ignoring the third option for now (last month's NME dealt with post-attack methods), the other two are easily accommodated.

||**||Consult the users|~||~||~|

Consult the users

If you missed relevant services (and you may need to consult with users about this), just add them to that list. For instance, users may be making extensive use of NetBIOS to exchange files using Windows shared folders. If that is not in violation of your security or usage policies, just add the netbios protocols to your list, and move on. If on the other hand it is not a service you intend to support, such as ICQ or IRC, you can either ban it, or control it. The latter is preferred, since it engenders less uprising amongst the users. Control is a fundamental part of QoS, and explaining that in the framework of a policy document will go a long way to helping users understand network limitations.

Because foreign protocols like Napster and IRC can clog network traffic, enforcing limitations on them is an important part of maintaining QoS, which is why you need that list of services. Anything not on it is going to be low priority if it is not being outright forbidden during working hours (and even then beware: massive Napster downloads scheduled for overnight when your firewall is permitting connections will impact on visitors to your website).

Before we deal with the network directly, consider the other components of each service. If you trace the service from source to destination, there will be several other entities involved in the fulfilment (and thus quality) of the service. A database request in an ERP system will traverse several servers, undergo data manipulation along the way, and eventually be presented at the client. Each of those intermediate steps is adding latency, which needs to be minimised. For each of your core services, ensure you have a clear path for its propagation, and factors that influence its performance. A sluggish server may need to be upgraded or to have other processes offloaded onto another system. For example, many companies run their Internet gateway and email server on the same system. Users polling the mail server for new mail every five minutes will create regular peaks in demand on the server's resources, even if only for a few moments. That may negatively impact other services requiring the attention of proxy/firewall processing at the gateway. The more servers through which a service is passing on its way between source and destination, the more sensitive it will be to such latency.

That brings me to the most significant problem in the quality of service field: sharing. Most resources on your infrastructure are shared in some way. Servers share memory and disk space among applications, network capacity is shared among users and services, your Internet pipe is shared among everyone (internal and external), and so on. That is especially relevant in the IP world, because IP does not have very extensive mechanisms to control QoS. The Type of Service (TOS) field in the IP header allows some basic controls, but it was only recently that vendors started to make any use of it at all. Because IP was designed to be an extremely simple end-to-end mechanism with very little intelligence in the protocol, relying on network devices at each end to manage the connection, providing QoS mechanisms has involved a certain amount of work-around and outright fudging.

||**||Out of control|~||~||~|

Out of control

Unfortunately, when you are doing business online or otherwise interacting with WAN clients, there is a segment of the network over which you have no control. As soon as the traffic exits your router, you lose control of it. That does not mean you can't take the latency of the public network into account of course; checking your external services from remote locations should be a vital step of deploying e-business services or even a website. Sure, that Flash animation looks gorgeous on your developer's machine, but how many visitors will wait ten minutes for it to download at 3kbps? Not many.

It is also possible to over-budget. Spending a fortune on prioritising web traffic so it leaves your network at blinding speed is a bit pointless when it will promptly slow to a crawl as it traverses international Internet connections.

The same testing methodology applies to external services as internal ones; know the links on the chain, and compensate where possible to optimise both the level of service as well as the level of investment you are making.

Inside your network, today's networking gear is capable of all sorts of clever tricks to make sure your important services get the bandwidth they require, limitations of IP notwithstanding.

One of the problems is that few applications are QoS-aware, a direct conflict with the original design of IP. The protocol's design suggests that applications should make QoS decisions and flag their packets accordingly and rely on the intelligence of the routers and switches to deal with it accordingly.

Instead, because the applications are not doing so, the switches are forced to make the decisions based on central policies set by the network administrators. That is not necessarily a bad thing, as it reduces the opportunities for abuse.

After all, every application would like to be highest-priority, and every user thinks their usage is most critical.

||**||Raw bandwidth is not the cure|~||~||~|

Raw bandwidth is not the cure

It is becoming increasingly evident that throwing bandwidth at the problem is not a solution, even in the short term, unless there clearly is insufficient bandwidth to support the critical services.

Further over-provision of bandwidth is not a solution because of basic nature of traffic, which is to say it is bursty, and peaked. While over a long enough period (which may be as short as a few hours) network traffic will appear to be fairly consistent, looking closer will show sudden bursts as connections are established, sharp peaks as simultaneous connections coincide, packet storms as network problems occur and resolve, and so on.

Unless you install insane amounts of backbone infrastructure, usage will always exceed demand at some point. The solution is simple enough; better manage the bandwidth you have. If traffic peaks, then the obvious solution is for lower-priority traffic to be delayed or dropped to accommodate more important services.

Minimum performance is the key; each of your critical services will have bottom levels of performance which they require.

Those will vary wildly; voice over IP requires a minimum of several kbps per stream, and a guaranteed latency of no more than a few milliseconds per packet. Email, on the other hand, requires a reliable server more than reliable bandwidth; the email can be deferred by several seconds at any stage because it is only the ultimate delivery that matters. So, SMTP traffic should take a back seat to VOIP when capacity is reaching a dangerous threshold.

There are numerous ways to accomplish this, using mechanisms like RSVP and DiffServ, but ultimately they all come down to the same thing; establishing threshold levels for services which must be met if at all possible.

By deriving those levels from your list of core services and establishing their minimum requirements (and by projecting how those will grow in the future) should give a clearer picture of how the network should be architected to accommodate the essentials.

Once you know you are at the very least maintaining those levels, additional bandwidth and resources will serve to improve them, but the businessman can be assured of a baseline performance regardless.

Without these assurances, as most networks are designed, the chances of unexpected outage or bottlenecks can be extremely damaging to applications, e-business and overall network performance, and without a clear picture of the services in use, it is very difficult to predict usage, forestall outage and successfully serve the business.

Add a Comment

Your display name This field is mandatory

Your e-mail address This field is mandatory (Your e-mail address won't be published)

Security code