Transaction Response Time: An Example of Dependence on Lower-Level Services

Transaction response time is the primary metric of application service quality as delivered to the end user. It's also an excellent example of the dependence of an application-level quality metric on lower-level services with their associated metrics. In addition, it shows the need for communications between applications design groups and the network services and operations groups. Because of its importance, and because it's such a good example, this section traces the dependencies of transaction response time through all the underlying services and their relevant metrics. (Chapters 9 and 10 present additional details about metrics for the web-server systems and transport infrastructures, respectively.)

You can view a transaction as having several time components. These are the serialization delay, queuing delays in transmission, propagation (transmission) delay, and the processing delays in both the network (modem, switching equipment, and so on) and the web-server system. There are also delays, commonly called think time, that are associated with external user activities. They include reading the delivered content, thinking, and talking on the telephoneall of which are not considered here but which will be important in load testing. (Note that there is a strategy in user interface design that can accommodate habits of user perception and influence think time by loading different components at different speeds, organizing the presentation, and using other tactics that create perceptions of positive or negative performance variations.)

In addition, each data block has additional overhead added to it before transmission. That overhead, in the form of headers and trailers, is needed by the lower infrastructure layers as they process the data blocks.

Specifics about different types of delays are discussed in the following subsections; serialization delay and propagation delay are shown in Figure 8-1.

Figure 8-1. Serialization and Propagation Delays

Serialization Delay

Serialization delay is caused by the process of converting a byte or word in the computer's memory to or from a serial string of bits on the communications line. Serialization causes delays in most routers and, of course, at the source and destination. The time needed for serialization is the time needed to write bits on to or off of the communications line; it's controlled by the line speed. For example, 1500 bytes requires 8 milliseconds (ms) to serialize at 1.5 Mbps and 300 ms to serialize at 40 kbps. The added header and trailer overhead increases serialization delay because of the time needed to write and read those bytes.

Decreasing overhead by fitting more data into each packet decreases download time by decreasing serialization delay. However, most systems that run over the public Internet use either 1460 bytes per packet (for high-speed connections) or 576 bytes per packet (for dial-up connections); it's not easy to change those values. Changes are more easily made on private systems. (Longer packets increase jitter and the penalty for a packet error; but in a private, dedicated network where the number of router hops is constrained and transmission quality is more controllable, this might not be a major issue.)

It's important to note that serialization delay is greatly influenced by compression and encryption of content. For example, the standard home-user, dial-up modems perform hardware compression within the modem itself. For some data patterns, the modem compression ratio is 4:1 or better. If a data block has been compressed, it is shorter and therefore takes much less time to serialize. On the other hand, encrypted data cannot be compressed. (An encrypted string appears to be purely random and therefore uncompressible.) The result is that secure web pages are transmitted much more slowly on transmission links that have a large serialization delay. Such web pages should be compressed before encryption. (This is also a strong argument in favor of using true end-user measurements instead of computed or simulated end-user measurements. A true end-user measurement would include the effects of modem hardware compression; no commercial emulated measurements do that.)

Queuing Delay

Queuing delay is caused by waits in queues at origin, destination, and intermediate switching or routing nodes. Variations in this delay cause jitter. For streaming media applications, a dejitter buffer is required at the receiving end. (The delay in the dejitter buffer is typically one or two times the typical jitter.)

Propagation Delay

Propagation delay is governed by the laws of physics; propagation delay cannot be decreased by increasing the line speed. It is a distance-sensitive parameter. The ITU-T standard G.114 specifies 4 µs/km for radio, 5 µs/km for optical fiber, and 6 µs/km for submarine coaxial cables, including repeaters. Therefore, it will require 20 ms to travel the 4000 kilometers (km) from New York City to Los Angeles, or 100 ms to travel the 17,000 km from New York City to Melbourne, Australia. A signal beamed up to a geosynchronous satellite and down again, a distance of 72,000 km, takes approximately 280 ms.

An example may help illustrate the massive importance of propagation delay. Imagine a 1-MB file to be transmitted over three different connections:

A local high-speed Ethernet connection at approximately 100 Mbps
An Internet connection from New York to Los Angeles with an effective bandwidth of 1.5 Mbps and a one-way propagation delay of 75 ms (a typical coast-to-coast latency on the Internet)
An Internet connection from New York to Los Angeles with an effective bandwidth of 15 Mbps and a one-way propagation delay of 75 ms

There's some additional complexity that must be mentioned here: the Transmission Control Protocol (TCP) used by web browsers and for reliable file transmission over the Internet has a typical data block size of 1460 bytes and a window size of 17,520 bytes.

NOTE

The window is th e maximum amount of unacknowledged data that can be outstanding at any given time; the value given here is for the Windows 2000 operating system (OS). Thus, for a window size of 17,520 bytes, twelve 1460-byte data packets can be transmitted before an acknowledgment must be received. An acknowledgment is sent after each even-numbered packet is received.

Note also that TCP's slow start algorithm, which slowly increases transmission rate at the start of a file to avoid congestion, is being ignored for this example. (The large file size makes slow start less important here, but it can be important for short files.)

Now you can see the effects of propagation delay on performance:

For local, high-speed Ethernet, the propagation delay is so low that there's never a problem receiving the acknowledgments before 17,520 bytes have been serialized. The transmission of 1 MB proceeds at full line speed and is complete in approximately .1 seconds.
For the 1.5-Mbps Internet connection in our example, serialization of 17,520 bytes takes approximately 100 ms, and the propagation delay across the U.S. takes approximately 75 ms. The round trip is therefore approximately 150 ms, and the first acknowledgment is generated when the second packet has finished arriving, approximately 15 ms after the first packet begins to arrive. Therefore, as shown in Figure 8-2, there's a 65 ms pause to wait for an acknowledgment after each block of 17,520 bytes is transmitted. Transmission of the 1 MB in 58 separate blocks of 17,520 bytes each takes approximately 9.5 seconds.
Figure 8-2. Packet Transmission at 1.5 Mbps
For the 15-Mbps Internet connection, serialization of 17,520 bytes takes approximately 10 ms, and the propagation delay across the U.S. takes approximately 75 ms. The round trip is therefore approximately 150 ms, and the first acknowledgment is generated approximately 1.5 ms after the first packet begins to arrive at the receiver. Therefore, as shown in Figure 8-3, there's a 142 ms pause to wait for an acknowledgment after each block of 17,520 bytes is transmitted. Transmission of the 1 MB in 58 blocks of 17,520 bytes takes approximately 9 secondsalmost the same as for the 1.5-Mbps connection!
Figure 8-3. Packet Transmission at 15 Mbps

This situation is even more important for web transactions, where each web page may require many files, each with this type of sensitivity to transmission delays. The number of round trips required by a web page or a transaction is sometimes referred to as turns, and decreasing that number clearly decreases the sensitivity to transmission delay. Another way of decreasing download time is to decrease transmission delay itself, and Chapter 9 discusses how content distribution networks can be used to place some of the page's content closer (in terms of transmission delay) to the end user.

Processing Delay

Processing delay in the network includes modem delays (typically 40 ms or more for a pair of V.34 modems without compression and error correction functions, for example), router delays, and telephone network switching equipment delays.

Processing delay at the web server encompasses such functions as authentication, database access, use of supporting services, and calculation. Increasing the server performance, improving caching and load distribution, accelerating encryption speeds, adding servers, or adding disc capacity are all ways to reduce the processing time or time spent on a server.

The Need for Communications Among Design and Operations Groups

Figures 8-2 and 8-3 demonstrate that different combinations of speed, location, and size can accentuate different sensitivities. Careful analysis and discussions between the operations and applications teams can avoid some problems and build the foundation for applications that are actually network aware. Designers must understand the burdens they impose on the network and other resources when they add turns to a transaction, add objects to a page, or enrich the current content.

The network administrators must also increase their application awareness as well. They need to select window and packet sizes that reduce latency and improve efficiency wherever they can. They, too, must understand that bandwidth does not solve every application performance problem.

The placement of content is becoming a concern as pressures to deliver and use richer content at higher quality continue. The content delivery infrastructure discussed in Chapter 9 and the sensitivities just covered in this chapter indicate the trade-offs that must be considered in application design and operations.