|
There is no way that we could hope to cover IPs completely in a book about firewall diagnostics. It's a huge subject and has been covered in absolutely fantastic detail by the late great W. Richard Stevens in the books, The Protocols TCP/IP Illustrated Volume 1 and The Implementation (TCP/IP Illustrated, Volume 2), coauthored by Gary R. Wright. A basic understanding of TCP/IP and how it fits with the concept of firewalling, however, is within the scope of this book. If you want to know more about TCP/IP, please check out the fine books mentioned here, and if you already have a firm grasp of this subject, then you can skip over some of this material. We provide it here as a brief introduction for the reader. Understanding the Internet Protocol (IP)IP is a Layer 3 protocol (Network Layer) and is documented fully in RFC 791. IP traffic contains routing and address information and is the medium by which packets traverse the Internethence, the rather obvious name, "Internet Protocol." This, combined with the Transmission Control Protocol (TCP), forms the backbone of how most services on the Internet function. Aside from IP's routing duties, this is also where the reassembly of packets (which is what we call a single unit of IP, TCP, ICMP, or UDP data) occurs when IP traffic moves through devices that dictate the Maximum Transmission Unit (MTU) size. Briefly, the MTU defines how big of a packet a particular device can handle. With fast connections, this value isn't nearly as important because it's almost always set to the maximum size possible. MTU is mostly an issue with slower connections, such as modems, where the maximum packet size is much smaller. MTUs become important when packets are lost and have to be retransmitted. If a packet is very large and only a tiny portion of the packet was really lost, the entire packet must be retransmitted. The idea behind smaller MTUs is to help prevent resending too much data. Figure 5.2. The Internet Protocol Packet.
What an IP Packet Looks LikeThere are 14 fields in an IP packet. They are (from left to right):
A word of cautionall of the headers in an IP packet can more or less be manipulated by any party between two points. This means that you shouldn't trust the fields in an IP packet arbitrarily. You can attempt to encrypt the traffic between two points, but if those two points are using IP, the headers on those packets can still be changed by an attacker. The bottom line is not to trust the headers of a packet by themselves. Understanding ICMPThe Internet Control Message Protocol (ICMP) is a quasi Layer 3 protocol (Network), documented by RFC 792 and RFC 1700. ICMP is used to pass IP packet error and processing information between IP devices. We call this information an "ICMP Message." A common ICMP packet is the "ping" packet, but there are many others. Some of them are not necessary anymore, and some are really critical to a healthy network. ICMP packets are somewhat unique in the world of TCP/IP. Because ICMP cannot live in Layer 3 all by itself, it's actually contained inside of an IP packet as documented here, much like the Layer 4 TCP and UDP packets documented in the following sections. Also unique to the ICMP packet is that the data portion of the packet does not contain the "ICMP Message." Rather, the "Message" component is in the Type and Code fields as documented in Table 5.1.
Figure 5.3. The ICMP Packet.
What an ICMP Message Looks LikeThe two fields of real importance with ICMP are the Type and Code fields. There are technically 255 ICMP message types, although presently only 34 are used, of which there are also several sub-types or codes for each of those 34 types. To put it succinctly, there are many ICMP messages out there. As stated previously, only some of them are necessary to keep your network healthy; the others you can and probably should learn to live without. Understanding TCPTCP stands for the Transmission Control Protocol and is situated in Layer 4 of the OSI model. It is a connection-oriented service and handles the transfer of data, flow control, reliability, and multiplexing all in one protocol. TCP is a very robust protocol and can handle most error conditions with automated recovery. TCP is a good protocol for higher-level protocols, such as HTTP and SMTP, which do not have built in error recovery and flow control capabilities. But TCP is not ideal for protocols that handle this internally, such as VPN protocols, which are tunneling TCP connections within. With VPNs, UDP is usually the better protocol to use. We discuss this in more detail later in this chapter when we cover UDP. There are 13 fields in a TCP Packet. They are (from left to right):
TCP provides many additional capabilities, such as reliability, efficient and varied methods of flow control (Linux has the capability to modify this further through /proc as detailed in later chapters), full-duplex communication, multiplexing, and streaming. With streaming, TCP can deliver bytes in an unstructured form identified by sequence numbers. This is used when an application does not or cannot break data into blocks that fit efficiently on the network. TCP is left to work this out for the application. In these cases, TCP groups the packets together into what are called sequences, based on internally determined maximum sizes based on network conditions and the way the system is configured. ReliabilityTCP provides reliability to data through full duplex connections. Unlike UDP, TCP packets always receive acknowledgment from the receiver, or they are sent again until acknowledged. TCP accomplishes this by performing what is called "forward acknowledgment" by adding in an Acknowledgment number for the next byte of data the source expects the recipient to receive. A specified amount of time is tracked by the sender, and if the block of data is not acknowledged within that period of time, the packet is sent again to the receiver. This makes it possible for TCP to respond to lost or damaged packets reliably. The receiver may also request that a packet be resent. Full Duplex and MultiplexingTCP is truly full duplex, which means that the protocol can be used to both send and receive data at the same time. TCP also can be used to perform multiplexing, which means that many simultaneous connections and conversations can be conversing over one connection. This is accomplished via upper layers of the OSI model through higher-level protocols, such as SMTP and HTTP for instance. Flow ControlFlow control is another feature of TCP. Flow control is defined as a technique used to stop the sender of data from sending more data than the receiver can accept. This is achieved through the use of sequence numbers, where the receiver sends back the highest sequence number it can receive without exceeding its receive buffers. The sender is supposed to transmit packets up to but not exceeding that sequence number, and then the sender is to wait until the receiver sends another ACK packet with a higher sequence number. There are a number of algorithms to achieve this, many of which are configurable through /proc under Linux. More detail on this is provided in later chapters. Congestion ControlTCP also provides for what is called congestion control by adapting to network is designed to operateconditions to slow down or speed up the rate at which packets are sent. This is to respond to the "common" reality of IP networks, where guaranteed flow rates are difficult to accomplish and packets are lost due to overloaded routers, switches, hubs, hosts, and any other device in the path. How TCP Connections Are EstablishedTCP, unlike UDP, is designed to operate via fully established connections. It is not a broadcast protocol, but a true three-way communications protocol. Nothing is assumed with TCP until both the receiver and sender acknowledge the communication. To do this, TCP uses a three-way handshake to establish a new connection and to synchronize the hosts on both ends to each other's capability to receive and send data. If you recall the sequence numbers discussed earlier, this is how TCP accomplishes flow control and readies both sides to send and receive data at that established rate. This helps to prevent unnecessary packet retransmissions. As each host starts the process, it is supposed to randomly pick a sequence number. It's worth noting here that not all random number generators are the same. Some OSs, including some Linux kernels, may not pick sequence numbers that are truly random. This may seem unimportant, but there are a number of attacks on TCP streams that are accomplished by predicting the next sequence number and spoofing the next packet in a stream to "hijack" the stream. A good random number generator for your sequence numbers helps to protect against these sorts of attacks and also makes certain types of DoS and DDoS bounce attacks more difficult against your systems. Thankfully, there are patches for the Linux kernel that make the sequence numbers and other random numbers used by the IP stack truly random. After the hosts have picked random numbers to use as their sequence numbers, the host that is initiating the connection, Bob, sends a TCP packet to the receiver, Alice, with the initial sequence number, XY, and the SYN flag set in the header of the packet. Alice, the receiver, then processes the packet, records the sequence number XY, and sends back a packet to Barrett with the acknowledgment field set to XY+1. The receiver increments the original sequence number by 1. Alice also adds its own initial sequence number to the packet, Z, and sets the ACK flag in the headers of the TCP packet. Each host then increments these sequence numbers by the number of bytes of data it has successfully received from the other host. This acts as a mechanism for each host to limit the amount of data sent in each packet to only the data the other party needs and can receive. Figure 5.4. The TCP packet.
Figure 5.5. Three-Way Handshake (TWH).
How TCP Connections Are ClosedTCP is as equally methodical about closing down connections as it is about creating them. TCP sessions can be closed in two ways. The first is very similar to the way in which a connection is established, via a four-way handshake referred to as a "close." The second method is called a TCP abort and uses a special packet flag called the reset, or RST flag. A close request using the four-way handshake method starts with the party that wants to close the connection by sending a FIN+ACK packet to the other party. This is the most graceful manner of closing a session and does not represent any sort of error state on the requester's end. We point this out because a TCP abort, the second method, is normally invoked when one of the parties experiences an error and wishes to close the connection in an error state. We have seen some vendors that use the RST method a little too prodigiously, which causes some high level applications that depend on this "niceness" to infer an error state from the state of the connection. If the close method is used, many programs assume the connection was successful; if the abort method is used, they infer, rightly in fact, that something went wrong. One fascinating example of this occurs with some high-level web applications that look for an error state from the TCP connect call, and when a TCP abort occurs, the web application assumes the entire HTTP connection failed and will simply retry the connection again. We found this problem at one large government customer of ours. They had an XML/HTTP application designed to send messages to another host running a web server. The client would send the data via an HTTP POST, the server would acknowledge the successful receipt of the data way up at Layer 7, and meanwhile down at Layer 3, a low balancing switch killed the session part way through the close process with an RST packet. The client inferred, incorrectly, that the server had not received the data correctly. Technically speaking, this was one correct manner of interpreting the TCP ABORT call. Regardless, our client's application assumed it needed to resend all the data again because all the data it got back from the server saying that it successfully received the data was discarded due to an error state created by the TCP ABORT call. In fairness, the program on the client could have been better written to take these exceptions into account, perhaps by simply looking at the HTTP data and ignoring the TCP error state. Nevertheless, the problem was so low in the OSI model that our customer spent months trying to figure out why their application kept randomly attempting to resend the same messages over and over again when the server had acknowledged them already. TCP CLOSEThe TCP CLOSE is accomplished, as we have already discussed, via a four-way handshake. Unlike establishing a connection, to close a session, both hosts must agree to the request. One host cannot arbitrarily close the connection via this method. The process starts with the party that wishes to close the connection sending a FIN packet. The other host, if it agrees to close the session, sends back a FIN+ACK packet. The other host also must send its own FIN packet to the first host, which must also send back a FIN+ACK packet. The process is basically a complete mirror on both ends. This is done so that both sides can empty their buffers of any remaining data. Until the final FIN is sent, data can still flow from the party that did not initiate the CLOSE request. This means that connection can be half closed, meaning that one side has stopped sending data and is waiting for the other side to stopwhile the other side is still sending data. TCP ABORTA TCP ABORT, or RST packet, is sent normally when data has been lost due to an unrecoverable error. This method of closing a TCP session can be used arbitrarily to shut a connection down. Both hosts do not have to agree to close the session. Figure 5.6. Three-way teardown.
There is much more to TCP than we have covered here, so don't consider this a complete explanation. For instance, we haven't covered the scenario of TCP windows or the various methods used to recover from packet loss. Other books cover those issues in wonderful detail; our objective is to briefly cover those aspects of TCP that are most important to troubleshooting firewalls. We also will assume, in later portions of the book, that the reader has a more in-depth understanding of TCP if the subject matter requires it. Where this assumption exists, we will point the reader to other books on the subject. Understanding UDPUser Datagram Protocol (UDP) is the other "big" protocol used on the Internet. Unlike TCP, it is entirely connectionless, but like TCP, it's also a transport layer protocol (Layer 4) and is part of the Internet Protocol. Also unlike TCP, UDP has no inherent flow control, error recovery, or reliability capabilities. If a UDP packet is lost, the receiving host has no way of knowing or reporting this as part of the native functionality of UDP. Because of this, some applications that use UDP have implemented their own internal functions to compensate. This is not to say that UDP is bad protocolfar from it. Sometimes a developer does not need the additional functionality of TCP, or the flow control characteristics of TCP may cause problems such as with VPNs. There are advantages to UDP because of its simpler design, as already indicated, but another is that because UDP packets tend to be smaller, they will use less bandwidth. This can be necessary for protocols that consume large amounts of bandwidth, such as file transfer protocols, VPNs, or voice-over IP. UDP packets contain five fields, the first four of which are the UDP header. The header is made up the following fields: source and destination ports, length, and checksum. The body of the message, also known as its data or payload field, contains the actual message or data being sent via the UDP protocol. The checksum field is optional and can be used to provide some integrity check on the UDP header and data fields, but is not required. Figure 5.7. The UDP packet.
Troubleshooting with this Perspective in MindAside from the obvious benefits of using this bottom-up methodology, we have found that when you don't really know where to start, that looking at the problem through this lens can be very helpful at quickly ruling out elements that may not be causing the problem. It's also amazing how often these simple problems end up being the cause of what appears to be a complex problem. Following are some undoubtedly oversimplified but really common problems we have seen happen far too often to people who should know better by nowto help illustrate the things to look for at these layers of the OSI model. The point being that sometimes, you're just over-thinking the problem. Also note how the "quick checks" get more complicated the farther you go down. Start with the simple things first! The following Scott one-liners on each layer help illustrate the point. Layer 1: Test To Make Sure You Have Physical Connectivity
This is one of the simpler things to test but often goes overlooked. What you want to look for is not just that you have a link light, which is also important, but that the cable is properly terminated, is it in spec for the connection it's being used for (CAT3 when you need CAT6), and other physical problems. The following is a general list of things to look for; the key thing to keep in mind is that you want to rule out physical connectivity issues before you move on to more difficult problems.
Layer 2: Test Your Driver
Layer 3: Test IP Layer
Layer 4: Test the TCP Layer
Layer 5: Test the Session Layer
Layer 6: Test the Presentation Layer
Layer 7: Test the Application Layer
The lesson to take away from this approach is to conclusively eliminate dependencies. This isn't a complete list of things to check. It's just a set of examples to help remember the layers and what to look for as you work through them. The chapters in Section 3 of the book have more detailed lists of items to test at each layer for specific problems, but you may determine that there are others specific to your implementation that we may not cover. Remember, if you cannot eliminate a layer and must move onto another layer, keep in mind the unresolved problems from the previous layers. It could be a more complex problem. The intent is to eliminate variables to make it possible to work on increasingly complex layers. As you move up the OSI model, the number of dependencies increases dramatically; if you don't rule out a layer, you're making it that much harder to diagnose the next layer. Repeat and live by the old adage, "keep it simple," when you're troubleshooting and you'll save yourself a lot of time by reducing the variables in your problem. |
|