Chapter 1: SSL

This chapter gives an overview of the whole subject of SSL encryption.

For most of this chapter we'll assume, for simplicity, that you are the sender of messages and someone else is the recipient. In practice, of course, you're likely to be sending messages to each other, and so everything we describe is actually happening in both directions.

Throughout this explanation, we have taken care to say "your SSL software does so-and-so" not "you do so-and-so". This is partly to emphasize that all the detailed work is done by the software - as a user, or even as a programmer, of SSL-enabled software you generally just set options. It is also to emphasize that you must keep your computer secure, with such things as login passwords. If someone else can use your SSL software in your place, obviously all security is lost.

What is SSL?

SSL is a standard mechanism for sending and receiving electronic communications in encrypted form. SSL can be used wherever confidential information is to be sent between machines, such as a network, a company intranet, or the Internet. For brevity, this book generally talks about the Internet.

SSL is heavily reliant on the use of Public Key Infrastructures (PKIs). A PKI is a system of issuing online certificates to people and organizations involved in electronic communications - such as Web sites - to confirm their authenticity and trustworthiness.

SSL is widely used by organizations large and small. For example, if you use online banking, you may notice that the Web addresses - also known as universal resource locators (URL) - of many pages on your bank's Web site have addresses beginning https rather than http. This indicates that messages between that Web page and the Web site are sent via a version of HTTP that uses SSL.

Any two software programs that communicate with one another and have SSL support can send their messages to one another in encrypted form, provided that they also have a trust relationship with a PKI.

SSL can provide the following benefits:

Confidentiality – keeping information private
Integrity – ensuring information has not been tampered with
Authentication – confirming the identity of the individual who sent the information
Non-repudiation – proving that the individual who sent the information did so

Some of the above benefits are provided by PKIs that are used by SSL.

SSL and TLS Terminology

SSL was originally defined by Netscape. The first publicly released version, SSL v2, came out in 1994. Several vendor-specific variants appeared, notably Private Communications Technology (PCT) from Microsoft. PCT was effectively a rival to SSL v3 when that came out in 1995. The Internet Engineering Task Force (IETF) took on the task of creating an industry-wide standard, and developed what was effectively SSL v3.1. However, they adopted for this the vendor-neutral name of Transport Layer Security (TLS). The terms TLS and SSL are sometimes used interchangeably. In this book we generally keep to the term SSL.

Encryption and Decryption

When software with SSL enabled sends a message, it sends it encrypted - that is, in a secret code. The software that receives the message, provided it too has SSL enabled and knows the secret code used, decrypts it - turns it back into the original uncoded message.

This gives you the first of the benefits of SSL mentioned above: confidentiality. If some unintended recipient - known as an eavesdropper, or attacker - picks up the message, they can't decipher it, because they don't know the code.

To encrypt a message, you need a key and an algorithm. A key is a number, and an algorithm is a method of combining this number with your message to create the encrypted message. To take a simple example, imagine you encrypt a message by replacing each letter by the letter two places farther along the alphabet, so that, for example, IBM becomes KDO. Here, "replace each letter by the letter n places farther along the alphabet" is your algorithm, and 2 is your key.

If you were to use a different key, you'd get a completely different code, although you're still using the same algorithm. So if for example n = 1, IBM becomes JCN.

Keys and Algorithms

Now it seems obvious that the intended recipient of your coded message needs to know the algorithm and the key, so as to decode the message; and equally it seems obvious that the algorithm and key must be kept secret from everyone else. And so you get the problem of how to let your intended recipient know them - after all, anyone who might be eavesdropping on your messages might be also eavesdropping when you send the algorithm and key.

Luckily, it's not as simple as that.

The algorithms typically used by SSL are so complex (typically taking several pages of mathematics to describe), and the keys used are typically so long (sometimes hundreds of digits long, when written in hex or decimal), that even if someone does know the algorithm, the chances of them guessing the key, or finding it by trial and error, are virtually nil. So, far from keeping algorithms secret, SSL uses well-known algorithms, which have been developed and published over the years by experts in cryptography.

When you are configuring software that uses SSL, it will generally ask you which of these well-known algorithms you want to use. Since the software does all the encryption/decryption for you, you don't need to understand these algorithms (you'd need considerable mathematical knowledge to do so), but you do need to know the names of the best-known ones and a little about their relative advantages and disadvantages. We'll cover this later. We'll mention some well-known ones as we go along.

You don't need to decide a key yourself, either. Instead, your SSL software uses a random number generator to create one.

Public Keys, Private Keys, and Secret Keys

As for keeping the key secret from all but your intended recipient, well, some of the well-known algorithms need two keys - and a message encrypted with one of the keys can only be decrypted with the other. So once you and your intended recipient have agreed which algorithm to use, your recipient's software generates a pair of keys, and sends you one (called his/her public key) while keeping the other one secret, even from you (this one is called his/her private key). Your software then encrypts and sends your message using the public key, and the recipient's software decrypts it using the private key.

So there's no danger of the private key being discovered by an eavesdropper - it never gets sent anywhere. It is known only to its owner.

As for the public key, there's no need to keep it secret. On the contrary, it can safely be made available to anyone who requests it. We'll come back later to how this is done.

An algorithm like this, using a private key and a public key, is called asymmetric. An algorithm that uses just one key, which has to be a secret between the sender and recipient, is called symmetric, and the key is called a secret key.

An algorithm that uses a private key and a public key is also called a public key algorithm. One of the best known ones is RSA, developed by Ron Rivest, Adi Shamir, and Leonard Adleman in 1977. An algorithm that uses just one key, a secret key, is called a secret key algorithm. One of the best known ones is Data Encryption Standard (DES), published as a US standard in 1977.

Using Asymmetric and Symmetric Algorithms Together

An asymmetric system has one major disadvantage - the algorithms are very complex and therefore very time-consuming. No matter how fast your computer is, you wouldn't want to use them to encrypt/decrypt large amounts of data.

So encryption software normally uses a symmetric algorithm on the data itself, meaning the secret key has to be sent to the recipient - but in case an eavesdropper intercepts the secret key, the software sends the secret key encrypted, using an asymmetric algorithm.

In this kind of system, the secret key is called a session key, since it is generated when the would-be sender and recipient first connect, sent using the asymmetric algorithm, and then used for all the data transmissions until the session is complete and the connection broken. A new secret key is generated for every new session.

Hashing

You may be familiar with hashing from the way some file systems work - the characters of a record's key field are combined in some way to create a number called the hash, which is then used as the basis of the disk address that the record is written to.

Hashing in SSL works on the same principle, although used for a different purpose. The sender's software combines the characters of the message to form a hash, which is sent with the message. The recipient's software does the same, and compares the hash it has calculated to the hash sent with the message. If they are the same, then the message has not been corrupted en route.

This is the second of the benefits of SSL mentioned above: integrity.

Signing

But imagine the following:

You send a message, encrypted and accompanied by a hash.
An hacker intercepts it and prevents it reaching your intended recipient.
The hacker, pretending to be you, sends a new message, encrypted and accompanied by a hash.
The recipient decrypts this, and finds the hash matches the hash he or she calculates, so believes he or she has received an uncorrupted message from you.

This is called a man-in-the-middle attack.

The solution to this is for your software to send the hash encrypted as well - but encrypted using your own private key. Since you have published your public key, the recipient's software will be able to decrypt the hash and use it. And the recipient can be sure it came from you, since only your software knows the private key that matches your public key.

A hash encrypted with your private key is called a digital signature.

So if you send a message signed in this way, the recipient knows it came from you. This is the third of the benefits of SSL mentioned above: authentication.

In fact, with a digital signature, your recipient can prove it came from you. If you should ever want to deny it, you can't. This is the fourth benefit mentioned above: non-repudiation. In some parts of the world, a digital signature has legal force, like a traditional hand-written signature.

Notice though that, unlike a hand-written signature, a digital signature is different every time - it is an encryption of the hash of the message it is attached to.

A similar idea to this is a message authentication code (MAC), which is a hash encrypted with the secret key of a symmetric algorithm. MACs are used instead of digital signatures to provide integrity, authentication, and non-repudiation in some systems.

Certificates

When asymmetric algorithms were first invented, it was suggested that people should publish their public keys in books like phone books. That idea didn't catch on. Instead, a new standard file type was invented, called a certificate.

A certificate contains its owner's public key and everyday elements of their identity such as their name, company, location, and the DNS host name of their computer. Your SSL software normally includes a function to create your certificate, but it's more usual to get an independent body, called a Certificate Authority, to create it. We'll go into this in the next chapter.

Anyone can have a certificate, and anyone needing to prove their identity online needs one. Notably, they are needed by organizations such as online banks and other Web sites that need clients or other organizations to communicate with them securely.

When you connect to such a Web site, the initial contact (called the handshake) between your SSL software and theirs includes their SSL software sending yours their certificate. This is how you get their public key, so that encrypted communication between you can begin, as described above.