Chapter 7: Definitions for OpenSSL

This topic briefly explains some terms that, while they are not part of SSL, often appear in discussions and documentation of SSL, especially of OpenSSL.

Overview

Abstract Syntax Notation number One (ASN.1) is a notation used extensively by standards committees and other bodies for defining protocols for use in data communications. It resembles a programming language, but consists mostly of features for defining data structures. There are various data types, and syntax for building up complex data structures out of these elementary types. You might think of a protocol specification written in ASN.1 as resembling the Data Division of a COBOL program, although the actual syntax of the language is more like C.

You may well never see an example of ASN.1, since it is for use in defining protocols, and even protocols themselves tend to be things that the user does not see directly. Nevertheless, ASN.1 is very widely used, and a great many electronic devices in every day use - from air traffic control systems to washing machines - use data transmission protocols that were designed using ASN.1.

Many of the standards used in SSL that we have been discussing were designed in ASN.1, and consequently you will sometimes come across ASN.1 terminology when reading about SSL. This is especially true of the documentation for OpenSSL. Although as an end-user, or as a COBOL developer, you are unlikely to come directly in contact with these concepts, you will find the OpenSSL documentation easier to follow if we explain some of them.

ASN.1 Compiler

People who are implementing protocols designed in ASN.1 often use an ASN.1 compiler, which translates a protocol specification from ASN.1 into source code in a programming language. For the C programming language, for example, the compiler would probably output a C header file (similar to a COBOL copyfile), for inclusion in a C program.

Encoding Rules

A protocol specification in ASN.1 specifies the data structures in a source-like, human-readable way. It says nothing about how those data structures are to be represented when actually transmitted. The bodies responsible for defining ASN.1 have defined encoding rules, which specify how each data structure that you can specify in ASN.1 is to be represented as a bit pattern.

There are in fact several alternative sets of encoding rules that the committees have designed over the years. The earliest set is called the Basic Encoding Rules (BER), and a later one, which produces much more compact messages, is called the Packed Encoding Rules (PER).

Distinguished or Canonical Encoding Rules

Clearly a set of encoding rules must be such that any particular bit pattern corresponds to just one ASN.1 data structure - otherwise software receiving a message would not know which data structure to interpret the bit pattern as. However, the reverse is not necessarily the case - no ambiguity is caused if several different bit patterns can represent the same ASN.1 data structure.

Nevertheless, in some applications, if different bit patterns can represent the same ASN.1 data structure, it does cause problems. For these applications, it's desirable to have a set of encoding rules where each ASN.1 data structure converts to just one bit pattern, and vice versa. This is especially true in secure communications, because if you are digitally signing messages you need semantically equivalent messages to always have the same encoding.

Such a set of encoding rules is called canonical, or distinguished. Two such sets of rules have been developed - one was named the Distinguished Encoding Rules (DER) and the other was named the Canonical Encoding Rules (CER). CER has not caught on, and is very seldom used. However, the standard file extension for a file containing DER is .cer.

In reading documentation of OpenSSL, you will often see mentions of data stored in DER format.

Object Identifiers

An important data type in ASN.1 is the object identifier. An object identifier (OID) is a reference number, unique in the world, that can be allocated to, in principle, any object; in practice, to any object that can be represented as a set of information. The object identifiers form a tree structure, which you can access at Object identifier tree. The branches of the tree are referred to as arcs, and various organizations have been allocated responsibility for assigning nodes in particular parts of the tree to anyone who requests one, to represent particular objects.

Each node has a number, and optionally, a name. An object's OID is the list of numbers representing the nodes in the tree from the root to that object's own node. They are normally shown enclosed in braces and separated by spaces. If a node's name is shown as well, the node's number is normally shown in parentheses. Only the numbers are intended to be machine-readable - the names are there to be human-readable.

Octets

COBOL programmers are used to working with memory physically organized into groups of eight bits, and calling these groups bytes. However, in communications, where we are talking about a continuous stream of bits on the wire, each group of eight bits is called an octet.

The ASN.1 encoding rules define four types of octet: identifier octet, length octet, contents octet, and end-of-content octet. There's no need to go into detail here about exactly how these are used (it's different for different encoding rules anyway); it suffices to say that the contents octets contain the actual data, while the others are needed for the syntax of the encoded bit-stream.

UTF8String

ASN.1 has a number of different string data types. They differ in the range of characters they can support, and in how many octets are used to represent each character.

The string data type you will see mentioned most often is the UTF8String. This is the best data type for internationalization. UTF8 stands for UCS Transformation Format, 8-bit (where UCS stands for Universal Character Set). It in fact uses a variable-length encoding. The original ASCII characters have exactly the same encoding as in ASCII itself, using 7-bits, and with the top bit set to zero. All other characters have the top bit set to 1. Because the encoding is variable length, all known characters can be represented.

Further Information

For detailed information on ASN.1, see ASN.1 Information Site

The formal standards documents on ASN.1 and its encoding rules are published by the International Telecommunications Union Telecommunications Sector (ITU-T) and International Standards Organization (ISO) / International Electrotechnical Commission (IEC).

Base64 Encoding

The bit patterns produced by the encoding rules described above are unlikely, in general, to correspond to printable characters. But sometimes it's convenient to represent the encoded data in printable form. This can be done by converting it to base64 encoding.

Base64 is a scheme for converting binary data to printable ASCII characters, namely the upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols, with the "=" symbol as a special suffix code.

The data's original length must be a multiple of eight bits. To convert data to base64, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes to encode, the corresponding buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" and each character thus selected is output.

This is repeated until the input data is exhausted. If at the end only one or two input bytes remain, then only the first two or three characters of the output are used, and the output is then padded with two or one "=" characters respectively. This prevents anyone adding extra bits to the reconstructed data.

The resulting base64-encoded data is approximately 33% longer than the original data, and typically appears as seemingly random characters.

Base64 encoding is specified in full in RFC 1421 and RFC 2045.

Many of the commands provided by the openssl.exe utility can convert between normal binary encoding and base64 encoding.

SSLeay

The Open SSL documentation occasionally mentions SSLeay, mainly where it needs to discuss compatibility of OpenSSL with SSLeay.

After SSL was first introduced by Netscape, an independent implementation of it was developed, the SSLeay library by Eric A. Young and Tim Hudson. This was released in 1995. OpenSSL is based on this SSLeay library.