Introduction to Multi-threading

Writing Multi-threaded Applications

Chapter 2: Synchronizing Execution and Resolving Contention

Uncontrolled multi-threading within an application can cause unpredictable end results. To achieve predictable results from an application, you need to synchronize the execution of threads, and resolve data contention between threads; there are various methods you can use to achieve this. Which of these methods you use depends on the characteristics of your application's data access, and the requirements for true multi-threading within the individual COBOL programs that make up your application. You can use:

Program attributes assigned by Compiler directives
The attributes of a data item to determine if there is a possibility that threads will contend for access to data items
Synchronization primitives

2.1 Multi-threaded Program Attributes

There are three main program attributes that affect multi-threading within a COBOL program. These attributes are assigned by including or omitting one of the multi-threading Compiler directives when you compile your program. The directives used affect the allocation of the program's system work areas and can cause automatic system locking of the compiled program. You can specify that a program:

Is not multi-threading, by specifying no multi-threading Compiler directives
Has the serial attribute. In a serial program, work areas are allocated statically, and the program is locked on entry and unlocked on exit
Has the reentrant attribute. In a reentrant program all system work areas on the stack are dynamically allocated (eliminating thread contention on these areas), and multiple threads can enter the program at the same time

2.1.1 Specifying No Multi-threading

You can specify that a program is not a multi-threading program, by omitting any of the multi-threading Compiler directives when you compile your program. If you do this, system work areas are allocated statically, and are thus subject to contention. This approach has several advantages, call speed and efficient stack usage among them, but it is up to your application to make sure that only one thread executes within a non-multi-threading program at a time. This can be accomplished through implicit program logic in any calling program; for example, when the application, by design, has only one thread calling a non-multi-threaded program. Alternatively, one of the synchronization primitives (such as a mutex) can be locked in the calling program just before the called program is entered and then unlocked when the called program has returned.

2.1.2 Serial Programs

In a program that has the serial attribute, system work areas are allocated statically, and the program is locked on entry and unlocked on exit. This enables only one thread to execute the program at a given time and so eliminates any contention on system or user work areas. No other explicit application logic is required.

A program can be given the serial attribute by specifying the SERIAL Compiler directive when you compile it.

By specifying a traditional COBOL program as a serial program it can be included in a multi-threaded application without any source changes.

The disadvantages of serial programs are that:

The level of multi-threading is limited within the application. Ideally, a multi-threaded application should enable free execution of as much code as possible to allow maximum performance.
Locking and unlocking the program can have a major run-time cost. This can significantly affect call speed and overall application performance. For this reason, as few modules as possible should be compiled with this directive and you should ensure that called programs have the no-multi-threading attribute. For example, if program A is to be compiled with the serial attribute and program A is the only program in the application that calls programs B and C, then by application design, program B and C are already serialized and can be compiled with the no-multi-threading attribute.

2.1.3 Reentrant Programs

You can specify that a multi-threaded program is to be reentrant by compiling it with the REENRANT Compiler directive. You should use reentrant programs for most (if not all) of the modules in a multi-threaded application.

If you specify REENTRANT"1", all compiler-generated temporary work areas are allocated on a per-thread basis. All user data and FD file areas allocated in the environment and data divisions are shared by all threads. It is the responsibility of the programmer to ensure serialization to the program's data by use of the CBL_ synchronization calls.

If you specify REENTRANT"2", all working-storage and file section data, as well as system work areas, are allocated dynamically on the stack. This eliminates thread contention on these areas, so that the program behaves safely if called from multiple-threads concurrently. No program locking or unlocking is needed. The disadvantage with this compiler directive setting is that there is no data shared (other than any data defined as EXTERNAL) between the threads.

REENTRANT"2" is a quick and simple way of getting a program running in a multi-threaded application, but you should aim to compile with REENTRANT"1".

A reentrant program must itself resolve all possible contention on data items in its Working-Storage Section and File Section. One or more of the techniques for resolving data contention must be used to accomplish this; see the next section for details.

2.2 Using Data Attributes

The attributes of a data item determine if threads that access it can contend for it. Threads can never contend for data items defined:

In the Local-Storage Section or Thread-Local-Storage Section
With the THREAD-LOCAL data attribute

Any other data items defined within the program are shared between threads, so contention is possible. Contention might occur for:

Data defined in the Working-Storage Section and the File Section
Various other implicitly defined data items such as file-status fields
Data stored in some Compiler special registers (except the RETURN-CODE register)

Data defined in the Local-Storage Section is traditionally used in recursive COBOL programs; for every recursion of the program a new instance of this data is allocated on the stack. Since each thread in a multi-threaded application has its own stack, this attribute is ideal for defining contention-free temporary and work items within reentrant programs. By design, a reentrant COBOL program also enables recursion within a single thread.

The disadvantage of using data defined in the Local-Storage Section is that a data item disappears when the reentrant or recursive program exits. If a program has exited, then on the next entry into that program the data items defined in the Local-Storage Section have undefined values. If a thread requires the program to preserve state across calls, some other mechanism must be used.

Data defined as thread-local, either in the Thread-Local-Storage Section, or by the THREAD-LOCAL attribute, resolves these problems. Thread-local data is unique to each thread, persists across calls and can be initialized with value clauses. Thread-local data can be viewed as a thread-specific Working-Storage data item. This kind of data is useful for resolving contention problems in most reentrant programs. In many cases, a program that does no file handling can be made completely reentrant by simply changing the Working-Storage Section header to a Thread-Local-Storage Section header. You can fine-tune data allocation by defining all read-only constants in the Working-Storage Section, and all read-write data items in Thread-Local-Storage Section.

The use of thread-local data has the following disadvantages:

Data values are not easily communicated between threads; for this it is better to define data in the Working-Storage Section and use appropriate thread synchronization.
It adds an overhead to program calls; if it is used in a high-traffic module then overall application performance can be seriously affected.
It is not possible to have EXTERNAL items within the Thread-Local-Storage Section. Currently, if thread-local data must be shared between application modules, the CBL_TSTORE_n routines must be used

Sometimes threads require more than just private data. Most multi-threaded applications communicate and coordinate thread execution through shared data that is accessed by each thread under a strict protocol. In COBOL this is accomplished through the use of Working-Storage Section or File Section data in conjunction with the use of various synchronization primitives that resolve any possibility for destructive contention on the data. See the next section for details of the synchronization primitives.

2.3 Using Synchronization Primitives

Synchronizing threads that access shared data is critical for predictable results in a multi-threaded application. Understanding the nature of the various data accesses between threads is the first step in determining what synchronization primitives and regimes should be used. Once the data access has been characterized, you can follow a synchronization regime for data items among all threads. The following sections outline common data sharing problems and their solutions.

2.3.1 Using a Mutex

The simplest form of a data sharing problem is when multiple threads require mutually exclusive access to shared data at some point during their processing. The area of code that accesses this shared data is called a critical section and these critical sections can be protected by the use of a mutex that is logically associated with the shared data items. The term mutex comes from the phrase mutual exclusion. The mutex associated with the data items is locked before a critical section is entered and unlocked when that critical section is finished.

It is vital that all threads lock the mutex before accessing any of the data that is being protected. If even one thread fails to follow this regime, then unpredictable results could occur (see the example below).

One problem with using a mutex is that it can severely limit the level of multi-threading within an application. For example, you have an application in which some programs add items to a table or count items in a table and, in order to maximize multi-threading, you want multiple threads to be able to count items in the table simultaneously. However, if a program was adding items to a table, you would not want other program to add items or count them.

A solution to this problem would be given by a synchronization primitive that enabled multiple threads to be active in read-only critical sections but prevented any other access when a thread is active in a write-critical section; such a synchronization primitive is the monitor.

Example - Protecting critical sections:

The following code illustrates the protection of two critical sections that access a table. These sections add to the table or count items in it. The Working-Storage Section data items table-xxx are protected by table-mutex. The mutex prevents a thread from adding data to the table while another thread reads the table at the same time.

 Working-Storage Section.
 78  table-length     value 20.
 01  table-mutex      usage mutex-pointer.
 01  table-current    pic x(4) comp-x value 0.
 01  table-1.
    05  table-item-1  pic x(10) occurs 20.
 Local-Storage Section.
 01  table-count      pic x(4) comp-x.
 01  table-i          pic x(4) comp-x.

*> Initialization code executed while in single threaded 
*> mode
     move 0    to table-max
     open table-mutex

*> Add an item to table-1, this is a critical section
     set table-mutex to on
     if table-current < table-length
         add 1 to table-current
         move 'filled in' to table-item-1(table-current)
     end-if
     set table-mutex to off

*> Count items in table-1, this is a critical section
     set table-mutex to on
     move 0 to table-count
     perform varying table-i from 1 by 1 
             until table-i > table-current
         if  table-item-1(table-i) = 'filled in'
             add 1 to table-count
         end-if
     end-perform
     set table-mutex to off

2.3.2 Using a Monitor

Monitors provide a solution to particular access problems not handled easily by mutexes. For example, you might want many threads to be able to simultaneously read data, but only one thread to be able write data. While the thread writes data, you might want to block read access by other threads.

A monitor is used by a critical section to declare what type of data access the critical section will be performing on the protected data; that is, reading, writing, or browsing.

The monitor synchronization facility can be extended to provide a critical section for browsing; this can be very useful. A browser reads data and, depending on conditions you have set, might or might not write to protected data items. While a browser is active, any number of critical sections that simply read the data are allowed, while any other critical sections that browse or write are not allowed. If the browser thread determines that it needs to write to the protected data, it requests a conversion of the browse lock to a write lock. The conversion process waits until all critical sections that read the data are finished, and then bars any other critical sections that read or write data from accessing the data. The browser proceeds to write with exclusive access to the protected data items (whose state is guaranteed to be the same as it was when the browser was just reading the protected data).

Example - Using a monitor to control the access of multiple threads:

The following example code shows a monitor that controls the access of multiple threads that count items in, or add items to, a table. The code:

Enables multiple threads to access data for counting
Disables any threads that add items to the table when a thread that counts items is active
Enables one thread that adds items to a table; while this thread is active, the code disables access to the table by any thread that counts items

 Working-Storage Section.
 78  table-length     value 20.
 01  table-monitor    usage monitor-pointer.
 01  table-current    pic x(4) comp-x value 0.
 01  table-1.
     05  table-item-1  pic x(10) occurs table-length.
 Local-Storage Section.
 01  table-count      pic x(4) comp-x.
 01  table-i          pic x(4) comp-x.

*> Initialization code, executed while in single
*> threaded mode
     move 0 to table-current
     open table-monitor

*> Add an item to table-1, this is a
*> writer critical section
     set table-monitor to writing
     if table-current < table-length
         add 1 to table-current
         move 'filled in' to table-item-1(table-current)
     end-if
     set table-monitor  to not writing

*> Count items in table-1, this is a 
*> reader critical section
     set table-monitor to reading
     move 0 to table-count
     perform varying table-i from 1 by 1 
             until table-i > table-current
         if  table-item-1(table-i) = 'filled in'
             add 1 to table-count
         end-if
     end-perform
     set table-monitor  to not reading

Example - A browser critical section using a monitor:

The following is a very simple example of a browser critical section:

 Working-Storage Section.
 01  data-monitor  usage monitor-pointer.
 01  data-value    pic x(4) comp-x value 0.

*> Initialization code executed while in single threaded 
*>mode
     open data-monitor

*> Add an item to table-1, this is a browser critical 
*> section
     set data-monitor to browsing
     if data-value < 20
         set data-monitor to writing 
                                converting from browsing
         add 5 to data-value
         set data-monitor to not writing
     else
         set data-monitor to not browsing
     end-if

We do not recommend that you use a browse lock for such a simple check. Usually, you need only use a browser if a significant amount of work has to be done to determine if a write lock is actually required, and you want to maximize multi-threading throughout your application. There are various other monitor conversions available to help you maximize the level of multi-threading in your application

2.3.3 Using a Semaphore

A semaphore is a synchronization primitive that acts like a gate that lowers to prevent passage of a thread through code and raises to enable passage (of another thread) through that code. A semaphore is similar to a mutex, and can be used instead of a mutex.

A semaphore is less efficient than a mutex, but it is more flexible: one thread can release a semaphore, enabling any other thread to acquire it. Contrast this with a mutex. A mutex must always be acquired before it can be released, and the operations of acquiring and releasing it must happen within the same thread. Semaphores provide a way of signaling from one thread to another.

The semaphores shown in the first two examples below count the number of releases that are outstanding on a semaphore and enable that many acquisitions to pass unblocked.

Example - Using a semaphore:

The following code illustrates the use of semaphores:

 Working-Storage Section.
 01  data-semaphore  usage semaphore-pointer.
 01  data-value      pic x(4) comp-x value 0.

*> Initialization code executed while in single threaded 
*> mode
     open data-semaphore
     set data-semaphore up by 1  *> Initialize as raised

*> Add change data-value, this is a critical section
     set data-semaphore down by 1
     add 1 to data-value
     set data-semaphore up by 1  
                     *> Allow other thread to pass semaphore

Just after the OPEN verb, the semaphore is raised by 1. This enables the first acquisition of the semaphore to succeed but any following acquisitions to be blocked until the semaphore is released again.

Example - Using a semaphore to establish handshaking between two threads:

The following code sample uses two different semaphores to establish handshaking between two threads. This handshaking enables one thread to signal the production of a new data value and the other thread to signal the corresponding consumption of that data value:

 Working-Storage Section.
 01  produced-semaphore  usage semaphore-pointer.
 01  data-value          pic x(4) comp-x value 0.
 01  consumed-semaphore  usage semaphore-pointer.

*> Initialization code executed while in single threaded 
*> mode
     open produced-semaphore
     open consumed-semaphore
     set consumed-semaphore  up by 1

*> This code is executed once to produce a data value
     set consumed-semaphore down by 1
     add 10 to data-value
     set produced-semaphore up by 1 
                       *> Signal that data value has changed

*> Another thread, waiting for the data-value to 
*> change, executes this code once as well.
     set produced-semaphore  down by 1
     display data-value
     set consumed-semaphore  up by 1 
                       *> Signal data value used

Example - Counting semaphore:

This example illustrates another common synchronization problem known as the Producer-Consumer Problem. The simplest Producer-Consumer Problem is that in which one thread produces data, while another thread consumes that data, and you need to synchronize execution between the producing and consuming threads so that when the consumer is active it always has data to operate on.

The following code illustrates a simple example of a producer/consumer pair where the producer is allowed to create data values until the data table cannot handle any more - at which point the producer blocks the creation of values until some values are consumed:

 Working-Storage Section.
 78  table-size    value 20.
 01  produced-semaphore   usage semaphore-pointer.
 01  filler.
     05  table-data-value pic x(4) comp-x 
             occurs table-size times value 0.
 01  consumed-semaphore   usage semaphore-pointer.
 Local-Storage Section.
 01  table-i     pic x(4) comp-x.

*> Initialization code executed while in single threaded 
*> mode
     open produced-semaphore
     open consumed-semaphore
     set consumed-semaphore up by table-size 
                                   *> Start raised 20 times

*> Producer thread
     move 1 to table-i
     perform until 1 = 0
         set consumed-semaphore down by 1
         add table-i to table-data-value(table-i)
         set produced-semaphore up by 1
         add 1 to table-i
         if  table-i > table-size
             move 1 to table-i
         end-if
     end-perform.

*> Consumer thread
     move 1 to table-i
     perform until 1 = 0
         set produced-semaphore down by 1
         display 'Current produced value is' 
                 table-data-value(table-i)
         set consumed-semaphore up by 1
         add 1 to table-i
         if table-i > table-size
             move 1 to table-i
         end-if
     end-perform.

2.3.4 Using an Event

An event is similar to a semaphore in that it enables one thread to signal to another that something has happened that requires attention. For several reasons an event is more flexible and slightly more complex. One reason is that an event, once posted, must be explicitly cleared.

Example - Using Event Synchronization:

The following is an example of code that solves the Producer-Consumer Problem by using event synchronization instead of semaphore synchronization. The simplest Producer-Consumer Problem is that in which one thread produces data, while another thread consumes that data, and you need to synchronize execution between the producing and consuming threads so that when the consumer is active it always has data to operate on.

 Working-Storage Section.
 01  produced-event  usage event-pointer.
 01  data-value   pic x(4) comp-x value 0.
 01  consumed-event  usage evnt-pointer.

*> Initialization code executed while in single threaded 
*> mode
     open produced-event
     open consumed-event
     set consumed-event to true *> Initialize as 'posted'

*> Protocol for the producer side
     wait for consumed-event
     set consumed-event to false *> Clear event
     add 10 to data-value

*> Signal that data value has changed
     set produced-event to true  *> Post event

*> Protocol for the consumer side, waiting for the 
*> data-value to change
     wait for produced-event
     set produced-event to false *> Clear event
     display data-value

*> Signal other thread that it can proceed, this thread 
*> has data-value
     set consumed-event to true  *> Post event

If there are only two threads (the producer and consumer threads) executing the above code, then the program works as expected. If another thread starts executing as a producer or consumer, unexpected events occur. This is because an event, once posted, wakes up all threads waiting for that event, unlike a semaphore which enables only one thread to pass after the semaphore has been released. After an event has been posted, and the waiting threads become active, it is necessary for each of the active threads to determine if it should take some form of action on the event (including clearing the event).

The last point can make events difficult to work with when there are multiple threads wating, but it also enables the building of custom synchronization objects for special needs.

Introduction to Multi-threading

Writing Multi-threaded Applications