Introduction to Multi-threading

Writing Multi-threaded Applications

Chapter 2: Synchronizing Execution and Resolving Contention

Uncontrolled multi-threading within an application can cause unpredictable end results. To achieve predictable results from an application, you need to synchronize execution of threads, and resolve data contention between threads; there are various methods you can use to achieve this. The methods you use depend on the characteristics of your application's data access, and the requirements for true multi-threading within the individual COBOL programs that make up your application. You can use:

Program attributes assigned by Compiler directives
The attributes of a data item to determine if there is a possibility that threads will contend for access to data items
Synchronization primitives

2.1 Multi-threaded Program Attributes

There are three main program attributes that affect multi-threading within a COBOL program. These attributes are assigned by including or omitting one of multi-threading Compiler directives when you compile your program. The directives used affect the allocation of the program's system work areas and can cause automatic system locking of the compiled program. You can specify that a program:

Is not multi-threading, by specifying no multi-threading Compiler directives
Has the serial attribute. In a serial program, work areas are allocated statically, and the program is locked on entry and unlocked on exit.
Has the reentrant attribute. In a reentrant program all system work areas on the stack are dynamically allocated (eliminating thread contention on these areas), and multiple threads can enter the program at the same time.

2.1.1 Specifying No Multi-threading

You can specify that a program is not a multi-threading program, by omitting any of the multi-threading Compiler directives when you compile your program. If you do this, system work areas are allocated statically, and are thus subject to contention. This approach has several advantages, call speed and efficient stack usage among them, but it is up to your application to make sure that only one thread executes within a non-multi-threading program at a time. This can be accomplished through implicit program logic in any calling program; for example, when the application, by design, has only one thread calling a non-multi-threaded program. Alternatively, one of the synchronization primitives (such as a mutex) can be locked in the calling program just before the called program is entered and then unlocked when the called program has returned.

2.1.2 Serial Programs

In a program that has the serial attribute, system work areas are allocated statically, and the program is locked on entry and unlocked on exit. This allows only one thread to execute the program at a given time and so eliminates any contention on system or user work areas. No other explicit application logic is required.

A program can be given the serial attribute by specifying the SERIAL Compiler directive when you compile it.

By specifying a traditional COBOL program as a serial program it can be included in a multi-threaded application without any source changes.

The disadvantages of serial programs are that:

The level of multi-threading is limited within the application. Ideally, a multi-threaded application should allow free execution of as much code as possible to allow maximum performance.
Locking and unlocking the program can have a major run-time cost. This can significantly affect call speed and overall application performance. For this reason, as few modules as possible should be compiled with this directive and, you should ensure that called programs have the no-multi-threading attribute. For example, if program A is to be compiled with the serial attribute and program A is the only program in the application that calls programs B and C, then by application design, program B and C are already serialized and can be compiled with the no-multi-threading attribute.

2.1.3 Reentrant Programs

You can specify that a multi-threaded program is to be reentrant by compiling it with the REENRANT Compiler directive. You should use reentrant programs for most (if not all) of the modules in a multi-threaded application.

If you specify REENTRANT(1), all compiler-generated temporary work areas are allocated on a per-thread basis. All user data and FD file areas allocated in the environment and data divisions are shared by all threads. It is the responsibility of the programmer to ensure serialization to the program's data by use of the CBL_ synchronization calls.

If you specify REENTRANT(2), all working-storage and file section data, as well as system work areas, are allocated dynamically on the stack. This eliminates thread contention on these areas, so that the program behaves safely if called from multiple-threads concurrently. No program locking or unlocking is needed. The disadvantage with this compiler directive setting is that there is no data shared (other than any data defined as EXTERNAL) between the threads.

REENTRANT(2) is a quick and simple way of getting a program running in a multi-threaded application, but you should aim to compile with REENTRANT(1).

A reentrant program must itself resolve all possible contention on data items in its WORKING-STORAGE and FILE SECTION. One or more of the techniques for resolving data contention must be used to accomplish this; see the next section for details.

2.2 Use of Data Attributes

The attributes of a data item determine if threads that access it can contend for it. Threads can never contend for data items defined:

In the Local-Storage Section or Thread-Local-Storage Section
With the THREAD-LOCAL data attribute

Any other data items defined within the program are shared between threads, so contention is possible. Contention might occur for:

Data defined in the Working-Storage Section and the File Section
Various other implicitly defined data items such as file-status fields
Data stored in some Compiler special registers (except the RETURN-CODE register)

Data defined in the Local-Storage Section is traditionally used in recursive COBOL programs; for every recursion of the program a new instance of this data is allocated on the stack. Since each thread in a multi-threaded application has its own stack, this attribute is ideal for defining contention-free temporary and work items within reentrant programs. By design, a reentrant COBOL program also allows recursion within a single thread.

The disadvantage of using data defined in the Local-Storage Section is that a data item disappears when the reentrant or recursive program exits. If a program has exited, then on the next entry into that program the data items defined in the Local-Storage Section have undefined values. If a thread requires the program to preserve state across calls, some other mechanism must be used.

Data defined as thread-local, either in the Thread-Local-Storage Section, or by the THREAD-LOCAL attribute, resolves these problems. Thread-local data is unique to each thread, persists across calls and can be initialized with value clauses. Thread-local data can be viewed as a thread-specific Working-Storage data item. This kind of data is very useful for resolving contention problems in most reentrant programs. In many cases, a program that does no file handling can be made completely reentrant by simply changing the Working-Storage Section header to a Thread-Local-Storage Section header. You can fine-tune data allocation by defining all read-only constants in the Working-Storage Section, and all read-write data items in Thread-Local-Storage Section.

The use of thread-local data has the following disadvantages:

Data values are not easily communicated between threads; for this it is better to define data in the Working-Storage Section and use appropriate thread synchronization.
It adds an overhead to program calls; if it is used in a high-traffic module then overall application performance can be seriously affected.
It is not possible to have EXTERNAL items within the Thread-Local-Storage Section. Currently, if thread-local data must be shared between application modules, the CBL_TSTORE_n routines must be used

Sometimes threads require more than just private data. Most multi-threaded applications communicate and coordinate thread execution through shared data that is accessed by each thread under a strict protocol. In COBOL this is accomplished through the use of Working-Storage Section or File Section data in conjunction with the use of various synchronization primitives that resolve any possibility for destructive contention on the data. See the next section for details of the synchronization primitives.

2.3 Using Synchronization Primitives

Synchronizing threads that access shared data is critical for predictable results in a multi-threaded application. Understanding the nature of the various data accesses between threads is the first step in determining what synchronization primitives and regimes should be used. Once the data access has been characterized, it is a simple matter to follow a synchronization regime for data items among all threads. The following topics outline common data sharing problems and their solutions:

2.3.1 Using a Mutex

The simplest form of a data sharing problem is when multiple threads require mutually exclusive access to shared data at some point during their processing. The area of code that accesses this shared data is called a critical section and these critical sections can be protected by the use of a mutex that is logically associated with the shared data items. The term mutex comes from the phrase mutual exclusion. The mutex associated with the data items is locked before a critical section is entered and unlocked when that critical section is finished.

It is vital that all threads lock the mutex before accessing any of the data that is being protected. If even one thread fails to follow this regime, then unpredictable results could occur.

For example, the following code illustrates the protection of two critical sections that access a table, adding to it or counting items in it. The working storage data items table-xxx are protected by table-mutex.

Example - Protecting Critical Sections

In this example the mutex is required to prevent a thread from adding data to the table while another thread reads the table at the same time.

 Working-Storage Section.
 78  table-length     value 20.
 01  table-mutex      usage mutex-pointer.
 01  table-current    pic x(4) comp-x value 0.
 01  table-1.
    05  table-item-1	 pic x(10) occurs 20.
 Local-Storage Section.
 01  table-count			   pic x(4) comp-x.
 01  table-i          pic x(4) comp-x.

*> Initialization code executed while in single threaded mode
	move 0				to table-max
	open table-mutex

*> Add an item to table-1, this is a critical section
	set table-mutex	to on
	if table-current < table-length
	    add 1 to table-current
	    move 'filled in' to table-item-1(table-current)
	end-if
	set table-mutex	to off

*> Count items in table-1, this is a critical section
	set table-mutex	to on
	move 0 to table-count
	perform varying table-i from 1 by 1 
 until table-i > table-current
	    if  table-item-1(table-i) = 'filled in'
			      add 1 to table-count
		   end-if
	end-perform
	set table-mutex	to off

One problem with using a mutex is that it can severely limit the level of multi-threading within an application. For example, say you have an application in which some programs add items to a table or count items in a table. In order to maximize multi-threading, you would want multiple threads to be able to count items in the table simultaneously. However, if a program was adding items to a table, you would not want a program to add items or count them.

A solution to this problem would be given by a synchronization primitive that enables multiple threads to be active in read-only critical sections but prevents any other access when a thread is active in a writer critical section; such a synchronization primitive is the monitor.

2.3.2 Using a Monitor

Monitors provide a solution to particular access problems not handled easily by mutexes. For example, you might want many threads to be able to simultaneously read data, but only one thread to be able write data. While the thread writes data, you might want to block read access by other threads.

A monitor is used by a critical section to declare what type of data access the critical section will be performing on the protected data; that is, reading, writing, or browsing.

The monitor synchronization facility can be extended to provide a critical section for browsing; this can be very useful in real world applications. A browser reads data and, depending on conditions you have set, might or might not write to protected data items. While a browser is active, any number of critical sections that simply read the data are allowed, while any other critical sections that browse or write are not allowed. If the browser thread determines that it needs to write to the protected data, it requests a conversion of the browse lock to a write lock. The conversion process waits until all critical sections that read the data are finished, and then bars any other critical sections that read or write data from accessing the data. The browser proceeds to write with exclusive access to the protected data items (whose state is guaranteed to be the same as it was when the browser was just reading the protected data).

Example - Using a monitor to control the access of multiple threads

The following example code shows a monitor that controls the access of multiple threads that count items in, or add items to, a table. The code:

Enables multiple threads to access data for counting
Disables any threads that add items to the table when a thread that counts items is active
Enables one thread that adds items to a table; while this thread is active, the code disables access to the table by any thread that counts items

 Working-Storage Section.
 78  table-length			value 20.
 01  table-monitor		usage monitor-pointer.
 01  table-current		pic x(4) comp-x value 0.
 01  table-1.
	    05  table-item-1		pic x(10) occurs table-length.
 Local-Storage Section.
 01  table-count			pic x(4) comp-x.
 01  table-i				   pic x(4) comp-x.

*> Initialization code, executed while in single
*> threaded mode
	move 0 to table-current
	open table-monitor

*> Add an item to table-1, this is a 
*> writer critical section
	set table-monitor		to writing
	if table-current < table-length
	    add 1 to table-current
		   move 'filled in' to table-item-1(table-current)
	end-if
	set table-monitor 	to not writing

*> Count items in table-1, this is a 
*> reader critical section
	set table-monitor		to reading
	move 0 to table-count
	perform varying table-i from 1 by 1 until table-i > table-current
	if  table-item-1(table-i) = 'filled in'
	    add 1 to table-count
	end-if
	end-perform
	set table-monitor		to not reading

Example - A browser critical section using a monitor

The following is an example of a browser critical section:

 Working-Storage Section.
 01  data-monitor		usage monitor-pointer.
 01  data-value		pic x(4) comp-x value 0.

*> Initialization code, executed while in single threaded mode
	open data-monitor

*> Add an item to table-1, this is a browser critical section
	set data-monitor		to browsing
	if data-value < 20
	    set data-monitor to writing converting from browsing
		   add 5 to data-value
		   set data-monitor	to not writing
	else
		   set data-monitor		to not browsing
	end-if

We do not recommend that you use a browse lock for such a simple check. Usually, you need only use a browser if a significant amount of work has to be done to determine if a write lock is actually required, and you want to maximize multi-threading throughout your application. There are various other monitor conversions available to help you maximize the level of multi-threading in your application

2.3.3 Using a Semaphore

A semaphore is a synchronization primitive that acts like a gate that lowers to prevent passage of a thread through code and raises to allow passage (of another thread) through that code. A semaphore is similar to a mutex, and can be used instead of a mutex.

Example - Using a semaphore

The following code illustrates the use of semaphores:

 Working-Storage Section.
 01  data-semaphore	usage semaphore-pointer.
 01  data-value		pic x(4) comp-x value 0.

*> Initialization code executed while in single threaded mode
 open data-semaphore
 set data-semaphore up by 1		*> Initialize as raised

*> Add change data-value, this is a critical section
 set data-semaphore down by 1
 add 1 to data-value
 set data-semaphore up by 1		*> Allow other thread to pass semaphore

Note that just after the OPEN verb, the semaphore is raised by 1. This enables the first acquisition of the semaphore to succeed but any following acquisitions to be blocked until the semaphore is released again.

A semaphore is less efficient than a mutex, but it is more flexible: one thread can simply release a semaphore while any other thread can then acquire it. Contrast this with a mutex. A mutex must always be acquired before it can be released, and the operations of acquiring and releasing it must happen within the same thread. Semaphores provide a way of signaling from one thread to another.

Example - Using a Sempahore to Establish Handshaking Between Two Threads

The following code sample uses two different semaphores to establish handshaking between two threads. This handshaking enables one thread to signal the production of a new data value and the other thread to signal the corresponding consumption of that data value:

 Working-Storage Section.
 01  produced-semaphore		usage semaphore-pointer.
 01  data-value				      pic x(4) comp-x value 0.
 01  consumed-semaphore		usage semaphore-pointer.

*> Initialization code executed while in single threaded mode
	open produced-semaphore
	open consumed-semaphore
	set consumed-semaphore		up by 1

*> This code is executed once to produce a data value
	set consumed-semaphore 		down by 1
	add 10 to data-value
	set produced-semaphore		up by 1	*> Signal that data value has changed

*> Another thread, waiting for the data-value to 
*> change, executes this code once as well.
	set produced-semaphore		down by 1
	display data-value
	set consumed-semaphore		up by 1	*> Signal data value used

This example illustrates another common synchronization problem known as the Producer-Consumer Problem. The simplest Producer-Consumer Problem is where there is one thread that produces data, and one thread that consumes that data, and you need to synchronize execution between the producing and consuming threads so that when the consumer is active it always has data to operate on.

The semaphores shown above count the number of releases that are outstanding on a semaphore and allow that many acquires to pass unblocked. This kind of counting semaphore enables the producer to produce multiple data values (usually in an array) before blocking to wait for the consumer to catch up.

Example - A counting semaphore

The following code illustrates a simple example of a producer/consumer pair where the producer is allowed to create data values until the data table cannot handle any more - at which point the producer blocks the creation of values until some values are consumed:

 Working-Storage Section.
 78  table-size				value 20.
 01  produced-semaphore		 usage semaphore-pointer.
 01  filler.
     05  table-data-value	pic x(4) comp-x 
					        occurs table-size times value 0.
 01  consumed-semaphore		 usage semaphore-pointer.
 Local-Storage Section.
 01  table-i					pic x(4) comp-x.

*> Initialization code executed while in single threaded mode
	open produced-semaphore
	open consumed-semaphore
	set consumed-semaphore	up by table-size *> Start raised 20 times

*> Producer thread
 move 1 to table-i
	perform until 1 = 0
	set consumed-semaphore	down by 1
	add table-i to table-data-value(table-i)
	set produced-semaphore	up by 1
	add 1 to table-i
	if  table-i > table-size
	    move 1 to table-i
	end-if
	end-perform.

*> Consumer thread
	move 1 to table-i
	perform until 1 = 0
 set produced-semaphore	down by 1
	display 'Current produced value is' table-data-value(table-i)
	set consumed-semaphore	up by 1
	add 1 to table-i
	if  table-i > table-size
	    move 1 to table-i
	end-if
	end-perform.

2.3.4 Using an Event

An event is similar to a semaphore in that it enables one thread to signal to another that something has happened that requires attention. For several reasons an event is more flexible and slightly more complex. One reason is that an event, once posted, must be explicitly cleared.

Example - Using Event Synchronization

The following is an example of code that solves the Producer-Consumer problem by using event synchronization instead of semaphore synchronization:

 Working-Storage Section.
 01  produced-event		usage event-pointer.
 01  data-value			pic x(4) comp-x value 0.
 01  consumed-event		usage evnt-pointer.

*> Initialization code executed while in single threaded mode
	open produced-event
	open consumed-event
	set consumed-event	to on	*> Initialize as 'posted'

*> Protocol for the producer side
	wait for consumed-event
	set consumed-event	to false	*> Clear event
	add 10 to data-value

*> Signal that data value has changed
	set produced-event	to true		*> Post event

*> Protocol for the consumer side, waiting for the data-value to change
	wait for produced-event
	set produced-event	to false	*> Clear event
	display data-value

*> Signal other thread that it can proceed, this thread has data-value
	set consumed-event	to true		*> Post event

If there are only two threads (the producer and consumer threads) executing the above code, then everything will work as expected. If another thread comes in and starts executing as a producer or consumer, unexpected events will occur. This is because an event, once posted, wakes up all threads waiting for that event, unlike a semaphore which enables only one thread to pass after the semaphore has been released. After an event has been posted, and the waiting threads are woken, it is necessary for each of the woken threads to determine if it should take some form of action on the event (including clearing the event).

The last point can make events difficult to work with when there are multiple waiting threads but it also enables the building of custom synchronization objects for special needs.

Introduction to Multi-threading

Writing Multi-threaded Applications