Memory Corruption

Describes common issues that can cause memory corruptions in PL/I applications.

A memory corruption is one of the hardest defects to track down in an application, and it requires a dedicated professional programmer to logically and methodically isolate the issue. Even when "it worked on the mainframe," it might not work the same way for Open PL/I. The following is a discussion of the differences between mainframe and Windows/Linux/UNIX-based computing, and also some very simple examples of these differences.

Stack direction

The stack grows in a different direction on z/OS than it does on most distributed platforms, particularly Intel-based platforms. If you have an application doing pointer arithmetic and it has incorrectly calculated something, this could easily corrupt an item on an Intel-based platform that would have just been unused stack space on z/OS.

Stack size

On Windows the stack size max and commit size are embedded in the executable used to start the process. This is different than z/OS-based applications where you can control available stack size dynamically via run-time system options and/or JCL parameters.

On Linux/UNIX, the available stack size is controlled in two different ways - via the ulimit settings and via the COBMAINSTACK environment variable.

If you use very large automatic variables, and your program dies unexpectedly when executing a CALL statement or attempting to invoke a function, then this is the first place you should look. A compile listing with the -map option can be exceptionally useful in identifying AUTOMATIC variables that are larger than expected.

Diagnosing memory corruption can be further complicated by turning on CTF tracing, or even introducing diagnostic code, because this can move the point of failure or eliminate the failure altogether. Typically, a CTF trace or diagnostic code causing a problem to disappear is due to an uninitialized AUTOMATIC variable because the additional logic for CTF tracing can overwrite the uninitialized stack area differently to how it was done when the program failed originally. In cases where this happens, you need to step back to the point where you can reliability recreate the failure; then give it some thought as to the next steps.

The types of corruptions and the tools to investigate along with process/procedures are discussed in the following topics: