Mapping Java Data Types

When interfacing with external Java systems, it is often necessary to map COBOL data types to Java data types. Following are the Java primitive types:

boolean - 8 bits, unsigned 
byte - 8 bits, signed 
char - 16 bits, unsigned 
short - 16 bits, signed 
int - 32 bits, signed 
float - 32 bits, signed, floating point 
long - 64 bits, signed 
double - 64 bits, signed, floating point

All other types are objects that are composed of other objects or primitive types. For example, the String type is an object. Note that encoding of native Java strings and numbers in memory may be different from way you represent data in COBOL.

Even on 32-bit platforms, Java represents longs as 64 bits. On 32-bit platforms, the ACUCOBOL-GT runtime truncates longs to 32 bits regardless of whether the "-Dw32" or "-Dw64" flags are used for compilation. In order to effectively interoperate using Java longs and use all 64 bits on a 32-bit platform, you must use the PIC S9(18) COMP-5 declaration shown below. Also, in order to use the entire range available to Java shorts and ints (short 32767 to -32768 int 2147483647 to -2147483648) with USAGE IS SIGNED-INT and USAGE IS SIGNED-SHORT declarations, the "-Dw32" flag must be specified at compile time.

The following sample declarations have been used to test COBOL/Java interoperability.

01 FIELD-INT USAGE IS SIGNED-INT. 
01 FIELD-BOOL pic 9. 
01 FIELD-BYTE pic x. 
01 FIELD-CHAR pic x. 
01 FIELD-SHORT USAGE IS SIGNED-SHORT. 
01 FIELD-LONG PIC S9(18) COMP-5. 
01 FIELD-FLOAT USAGE IS FLOAT. 
01 FIELD-DOUBLE USAGE IS DOUBLE. 
01 FIELD-STRING PIC X(80).

Another method of declaring ints and shorts is shown below. With these two declarations, the use of the "--TruncANSI" compiler switch is required so that the range checking is correct for the range the native platform allows. See Truncation Options for information about the "--TruncANSI" option.

01 FIELD-INT PIC S9(9) COMP-5. 
01 FIELD-SHORT PIC S9(5) COMP-5.

Currently, ACUCOBOL-GT converts Unicode UTF-16 Java strings to UTF-8 for representation in PIC X variables. If your program uses code points that require more than 16 bits to represent supplementary characters or if it uses UTF-32, then you should use arrays of Java ints to represent the data.

If your Java strings or PIC X data items contain characters outside of the ISO-8859-1 range, you need to instruct the runtime which character set to use by specifying it in the A_JAVA_CHARSET runtime configuration variable. The default setting is "IS0-8859-1". Be aware of a common misconception that ISO-8859-1 is equivalent to Windows-1252. This is for the most true, but there are characters in the range 0x80 - 0x9F that differ. Windows-1252 uses these numbers for letters and punctuation while the ISO-8859-1 uses these for control codes.

Note that Java implementations represent data in big endian format regardless of platform. For considerations on moving data between big endian and little endian hosts, see C$SOCKET and Usage Clause.

Note: Be careful when sending numeric data across the network via sockets, because some machines use different byte ordering than others, and native numeric data can appear swapped on different machines. COMP-4 data is in the order that most network servers expect binary data to be in, so if you are communicating with a non-COBOL client or server, you should use COMP-4 data of the correct size for the machine in question. If your client and server are both COBOL, you can use standard COBOL types.