Differences between Server and Parallel VarChar

There are some major differences between the way the Server and Parallel engines handle strings. Some of these differences (described in more detail below) are only made apparent by the processing of certain data at runtime, which is a condition that cannot be detected by a static code analysis and automated conversion solution such as S2PX. For this reason some generated Parallel jobs may require a degree of manual remediation.

String processing differences relevant to S2PX conversion and described here are …

Quote characters inside strings

Consider the situation where a DataStage Server Job is reading data from a Sequential file with the following example contents:

#	Item : VARCHAR(36)	Description : VARCHAR(256)
1	“Apple”	“A sweet juicy apple”
2	“Golden Apple“	“An apple with special powers, gives regeneration effect”
3	“Enchanted Golden Apple”	“Also referred to as ““Notch Apple”” gives strength and all elemental resistances”

A potential issue arises in the Description value on row 3 where quote characters occur inside a supplied string. Server jobs handle this situation by automatically escaping each in-line quote character with another quote (i.e., using two adjacent quotes). The Parallel engine, on the other hand, does not provide an automated solution to this challenge and will generate invalid CSV data for those data:

#	Item : VARCHAR(36)	Description : VARCHAR(256)
1	“Apple”	“A sweet juicy apple”
2	“Golden Apple“	“An apple with special powers, gives regeneration effect”
3	“Enchanted Golden Apple”	“Also referred to as “Notch Apple” gives strength and all elemental resistances”

Solutions

The simplest solution to this problem is to manually change the Quote property of the generated Parallel Sequential File stage to use a quote character which does not oitherwise appear in the text being processed.

Screen Shot 2022-09-05 at 5.02.56 pm.png

This, of course, changes either what is output to the sequential file or alters the job’s expected input. Either way, it has further implications for your project that you will need to account for.

Empty Strings and Null Strings

Server strings provide null handling behaviour which cannot be fully replicated by Parallel Sequential Files. Consider the following example:

#	Item : VARCHAR(36)	Description : VARCHAR(256)
1	“Apple”	“”
2	“Golden Apple“
3	“Enchanted Golden Apple”	“Also referred to as Notch Apple gives strength and all elemental resistances”

The Server engine will interpret the Description values in rows 1 and 2 as being different (i.e., row 1 is an empty string and row two is null). The Parallel engine, on the other hand, does not differentiate between null and empty strings. Parallel jobs do support null handling in Transformer stages, but not when reading data using Sequential File stages. You could define the Sequential File stage property null_field_value="" which will force empty strings to be interpreted as null, which may or may not be adequate to meet your job’s functional requirements. In this case the data are interpreted thus:

#	Item : VARCHAR(36)	Description : VARCHAR(256)
1	“Apple”	“”
2	“Golden Apple“	““
3	“Enchanted Golden Apple”	“Also referred to as Notch Apple gives strength and all elemental resistances”