The S2PX configuration file

The config.yml file is not shipped with the S2PX distribution media. You must create this file yourself using your favourite text editor.

You can find the example file described on this page available here.

All configurable aspects of the S2PX conversion process are guided by settings in a YAML-formatted configuration file. This file can be named anything, but for the purposes of this documentation we will refer to it as config.yaml. This file has a number of distinct sections, responsible for different areas of the S2PX conversion process.

The `transfer` section

This section of the config.yml file is mandatory.

YAML

transfer:                                  # General S2PX settings
  strategy: RENAME_PARALLEL_JOB            # The job naming strategy: Valid values are RENAME_PARALLEL_JOB or BACKUP_SERVER_JOB 
  suffix: Px                               # The job name suffix to be used when using the

The transfer section allows you to specify how the S2PX conversion process names handles the potential conflict between Server assets and their Parallel equivalents. Valid strategy values are:

RENAME_PARALLEL_JOB: The generated Parallel Jobs are assigned new names, and the existing Server Jobs are unaffected, or
BACKUP_SERVER_JOB: The original Server jobs are renamed and the generated Parallel assets (Jobs or Sequences, as appropriate) assume the name of the Server Job used to generate them.

S2PX derives a unique name for the new Parallel assets by appending a suffix (specified in the transfer section’s suffix value) to the end of the original function name. These names are also used by newly-created Job references. Job Sequences generated using the RENAME_PARALLEL_JOB naming strategy, for example, will refer to generated Parallel jobs by their suffix-bearing names

The `decomposition` section

This section of the config.yml file is mandatory.

YAML

decomposition:
  bufferDirectory: /data/project/sv2px/transient      # A directory for the storage of temporary information during conversion 
  mode: OPTIMIZE_RUNTIME                              # (OPTIONAL) Optimise performance or minimize jobs. Valid values are OPTIMIZE_RUNTIME and MINIMIZE_JOBS

Job decomposition requires a path (accessible from DataStage jobs) to place any ‘buffer’ DataSets that are required during decomposition. For a vast majority of Server jobs this will never be used, but the path must be specified in the config regardless. The optional decomposition mode (specifying whether decomposition optimizes runtime performance or minimizes the number of generated jobs) defaults to OPTIMIZE_RUNTIME if a value for mode is not provided.

The `hashedFiles` section

The hashedFiles section allows you to set defaults for the Parallel DRS stage used to replace Server Hashed Files.

Example:

YAML

hashedFiles:                               # Special settings for handling the translation of Server Hashed Files
  type: ORACLE                             # The type settings to be used for the generated DRS Connector stages.  Valid values are DB2,ORACLE or	ODBC
  variant: 11                              # the DRS variant setting
  connection: MCIDEMO                      # the DRS connection name
  username: MCITESTDEV                     # the DRS database username (this can also be a job parameter name if required)
  password: '{iisenc}CeGmkL8fluDU7OVw=='   # the DRS database password (this can also be a job parameter name if required)
  schema: myuser                           # (OPTIONAL) Prefix all Hashed File tablenames with the a schema value (e.g. myuser.tablename)

For more information about where these settings are applied within the resulting stage’s configuration, refer to Defining a DRS Connector stage connection to the database - IBM Documentation.

The `customRoutines` section

The customRoutines section supports two distinct aspects of S2PX conversion:

mapping: Mapping Server Routine calls to Parallel equivalent expressions. You'll see that this mechanism is used by S2PX to map Server Functions without an identically-named Parallel equivalent to a Parallel expression which achieves the same effect.
libraries: Providing the specification for new Parallel Routines to act as user-created replacements for Server Custom Functions

The `customRoutines` `mapping` section

The mapping clause within the customRoutines section of config.yaml allows you to map Server Functions without an identically-named Parallel equivalent to a Parallel expression which achieves the same effect. For example:

YAML

customRoutines:                                     # A list of mappings from legacy Server functions to Parallel equivalent expressions
  mapping:                                          #
    AddOneToNum: '{0} + 1'                          # 'AddOneToNumber(myParam)'   --becomes--> 'MyParam + 1'
    GetBalanceInUSD: '{0} * 0.75'                   # 'GetBalanceInUSD(MyParam)'' --becomes--> 'MyParam * 0.75'

This mapping mechanism is also used by S2PX to perform its out-of-the-box functions mappings which you can edit if desired.

Mappings are NOT iterative, so mapping entries may not include a reference to another mappings. Here’s an an example of an invalid mapping:

YAML

customRoutines:                                     # A list of mappings from legacy Server functions to Parallel equivalent expressions
  mapping:                                          #
    AddOneToNum: '{0} + 1'                          # 'AddOneToNumber(myParam)' is translated to 'MyParam + 1'
    AddTwoToNum: 'AddOneToNum(AddOneToNum({0}))'    # 'AddTwoToNumber(MyParam)' is invalid, as it references another mapping

The `customRoutines` `libraries` section

Where native DataStage Parallel engine functions do not provide a functional equivalent to a Server routine you may need to specify a mapping to a new Parallel function which you will create in C++. The libraries clause within the customRoutines section of config.yaml allows you to describe the type signature of each of your newly created Parallel functions, along with details of the library in which they can be found.

Here’s an annotated example illustrating the file’s structure and available settings:

YAML

  libraries:                               # A list of libraries within which custom routines are defined
    - 
      name: mylib                          # The name of a library: 'mylib'
      path: /path/to/lib                   # The location of the mylib library
      type: SHARED_LIBRARY                 # The library type.  VAlid values are SHARED_LIBRARY (for .so files) and OBJECT_FILE (for .o files)
      
      routines:                            # A list of the routines available within the mylib library
        ACustomRoutine:                    # A routines within the mylib library:
          type: STRING                     #
          externalName: a_custom_func      # Name of the C/C++ function in the specified library
          arguments:                       # The arguments (parameters) to the function, descrbied as pairs of...
            - 
              name: arg1                   # parameter names, and
              type: FLOAT                  # parameter datatype.  Valid values are CHAR, STRING, DOUBLE, FLOAT, INT, LONG, SHORT, UCHAR, UINT, ULONG or USHORT.
            - 
              name: arg2                   # etc.
              type: STRING                 # etc.

See Custom Routine conversion for more details.

The transfer section

The decomposition section

The hashedFiles section

The customRoutines section

The customRoutines mapping section

The customRoutines libraries section

The `transfer` section

The `decomposition` section

The `hashedFiles` section

The `customRoutines` section

The `customRoutines` `mapping` section

The `customRoutines` `libraries` section