Sampling Options

Basic Options

Use the Sampling Options tab in the Distributed Sampler to specify your sampling criteria.

To export your sampling configuration, you need to provide the following information:

  • your workspace (machine ID, company)
  • your source data store credentials (source database, source schema, source user ID/password)
    Note: The source database is an ODBC DSN or Oracle TNS alias configured to access that data store.
  • types of sampling to be performed
  • directory where you want to place output files for Extension Technology (Output directory)
    Note: Only specify an Output directory if you are exporting your sampling configuration.

To export your configuration, click Export. After you have successfully exported from the Distributed Sampler, the following files are created in the specified output directory:

  • method.rc - Encoded connection criteria for each data store referenced.
    Note: If you have performed export functions for masking or subset extraction, the method.rc file already exists in your config directory. If you export your sampling configuration to the config directory, the previous method.rc file will be overwritten.
  • sampling.dat - Information regarding the data elements to be sampled.

Note: If your Knowledge Base and the data stores you want to sample are located on the same Windows machine that is running Data Express, you do not need to export configuration information, execute sampling from a script, or load results. Instead, you would bypass those steps and click the Start button in the Distributed Sampler to execute sampling.

Advanced Options

To further control your sampling results, you can specify advanced sampling options, which are accessible from the Advanced button.

Additional Data Element Attributes

You can restrict which data elements are sampled by their data element types:

Advanced Options - Data Element Types

By default, all types are checked. Uncheck a type to remove it as a sampling candidate.

Note: The Binary data element type represents the COBOL binary (COMP field) data type. True binary is not supported.

Data Element Sizes

You can restrict which data elements are sampled by their data element sizes:

Advanced Options - Data Element Sizes

By default, all values are 0 (zero), which means to include all data elements regardless of the associated size values.

If a minimum value is set, the data element size to be included is the value greater than the specified MinValue. For example, if the Min length is changed to 1, data elements with a length of 2 or greater are included.

Likewise, if a maximum value is set, the data element size to be included is the value less than the specified MaxValue. For example, if the Max length is changed to 99, data elements with a length of 98 or less are included.

Max. and Min. Recalculation Additional Options

You can restrict the Min/Max calculation sampling for data elements:

Advanced Options - Max. And Min. Recalculation Additional Options

The following options are available:

  • Ignore Special Values Zero / Space - If this box is checked, the 0 (zero) and special characters like the space character are ignored in your minimum and maximum value calculations.
  • Out-of-range minimum value - If a defined range within your data element values is desired, the Out-of-range minimum value field defines the smallest value outside of that range. For example, if 5 is specified, values of 6 and greater are included in the range.

    By default, the Out-of-range minimum value is 0 (zero), which means to include all data elements regardless of the actual value.

  • Out-of-range maximum value - If a defined range within your data element values is desired, the Out-of-range maximum value field defines the largest value outside of that range.

    By default, the Out-of-range maximum value is 0 (zero), which means to include all data elements regardless of the actual value.

  • Use range in Min/Max calculation - If this box is checked, the range indicated by the fields Out-of-range minimum value and Out-of-range maximum value is applied to the Min/Max calculation sampling result.

Known Restriction: The Max. and Min. Recalculation Additional Options currently affect the minimum and maximum data element values reported for standard and compressed sampling.

Additional Restriction Options

You can restrict which data elements are sampled based on data element class assignments. You can also reduce the number of data elements to sample from.

Advanced Options - Additional Restriction Options

You can add a class to the table show on the left after selecting it in the Select a Class list. Only data elements that are assigned to a class specified in the Selected Class list will be sampled. This is especially helpful when you have several data stores containing data elements with multiple class assignments.

The Number of records is the total number of records to be sampled. The default value is 0, which means to include all records.

Use the Select distinct values check box to improve the sampling process for big tables. The SELECT DISTINCT command is used to provide the distinct values of a column. This option allows you to create a sampling result only for the distinct values of a column. When checked, already sampled values are discarded and the result will be set for unique ones.

Note: The list of data elements is sorted either numerically or alphanumerically as based on the data element type. Once the limit of records is reached, the remaining data element records are not sampled.