Data Resource - Generator Manifest

Undraw Environment

About

A data resource generator is a content data resource that generates data.

Syntax

A generator is a resource manifest file

Example:

kind: generator
# Spec follows the data definition format
spec:
  # The maximum number of record generated
  max-record-count: 30
  # The number of record generated for each resource in a generate stream supplier
  stream-record-count: 10
  # The columns definition
  columns:
    - name: columnName
      comment: A column with a sequence integer generator and its properties
      data-supplier: # the data supplier
        type: sequence
        arguments:
          start: 3
          step: 2
          maxTick: 5
   - name: columnName2
    ........

See the column supplier page to see all type of generations that you can choose for a column.

Attributes

max-record-count

max-record-count is an attribute that defines the maximum number of record generated

stream-record-count

stream-record-count is an attribute that defines the number of record generated in generate stream operation.

Count

The count attribute is calculated and defines how many record a generator would generate.

You can see it with the data info command.

Example with a sequence generator, you would get the maximum value. For an integer, this is 2147483647

tabul data info --strict-selection count--generator.yml@howto
Information about the data resource (count@memgen)
attribute             value                                                                                              description
-------------------   ------------------------------------------------------------------------------------------------   -----------------------------------------------------
MAX_RECORD_COUNT      100                                                                                                The maximum of records generated
SIZE                  100                                                                                                The size
SIZE_NOT_CAPPED       2147483647                                                                                         The number of records without max
STREAM_RECORD_COUNT                                                                                                      The records generated in a stream
ABSOLUTE_PATH         count                                                                                              The absolute path on the data system
ACCESS_TIME                                                                                                              The access time (access time)
COMMENT                                                                                                                  A comment
CONNECTION            memgen                                                                                             The connection name
COUNT                 100                                                                                                The number of records
CREATION_TIME         2025-11-10 19:48:59.159433321                                                                      The creation time (birth time)
DATA_URI              count@memgen                                                                                       The data uri
KIND                  generator                                                                                          The kind of media
LOGICAL_NAME          count                                                                                              The logical name
MD5                   ef69caaaeea9c17120821a9eb6c7f1de                                                                   The Md5 hash
MEDIA_SUBTYPE         vnd.tabulify.generator+yaml                                                                        The media subType
MEDIA_TYPE            text/vnd.tabulify.generator+yaml                                                                   The media type
NAME                  count                                                                                              The name of the data resource
PARENT                gen                                                                                                The parent
PATH                  count                                                                                              The relative path to the default connection path
SHA384                0c9b6656498be26d413bf3563198f01be3236d017f75943f9406922d08ba4ec137ffde15d2e95dcb4d77f9d6cd6eec79   The Sha384 hash
SHA384_INTEGRITY      sha384-DJtmVkmL4m1BO/NWMZjwG+MjbQF/dZQ/lAaSLQi6TsE3/94V0uldy013+dbNbux5                            The sha384 value used in the html integrity attribute
TABULAR_TYPE          data                                                                                               The tabular type
UPDATE_TIME                                                                                                              The last update time (modify time)

MediaType

Manifest file

The media type of a file is text/vnd.tabulify.generator+yaml

Manifest fragment

The media type of a fragment in a define step (ie inline generator) should be text/vnd.tabulify.generator+yaml-fragment

Example:

kind: pipeline
spec:
  steps:
    - name: "Define"
      operation: "define"
      args:
        data-resource:
          # The below media-type has `fragment` in its extension
          # It's a special media-type that makes it possible to define a generator as a yaml in a `define` step
          media-type: text/vnd.tabulify.generator+yaml-fragment
          data-def:
            logical-name: "my-sequence"
            max-record-count: 5
            columns:
              - name: "id"
                type: integer
                data-supplier:
                  type: sequence
    - name: "Print"
      comment: "Print the sequence"
      operation: "print"


Creation

generators can be created:

  • manually by creating a yaml file
  • automatically from pipeline input with the fill data operation



Related HowTo
Undraw Environment
Data Generator - How to generate a date dimension ?

A date dimension is a typical case for data generation and this article shows you how to generate it.
Undraw Environment
How to generate a number with the Regular Expression Generator?

This howto will show you how to generate a number with the Regular Expression Generator. The following expression will generate a double: where: [0-9]{3} asks for 3 digits \. print a point ...
Undraw Environment
How to generate data with a data set?

This howto will show you how to generate data with an data set generator. In these examples, we use a predefined csv entity file but you could any data resource such as : a sql table a sql query...
Undraw Environment
How to generate data with an entity?

This howto will show you how to generate data with an entity generator. Example of a basic entity generator resource that uses the firstname entity to fill a firstname column You can see the output...
Undraw Environment
How to retrieve a third column from a data set generator ?

This howto will show you how to use the data set meta generator to retrieve a meta column (ie third column) from a data set defined in a data set generator (ie data set or entity generator) In this...
Undraw Environment
How to use the define operation?

This howto shows you how to use the define data operation to create data resources in the pipeline file (ie inline data resources) This operation is only available as step in a pipeline (ie not in a tabul...
Undraw Environment
How to write a Javascript expression generator?

This howto will show you how to write an expression for a expression generator. An expression generator generates data from another column based on an expression. This example generate a times table...
Undraw Environment
Learning Tabulify - Step 9 - How to fill a data resource with generated data ?

Tabulify integrates natively a data generator. You can generate realistic production data and start working on your project right away. anonymize production data in your development environment because...
Undraw Environment
SQLite - How to fill a table with a resource data generator?

This how-to will show you how to define the generation of data via a data definition file and load it into a table via the tabul data fill operation.
Undraw Environment
Tabulify - How to create a CSV File with generated data

This article shows you how to generate a csv file from a generator tabular file.

Task Runner