How to generate data with a data set?

Undraw Data Processing

How to generate data with a data set?

About

This howto will show you how to generate data with an data set generator.

In these examples, we use a predefined csv entity file but you could any data resource such as :

This generator resource uses

  • the firstname entity csv file
  • to fill a firstname column

Example

Basic

In this basic example, the data set is located by the data URI value.

This value firstname/firstname_fr.csv@entity locates:

  • the firstname_fr.csv file
  • stored in the entity directory
  • under the firstname/ directory

The content of the data set is

tabul data head firstname/firstname_fr.csv@entity
The first 10 rows of the data resource (firstname/firstname_fr.csv@entity):
firstname   gender   probability
---------   ------   --------------------------
Aadam       M        3.14396359513727576E-7
Aadel       M        6.52081338250694231E-7
Aadil       M        0.000002142552968537995330
Aahil       M        2.44530501844010337E-7
Aakash      M        3.02752049902108036E-7
Aalia       F        4.77416694076401133E-7
Aaliya      F        0.000002422016399216864286
Aaliyah     F        0.000028086074783226330085
Aalya       F        0.000001490471630287301099
Aalyah      F        0.000002585036733779537844

Using this dataset, we can generate generates 10 firstnames with this generator file.

kind: generator
spec:
  MaxRecordCount: 10
  Columns:
    - name: firstname
      Type: varchar
      data-supplier:
        type: data-set
        arguments:
          dataUri: firstname/firstname_fr.csv@entity
          column: firstname # for demo purpose as this is the default value


tabul data print generator/dataset-basic--generator.yml@howto
firstname
------------
Lëana
Yelenna
Éliam
Aïley
Tinhinan
Jaufret
Mehmetali
Vyns
Ycham
Jale

Meta Column Dependency

A firstname depends on the gender. Each entity may have one or more meta columns such as gender.

To express this dependency, you can use the meta_columns attribute to map:

  • a local column (ie from the generator)
  • to a entity column (ie from the data set)
kind: generator
spec:
  maxRecordCount: 30
  columns:
    - name: gender
      type: varchar
      data-supplier:
        type: histogram
        arguments:
          buckets:
            M: 1.0
            F: 2.0
    - name: firstname
      type: varchar
      data-supplier:
        type: data-set
        arguments:
          dataUri: firstname/firstname_fr.csv@entity
          column: firstname # for demo purpose as this is the default value
          metaColumns:
            gender: gender


tabul data print generator/dataset-meta-columns--generator.yml@howto
gender   firstname
------   ----------------
M        Louis-gabriel
F        Amany
M        Guerino
M        Loucian
F        Soufia
F        Ieva
M        Lowen
F        Kellycia
M        Elya
F        Maïlyn
F        Sarina
F        Janel
F        Andy
F        Romaysa
F        Bélinda
M        Mayel
F        Menekse
F        Silja
F        Anne-elise
F        Shaé
F        Houyem
F        Razanne
F        Aysel
M        Remuald
M        Tajeddine
F        Tyhana
F        Elorie
F        Mehdia
F        Marie-thereze
F        Eugénie




Related Pages
Undraw Data Processing
Generator - DataSet Generator

A data set generator is a column data generator that generates data from any resource (Ie from any data set) The list of arguments that you can use in the data-supplier are: Attribute Name ...

Task Runner