---json
{
"aliases": [
{ "path": ":howto:generator:dataset" }
],
"page_id": "k57166jzaaj85t6rnd7z3"
}
---
====== How to generate data with a data set? ======
===== About =====
This howto will show you how to generate data with an [[:docs:generator:data-set|data set generator]].
In these examples, we use a predefined csv [[:docs:generator:entity|entity file]] but you could any [[:docs:resource:resource|data resource]] such as :
* a [[docs:resource:sql_table|sql table]]
* a [[docs:resource:sql_select|sql query]]
This [[:docs:resource:generator|generator resource]] uses
* the `firstname` entity [[:docs:resource:csv|csv]] file
* to fill a `firstname` column
===== Example =====
==== Basic ====
In this basic example, the data set is located by the [[:docs:resource:data_uri|data URI]] value.
This value ''firstname/firstname_fr.csv@entity'' locates:
* the ''firstname_fr.csv'' file
* stored in the [[:docs:connection:built-in|entity]] directory
* under the ''firstname/'' directory
The content of the data set is
tabul data head firstname/firstname_fr.csv@entity
The first 10 rows of the data resource (firstname/firstname_fr.csv@entity):
firstname gender probability
--------- ------ --------------------------
Aadam M 3.14396359513727576E-7
Aadel M 6.52081338250694231E-7
Aadil M 0.000002142552968537995330
Aahil M 2.44530501844010337E-7
Aakash M 3.02752049902108036E-7
Aalia F 4.77416694076401133E-7
Aaliya F 0.000002422016399216864286
Aaliyah F 0.000028086074783226330085
Aalya F 0.000001490471630287301099
Aalyah F 0.000002585036733779537844
Using this dataset, we can generate generates 10 `firstnames` with this [[:docs:resource:generator|generator file]].
kind: generator
spec:
MaxRecordCount: 10
Columns:
- name: firstname
Type: varchar
data-supplier:
type: data-set
arguments:
dataUri: firstname/firstname_fr.csv@entity
column: firstname # for demo purpose as this is the default value
* You can see the output with [[:docs:tabul:data:print|tabul print]]
tabul data print generator/dataset-basic--generator.yml@howto
firstname
------------
Lëana
Yelenna
Éliam
Aïley
Tinhinan
Jaufret
Mehmetali
Vyns
Ycham
Jale
==== Meta Column Dependency ====
A `firstname` depends on the `gender`. Each entity may have one or more meta columns such as `gender`.
To express this dependency, you can use the ''meta_columns'' attribute to map:
* a local column (ie from the generator)
* to a entity column (ie from the data set)
kind: generator
spec:
maxRecordCount: 30
columns:
- name: gender
type: varchar
data-supplier:
type: histogram
arguments:
buckets:
M: 1.0
F: 2.0
- name: firstname
type: varchar
data-supplier:
type: data-set
arguments:
dataUri: firstname/firstname_fr.csv@entity
column: firstname # for demo purpose as this is the default value
metaColumns:
gender: gender
* You can see the output with [[:docs:tabul:data:print|tabul print]]
tabul data print generator/dataset-meta-columns--generator.yml@howto
gender firstname
------ ----------------
M Louis-gabriel
F Amany
M Guerino
M Loucian
F Soufia
F Ieva
M Lowen
F Kellycia
M Elya
F Maïlyn
F Sarina
F Janel
F Andy
F Romaysa
F Bélinda
M Mayel
F Menekse
F Silja
F Anne-elise
F Shaé
F Houyem
F Razanne
F Aysel
M Remuald
M Tajeddine
F Tyhana
F Elorie
F Mehdia
F Marie-thereze
F Eugénie