Tabulify - How to get data from a list of values at random

Tabulify - How to get data from a list of values at random

About

This how-to shows you how to generate data from a list of values at random data with the column histogram generator.

Tip: The random column generator can also generate all primary data type (number, date, string) at random. See Tabulify - How to generate random data

Steps

Creation of the generator file

To generate data, you need to create a generator file that will describe the data to be generated.

The below data resource generator:

  • has the name histogram_random–generator.yml
  • has the simple name histogram_random
  • will generate 30 values (MaxRecordCount count)
  • has a column named id that has sequence data generator that:
    • starts by default at the value 1
    • increments by default with the value 1
  • has a column named buckets_map with a histogram generator where the buckets property defines a map where:
    • the key is the value to generate
    • the value is the chance factor of generation (the more, the more chance that you get the value generated)
  • has a column named buckets_list with a histogram generator where the buckets property defines:
    • a list of values (the chance factor have by default a value of 1)

The two buckets columns (buckets_map and buckets_list) are equivalent.

They defines the buckets as being:

  • a list of values
  • with a factor of chance of value 1.
kind: generator
spec:
  MaxRecordCount: 10
  Columns:
    - name: id
      type: integer
      comment: A id column to see easily the number of values generated
      data-supplier:
        type: sequence
    - name: bucket_map
      type: varchar
      comment: A column with a random color generator and a map of values with the chance factor
      data-supplier:
        type: histogram
        arguments:
          Buckets:
            blue: 1
            red: 1
            green: 1
    - name: bucket_list
      type: varchar
      comment: A column with a random color generator and a list of values
      data-supplier:
        type: histogram
        arguments:
          Buckets:
            - blue
            - red
            - green




Printing the data

With the data print command, we can print the 30 values generated.

tabul data print histogram_random--generator.yml@howto

howto is the connection that contains the files used in the HowTo's.

id   bucket_map   bucket_list
--   ----------   -----------
 1   blue         green
 2   blue         green
 3   red          red
 4   blue         red
 5   green        blue
 6   green        red
 7   red          blue
 8   green        red
 9   green        blue
10   green        green

Next

Because a generator is just a data resource, you can use it in every data operation.

How to use a generator in a data operation




Related Pages
Undraw Data Processing
Random Data Generator

A random generator is a column data supplier that generates data randomly inside a range of values. histogram generator1 This generator will generate the values in an uniform distribution. Arguments...

Task Runner