Table of Contents

Tabulify - How to generate a normal distribution with an histogram generator

About

This how-to shows you how to generate data that follows a normal distribution with the column histogram generator.

Steps

Defining the buckets

The bucket list needs a serie of value and their respective factor.

We will simulate the arrival of people at an event.

The time will be the value and the factor will follow the probability of the normal distribution.

Example: The below bucket definition:

Buckets:
  "8:45": 0.05
  "8:50": 0.5
  "8:55": 0.22
  "9:00": 0.4
  "9:05": 0.22
  "9:10": 0.5
  "9:15": 0.05

models the below normal distribution:

Normal Distribution Incoming Time Meeting

Creation of the generator file

To generate data, you need to create a generator file that will describe the data to be generated.

The below data resource generator:

kind: generator
spec:
  MaxRecordCount: 30
  Columns:
    - name: id
      type: integer
      comment: A id column to see easily the number of values generated
      data-supplier:
        type: sequence
    - name: bucket_map
      type: time
      comment: A column with a histogram generator that generates an uniform distribution of time
      data-supplier:
        type: histogram
        arguments:
          Buckets:
            "8:45": 0.05
            "8:50": 0.5
            "8:55": 0.22
            "9:00": 0.4
            "9:05": 0.22
            "9:10": 0.5
            "9:15": 0.05


Printing the data

With the data print command, we can print the 30 values generated.

tabul data print histogram_normal_distribution--generator.yml@howto

howto is the connection that contains the files used in the HowTo's.

id   bucket_map
--   ----------
 1   08:55:00
 2   09:15:00
 3   09:15:00
 4   09:05:00
 5   09:15:00
 6   08:45:00
 7   08:45:00
 8   09:15:00
 9   08:55:00
10   09:05:00
11   08:45:00
12   08:55:00
13   09:10:00
14   09:00:00
15   08:45:00
16   09:00:00
17   08:50:00
18   09:00:00
19   09:05:00
20   08:45:00
21   09:00:00
22   08:55:00
23   09:15:00
24   09:05:00
25   09:10:00
26   08:55:00
27   09:05:00
28   09:15:00
29   08:55:00
30   08:50:00

Next

Because a generator is just a data resource, you can use it in every data operation.

How to use a generator in a data operation