Histogram Generator

Undraw File Manager

Histogram Generator

About

A histogram generator is a column data generator that generates a value according to its chance factor.

This generator is used to generate data that follows a probability distribution where the factor is the probability for the value.

Example

Distribution over week day (Varchar)

Example if you want to simulate that people may work the week-end once every 10 weeks, you will define this bucket list

Columns:
  - name: bucket_map
    type: varchar
    data-supplier:
      type: histogram
      arguments:
        Buckets:
          Monday: 10
          Tuesday: 10
          Wednesday: 10
          Thursday: 10
          Friday: 10
          Saturday: 1
          Sunday: 1

Normal Distribution over Arrival Time

Example of normal distribution over time

kind: generator
spec:
  MaxRecordCount: 30
  Columns:
    - name: id
      type: integer
      comment: A id column to see easily the number of values generated
      data-supplier:
        type: sequence
    - name: bucket_map
      type: time
      comment: A column with a histogram generator that generates an uniform distribution of time
      data-supplier:
        type: histogram
        arguments:
          Buckets:
            "8:45": 0.05
            "8:50": 0.5
            "8:55": 0.22
            "9:00": 0.4
            "9:05": 0.22
            "9:10": 0.5
            "9:15": 0.05


Arguments

Buckets

This data-supplier has only one argument that defines the histogram namely, the buckets.

A bucket is a value and its chance factor

Data Type

The below data type are supported:

Name Yaml Format Example
Integer d 8
Double d.dd 8.00
Date YYYY-MM-DD 1970-01-01
Timestamp YYYY-MM-DD HH:MM:SS 1970-01-01 00:00:00
Time "HH:MM", "HH:MM:SS", "HH:MM:SS.SSS" "08:00" quote is mandatory
Varchar ".*" "a text"

Why the time must be quoted? Yaml does not support time as a type. The time string should be quoted.

How to define a Bucket definition in a data set

Note that the data set and the entity generator creates histogram from resources.

  • The data being defined by the column attribute
  • the chance factor being defined by the probability, weight or factor column



Related HowTo
Undraw File Manager
Tabulify - How to generate a normal distribution with an histogram generator

This how-to shows you how to generate data that follows a normal distribution with the column histogram generator
Undraw File Manager
Tabulify - How to get data from a list of values at random

This how-to shows you how to generate data from a list of values at random data with the column histogram generator.

Task Runner