Histogram Generator
About
A histogram generator is a column data generator that generates a value according to its chance factor.
This generator is used to generate data that follows a probability distribution where the factor is the probability for the value.
Example
Distribution over week day (Varchar)
Example if you want to simulate that people may work the week-end once every 10 weeks, you will define this bucket list
Columns:
- name: bucket_map
type: varchar
data-supplier:
type: histogram
arguments:
Buckets:
Monday: 10
Tuesday: 10
Wednesday: 10
Thursday: 10
Friday: 10
Saturday: 1
Sunday: 1
Normal Distribution over Arrival Time
Example of normal distribution over time
kind: generator
spec:
MaxRecordCount: 30
Columns:
- name: id
type: integer
comment: A id column to see easily the number of values generated
data-supplier:
type: sequence
- name: bucket_map
type: time
comment: A column with a histogram generator that generates an uniform distribution of time
data-supplier:
type: histogram
arguments:
Buckets:
"8:45": 0.05
"8:50": 0.5
"8:55": 0.22
"9:00": 0.4
"9:05": 0.22
"9:10": 0.5
"9:15": 0.05
Arguments
Buckets
This data-supplier has only one argument that defines the histogram namely, the buckets.
A bucket is a value and its chance factor
Data Type
The below data type are supported:
| Name | Yaml Format | Example |
|---|---|---|
| Integer | d | 8 |
| Double | d.dd | 8.00 |
| Date | YYYY-MM-DD | 1970-01-01 |
| Timestamp | YYYY-MM-DD HH:MM:SS | 1970-01-01 00:00:00 |
| Time | "HH:MM", "HH:MM:SS", "HH:MM:SS.SSS" | "08:00" quote is mandatory |
| Varchar | ".*" | "a text" |
Why the time must be quoted? Yaml does not support time as a type. The time string should be quoted.
How to define a Bucket definition in a data set
Note that the data set and the entity generator creates histogram from resources.
- The data being defined by the column attribute
- the chance factor being defined by the probability, weight or factor column