How to add information about the selected resources with the Enrich operation ?
About
enrich is an intermediate operation that will add virtual columns to its inputs thanks to data supplier.
Enrich accepts only one argument data-def where you can define extra columns called virtual columns with their respective data supplier.
A data supplier is a function that supplies a value to a column.
In the steps below, we add to the input resources:
- the input file name
- the input file extension
- and an increasing sequence
Steps
The input example
In this example, we will showcase the enrich operation with the enrich-me.md
- located in the pipeline/enrich subdirectory of the howto directory
With the cat command
tabul data cat pipeline/enrich/enrich-me.md@howto
We can see the content:
This is a file used in the [enrich pipeline](../enrich.yml)
for demonstration
The pipeline
In this example, the pipeline:
- a markdown text file in the pipeline with the define operation
- enrich its records with 3 virtual columns:
- the input file name thanks to the meta data supplier.
- the input file extension thanks to the expression data supplier
- a column line_id with an increasing sequence thanks to the sequence data supplier
- and print the records
kind: pipeline
spec:
steps:
- operation: 'define'
arguments:
data-resource:
data-uri: 'pipeline/enrich/enrich-me.md@howto'
- operation: 'enrich'
arguments:
data-def:
columns:
- name: file_name
data-supplier:
type: meta
arguments:
attribute: name
- name: file_extension
data-supplier:
type: expression
arguments:
column-variable: file_name
expression: "file_name.split('.').pop()"
- name: line_id
type: integer
data-supplier:
type: sequence
- operation: 'print'
The execution result
By executing it, we can see the 4 columns created.
- the lines column with the content of the file (lines is the value of the text file column-name attribute)
- the file_name column with the name of the input
- the file_extension column with the extension of the input
- the line_id column with the line id. We have 2 lines.
tabul flow execute --no-results pipeline/enrich.yml@howto
pipeline/enrich/enrich-me.md@howto
lines file_name file_extension line_id
----------------------------------------------------------- ------------ -------------- -------
This is a file used in the [enrich pipeline](../enrich.yml) enrich-me.md md 1
for demonstration enrich-me.md md 2