How to add information about the selected resources with the Enrich operation ?

How to add information about the selected resources with the Enrich operation ?

About

enrich is an intermediate operation that will add virtual columns to its inputs thanks to data supplier.

Enrich accepts only one argument data-def where you can define extra columns called virtual columns with their respective data supplier.

A data supplier is a function that supplies a value to a column.

In the steps below, we add to the input resources:

  • the input file name
  • the input file extension
  • and an increasing sequence

Steps

The input example

In this example, we will showcase the enrich operation with the enrich-me.md

With the cat command

tabul data cat pipeline/enrich/enrich-me.md@howto

We can see the content:

This is a file used in the [enrich pipeline](../enrich.yml)
for demonstration


The pipeline

In this example, the pipeline:

kind: pipeline
spec:
  steps:
    - operation: 'define'
      arguments:
        data-resource:
          data-uri: 'pipeline/enrich/enrich-me.md@howto'
    - operation: 'enrich'
      arguments:
        data-def:
          columns:
            - name: file_name
              data-supplier:
                type: meta
                arguments:
                  attribute: name
            - name: file_extension
              data-supplier:
                type: expression
                arguments:
                  column-variable: file_name
                  expression: "file_name.split('.').pop()"
            - name: line_id
              type: integer
              data-supplier:
                type: sequence
    - operation: 'print'


The execution result

By executing it, we can see the 4 columns created.

  • the lines column with the content of the file (lines is the value of the text file column-name attribute)
  • the file_name column with the name of the input
  • the file_extension column with the extension of the input
  • the line_id column with the line id. We have 2 lines.
tabul flow execute --no-results pipeline/enrich.yml@howto
pipeline/enrich/enrich-me.md@howto
lines                                                         file_name      file_extension   line_id
-----------------------------------------------------------   ------------   --------------   -------
This is a file used in the [enrich pipeline](../enrich.yml)   enrich-me.md   md                     1
for demonstration                                             enrich-me.md   md                     2




Related Pages
Data Operation - Enrich

enrich is a intermediate data operations that adds one or several virtual columns to a data resource. Name Description data-def The data definition with only the added columns and their data...

Task Runner