dagster_celery_k8s.
CeleryK8sRunLauncher
RunLauncher[source]¶The name
of an existing Volume to mount into the pod in order to provide a ConfigMap for the Dagster instance. This Volume should contain a dagster.yaml
with appropriate values for run storage, event log storage, etc.
The name of the Kubernetes Secret where the postgres password can be retrieved. Will be mounted and supplied as an environment variable to the Job Pod.Secret must contain the key "postgresql-password"
which will be exposed in the Job environment as the environment variable DAGSTER_PG_PASSWORD
.
The location of DAGSTER_HOME in the Job container; this is where the dagster.yaml
file will be mounted from the instance ConfigMap specified here. Defaults to /opt/dagster/dagster_home.
Default Value: ‘/opt/dagster/dagster_home’
Set this value if you are running the launcher
within a k8s cluster. If True
, we assume the launcher is running within the target
cluster and load config using kubernetes.config.load_incluster_config
. Otherwise,
we will use the k8s config specified in kubeconfig_file
(using
kubernetes.config.load_kube_config
) or fall back to the default kubeconfig.
Default Value: True
The kubeconfig file from which to load config. Defaults to using the default kubeconfig.
Default Value: None
Whether the launched Kubernetes Jobs and Pods should fail if the Dagster run fails
Docker image to use for launched Jobs. If this field is empty, the image that was used to originally load the Dagster repository will be used.(Ex: “mycompany.com/dagster-k8s-image:latest”).
Image pull policy to set on the launched task Job Pods. Defaults to “IfNotPresent”.
(Advanced) Specifies that Kubernetes should get the credentials from the Secrets named in this list.
(Advanced) Override the name of the Kubernetes service account under which to run the Job.
A list of custom ConfigMapEnvSource names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/#define-an-environment-variable-for-a-container
A list of custom Secret names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of environment variables to inject into the Job. Default: []
. See: https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of volume mounts to include in the job’s container. Default: []
. See: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volumemount-v1-core
Default Value: []
A list of volumes to include in the Job’s Pod. Default: []
. For the many possible volume source types that can be included, see: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volume-v1-core
Default Value: []
Additional labels that should be included in the Job’s Pod. See: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
The URL of the Celery broker. Default: ‘pyamqp://guest@{os.getenv(‘DAGSTER_CELERY_BROKER_HOST’,’localhost’)}//’.
The URL of the Celery results backend. Default: ‘rpc://’.
Default Value: ‘rpc://’
List of modules every worker should import
Additional settings for the Celery app.
{
"enabled": {}
}
{}
{}
In contrast to the K8sRunLauncher
, which launches dagster runs as single K8s
Jobs, this run launcher is intended for use in concert with
dagster_celery_k8s.celery_k8s_job_executor()
.
With this run launcher, execution is delegated to:
A run worker Kubernetes Job, which traverses the dagster run execution plan and submits steps to Celery queues for execution;
The step executions which are submitted to Celery queues are picked up by Celery workers, and each step execution spawns a step execution Kubernetes Job. See the implementation defined in
dagster_celery_k8.executor.create_k8s_job_task()
.
You can configure a Dagster instance to use this RunLauncher by adding a section to your
dagster.yaml
like the following:
run_launcher:
module: dagster_k8s.launcher
class: CeleryK8sRunLauncher
config:
instance_config_map: "dagster-k8s-instance-config-map"
dagster_home: "/some/path"
postgres_password_secret: "dagster-k8s-pg-password"
broker: "some_celery_broker_url"
backend: "some_celery_backend_url"
dagster_celery_k8s.
celery_k8s_job_executor
ExecutorDefinition[source]¶The URL of the Celery broker. Default: ‘pyamqp://guest@{os.getenv(‘DAGSTER_CELERY_BROKER_HOST’,’localhost’)}//’.
The URL of the Celery results backend. Default: ‘rpc://’.
Default Value: ‘rpc://’
List of modules every worker should import
Additional settings for the Celery app.
{
"enabled": {}
}
{}
{}
Docker image to use for launched Jobs. If this field is empty, the image that was used to originally load the Dagster repository will be used.(Ex: “mycompany.com/dagster-k8s-image:latest”).
Image pull policy to set on the launched task Job Pods. Defaults to “IfNotPresent”.
(Advanced) Specifies that Kubernetes should get the credentials from the Secrets named in this list.
(Advanced) Override the name of the Kubernetes service account under which to run the Job.
A list of custom ConfigMapEnvSource names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/#define-an-environment-variable-for-a-container
A list of custom Secret names from which to draw environment variables (using envFrom
) for the Job. Default: []
. See:https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of environment variables to inject into the Job. Default: []
. See: https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables
A list of volume mounts to include in the job’s container. Default: []
. See: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volumemount-v1-core
Default Value: []
A list of volumes to include in the Job’s Pod. Default: []
. For the many possible volume source types that can be included, see: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volume-v1-core
Default Value: []
Additional labels that should be included in the Job’s Pod. See: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
Set this value if you are running the launcher within a k8s cluster. If
True
, we assume the launcher is running within the target cluster and load config
using kubernetes.config.load_incluster_config
. Otherwise, we will use the k8s config
specified in kubeconfig_file
(using kubernetes.config.load_kube_config
) or fall
back to the default kubeconfig. Default: True
.
Default Value: True
Path to a kubeconfig file to use, if not using default kubeconfig.
The namespace into which to launch new jobs. Note that any other Kubernetes resources the Job requires (such as the service account) must be present in this namespace. Default: "default"
Default Value: ‘default’
The repository location name to use for execution.
Default Value: ‘<<in_process>>’
Wait this many seconds for a job to complete before marking the run as failed. Defaults to 86400.0 seconds.
Default Value: 86400.0
Celery-based executor which launches tasks as Kubernetes Jobs.
The Celery executor exposes config settings for the underlying Celery app under
the config_source
key. This config corresponds to the “new lowercase settings” introduced
in Celery version 4.0 and the object constructed from config will be passed to the
celery.Celery
constructor as its config_source
argument.
(See https://docs.celeryproject.org/en/latest/userguide/configuration.html for details.)
The executor also exposes the broker
, backend, and include
arguments to the
celery.Celery
constructor.
In the most common case, you may want to modify the broker
and backend
(e.g., to use
Redis instead of RabbitMQ). We expect that config_source
will be less frequently
modified, but that when op executions are especially fast or slow, or when there are
different requirements around idempotence or retry, it may make sense to execute dagster jobs
with variations on these settings.
To use the celery_k8s_job_executor, set it as the executor_def when defining a job:
from dagster_celery_k8s.executor import celery_k8s_job_executor
from dagster import job
@job(executor_def=celery_k8s_job_executor)
def celery_enabled_job():
pass
Then you can configure the executor as follows:
execution:
config:
job_image: 'my_repo.com/image_name:latest'
job_namespace: 'some-namespace'
broker: 'pyamqp://guest@localhost//' # Optional[str]: The URL of the Celery broker
backend: 'rpc://' # Optional[str]: The URL of the Celery results backend
include: ['my_module'] # Optional[List[str]]: Modules every worker should import
config_source: # Dict[str, Any]: Any additional parameters to pass to the
#... # Celery workers. This dict will be passed as the `config_source`
#... # argument of celery.Celery().
Note that the YAML you provide here must align with the configuration with which the Celery workers on which you hope to run were started. If, for example, you point the executor at a different broker than the one your workers are listening to, the workers will never be able to pick up tasks for execution.
In deployments where the celery_k8s_job_executor is used all appropriate celery and dagster_celery commands must be invoked with the -A dagster_celery_k8s.app argument.