Documentation Index Fetch the complete documentation index at: https://openmetadata-feat-feat-2mbfixdeploy.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Kubernetes Native Orchestrator
Starting with OpenMetadata 1.12, you can run ingestion pipelines directly using native Kubernetes ,
eliminating the need for Apache Airflow. This is ideal for organizations that:
Already run workloads on Kubernetes and prefer native solutions
Don’t need the full feature set of Apache Airflow
Orchestration Modes
The Kubernetes orchestrator supports two modes for running ingestion pipelines:
Option 1: OMJob Operator (Recommended)
Uses custom Kubernetes CRDs (OMJob and CronOMJob) managed by the OpenMetadata operator.
Resource Description CronOMJob Scheduled pipelines - runs on a cron schedule OMJob On-demand pipelines - one-off execution when triggered
Recommended for production. The OMJob Operator provides guaranteed exit handler execution and failure diagnostics.
Advantages:
Exit Handler Guarantee : Even if the ingestion pod crashes (OOMKilled, node failure, etc.), the operator ensures pipeline status is always reported back to OpenMetadata
Failure Diagnostics : Automatically collects detailed error context from pod logs and events when pipelines fail
Pod Lifecycle Monitoring : The operator watches pod events and updates pipeline status in real-time
Requirements:
Elevated permissions to install Custom Resource Definitions (CRDs)
The OMJob Operator deployment running in your cluster
Option 2: Native Kubernetes Jobs
Uses standard Kubernetes resources (Job and CronJob) without any custom CRDs.
Resource Description CronJob Scheduled pipelines - runs on a cron schedule Job On-demand pipelines - one-off execution when triggered
Advantages:
No CRD installation required - uses only built-in Kubernetes resources
Works in environments with restricted permissions
Simpler setup
Limitations:
No guaranteed exit handler - if a pod is killed unexpectedly, status updates may not reach OpenMetadata
No automatic failure diagnostics
Features
Native K8s Integration Pipelines run as standard Kubernetes Jobs, making them easy to monitor with existing K8s tooling.
Automatic Status Updates Pipeline status is automatically reported back to OpenMetadata, including success/failure details.
Failure Diagnostics When pipelines fail, detailed diagnostics are collected from pod logs and events. (OMJob Operator only)
Resource Control Configure CPU, memory, node selectors, and security contexts for ingestion pods.
Setup Option 1: OMJob Operator (Recommended)
This setup uses custom CRDs for guaranteed exit handler execution and failure diagnostics.
Prerequisites
OpenMetadata deployed on Kubernetes (Helm chart recommended)
Permissions to install CRDs in your cluster
Ingestion image accessible from your cluster (docker.getcollate.io/openmetadata/ingestion-base)
Helm Values Configuration
# Enable the OMJob Operator
omjobOperator :
enabled : true
image :
repository : docker.getcollate.io/openmetadata/omjob-operator
tag : "1.12.0"
pullPolicy : IfNotPresent
resources :
requests :
cpu : "100m"
memory : "128Mi"
limits :
cpu : "500m"
memory : "256Mi"
openmetadata :
config :
pipelineServiceClientConfig :
enabled : true
type : "k8s"
metadataApiEndpoint : http://openmetadata:8585/api
k8s :
# Use the OMJob Operator
useOMJobOperator : true
# Container image for ingestion jobs
ingestionImage : "docker.getcollate.io/openmetadata/ingestion-base:1.12.0"
imagePullPolicy : "IfNotPresent"
imagePullSecrets : ""
# Service account for ingestion jobs
serviceAccountName : "openmetadata-ingestion"
# Job lifecycle settings
ttlSecondsAfterFinished : 86400 # Keep completed jobs for 24 hours
activeDeadlineSeconds : 7200 # Max 2 hour runtime
backoffLimit : 3 # Retry up to 3 times
# Job history
successfulJobsHistoryLimit : 3
failedJobsHistoryLimit : 3
# Pod security context
securityContext :
runAsUser : 1000
runAsGroup : 1000
fsGroup : 1000
runAsNonRoot : true
# Resource limits
resources :
limits :
cpu : "2"
memory : "4Gi"
requests :
cpu : "500m"
memory : "1Gi"
# Enable failure diagnostics (only works with OMJob Operator)
enableFailureDiagnostics : true
# RBAC - set to false if managed externally
rbac :
enabled : true
Required RBAC Permissions
When using the OMJob Operator, additional permissions are needed for the custom resources:
rules :
# Pod management for pipeline jobs and diagnostics
- apiGroups : [ "" ]
resources : [ "pods" , "pods/log" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]
# ConfigMaps for pipeline configuration
- apiGroups : [ "" ]
resources : [ "configmaps" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]
# Secrets for pipeline credentials
- apiGroups : [ "" ]
resources : [ "secrets" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]
# Events for diagnostics
- apiGroups : [ "" ]
resources : [ "events" ]
verbs : [ "get" , "list" ]
# Jobs and CronJobs management
- apiGroups : [ "batch" ]
resources : [ "jobs" , "cronjobs" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]
# OMJob CRDs
- apiGroups : [ "pipelines.openmetadata.org" ]
resources : [ "omjobs" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]
- apiGroups : [ "pipelines.openmetadata.org" ]
resources : [ "omjobs/status" ]
verbs : [ "get" , "patch" ]
- apiGroups : [ "pipelines.openmetadata.org" ]
resources : [ "cronomjobs" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]
- apiGroups : [ "pipelines.openmetadata.org" ]
resources : [ "cronomjobs/status" ]
verbs : [ "get" , "patch" ]
Setup Option 2: Native Kubernetes Jobs
This setup uses standard Kubernetes Jobs and CronJobs without any custom CRDs.
Prerequisites
OpenMetadata deployed on Kubernetes (Helm chart recommended)
RBAC permissions for the OpenMetadata service account to manage Jobs, CronJobs, ConfigMaps, and Secrets
Ingestion image accessible from your cluster (docker.getcollate.io/openmetadata/ingestion-base)
Helm Values Configuration
openmetadata :
config :
pipelineServiceClientConfig :
enabled : true
type : "k8s"
metadataApiEndpoint : http://openmetadata:8585/api
k8s :
# Do NOT use the OMJob Operator (default)
useOMJobOperator : false
# Container image for ingestion jobs
ingestionImage : "docker.getcollate.io/openmetadata/ingestion-base:1.12.0"
imagePullPolicy : "IfNotPresent"
imagePullSecrets : ""
# Service account for ingestion jobs
serviceAccountName : "openmetadata-ingestion"
# Job lifecycle settings
ttlSecondsAfterFinished : 86400
activeDeadlineSeconds : 7200
backoffLimit : 3
# Job history
successfulJobsHistoryLimit : 3
failedJobsHistoryLimit : 3
# Pod security context
securityContext :
runAsUser : 1000
runAsGroup : 1000
fsGroup : 1000
runAsNonRoot : true
# Resource limits
resources :
limits :
cpu : "2"
memory : "4Gi"
requests :
cpu : "500m"
memory : "1Gi"
# RBAC - set to false if managed externally
rbac :
enabled : true
Required RBAC Permissions
rules :
# Pod management for pipeline jobs
- apiGroups : [ "" ]
resources : [ "pods" , "pods/log" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]
# ConfigMaps for pipeline configuration
- apiGroups : [ "" ]
resources : [ "configmaps" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]
# Secrets for pipeline credentials
- apiGroups : [ "" ]
resources : [ "secrets" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]
# Events for diagnostics
- apiGroups : [ "" ]
resources : [ "events" ]
verbs : [ "get" , "list" ]
# Jobs and CronJobs management
- apiGroups : [ "batch" ]
resources : [ "jobs" , "cronjobs" ]
verbs : [ "get" , "list" , "create" , "update" , "patch" , "delete" ]