Skip to main content

Documentation Index

Fetch the complete documentation index at: https://openmetadata-feat-feat-2mbfixdeploy.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Kubernetes Native Orchestrator

Starting with OpenMetadata 1.12, you can run ingestion pipelines directly using native Kubernetes, eliminating the need for Apache Airflow. This is ideal for organizations that:
  • Already run workloads on Kubernetes and prefer native solutions
  • Don’t need the full feature set of Apache Airflow

Orchestration Modes

The Kubernetes orchestrator supports two modes for running ingestion pipelines: Uses custom Kubernetes CRDs (OMJob and CronOMJob) managed by the OpenMetadata operator.
ResourceDescription
CronOMJobScheduled pipelines - runs on a cron schedule
OMJobOn-demand pipelines - one-off execution when triggered
Recommended for production. The OMJob Operator provides guaranteed exit handler execution and failure diagnostics.
Advantages:
  • Exit Handler Guarantee: Even if the ingestion pod crashes (OOMKilled, node failure, etc.), the operator ensures pipeline status is always reported back to OpenMetadata
  • Failure Diagnostics: Automatically collects detailed error context from pod logs and events when pipelines fail
  • Pod Lifecycle Monitoring: The operator watches pod events and updates pipeline status in real-time
Requirements:
  • Elevated permissions to install Custom Resource Definitions (CRDs)
  • The OMJob Operator deployment running in your cluster

Option 2: Native Kubernetes Jobs

Uses standard Kubernetes resources (Job and CronJob) without any custom CRDs.
ResourceDescription
CronJobScheduled pipelines - runs on a cron schedule
JobOn-demand pipelines - one-off execution when triggered
Advantages:
  • No CRD installation required - uses only built-in Kubernetes resources
  • Works in environments with restricted permissions
  • Simpler setup
Limitations:
  • No guaranteed exit handler - if a pod is killed unexpectedly, status updates may not reach OpenMetadata
  • No automatic failure diagnostics

Features

Native K8s Integration

Pipelines run as standard Kubernetes Jobs, making them easy to monitor with existing K8s tooling.

Automatic Status Updates

Pipeline status is automatically reported back to OpenMetadata, including success/failure details.

Failure Diagnostics

When pipelines fail, detailed diagnostics are collected from pod logs and events. (OMJob Operator only)

Resource Control

Configure CPU, memory, node selectors, and security contexts for ingestion pods.

This setup uses custom CRDs for guaranteed exit handler execution and failure diagnostics.

Prerequisites

  1. OpenMetadata deployed on Kubernetes (Helm chart recommended)
  2. Permissions to install CRDs in your cluster
  3. Ingestion image accessible from your cluster (docker.getcollate.io/openmetadata/ingestion-base)

Helm Values Configuration

# Enable the OMJob Operator
omjobOperator:
  enabled: true
  image:
    repository: docker.getcollate.io/openmetadata/omjob-operator
    tag: "1.12.0"
    pullPolicy: IfNotPresent
  resources:
    requests:
      cpu: "100m"
      memory: "128Mi"
    limits:
      cpu: "500m"
      memory: "256Mi"

openmetadata:
  config:
    pipelineServiceClientConfig:
      enabled: true
      type: "k8s"
      metadataApiEndpoint: http://openmetadata:8585/api

      k8s:
        # Use the OMJob Operator
        useOMJobOperator: true
        
        # Container image for ingestion jobs
        ingestionImage: "docker.getcollate.io/openmetadata/ingestion-base:1.12.0"
        imagePullPolicy: "IfNotPresent"
        imagePullSecrets: ""
        
        # Service account for ingestion jobs
        serviceAccountName: "openmetadata-ingestion"
        
        # Job lifecycle settings
        ttlSecondsAfterFinished: 86400  # Keep completed jobs for 24 hours
        activeDeadlineSeconds: 7200      # Max 2 hour runtime
        backoffLimit: 3                  # Retry up to 3 times
        
        # Job history
        successfulJobsHistoryLimit: 3
        failedJobsHistoryLimit: 3
        
        # Pod security context
        securityContext:
          runAsUser: 1000
          runAsGroup: 1000
          fsGroup: 1000
          runAsNonRoot: true
        
        # Resource limits
        resources:
          limits:
            cpu: "2"
            memory: "4Gi"
          requests:
            cpu: "500m"
            memory: "1Gi"
        
        # Enable failure diagnostics (only works with OMJob Operator)
        enableFailureDiagnostics: true
        
        # RBAC - set to false if managed externally
        rbac:
          enabled: true

Required RBAC Permissions

When using the OMJob Operator, additional permissions are needed for the custom resources:
rules:
  # Pod management for pipeline jobs and diagnostics
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # ConfigMaps for pipeline configuration
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # Secrets for pipeline credentials
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # Events for diagnostics
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list"]
  # Jobs and CronJobs management
  - apiGroups: ["batch"]
    resources: ["jobs", "cronjobs"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # OMJob CRDs
  - apiGroups: ["pipelines.openmetadata.org"]
    resources: ["omjobs"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  - apiGroups: ["pipelines.openmetadata.org"]
    resources: ["omjobs/status"]
    verbs: ["get", "patch"]
  - apiGroups: ["pipelines.openmetadata.org"]
    resources: ["cronomjobs"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  - apiGroups: ["pipelines.openmetadata.org"]
    resources: ["cronomjobs/status"]
    verbs: ["get", "patch"]

Setup Option 2: Native Kubernetes Jobs

This setup uses standard Kubernetes Jobs and CronJobs without any custom CRDs.

Prerequisites

  1. OpenMetadata deployed on Kubernetes (Helm chart recommended)
  2. RBAC permissions for the OpenMetadata service account to manage Jobs, CronJobs, ConfigMaps, and Secrets
  3. Ingestion image accessible from your cluster (docker.getcollate.io/openmetadata/ingestion-base)

Helm Values Configuration

openmetadata:
  config:
    pipelineServiceClientConfig:
      enabled: true
      type: "k8s"
      metadataApiEndpoint: http://openmetadata:8585/api

      k8s:
        # Do NOT use the OMJob Operator (default)
        useOMJobOperator: false
        
        # Container image for ingestion jobs
        ingestionImage: "docker.getcollate.io/openmetadata/ingestion-base:1.12.0"
        imagePullPolicy: "IfNotPresent"
        imagePullSecrets: ""
        
        # Service account for ingestion jobs
        serviceAccountName: "openmetadata-ingestion"
        
        # Job lifecycle settings
        ttlSecondsAfterFinished: 86400
        activeDeadlineSeconds: 7200
        backoffLimit: 3
        
        # Job history
        successfulJobsHistoryLimit: 3
        failedJobsHistoryLimit: 3
        
        # Pod security context
        securityContext:
          runAsUser: 1000
          runAsGroup: 1000
          fsGroup: 1000
          runAsNonRoot: true
        
        # Resource limits
        resources:
          limits:
            cpu: "2"
            memory: "4Gi"
          requests:
            cpu: "500m"
            memory: "1Gi"
        
        # RBAC - set to false if managed externally
        rbac:
          enabled: true

Required RBAC Permissions

rules:
  # Pod management for pipeline jobs
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # ConfigMaps for pipeline configuration
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # Secrets for pipeline credentials
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # Events for diagnostics
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list"]
  # Jobs and CronJobs management
  - apiGroups: ["batch"]
    resources: ["jobs", "cronjobs"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]

For validating your setup, viewing pipeline logs, troubleshooting, and migrating from Airflow, see the Operations & Troubleshooting guide.