Documentation Index
Fetch the complete documentation index at: https://openmetadata-feat-feat-2mbfixdeploy.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Lineage Ingestion
Lineage ingestion extracts data flow relationships from SQL queries — which tables feed into which other tables, and at what column granularity. It parses query logs, view definitions, and stored procedures to build a lineage graph published to OpenMetadata.Pipeline Overview
Lineage uses the same query log sources as usage ingestion, but processes queries differently — it analyzes directional data flow (source → target) rather than counting references.LineageSource parses queries, builds the lineage graph, resolves entities, and yields ready-to-publish AddLineageRequest objects.
Lineage Extraction Call Chain
SQL Parsing
LineageParser
LineageParser (lineage/parser.py) is the core SQL analysis engine shared with usage ingestion. For lineage, it extracts directional information:
Cascade Parsing Strategy
Three parsers are tried in order, with 30-second timeouts and 100MB memory limits each:- SqlGlot — preferred for accuracy, handles most dialects
- SqlFluff — fallback, good for complex SQL
- SqlParse — final fallback, always succeeds (less accurate)
Query Cleaning
Before parsing, queries are cleaned:- Remove
COPY GRANTS(Snowflake) - Remove
MERGE...WHEN MATCHEDclauses (too complex for parsers) - Filter out
CREATE TRIGGER/FUNCTION/PROCEDURE(no lineage value) - Normalize escape sequences
Query Masking
QueryMasker (lineage/masker.py) replaces literal values with ? before storing queries in lineage details — prevents sensitive data leakage while preserving query structure.
Lineage-Relevant Query Types
Not all queries produce lineage. Each database connector filters for specific query types:| Query Pattern | Lineage Produced |
|---|---|
CREATE TABLE AS SELECT ... | source tables → new table |
INSERT INTO ... SELECT ... | source tables → target table |
UPDATE ... FROM ... | source tables → target table |
MERGE INTO ... USING ... | source table → target table |
CREATE VIEW AS ... | source tables → view |
| Stored procedure body | varies (parsed recursively) |
SELECT, DROP, or DDL without data movement are filtered out.
Graph-Based Lineage Analysis
lineage/sql_lineage.py uses NetworkX directed graphs to handle complex lineage scenarios.
Direct Lineage
For simple queries (INSERT INTO target SELECT FROM source):
AddLineageRequest is created.
Intermediate Table Lineage
For queries involving temp/staging tables:get_lineage_by_graph() traces paths through the graph:
- Find weakly connected components
- Extract root-to-leaf paths (max depth: 20)
- Create lineage request for each hop
Column-Level Lineage
For each (source, target) edge, the parser maps individual columns:- Simple column mappings (
col1 → col1) - Renamed columns (
source.old_name → target.new_name) - Star selects (
* → individual columns) - Expressions (
source.a + source.b → target.sum_ab) - Intermediate column mappings through staging tables
For entity resolution, cross-database lineage, stored procedures, parallel processing, publishing, dialect mapping, and configuration, see Lineage Ingestion — Advanced Topics.