Summary

A hybrid data pipeline uses Python for ingestion, parsing, and complex transformations (Bronze and Silver layers) and SQL with dbt for business modeling and analytics (Gold layer), following the medallion architecture.
Combining the two leverages Python's richer ecosystem and data quality tooling for messy early-stage work while using SQL and dbt for readable, governable, and well-documented analytics tables.
The approach introduces risks such as operational complexity from running two runtimes, harder cross-layer debugging, context switching for engineers, split testing strategies, and orchestration coupling that requires a coordinator like Airflow, Prefect, or Dagster.
AI agents can automate code generation, enforce consistency, assist with maintenance, and support onboarding across both languages, but they do not replace code review, data quality monitoring, or architectural decisions, and their system prompt must be kept updated as the pipeline evolves.

Most data pipelines are built around a single language, but the real question isn’t which one is better, but which one is right for each job.

A hybrid pipeline applies Python where flexibility matters (ingestion, parsing, complex transforms) and SQL + dbt where readability and governance take over (business modeling, analytics). This is the logic behind the medallion architecture (Bronze → Silver → Gold).

Layering two runtimes adds complexity, but also opens the door for AI agents to handle the repetitive work across both worlds, keeping architectural decisions where they belong: with the engineers.

What are the layers of a hybrid data pipeline?

A hybrid pipeline divides responsibilities across three layers, commonly called the medallion architecture. Each layer has a clear owner: Python handles the early, messy work; SQL and dbt take over once the data is clean and ready for analysis. The boundary between them it’s where the nature of the problem changes.

Bronze: Raw ingestion (Python)

Python reads data from all source systems (REST APIs, flat files, event streams, databases) and lands it as-is into storage. No transformations happen here; the raw data is preserved exactly as received for auditability and reprocessing.

Silver: Cleansing & transformation (Python)

Python unnests nested structures (JSON, XML, Avro), applies type casting, deduplicates records, and enforces schema validation and data quality. The result is clean, flat, typed data stored in a columnar format (e.g. Parquet or Delta).

Gold: Business models (SQL + dbt)

dbt builds business-ready tables using SQL: aggregations, joins, slowly changing dimensions, and data mart views. These are the tables analysts and BI tools query directly.

Read more: The biggest benefits of Python, according to Cheesecake Labs

Why combine Python and SQL in a data pipeline?

Python has a far richer ecosystem for complex data manipulation — libraries like pandas, polars, pyarrow, and or json handle deeply nested structures, binary formats, and custom parsing logic far more elegantly than data warehouse built-in functions (functions like Unnesting and Regex). This avoids being locked into vendor-specific SQL dialects for tasks they were not designed for.

Python also brings stronger data quality tooling to the Silver layer. Libraries like Great Expectations and Pandera let you define and enforce schema contracts, catch anomalies early, and fail pipelines before bad data reaches analysts.

SQL and dbt in the Gold layer align with how most data analysts and business stakeholders actually think. SQL is the language of analytics; readable, reviewable in pull requests, and self-documenting when paired with dbt’s schema files and tests. dbt also adds lineage tracking, automated testing, and documentation generation out of the box, making the Gold layer far easier to govern and maintain as the team grows.

Read more: The Data Architecture Decisions That Actually Matter (Before You Write a Single Query)

What are the risks of a hybrid Python + dbt pipeline?

Operational complexity: Running two runtimes (Python orchestration + dbt) means two sets of dependencies, two deployment surfaces, and two failure modes to monitor. Teams need to be comfortable with both paradigms.
Debugging across layers is harder: When a data quality issue surfaces in the Gold layer, the root cause could be in the Python ingestion (Bronze), the Python transformation (Silver), or the dbt model (Gold). Tracing lineage across language boundaries requires good observability tooling (e.g. OpenLineage, dbt artifacts + Airflow logs).
Context switching for engineers: This architecture requires proficiency in both Python and SQL/dbt. On smaller teams, that dual requirement slows delivery and concentrates knowledge risk — when the one engineer who owns the Silver layer is out, progress stalls.
Testing strategy is split: Python unit tests (pytest) cover ingestion and transformation logic; dbt tests cover model correctness. End-to-end data quality doesn’t emerge from either in isolation, and it requires deliberately stitching both together, and that integration won’t happen on its own.
Orchestration coupling: The handoff between the Python Silver layer and the dbt Gold layer needs a coordinator like Airflow, Prefect, or Dagster. Define that boundary cleanly, or incremental runs and dependency management will become a reliability problem.

When should you use a hybrid medallion architecture?

This pattern is a strong fit when source data is structurally complex (deeply nested, semi-structured, or from many heterogeneous sources), the analytics team is SQL-fluent, and the engineering team has Python expertise.

It tends to struggle when the team is very small or when source data is already well-structured enough that Python’s added flexibility provides little real benefit.

How can AI agents help build and maintain data pipelines?

What role do AI agents play in a multilingual pipeline?

Each layer of the pipeline has a distinct knowledge domain — Python for ingestion and complex unnesting transformation, SQL and dbt conventions for modeling and business logic. Here is where AI will help the most, with the use of agents and skill to help us create and maintain this challenging multilingual environment. This dramatically improves code quality and reduces the amount of prompting needed to get useful output.

What tasks can AI agents automate in a Python + dbt pipeline?

Code generation is the most immediate win. A well-configured set of skills can scaffold an entire Bronze ingestion script, like a skill for an API ingestion, a skill for data quality, a Silver unnesting transform, or a dbt staging model in seconds — and because it knows the conventions, the output rarely needs significant editing.

Consistency enforcement is the less obvious but arguably more valuable benefit. Agents apply the same patterns every time: error handling in every ingestion script, lineage columns in every Silver output, not_null + unique tests on every Gold primary key. This is hard to achieve with humans alone, especially as the team grows.

Maintenance assistance becomes critical when pipelines age. When a Silver schema changes (a new nested field appears, a source renames a column), the Python agent can generate the updated transform. When a Gold model needs a new mart or an additional aggregation, the dbt agent can produce it with the correct structure without the engineer needing to look up the dbt docs.

Onboarding is another strong use case. New engineers can query the agent to understand what a layer does, why a particular pattern was chosen, or how to add a new source — without pulling a senior engineer into every question.

Read more: Beyond “Vibe Coding”: Engineering with AI and Cursor

What do AI agents still get wrong in data engineering?

Agents do not replace code review, data quality monitoring, or architectural decisions. The agent generates code based on the context it was given — if your Silver layer evolves significantly, the system prompt needs to be updated to reflect reality.

Think of the system prompt as a living document that encodes your team’s conventions, and treat updates to it with the same discipline as updates to a shared style guide.

FAQ

What is a hybrid data pipeline?

A hybrid pipeline applies Python where flexibility matters (ingestion, parsing, complex transforms) and SQL + dbt where readability and governance take over (business modeling, analytics), following the medallion architecture of Bronze, Silver, and Gold layers.

What are the layers of the medallion architecture and their roles?

Bronze uses Python for raw ingestion from source systems without transformations, preserving data as-is for auditability. Silver uses Python to unnest structures, apply type casting, deduplicate records, and enforce schema validation, producing clean typed data in columnar formats like Parquet or Delta. Gold uses SQL and dbt to build business-ready tables through aggregations, joins, slowly changing dimensions, and data mart views for analysts and BI tools.

Why combine Python and SQL in a data pipeline?

Python has a richer ecosystem for complex data manipulation (pandas, polars, pyarrow) that handles nested structures and custom parsing better than SQL warehouse functions, and offers stronger data quality tooling like Great Expectations and Pandera. SQL and dbt align with how analysts think, are readable and reviewable, and dbt adds lineage tracking, automated testing, and documentation generation for easier governance in the Gold layer.

What are the risks of using a hybrid Python + dbt pipeline?

Risks include operational complexity from running two runtimes with separate dependencies and failure modes, harder debugging across layers since issues could originate in Bronze, Silver, or Gold, context switching requiring proficiency in both Python and SQL/dbt, a split testing strategy between pytest and dbt tests, and orchestration coupling that requires a coordinator like Airflow, Prefect, or Dagster.

How can AI agents help with a Python + dbt data pipeline?

AI agents can automate code generation such as scaffolding Bronze ingestion scripts or Silver transforms, enforce consistency in patterns like error handling and lineage columns, assist with maintenance when schemas change, and support onboarding by helping new engineers understand pipeline layers. However, agents do not replace code review, data quality monitoring, or architectural decisions, and the system prompt must be kept updated to reflect the team's evolving conventions.

About the author.

Yuri Pontes

As a Data Engineer at nok with over two years in the role, I specialize in leveraging tools like Google BigQuery to develop efficient data engineering solutions. My work focuses on creating, maintaining, and optimizing ETL and ELT processes, enabling seamless data integration and validation. My mission is to contribute to data-driven decision-making by employing advanced technologies and scalable methods in a collaborative and innovative environment.

Python vs SQL in Data Pipelines: Why the Answer is Both

What are the layers of a hybrid data pipeline?

Bronze: Raw ingestion (Python)

Silver: Cleansing & transformation (Python)

Gold: Business models (SQL + dbt)

Why combine Python and SQL in a data pipeline?

What are the risks of a hybrid Python + dbt pipeline?

When should you use a hybrid medallion architecture?

How can AI agents help build and maintain data pipelines?

What role do AI agents play in a multilingual pipeline?

What tasks can AI agents automate in a Python + dbt pipeline?

What do AI agents still get wrong in data engineering?

FAQ

About the author.

See also.

When to Move Your Data Out of Spreadsheets?

The Data Architecture Decisions That Actually Matter (Before You Write a Single Query)

Your AI Strategy Has a Data Problem