The Data Architecture Decisions That Actually Matter (Before You Write a Single Query)
Douglas da Silva | Mar 31, 2026
Most data pipelines are built around a single language, but the real question isn’t which one is better, but which one is right for each job.
A hybrid pipeline applies Python where flexibility matters (ingestion, parsing, complex transforms) and SQL + dbt where readability and governance take over (business modeling, analytics). This is the logic behind the medallion architecture (Bronze → Silver → Gold).
Layering two runtimes adds complexity, but also opens the door for AI agents to handle the repetitive work across both worlds, keeping architectural decisions where they belong: with the engineers.
A hybrid pipeline divides responsibilities across three layers, commonly called the medallion architecture. Each layer has a clear owner: Python handles the early, messy work; SQL and dbt take over once the data is clean and ready for analysis. The boundary between them it’s where the nature of the problem changes.
Python reads data from all source systems (REST APIs, flat files, event streams, databases) and lands it as-is into storage. No transformations happen here; the raw data is preserved exactly as received for auditability and reprocessing.
Python unnests nested structures (JSON, XML, Avro), applies type casting, deduplicates records, and enforces schema validation and data quality. The result is clean, flat, typed data stored in a columnar format (e.g. Parquet or Delta).
dbt builds business-ready tables using SQL: aggregations, joins, slowly changing dimensions, and data mart views. These are the tables analysts and BI tools query directly.
Read more: The biggest benefits of Python, according to Cheesecake Labs
Python has a far richer ecosystem for complex data manipulation — libraries like pandas, polars, pyarrow, and or json handle deeply nested structures, binary formats, and custom parsing logic far more elegantly than data warehouse built-in functions (functions like Unnesting and Regex). This avoids being locked into vendor-specific SQL dialects for tasks they were not designed for.
Python also brings stronger data quality tooling to the Silver layer. Libraries like Great Expectations and Pandera let you define and enforce schema contracts, catch anomalies early, and fail pipelines before bad data reaches analysts.
SQL and dbt in the Gold layer align with how most data analysts and business stakeholders actually think. SQL is the language of analytics; readable, reviewable in pull requests, and self-documenting when paired with dbt’s schema files and tests. dbt also adds lineage tracking, automated testing, and documentation generation out of the box, making the Gold layer far easier to govern and maintain as the team grows.
Read more: The Data Architecture Decisions That Actually Matter (Before You Write a Single Query)
This pattern is a strong fit when source data is structurally complex (deeply nested, semi-structured, or from many heterogeneous sources), the analytics team is SQL-fluent, and the engineering team has Python expertise.
It tends to struggle when the team is very small or when source data is already well-structured enough that Python’s added flexibility provides little real benefit.
Each layer of the pipeline has a distinct knowledge domain — Python for ingestion and complex unnesting transformation, SQL and dbt conventions for modeling and business logic. Here is where AI will help the most, with the use of agents and skill to help us create and maintain this challenging multilingual environment. This dramatically improves code quality and reduces the amount of prompting needed to get useful output.
Code generation is the most immediate win. A well-configured set of skills can scaffold an entire Bronze ingestion script, like a skill for an API ingestion, a skill for data quality, a Silver unnesting transform, or a dbt staging model in seconds — and because it knows the conventions, the output rarely needs significant editing.
Consistency enforcement is the less obvious but arguably more valuable benefit. Agents apply the same patterns every time: error handling in every ingestion script, lineage columns in every Silver output, not_null + unique tests on every Gold primary key. This is hard to achieve with humans alone, especially as the team grows.
Maintenance assistance becomes critical when pipelines age. When a Silver schema changes (a new nested field appears, a source renames a column), the Python agent can generate the updated transform. When a Gold model needs a new mart or an additional aggregation, the dbt agent can produce it with the correct structure without the engineer needing to look up the dbt docs.
Onboarding is another strong use case. New engineers can query the agent to understand what a layer does, why a particular pattern was chosen, or how to add a new source — without pulling a senior engineer into every question.
Read more: Beyond “Vibe Coding”: Engineering with AI and Cursor
Agents do not replace code review, data quality monitoring, or architectural decisions. The agent generates code based on the context it was given — if your Silver layer evolves significantly, the system prompt needs to be updated to reflect reality.
Think of the system prompt as a living document that encodes your team’s conventions, and treat updates to it with the same discipline as updates to a shared style guide.

As a Data Engineer at nok with over two years in the role, I specialize in leveraging tools like Google BigQuery to develop efficient data engineering solutions. My work focuses on creating, maintaining, and optimizing ETL and ELT processes, enabling seamless data integration and validation. My mission is to contribute to data-driven decision-making by employing advanced technologies and scalable methods in a collaborative and innovative environment.