This repository has been archived by the owner on May 17, 2024. It is now read-only.
The data-diff shows incorrect row differences between Postgres and BigQuery table #532
Labels
bug
Something isn't working
non-dbt
Use cases outside of dbt
stale_immune
Immunity to stale bot
triage
I ran the following command to use data-diff for a comparison of tables between PostgreSQL and BigQuery:
Make sure to include the following (minus sensitive information):
The primary key column in both tables is id, with a format of application-xxxxxxxxYYYYYY-zzzz (e.g., application-2gXxHwCux0HqFNsV721hvTPhJRa97A-0dJgh).
Issue:
The data-diff output showed a difference of >500 rows between the two tables, while the actual difference is only around 4 records.
Further Details:
PostgreSQL version: PostgreSQL 13.4.
After the segment split, data-diff prepares the following queries for both Postgres and BigQuery (multiple queries like this are created for each segment):
PostgreSQL query (generated by data-diff):
BigQuery query (generated by data-diff):
Upon running these queries separately on Postgres and BigQuery, there is a significant difference in row count.
The text was updated successfully, but these errors were encountered: