You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When executing a delta merge operation, if the affected table is read t he lineage of it is affected. So if a DataFrame is checked before and after the merge operation, it can have different values even if cached.
Steps to reproduce
Pyspark code
from pyspark.sql.types import StructType, StructField, StringType
from delta.tables import DeltaTable
from pyspark.sql import DataFrame
print("Start")
table = "MYTABLE"
path = "MYPATH"
schema = StructType([StructField("id", StringType(), True)])
df = spark.createDataFrame(
[("A",), ("B",), ("C",), ("D",)],
schema
)
df_del = spark.createDataFrame(
[("A",), ("B",),("OTHER",)],
schema
)
df.write.format("delta").mode("overwrite").saveAsTable(table)
df_read = spark.read.format("delta").load(path).cache()
df_read.show() # First read
delta_table = DeltaTable.forPath(df.sparkSession, path)
delta_table.alias("target").merge(
source=df_del.alias("source"),
condition=" AND ".join([f"target.{pk} = source.{pk}" for pk in df_del.columns]),
).whenMatchedDelete().execute()
df_read.show() # Second read. It changed!
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
No. I cannot contribute a bug fix at this time.
The text was updated successfully, but these errors were encountered:
Bug
Which Delta project/connector is this regarding?
Describe the problem
When executing a delta merge operation, if the affected table is read t he lineage of it is affected. So if a DataFrame is checked before and after the merge operation, it can have different values even if cached.
Steps to reproduce
Pyspark code
Observed results
Expected results
Further details
I am executing this code in Fabric.
Environment information
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
The text was updated successfully, but these errors were encountered: