-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc DeltaOptions as public API or make it a delta private class #598
Comments
I am also happy to file a PR as long as we have the agreement on this |
Only APIs showing up in the API doc are public. Totally agreed this is confusing. But we basically follow Spark. |
yeah, I mean the way to eliminate the confusion might be 1) make this as a public API, or 2) we make it as a package private class so people will not refer to it in application code (or at least not that easy to do it and when they do they implicitly accept the risk) |
I found it's pretty useful when debugging issues in a notebook environment. I'm inclined to leave it as it is and improve the document. |
how about DeltaLog...which is used even more, but it is still not a public API? |
It is not. Because we dont guarantee API compatibility across versions and
in the past we have refactored the structure and methods in that class.
Furthermore, we didnt start with the intention of making Delta Log public
so the way it is structured, it can expose a very larget surface area of
internal classes to be publicly accessible. Its non-trivial to set up those
public-private boundaries within and around the DeltaLog class.
Are their any specific functionality that you are trying to access?
…On Thu, Feb 18, 2021 at 11:17 PM Nan Zhu ***@***.***> wrote:
how about DeltaLog...which is used even more, but it is still not a public
API?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#598 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFB5LB6VYDPGJKEF4QGYI3S7XQ6HANCNFSM4XW7KIYA>
.
|
Hi, @tdas . There are several methods of DeltaLog I am interested in, e.g. getChanges, and even as simple as checkpointInterval in general our experience with delta lake is not that smooth in DBR, because there are many public classes we can get from our delta lake dependency (which has to be a open source version), and we can easily access all public classes under org.apache.spark.sql package...however, they will not run in DBR as all these classes are moved to somewhere else (we have to be use a very hacky way to make our code runnable in both our CI and DBR ) beyond the original purpose of filing this PR, I would like to know if there is any plan to make DBR and open source version more compatible which I personally feel is beneficial in terms of both user experience and community growth |
Faced same problem while using DeltaOption to create DeltaSink in DBR. Apparently both these two classes are private, did you find any workaround to your problems @CodingCat ? |
@TJZhou Which APIs in DeltaOption are you using? Is it possible to avoid using DeltaOption in your project? |
We created a DeltaOption instance and pass it into DeltaSink, which looks like
It's almost impossible to avoid it, and I also checked DeltaSink API is private in Databricks too. |
@TJZhou |
Spark doesn't have good support for writing to multiple, dynamic output locations so we had to interact with a lower level of the Deltalake API when writing this. We customize the sink like the following.
|
Hm, so you are trying to write to multiple delta tables in foreachBatch but still require exactly-once? |
Could you try https://docs.databricks.com/delta/delta-streaming.html#idempotent-multi-table-writes instead? |
Ahh that's some new features that we haven't tried before. Yep I shall give it a try. Thanks @zsxwing |
I was using Delta Lake and wanted set OVERWRITE_SCHEMA_OPTION to true, with
however this application is broken in Databricks since the internal version is different with open source one and the support engineer said that DeltaOptions is private API
this actually confused users a lot, I saw a public class and straightforwardly referred to it only to find that we cannot use it in the platform of the company who created Delta Lake
I would suggest documenting DeltaOptions as a public API and committed to backward compatibility or make it a delta private class so others will not fall into the same issue
The text was updated successfully, but these errors were encountered: