Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.3.0 support #57

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Spark 3.3.0 support #57

wants to merge 1 commit into from

Conversation

chncaesar
Copy link
Contributor

Proposed changes

  1. Support Spark 3.3.0
    Removed log4j 1.x, and uses Spark's Logging trait, which uses log4j 2.x in Sprak 3.3.0. For older Spark versions , this change does not break the compability. Code changes are in ScalaValueReader.scala

  2. Close BufferedReader in DorisStreamLoad
    When reading Doris BE rest api's response, BufferedReader should be closed in DorisStreamLoad , function: loadBatch

  3. Change spark.minor.version to spark.major.version
    In pom.xml, the property spark.minor.version is actually spark major version.

  4. source jar to include scala code
    changes in pom.xml scala-maven-plugin

Issue Number: close #xxx

Problem Summary:

This pr upgrades the code to support Spark 3.3.0, as well as other minor changes.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
    No

  2. Has unit tests been added: (Yes/No/No Need)
    No unit test is added, but tested manually. in spark-sql CLI.

  3. Has document been added or modified: (Yes/No/No Need)

  4. Does it need to update dependencies: (Yes/No)

  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

Test results:

Versions:

  • spark-3.3.0-bin-hadoop3
  • JDK 1.8
  1. truncate Doris table from CLI
    image

  2. Create spark view and insert data into Doris table
    Start spark-sql CLI in local mode and execute:

CREATE TEMPORARY VIEW spark_doris
USING doris
OPTIONS(
  "table.identifier"="zjc_1.table_hash",
  "fenodes"="localhost:8030",
  "user"="zjc",
  "password"="******"
);
insert into spark_doris select 5,15.0;
  1. Check data in Doris
    image

  2. Select data in spark-sql
    select * from spark_doris;
    image

How to build spark-doris-connector for Spark 3.3.0

Run the command:
sh build.sh --spark 3.3.0 --scala 2.12

@hf200012
Copy link
Contributor

Please modify the spark.minor.version name in the build.sh script

- Close BufferedReader in DorisStreamLoad
- Change spark.minor.version to spark.major.version
- source jar to include scala code

- Support Spark 3.3.0
- Close BufferedReader in DorisStreamLoad
- Change spark.minor.version to spark.major.version
- source jar to include scala code
Copy link
Contributor

@hf200012 hf200012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JNSimba
Copy link
Member

JNSimba commented Feb 22, 2023

Hello, thank you for your contribution, can you resolve the conflict?

@bowenliang123
Copy link
Contributor

Hi, any further plans or progress on this PR?

And it seems not all the features listed are about introducing support to Spark 3.3 and they are good to be separated into several smaller PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants