Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Cache Schema instances for classes in a weak reference cache since creating an instance could be CPU intensive #23777

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

walkinggo
Copy link

@walkinggo walkinggo commented Dec 24, 2024

Main Issue: #23707

Motivation

Cache Schema instances for classes in a weak reference cache since creating an instance could be CPU intensive

Modifications

  • Added weak reference cache for schema

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Copy link

@walkinggo Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Dec 24, 2024
@walkinggo walkinggo changed the title Enhancement schema cache [Enhancement] Cache Schema instances for classes in a weak reference cache since creating an instance could be CPU intensive Dec 26, 2024
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation approach isn't correct. The cache should be an implementation details of the PulsarClientImplementationBindingImpl and not exposed externally.
The key in the cache could be the org.apache.pulsar.client.impl.schema.SchemaDefinitionImpl, however that would require adding hashCode&equals implementations to that class. The cache should have weak keys to prevent memory leaks.

Comment on lines +58 to +63
public Schema getAvroSchemaCache(Class clazz);
public Schema getProtobufSchemaCache(Class clazz);
public Schema getJsonSchemaCache(Class clazz);
public void setAvroSchemaCache(Class clazz,Schema schema);
public void setProtobufSchemaCache(Class clazz,Schema schema);
public void setJsonSchemaCache(Class clazz,Schema schema);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache should be encapsulated in the PulsarClientImplementationBindingImpl and not exposed outside of the interface.

@lhotari
Copy link
Member

lhotari commented Jan 3, 2025

There's another PR #23808 where I've provided suggestions about the minimal implementation: #23808 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants