Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Anthropic Citations #52

Open
db0sch opened this issue Mar 20, 2025 · 5 comments
Open

Support for Anthropic Citations #52

db0sch opened this issue Mar 20, 2025 · 5 comments
Labels
enhancement New feature or request

Comments

@db0sch
Copy link

db0sch commented Mar 20, 2025

Anthropic has introduced relatively recently the Citations feature, which forces the model to add references/footnotes, also called "Grounding" by other providers.

In addition to helping creating content with sources and footnotes, it also apparently helps the model to not hallucinate.

I'm sorry to be that person posting an issue to ask for a feature support :)
But I feel this would be a great add-on to RubyLLM.

That being said, it might have quite a bit impact on how RubyLLM handles the model response, as with citations enabled, the API returns the text sliced intro fragments (with some of them having citations, and some other without any), all this in a JSON payload.

Here is a response payload example (coming from Anthropic's doc):

{
    "content": [
        {
            "type": "text",
            "text": "According to the document, "
        },
        {
            "type": "text",
            "text": "the grass is green",
            "citations": [{
                "type": "char_location",
                "cited_text": "The grass is green.",
                "document_index": 0,
                "document_title": "Example Document",
                "start_char_index": 0,
                "end_char_index": 20
            }]
        },
        {
            "type": "text",
            "text": " and "
        },
        {
            "type": "text",
            "text": "the sky is blue",
            "citations": [{
                "type": "char_location",
                "cited_text": "The sky is blue.",
                "document_index": 0,
                "document_title": "Example Document",
                "start_char_index": 20,
                "end_char_index": 36
            }]
        }
    ]
}

Also, I'll be happy to discuss how to modelise these citations in the RubyLLM response object.

Source: https://docs.anthropic.com/en/docs/build-with-claude/citations

I'll try to dive into the source code later today to find out the first steps toward the citations support.

Also, let me know if you feel this is out of scope.

Thanks

@db0sch db0sch changed the title Support Anthropic Citations Support for Anthropic Citations Mar 20, 2025
@db0sch
Copy link
Author

db0sch commented Mar 20, 2025

After reading the source files, I realize that we are storing the response text "concatenated" (Message's content property) whereas, here, we would need to keep track of the fragments to keep the citations granularity.
For instance, I would mean having the array of objects in a raw_content property (in ActiveRecord, this means having a raw_content jsonb column).
This would move the responsability to the user to use this raw_content instead of the content to process these text fragments, add the [^1] at the end of the sentences (if they were to format the text as markdown) and fetch the citations to build the footnotes. And this would stay out of the scope of the gem.

That's just a first thought.

@db0sch
Copy link
Author

db0sch commented Mar 20, 2025

As for the API, we could probably add an option to the with parameters, such as:

chat.ask "Summarize this document", with: { pdf: "https://example.com/document.pdf", citations: true }

@crmne crmne added the enhancement New feature or request label Mar 23, 2025
@crmne
Copy link
Owner

crmne commented Mar 23, 2025

This would be a fantastic addition to RubyLLM. Feel free to open a PR!

I like with: { citations: true }. However, can it work also on a simple text prompt? in that case Chat.with_citations would also be welcome.

@db0sch
Copy link
Author

db0sch commented Mar 23, 2025

Thank you for your reply.
At the moment, we are storing only the processed answer of the LLM (plain text). Would you be open to store the raw_content which would be an array of {text: }?
This would let the developer decide how they process the text with citations.

I'll try to open a PR this week

@crmne
Copy link
Owner

crmne commented Mar 24, 2025

Hold up! First, look at content.rb and message.rb's normalize_content - we already support rich structured content (text, images, PDFs, audio). No need for a separate raw_content system.

But here's the thing: RubyLLM's whole philosophy is having the most elegant API in the business. We should look at how Anthropic, OpenAI and others handle citations, pick the best approach, and then wrap it in a beautiful Ruby API. Users should never have to deal with raw provider responses - that's our job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants