Skip to content

feature: AI #1674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 237 commits into
base: main
Choose a base branch
from
Open

feature: AI #1674

wants to merge 237 commits into from

Conversation

YousefED
Copy link
Collaborator

@YousefED YousefED commented May 9, 2025

This PR adds AI functionality to BlockNote!

There's still some work required but don't expect major changes at this point

Preview @ https://blocknote-git-feature-ai-typecell.vercel.app/ai/playground

Review notes

  • Large PR, recommended to review in VS Code
  • I'd recommend first reviewing the parts outside of the AI packages (core, react, etc). We could even merge those as part of a separate PR
  • I left quite some comments, but please leave feedback if things are unclear - we want the code to be in a state you feel comfortable working on future iterations

TODOs

  • Manual Testing (N+M)
  • test / fix dark mode (M)
  • Review AI Playground example (Y)
  • Directory structure xl-ai (Y)
  • Iterate on API and create docs (Y)
  • fix CI (N)
  • Review (N+M)
  • apply review feedback (Y)
  • Help with commented TODOs (M)
  • Error handling / Loading state (M+Y)
  • Undo / Redo support (N+Y)
  • Test in collaboration mode and / or integrate forking PR (N)
  • UX when adding new commands after existing (Y)

After v1:

* - replace the text with the first character of the replacement (if any) (1 transaction per ReplaceStep)
* - insert the replacement character by character (strlen-1 transactions per ReplaceStep)
*/
export function getStepsAsAgent(doc: Node, pmSchema: Schema, steps: Step[]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes more sense to operate on a whole transform, and give a transform back rather than an array of steps:

export function getStepsAsAgent(trToTransform: Transform) {
  const pmSchema = getPmSchema(trToTransform);
  const { modification } = pmSchema.marks;

  const agentSteps: AgentStep[] = [];

  const tr = new Transform(trToTransform.doc);

  for (const step of trToTransform.steps) {
    ...

return tr;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the input, but the current return type does have additional metadata, right? So we can't really return a transform?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, a transform wouldn't be enough for that.

referenceId = referenceId.slice(0, -1);
}

const block = editor.getBlock(referenceId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always pulling from the current editor state. I would probably have an in-memory version of the document & pull from that instead. Probably more of an issue for collab mode than anything

}
}

export function agentStepToTr(tr: Transaction, step: AgentStep) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyAgentSteps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually may also be better implemented as a prosemirror-command that can be editor.exec'd, because right now this can just throw if it cannot be applied, whereas what you'd want is for it to just not be applied at all and continue streaming

...Object.keys(oldNode.attrs),
]);
for (const attr of attrNames) {
if (newNode.attrs[attr] !== oldNode.attrs[attr]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attributes can be objects in prosemirror, though this works for blocknote attributes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I think prosemirror-suggest-changes also only supports string attrs fwiw

Comment on lines +166 to +168
// It might be cleaner to;
// a) make this optional
// b) actually delete / insert the content and let prosemirror-suggest-changes handle the marks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I would consider doing b, it feels like prosemirror-suggest-changes should be doing this transformation, though I can understand wanting the fine-grained control of how it displays visually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the initial prototype used the current approach, I only later hooked up prosemirror-suggest-changes. I tried to migrate to having that handle the insertions, but was running into a few edge-cases which made it more cumbersome than I initially hoped (ofc, might have just been a single mapping / positioning error). For now I think we can keep it as is, but good to leave the note here if we run into things that the "other approach" could address, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fine with that. If we experience a bug with it, I'd probably just throw it away and re-implement

Comment on lines +185 to +189
// note, instead of inserting one charachter at a time at the end (a, b, c)
// we're replacing the entire part every time (a, ab, abc)
// would be cleaner to do just insertions, but didn't get this to work with the add operation
// (and this kept code relatively simple)
const replacement = new Slice(step.slice.content.cut(0, i), 0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect how the ops are laid out in the undo/redo stack.
Here is where in prosemirror-history steps are merged together: https://github.com/ProseMirror/prosemirror-history/blob/master/src/history.ts#L237
This is a ReplaceStep's merge method: https://github.com/ProseMirror/prosemirror-transform/blob/137ff74738bd1b50d49416cd6cfdbbf52cb059ef/src/replace_step.ts#L48-L62
So, this can only merge ops which occur one after the other, not overlapping ranges like you've got here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss undo / redo tomorrow morning - let's look at the zoomed out issue first because I think the way undo / redo works in prosemirror might not be applicable to our use-case at all

Comment on lines +191 to +195
tr.replace(replaceFrom, replaceEnd, replacement).addMark(
replaceFrom,
replaceFrom + replacement.content.size,
pmSchema.mark("insertion", {}),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also affect how undo/redo stack works, since an addMark step cannot be merged with a replace step.

What I've done before for this, is to add the mark to the replacement content separately, so it doesn't need a separate addMark step

return true;
}
if (node.isBlock) {
tr.addNodeMark(pos, pmSchema.mark("insertion", {}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Comment on lines +20 to +21
* NOTE: we might be able to replace this code with a custom encoder
* (this didn't exist yet when this was written)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think the whole custom encoder thing is just for this. Probably okay for now, but could be simplified

Comment on lines +143 to +145
const tableCells = new Set(
[...tableCellsOld].filter((cell) => tableCellsNew.has(cell)),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const tableCells = tableCellsOld.intersection(tableCellsNew);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know if this consistently works or requires extra setup? I tried this but got an error at some point that intersection is not available

Comment on lines +149 to +155
encodeNodeStart: (node) => {
if (node.type.name === "tableCell") {
const str = JSON.stringify(node.toJSON());
if (tableCells.has(str)) {
return str;
}
return node.type.name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a comment that you are encoding it like this to "flatten" the changeset

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was to give two equal table cells in before / after a unique Encoding, to nudge prosemirror-changeset to keep them intact.

I think this encoder improved the table handling a bit, but I'm still running into issues like described here: ProseMirror/prosemirror-changeset#22

will require more research unless you have a good idea. At least, I'll add more documentation to the code here

Comment on lines +224 to +234
_operationsSource = createAsyncIterableStreamFromAsyncIterable(
preprocessOperationsStreaming(
filterNewOrUpdatedOperations(
streamOnStartCallback(
partialObjectStreamThrowError(ret.fullStream),
onStart,
),
),
streamTools,
),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to wonder whether it is easier to implement this using streams or as async iterables (like you've done here).

Streams are a somewhat awkward API, but they are pretty good at this sort of a thing with their ability to pipe through transforms. Just a thought

/**
* Validates an stream of operations and throws an error if an invalid operation is found.
*/
export async function* preprocessOperationsNonStreaming<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these not the same? streaming vs. not streaming?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with non-streaming, we assume all operations must be valid, and we throw an error if not.

with streaming, operations can be partial, so we just drop invalid operations

* promptbuilder poc

* rename

* other formats
];
}

export const defaultHTMLPromptBuilder: PromptBuilder = async (editor, opts) => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nperez0111 as discussed on slack:

I think it would be even nicer if you can provide a function that takes data similar to promptManipulateSelectionHTMLBlocks, but then we’d need to call getPromptData automatically and also make that optionally configurable. Now you just need to call the helpers yourself (similar to how defaultHTMLPromptBuilder does this)
I think if we generalize this further it might get quite complicated considering the difference between selection / non selection, and different formats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants