Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I/O Tensors are dynamically allocated #39

Open
Victor-Jung opened this issue Feb 21, 2025 · 0 comments
Open

I/O Tensors are dynamically allocated #39

Victor-Jung opened this issue Feb 21, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@Victor-Jung
Copy link
Member

Victor-Jung commented Feb 21, 2025

I/O tensors are allocated in the InitNetwork function and never deallocated (hence, they basically have an infinite lifetime). I/O tensor dimensions are known at compile time, and we should allocate and deallocate them

For the first time, I/we should display them in the memory allocation visualization and make sure to raise an error if we go above the memory capacity limit. Then, we should perform static memory allocation for them.

This is especially an issue for Llama where the KV cache is considered as an input and then an output.

@Victor-Jung Victor-Jung added the bug Something isn't working label Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant