You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I/O tensors are allocated in the InitNetwork function and never deallocated (hence, they basically have an infinite lifetime). I/O tensor dimensions are known at compile time, and we should allocate and deallocate them
For the first time, I/we should display them in the memory allocation visualization and make sure to raise an error if we go above the memory capacity limit. Then, we should perform static memory allocation for them.
This is especially an issue for Llama where the KV cache is considered as an input and then an output.
The text was updated successfully, but these errors were encountered:
I/O tensors are allocated in the
InitNetwork
function and never deallocated (hence, they basically have an infinite lifetime). I/O tensor dimensions are known at compile time, and we should allocate and deallocate themFor the first time, I/we should display them in the memory allocation visualization and make sure to raise an error if we go above the memory capacity limit. Then, we should perform static memory allocation for them.
This is especially an issue for Llama where the KV cache is considered as an input and then an output.
The text was updated successfully, but these errors were encountered: