-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keras symbolic inputs/outputs do not implement __len__
#22
Comments
Hi, thanks! May I see the full script? |
Hey, I created a google colab so that you can check it, I included a cell where Nobuco is executed to produce a fixed size keras model (works fine) and then another cell where i tries to convert with dynamic size (crashes) NOTE: you need a premium account to run it because you need HIGH RAM machine (CPU only) |
Okay, this one is... complicated.
|
First of all, THANK YOU! I went ahead and made modifications to address your points in the same colab https://colab.research.google.com/drive/1vxSotUx_tfAl1gjEdsg542SKRlL7ypBV?usp=sharing
With the first two modifications I am happy to let you know the model now works dynamically just fine and moreover it produces the EXACT same result as the original pytorch couterpart - VICTORY! Now, I would like to ask you for your help with something that is probably more complex: key-value caching. This model accepts in fact a forth parameter, past_key_values. I see a few problems to even converting that aspect of the model:
Technically this means: Anyways I realize my explanation is not clear and also that if you don't have a background on HF transformers it would be difficult to follow. Perhaps the first pointer you can give me is: in the colab, right before the nobuco conversion, when we generate the inputs for conversion you'll see I call the pytorch model to create example past_key_values. If you inspect that variable you'll notice it's composed of tuples some of which have tensors. If you could tell me how to pass that variable as fourth parameter to the conversion (what shape to declare for it, that is, taking into account that one of the dimensions is dynamic) then that would be a phenomenal start. Thank you so much! Federico |
Tensorflow only accepts tensors as inputs, so None is not allowed. We can, however, represent empty past_key_values = []
for i in range(24):
k = torch.zeros(size=(1, 32, 0, 64))
v = torch.zeros(size=(1, 32, 0, 64))
past_key_values.append((k, v))
past_key_values = tuple(past_key_values) Also keep in mind that input padding is extremely important for Tensorflow, as changes in input sizes will trigger graph re-tracing. I put together an example for Zephyr here. I'm not an LLM guy, so it might be substandard, but works quite well overall. |
I'll try that. You make me wonder though: if changes in input shapes trigger retracing in TF (I'm assuming this is true as well for TFlite) then that would be problematic for LLMs inference speed - since you typically begin with a large input (the prompt), say, 24 token/words, but then during generation, you only feed back to the LLM the last generated token (since the context is kept precisely through the use of the past_key_values). So the main input (the first parameter passed to the model) goes from shape [1,24] on first iteration to [1,1] on all further iterations. Does this mean you would get a full retracing happening between iteration 1 and the other ones? Also, past_key_values grows linearly with each iteration...so you would be getting retrace every iteration? Maybe Tensorflow is just not adapted to work with LLMs? I find it difficult to believe they would let such an important framework die? |
@AlexanderLutsenko your example runs perfectly! thank you so much, everybody should know about this library! |
@federicoparra Awesome!
Well, not quite, as you would allocate buffers of the maximum allowed size in advance. BTW, TFLite does not support dynamic shapes, although these work sometimes. See PINTO0309/onnx2tf#543 for an example how it can be made to work and the drawbacks.
Surely feels like it. Pytorch already supports FlashAttention-v2 out of the box, while in Tensorflow we are stuck with the naive implementation.
Dunno, that'd be a shame. TFLite runs really well on the mobile and the web, so they've still got the upper hand there. |
Hi! your library is amazing, thank you so much!
I'm trying to convert this LLM https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b from pytorch to tensorflow, as usual the input is dynamic, when I run:
keras_model = nobuco.pytorch_to_keras(
model,
args=[padded_input], kwargs=None,
inputs_channel_order=ChannelOrder.TENSORFLOW,
outputs_channel_order=ChannelOrder.TENSORFLOW
)
the conversion works flawlessly and the resulting keras model produces the same result as the original model, except that the input size of the keras model is fixed to whatever the size of the padded_input was.
If instead I run the conversion like so:
keras_model = nobuco.pytorch_to_keras(
model,
args=[padded_input], kwargs=None,
input_shapes={padded_input: (1, None)},
trace_shape=True,
inputs_channel_order=ChannelOrder.TENSORFLOW,
outputs_channel_order=ChannelOrder.TENSORFLOW
)
then it crashes towards the end of the conversion with error:
TypeError: Keras symbolic inputs/outputs do not implement
__len__
. You may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model. This error will also get raised if you try asserting a symbolic input/output directly.Any pointers of what the problem might be?
The text was updated successfully, but these errors were encountered: