-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wip: ltx-video support #491
base: master
Are you sure you want to change the base?
Conversation
e6c6000
to
c5e01af
Compare
this is great, thank u so much, cant wait to test it when its done please support for quantized ltx models too and conversation too, there are fp8 in huggingface and fp16 gguf are in there, please add for cpu users too it will be great if u make it work to have im2vid also in ltx video playground in huggingface there is advanced options that we can make videos up to 11 seconds like 512x320 resolution with 257 frames, it will be great ifwe can make long videos here too |
Convertion/quantization should be working already. |
The 5D tensors in the VAE are a pain to deal with. I'm losing motivation... |
it is ok, this is hard task, we got unworking svd too, seems video is harder to impelent in sd cpp, thank u for ur hard work |
@stduhpf Which operators require more than 4 dimensional tensors? Can these tensors be transformed to less dimensions? Maybe with appropriate combination of |
@ggerganov Basically the whole VAE is made of 3D convolutions, so this means a 3x3x3 kernel for each input/output channel pair. Maybe there is a way to flatten it to use Conv2d instead, but I couldn't figure it out. |
Hm, indeed it's not obvious. I guess we will need to increase the |
It’s understandable that this task is challenging, and I appreciate everyone’s efforts so far. Based on the comments, the issue seems to stem from the lack of conv3d implementation in the GGML library. Although I’m not familiar with GGML, I noticed that conv2d is implemented using The same principle can be extended to conv3d using a 3D version of im2col. Here’s a high-level approach: Implementing conv3d:You can create an im2col_3d tensor and perform matrix multiplication for convolution, similar to the conv2d implementation. Below is sample (untested) code:
Implementing im2col_3d:Since GGML lacks im2col_3d, you can emulate it using a composition of two im2col operations:
As already stated GGML_MAX_DIMS should be increased to 5 to support 5D tensors. These are just starting points and will need testing and optimization. |
For now, the diffusion model seems to load in memory. the 128-D VAE is still completely unmplemented. Forward logic might be off.
TODO:
tensor 'model.diffusion_model.proj_out.weight' has wrong shape in model file: got [1, 1, 2048, 128], expected [2048, 128, 1, 1]
(diffusion model will hopefully load properly after that)