-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with missing limbs #7
Comments
Could you please provide an example input image where the problem occurs? This will help us investigate further. In our testing, the model has generally performed well for most in-the-wild cases. |
Thank you for your detailed feedback and for sharing your observations! Currently, the automatic segmentation in our demo relies on rm_anime_bg (https://github.com/shirayu/rm_anime_bg), which works well for anime-style images but may not perform as effectively for other styles like 2.5D or real-world images. If the background isn’t fully removed, it can significantly impact the results. For better results, you could try using other background removal tools like Clipdrop before uploading the image. Our method does work for real human and other styles, as demonstrated in the paper. Additionally, our training data consists of full-body images, so inputting half-body photos (e.g., missing feet or legs) may lead to suboptimal results. You could experiment with tools like Clipdrop’s image uncrop feature to extend the image to a full-body format. We’ll also consider supporting more diverse inputs, such as half-body photos, as part of our future work. |
I find using Google's new multimodal flash model works well for background removal. It produces perfect results with a bit of prompting. (ask it what objects there are, tell it your goal, ask it to meet that) Issues: hallucinating which happens rarely and image degradation because of resolution and vae channels altering small details. It can also zoom out images to better match the remaining images but there's a significant hallucination percentage. |
I've noticed an issue with missing limbs. Most commonly, feet. Much of the images generally available are 3/4 type shots. Without the feet in the image, it results in a unusable posed result. I can think of a way to fix this: add feet via fast image generation, pixart is fast and light. Otherwise it's a issue with the model that processes the posing, and I don't have any idea how that really works. Accuracy for that step alone is not great and there's probably some easy improvement there.
The text was updated successfully, but these errors were encountered: