Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with missing limbs #7

Open
linkpharm opened this issue Mar 21, 2025 · 4 comments
Open

Issue with missing limbs #7

linkpharm opened this issue Mar 21, 2025 · 4 comments

Comments

@linkpharm
Copy link

I've noticed an issue with missing limbs. Most commonly, feet. Much of the images generally available are 3/4 type shots. Without the feet in the image, it results in a unusable posed result. I can think of a way to fix this: add feet via fast image generation, pixart is fast and light. Otherwise it's a issue with the model that processes the posing, and I don't have any idea how that really works. Accuracy for that step alone is not great and there's probably some easy improvement there.

@hyz317
Copy link
Owner

hyz317 commented Mar 21, 2025

Could you please provide an example input image where the problem occurs? This will help us investigate further. In our testing, the model has generally performed well for most in-the-wild cases.

@linkpharm
Copy link
Author

Without:

Image

Image

Image

Image

With feet+legs:

Image

Image

It's a bit of a mixed bag. There's some inaccuracy caused by the missing legs, but not enough to explain the other sample "without" images' results. Is there a better way to use this?

@hyz317
Copy link
Owner

hyz317 commented Mar 24, 2025

Thank you for your detailed feedback and for sharing your observations!

Currently, the automatic segmentation in our demo relies on rm_anime_bg (https://github.com/shirayu/rm_anime_bg), which works well for anime-style images but may not perform as effectively for other styles like 2.5D or real-world images. If the background isn’t fully removed, it can significantly impact the results. For better results, you could try using other background removal tools like Clipdrop before uploading the image. Our method does work for real human and other styles, as demonstrated in the paper.

Additionally, our training data consists of full-body images, so inputting half-body photos (e.g., missing feet or legs) may lead to suboptimal results. You could experiment with tools like Clipdrop’s image uncrop feature to extend the image to a full-body format. We’ll also consider supporting more diverse inputs, such as half-body photos, as part of our future work.

@linkpharm
Copy link
Author

I find using Google's new multimodal flash model works well for background removal. It produces perfect results with a bit of prompting. (ask it what objects there are, tell it your goal, ask it to meet that) Issues: hallucinating which happens rarely and image degradation because of resolution and vae channels altering small details. It can also zoom out images to better match the remaining images but there's a significant hallucination percentage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants