Issue with missing limbs #7

linkpharm · 2025-03-21T04:05:58Z

I've noticed an issue with missing limbs. Most commonly, feet. Much of the images generally available are 3/4 type shots. Without the feet in the image, it results in a unusable posed result. I can think of a way to fix this: add feet via fast image generation, pixart is fast and light. Otherwise it's a issue with the model that processes the posing, and I don't have any idea how that really works. Accuracy for that step alone is not great and there's probably some easy improvement there.

hyz317 · 2025-03-21T05:32:53Z

Could you please provide an example input image where the problem occurs? This will help us investigate further. In our testing, the model has generally performed well for most in-the-wild cases.

linkpharm · 2025-03-23T18:04:53Z

Without:

With feet+legs:

It's a bit of a mixed bag. There's some inaccuracy caused by the missing legs, but not enough to explain the other sample "without" images' results. Is there a better way to use this?

hyz317 · 2025-03-24T03:38:05Z

Thank you for your detailed feedback and for sharing your observations!

Currently, the automatic segmentation in our demo relies on rm_anime_bg (https://github.com/shirayu/rm_anime_bg), which works well for anime-style images but may not perform as effectively for other styles like 2.5D or real-world images. If the background isn’t fully removed, it can significantly impact the results. For better results, you could try using other background removal tools like Clipdrop before uploading the image. Our method does work for real human and other styles, as demonstrated in the paper.

Additionally, our training data consists of full-body images, so inputting half-body photos (e.g., missing feet or legs) may lead to suboptimal results. You could experiment with tools like Clipdrop’s image uncrop feature to extend the image to a full-body format. We’ll also consider supporting more diverse inputs, such as half-body photos, as part of our future work.

linkpharm · 2025-03-24T06:12:33Z

I find using Google's new multimodal flash model works well for background removal. It produces perfect results with a bit of prompting. (ask it what objects there are, tell it your goal, ask it to meet that) Issues: hallucinating which happens rarely and image degradation because of resolution and vae channels altering small details. It can also zoom out images to better match the remaining images but there's a significant hallucination percentage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with missing limbs #7

Issue with missing limbs #7

linkpharm commented Mar 21, 2025

hyz317 commented Mar 21, 2025

linkpharm commented Mar 23, 2025

hyz317 commented Mar 24, 2025

linkpharm commented Mar 24, 2025

Issue with missing limbs #7

Issue with missing limbs #7

Comments

linkpharm commented Mar 21, 2025

hyz317 commented Mar 21, 2025

linkpharm commented Mar 23, 2025

hyz317 commented Mar 24, 2025

linkpharm commented Mar 24, 2025