Current LoRA workflow

Most of my public LoRA's are here: https://civitai.com/user/chairfull

Model	Images Used	Note
Josh Brolin	19	Fewest images used.
Guy Pierce	32	Used CLIP instead of BLIP.
Jack Nicholson	52	Most downloaded [Male].
Kelly Brook	146	Captioned with CLIP Interrogator 2.1 at `best` setting. For most models I use BLIP.
Anne Hathaway	147	Most downloaded.
Maitland Ward	325	Most images used.

1) Training Data

Image quality = Model quality.

Image quantity = Model flexibility.

Image quality is a big part in how well a LoRA turns out, so try to find the highest quality images you can.

Many images I've used are over 2000x3000. Some >8000x5000. I only crop out other people and text. I don't resize.

High quality image != big image. A high quality is one where if you zoom in you see details like skin pores, eye flecks, fabric threads.

If you zoom in and it looks blurry, that image is someones crummy upscale. Using too many of those images in training will give the model a cartoon airbrush look.

Finding images

(Optional) Chrome extensions

Imagus: See full image by hovering it or a link, and hit Ctrl+S to save it.

Double Click Image Downloader: For quicker downloading.

UBlockOrigin: Nicest adblocker, imo.

Sources

Yandex This is my goto. Better than Google's image search. Allows easilly finding in different size.

Search your subject.
Sort images by largest

On the right is a size drop down, attempt to find the biggest.

Only do this if the largest is actually better quality. It may be an crummy upscale, or the link may not work.

You can also search for better quality images by dragging them into Yandex, to do a Similar image search.

For a person, attempt to find at least one of each:

Profile left + profile right.
3/4 left + 3/4 right.
Looking at camera.
Looking up + looking down.
(Bonus) Looking up + down at 3/4 left and right.
(Bonus) All these angles with multiple expressions (happy, neutral, angry)

Processing images

Dealing with duplicates

While looking for images I save as many as look decent. Sometimes coming across higher quality versions later. So I end up with duplicates.

To remove duplicates I use Geeqie Image viewer.

Open Geeqie and go to your folder of images.

Select all of them in the lower right panel.

Right Click and select Find duplicates.

Sort on Similarity, (low, med, high).

If it finds any, get rid of whichever ones seem lower quality, by Right Clicking and selecting Delete.

Bulk Cropping

Select the few images that need cropping and drag into bulkimagecrop.com.

While I try to remove other similar subjects (other males, if training on a male), and text, I don't try to center the subject. Having the subject dead center in every image could train the model to think you always want that.

I have the subject be on far left, far top, bottom right...

Once you've cropped and downloaded the images to zips, you can mass unzip with: unzip \*.zip

Zip

Zip the images: zip ./my_pics -r .

Upload

Upload the zip to your Google drive.
Right Click it in GDrive, select Share or Get link.
Toggle Make Public.
Click Copy link.

2) Kohya

I used: https://github.com/Linaqruf/kohya-trainer (Dreambooth method, top one.)

I use the Google colab version as my GPU sucks, but I assume it works the same if you run it on your pc.

I mostly BLIP to auto caption the images.
Recently I started upping the word count from 15-75 to 30-100, and the results have seemed a tinge better?

Leave pretty much all the settings values at their default, except:

For pre trained model download: Stable-Diffusion-v1-5.
For VAE model download: stablediffusion.vae.pt
Set pretrained_model_name_or_path to /content/pretrained_model/Stable-Diffusion-v1-5.safetensors
Set vae to /content/vae/stablediffusion.vae.pt
Set class_token to man.

Experiments

Random ideas I'm trying out:

Higher quality through tokens

Tokens, in the captions, are what you don't want trained as part of your model, with the exception of the class_token:
So for a man: a man, in a red hat, in a forest would only extract the man not the red hat or forest.
Theoretically, this should work for style and image quality, so for old images I might add: blurry, old image, scan, jpeg artifacts, low quality in hopes the model will pull a sharper image.

CLIP instead of BLIP

For this model I captioned 146 images with CLIP Interrogator 2.1 on the best setting.

It took a long time, and I don't know that it was worth it. Theoretically it should be more flexible than other models. Needs more testing.

Sentiment analyzer for better facial expressiveness

To get more expressiveness out of training data, I'm going to try a sentiment analyzer on a set of photos.

Maybe instead of a single subject, I will train a ton of random faces of emotions at different angles, and then caption each like

img1.png: a man on the beach, neutral_90 sad_20 fear_5 happy_3
img2.png: a woman at work, happy_40 neutral_20 sad_9

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md
geeqie_dup1.jpg		geeqie_dup1.jpg
geeqie_dup2.jpg		geeqie_dup2.jpg
yandex_large.jpg		yandex_large.jpg
yandex_largest.jpg		yandex_largest.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Current LoRA workflow

1) Training Data

Finding images

(Optional) Chrome extensions

Sources

Processing images

Dealing with duplicates

Bulk Cropping

Zip

Upload

2) Kohya

Experiments

Higher quality through tokens

CLIP instead of BLIP

Sentiment analyzer for better facial expressiveness

About

Releases

Packages

tsaost/my_lora_workflow

Folders and files

Latest commit

History

Repository files navigation

Current LoRA workflow

1) Training Data

Finding images

(Optional) Chrome extensions

Sources

Processing images

Dealing with duplicates

Bulk Cropping

Zip

Upload

2) Kohya

Experiments

Higher quality through tokens

CLIP instead of BLIP

Sentiment analyzer for better facial expressiveness

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages