Usage¶
GUI app¶
A simple graphical user interface for the command line ezsam
tool can be started via:
ezsam-gui
Note
The gui can only process a single image or video file at a time, and the output is written to <current_directory>/<input_filename>.out.<output_extension>
Options¶
The command-line app ezsam
contains more options than the gui:
ezsam --help
Examples¶
The example images are sourced from rembg for easy comparison.
Simple image extraction¶
Process images extracting foreground specified by prompt to examples/animal*.out.png
.
ezsam examples/animal*.jpg -p animal -o examples
Note
For image extractions, which require adding an alpha channel, the output image format is always png
.
Video filtering¶
Video files are handled automatically.
ezsam car.mkv -p car
Warning
In order to output most of the allowed video formats, FFmpeg needs to be installed and on your $PATH
. For GIF output, ImageMagick needs to be installed, with the convert
command available. See Installation.
Multiple subjects¶
Multiple objects can be selected as the foreground. The output image ./car-1.out.png
contains the car and the person.
ezsam examples/car-1.jpg -p car, person
Debug mode¶
Use debug mode to fine tune or troubleshoot prompts. This writes output with foreground mask and object detections
annotated over the original image file. Here we write out to test/car-3.debug.jpg
.
ezsam examples/car-3.jpg -p white car -o test -s .debug --debug
Note
Note the original image format jpg
is preserved in debug mode!
Object detection box tuning¶
The object detection box threshold parameter can be used to fine tune objects for selection.
ezsam examples/car-3.jpg -p white car -o test --bmin 0.45
Or...
ezsam examples/food.mp4 -p turkey -o examples -s .turkey --hq -m vit_h --keep --bmin 0.46
Complex prompts¶
Writing prompts with specificity can also help.
ezsam examples/anime-girl-2.jpg -o examples -s .debug -p girl, phone, bag, railway crossing sign post --debug
Note
When the GroundingDINO object detection model can't map your input prompt onto any classes for a detection box with confidence, in debug mode the generated label for that box will be "Error" instead.
Negative prompting¶
Negative (inverse) prompt selections can be used to remove specific objects from selection.
ezsam examples/anime-girl-2.jpg -o examples -s .out -p train -n window
Models¶
The tool uses GroundingDINO for object detection.
To perform image segmentation, you can pick SAM or SAM-HQ:
For the best results use the biggest model your GPU has memory for. ViT = Vision Transformer, the model type. From best/slowest to worst/fastest: ViT-h(uge) > ViT-l(arge) > ViT-b(ase) > ViT-tiny.
Note
ViT-tiny is for SAM-HQ only, you must use the --hq
flag.
Troubleshooting¶
GPU memory¶
If you always get an error stating "CUDA out of memory", try using a smaller Segment Anything model (vit_tiny, vit_b) or lower resolution (or less) input.
If you only get a CUDA OOM error occasionally, or after a while, try to free up some memory by closing processes using the GPU:
# List commands using nvidia gpu
fuser -v /dev/nvidia*
You can also try manually getting the GPU to clear some processes:
# Clears all processes accounted so far
sudo nvidia-smi -caa
If you are using multiple GPUs, and so the GPU you're running CUDA on isn't driving your displays, you can also reset the GPU using:
# Trigger reset of one or more GPUs
sudo nvidia-smi -r
Note
nvidia-smi is in the nvidia-utils package of NVIDIA's CUDA repo for Ubuntu.
GUI¶
Job failures¶
On certain job failures the gui might not detect the job as ended, keeping the cursor spinning and preventing another run from being queued. A workaround is to just restart.
Slow to load¶
The one-file build takes a couple seconds to extract itself and start up, see here.