Usage¶

GUI app¶

A simple graphical user interface for the command line ezsam tool can be started via:

ezsam-gui

Note

The gui can only process a single image or video file at a time, and the output is written to <current_directory>/<input_filename>.out.<output_extension>

Options¶

The command-line app ezsam contains more options than the gui:

ezsam --help

Examples¶

The example images are sourced from rembg for easy comparison.

Simple image extraction¶

Process images extracting foreground specified by prompt to examples/animal*.out.png.

ezsam examples/animal*.jpg -p animal -o examples

Note

For image extractions, which require adding an alpha channel, the output image format is always png.

Video filtering¶

Video files are handled automatically.

ezsam car.mkv -p car

Warning

In order to output most of the allowed video formats, FFmpeg needs to be installed and on your $PATH. For GIF output, ImageMagick needs to be installed, with the convert command available. See Installation.

Multiple subjects¶

Multiple objects can be selected as the foreground. The output image ./car-1.out.png contains the car and the person.

ezsam examples/car-1.jpg -p car, person

Debug mode¶

Use debug mode to fine tune or troubleshoot prompts. This writes output with foreground mask and object detections annotated over the original image file. Here we write out to test/car-3.debug.jpg.

ezsam examples/car-3.jpg -p white car -o test -s .debug --debug

Note

Note the original image format jpg is preserved in debug mode!

Object detection box tuning¶

The object detection box threshold parameter can be used to fine tune objects for selection.

ezsam examples/car-3.jpg -p white car -o test --bmin 0.45

Or...

ezsam examples/food.mp4 -p turkey -o examples -s .turkey --hq -m vit_h --keep --bmin 0.46

Complex prompts¶

Writing prompts with specificity can also help.

ezsam examples/anime-girl-2.jpg -o examples -s .debug -p girl, phone, bag, railway crossing sign post --debug

Note

When the GroundingDINO object detection model can't map your input prompt onto any classes for a detection box with confidence, in debug mode the generated label for that box will be "Error" instead.

Negative prompting¶

Negative (inverse) prompt selections can be used to remove specific objects from selection.

ezsam examples/anime-girl-2.jpg -o examples -s .out -p train -n window

Models¶

The tool uses GroundingDINO for object detection.

To perform image segmentation, you can pick SAM or SAM-HQ:

For the best results use the biggest model your GPU has memory for. ViT = Vision Transformer, the model type. From best/slowest to worst/fastest: ViT-h(uge) > ViT-l(arge) > ViT-b(ase) > ViT-tiny.

Note

ViT-tiny is for SAM-HQ only, you must use the --hq flag.

Troubleshooting¶

GPU memory¶

If you always get an error stating "CUDA out of memory", try using a smaller Segment Anything model (vit_tiny, vit_b) or lower resolution (or less) input.

If you only get a CUDA OOM error occasionally, or after a while, try to free up some memory by closing processes using the GPU:

# List commands using nvidia gpu
fuser -v /dev/nvidia*

You can also try manually getting the GPU to clear some processes:

# Clears all processes accounted so far
sudo nvidia-smi -caa

If you are using multiple GPUs, and so the GPU you're running CUDA on isn't driving your displays, you can also reset the GPU using:

# Trigger reset of one or more GPUs
sudo nvidia-smi -r

Note

nvidia-smi is in the nvidia-utils package of NVIDIA's CUDA repo for Ubuntu.

GUI¶

Job failures¶

On certain job failures the gui might not detect the job as ended, keeping the cursor spinning and preventing another run from being queued. A workaround is to just restart.

Slow to load¶

The one-file build takes a couple seconds to extract itself and start up, see here.