YOLOE: A Faster Model for Object Detection

Published on June 2, 2025

Technical Writer

YOLOE: A Faster Model for Object Detection

Object detection and segmentation are key parts of computer vision, used in everything from self-driving cars to medical image analysis. Popular models like the YOLO series are fast and accurate, but they can only recognize a fixed set of object categories. This makes them less useful in real-world scenarios where new or uncommon objects may appear. To fix this, recent research has focused on “open-set” models that can detect and label any object, even those not seen during training, using prompts like text or visual cues.

YOLOE, a powerful and efficient model that works like a human eye, recognizing any object across various prompt types: text-based prompts, visual hints, or even no prompts at all. It builds upon the strengths of YOLO models but is designed for more flexible use in the real world, all while keeping the speed and light weight that made YOLO famous.

How does YOLOE work?

Here’s how YOLOE works across the three prompt types:

Text Prompts (RepRTA Strategy)
For situations where you describe what you’re looking for (e.g., “find all bicycles”), YOLOE uses a strategy called Re-parameterizable Region-Text Alignment (RepRTA). It improves how the model connects text and images using a lightweight helper network. During inference, this helper network is folded into the main model, so there’s no extra cost or delay.
Visual Prompts (SAVPE Strategy)
If you provide an example region or visual cue, YOLOE uses the Semantic-Activated Visual Prompt Encoder (SAVPE). It splits the job into two branches—one for understanding the meaning (semantics) and another for activating relevant regions. This smart separation allows the model to stay accurate while keeping things simple and fast.
Prompt-Free (LRPC Strategy)
When no prompt is given, YOLOE uses Lazy Region-Prompt Contrast (LRPC). Instead of relying on large, slow language models, it matches detected objects with a built-in list of known categories. This allows it to perform well while saving on memory and computation.

YOLOE supports detection and segmentation across diverse open prompt types by using re-parameterizable region-text alignment for text, SAVPE for efficient visual prompt embedding, and lazy region-prompt contrast for prompt-free object categorization.

Getting Started with YOLOE: Zero-Shot Object Detection and Segmentation

Here is the code walkthrough to use YOLOE for your projects:

# Step 1: Clone the YOLOE Repository
git clone https://github.com/THU-MIG/yoloe.git
cd yoloe

# Step 2: Install Dependencies
pip install -r requirements.txt

# Step 3: Download Pretrained Models
# Visit https://github.com/THU-MIG/yoloe to download pretrained weights (e.g., YOLOE-v8-S.pth)
# Place them in the appropriate directory (e.g., yoloe/weights/)

# Step 4: Prepare Your Dataset
# Place your test images in a folder (e.g., ./data/images/)
# For zero-shot detection, make sure you have text prompts or class descriptions ready

# Step 5: Run Inference
python predict_text_prompt.py \
    --source ./data/images/  \
    --checkpoint pretrain/yoloe-v8l-seg.pt \
    --text_prompts "cat, dog, car, person" \
    --device cuda:0

# Step 6: Visualize Results
# Each image will show:
# - Bounding boxes
# - Segmentation masks

Conclusion

To conclude, we can say that YOLOE is another breakthrough model that combines the best of speed, flexibility, and simplicity. It works across all types of prompts—text, visual, or none—without the heavy cost of complex models. It’s a big step toward truly intelligent, real-time computer vision that adapts to whatever the world throws at it. Personally, I find YOLOE’s practical design and architecture not just impressive, but a promising shift toward practical, real-time AI that’s actually deployable in the applications.

Resources

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

Shaoni Mukherjee

Author

Technical Writer

See author profile

With a strong background in data science and over six years of experience, I am passionate about creating in-depth content on technologies. Currently focused on AI, machine learning, and GPU computing, working on topics ranging from deep learning frameworks to optimizing GPU-based workloads.

Category:

Tags: