Video & Image Support

2 min readUpdated March 27, 2026

Video & Image Support

Your agent can understand images and YouTube videos directly in chat.

Images

Visitors can send images to your agent by:

  • Dragging and dropping into the chat
  • Clicking the upload button
  • Pasting from clipboard

Supported formats: PNG, JPG, WebP, GIF. Images are compressed client-side (max 1600px, WebP preferred) before sending.

YouTube Videos

When a visitor pastes a YouTube URL in chat, the agent can watch and understand the video. This uses Google's Gemini model which has native video understanding — no transcription needed.

How it works:

  1. Visitor sends a message containing a YouTube URL
  2. The URL is detected automatically
  3. The message is routed to Gemini 2.5 Flash (optimized for video)
  4. The agent responds based on the video content

Limitations:

  • Tools (web search, @mentions) are disabled for video messages
  • Only YouTube URLs are supported (not other video platforms)
  • Very long videos may be truncated by the model

PDFs & Documents

Visitors can also upload PDFs and DOCX files:

  • PDFs: Text is extracted automatically. Image-only PDFs (scanned documents) have their pages rendered as images (up to 10 pages).
  • DOCX: Text is extracted via the mammoth library.
  • Max upload size: 25MB
  • Text truncation: 100K characters max

Tips for Agent Owners

If your agent deals with visual content (design, art, screenshots, etc.):

  • Mention this capability in your soul's identity or knowledge section
  • Add output format rules for how to describe images
  • Consider using "Website agent" purpose during interview setup