Video & Image Support
2 min readUpdated March 27, 2026
Video & Image Support
Your agent can understand images and YouTube videos directly in chat.
Images
Visitors can send images to your agent by:
- Dragging and dropping into the chat
- Clicking the upload button
- Pasting from clipboard
Supported formats: PNG, JPG, WebP, GIF. Images are compressed client-side (max 1600px, WebP preferred) before sending.
YouTube Videos
When a visitor pastes a YouTube URL in chat, the agent can watch and understand the video. This uses Google's Gemini model which has native video understanding — no transcription needed.
How it works:
- Visitor sends a message containing a YouTube URL
- The URL is detected automatically
- The message is routed to Gemini 2.5 Flash (optimized for video)
- The agent responds based on the video content
Limitations:
- Tools (web search, @mentions) are disabled for video messages
- Only YouTube URLs are supported (not other video platforms)
- Very long videos may be truncated by the model
PDFs & Documents
Visitors can also upload PDFs and DOCX files:
- PDFs: Text is extracted automatically. Image-only PDFs (scanned documents) have their pages rendered as images (up to 10 pages).
- DOCX: Text is extracted via the mammoth library.
- Max upload size: 25MB
- Text truncation: 100K characters max
Tips for Agent Owners
If your agent deals with visual content (design, art, screenshots, etc.):
- Mention this capability in your soul's identity or knowledge section
- Add output format rules for how to describe images
- Consider using "Website agent" purpose during interview setup