Video & Image Support
Video & Image Support
Your agent can understand images, YouTube videos, X/Twitter posts, and Reddit posts directly in chat. Paste a URL or upload a file — the agent handles it automatically.
Images
Visitors can send images to your agent by:
- Dragging and dropping into the chat
- Clicking the upload button
- Pasting from clipboard
Supported formats: PNG, JPG, WebP, GIF. Images are compressed client-side (max 1600px, WebP preferred) before sending.
YouTube Videos
When a visitor pastes a YouTube URL in chat, the agent watches and understands the video using Google's Gemini model — no transcription needed.
How it works:
- Visitor sends a message containing a YouTube URL
- The URL is detected automatically
- The message is routed to Gemini 2.5 Flash (native video understanding)
- The agent responds based on the video content
Note: Web search and @mentions are disabled for video messages.
X/Twitter Posts
When a visitor pastes an X/Twitter post URL, the agent fetches the post content automatically:
- Video posts — the video is extracted and sent to Gemini for understanding
- Photo posts — the image is sent to the vision model
- Text-only posts — the post text is injected as context
No API key required. Works with public posts only.
Reddit Posts
When a visitor pastes a Reddit post URL, the agent fetches and processes the post:
- Image posts — the image is sent to the vision model
- Video posts — the hosted video is sent to Gemini
- Gallery posts — up to 4 images from the gallery are processed
- Text/self posts — the post text is injected as context
- External link posts — the post title and URL are included as context
Works with public subreddits only.
PDFs & Documents
Visitors can also upload PDFs and DOCX files:
- PDFs: Text is extracted automatically. Image-only PDFs (scanned documents) have their pages rendered as images (up to 10 pages).
- DOCX: Text is extracted via the mammoth library.
- Max upload size: 25MB
- Text truncation: 100K characters max
Tips for Agent Owners
If your agent deals with visual content (design, art, screenshots, etc.):
- Mention this capability in your soul's identity or knowledge section
- Add output format rules for how to describe images
- Consider using "Website agent" purpose during interview setup