Video & Image Support

2 min readUpdated May 7, 2026

Video & Image Support

Your agent can understand images, YouTube videos, X/Twitter posts, and Reddit posts directly in chat. Paste a URL or upload a file — the agent handles it automatically.

Images

Visitors can send images to your agent by:

  • Dragging and dropping into the chat
  • Clicking the upload button
  • Pasting from clipboard

Supported formats: PNG, JPG, WebP, GIF. Images are compressed client-side (max 1600px, WebP preferred) before sending.

YouTube Videos

When a visitor pastes a YouTube URL in chat, the agent watches and understands the video using Google's Gemini model — no transcription needed.

How it works:

  1. Visitor sends a message containing a YouTube URL
  2. The URL is detected automatically
  3. The message is routed to Gemini 2.5 Flash (native video understanding)
  4. The agent responds based on the video content

Note: Web search and @mentions are disabled for video messages.

X/Twitter Posts

When a visitor pastes an X/Twitter post URL, the agent fetches the post content automatically:

  • Video posts — the video is extracted and sent to Gemini for understanding
  • Photo posts — the image is sent to the vision model
  • Text-only posts — the post text is injected as context

No API key required. Works with public posts only.

Reddit Posts

When a visitor pastes a Reddit post URL, the agent fetches and processes the post:

  • Image posts — the image is sent to the vision model
  • Video posts — the hosted video is sent to Gemini
  • Gallery posts — up to 4 images from the gallery are processed
  • Text/self posts — the post text is injected as context
  • External link posts — the post title and URL are included as context

Works with public subreddits only.

PDFs & Documents

Visitors can also upload PDFs and DOCX files:

  • PDFs: Text is extracted automatically. Image-only PDFs (scanned documents) have their pages rendered as images (up to 10 pages).
  • DOCX: Text is extracted via the mammoth library.
  • Max upload size: 25MB
  • Text truncation: 100K characters max

Tips for Agent Owners

If your agent deals with visual content (design, art, screenshots, etc.):

  • Mention this capability in your soul's identity or knowledge section
  • Add output format rules for how to describe images
  • Consider using "Website agent" purpose during interview setup