Introduction to Valossa AI
Valossa AI is a suite of AI-powered video intelligence technologies that create deep scene metadata by analyzing speech, sounds, and visual content in video files.
Transcribe Pro Vision MAX gives you instant self-serve access to the Valossa API with a free trial — faces, speech, objects, emotions, moderation and more. Get your API key in minutes →
What Valossa AI Does
Valossa AI processes video files and produces structured, time-coded metadata in JSON format. The analysis engine examines both the visual and auditory content of your videos using a broad set of machine learning models and algorithms.
With Valossa AI, you can:
- Index and search video scenes by their content
- Generate transcripts with time-coded tags for key semantics
- Identify people including celebrities and custom-trained faces
- Detect objects, scenes, and actions in the visual content
- Monitor for sensitive content (sexual, violent, disturbing) for compliance
- Extract emotions and sentiment from faces and speech
- Classify videos using IAB Content Taxonomy categories for ad placement
- Detect audio events such as music, applause, and environmental sounds
- Recognize text via OCR in video frames
- Generate automatic video summaries and promotional clips
Metadata Formats
Valossa AI produces several metadata output formats, each serving a specific purpose:
| Metadata Type | Current Version | Description |
|---|---|---|
| Core | 1.8.1 | The primary metadata format containing all detections, detection groupings, segmentations, and time-coded data |
| frames_faces | 1.0.5 | Per-frame face bounding box coordinates for all detected faces |
| seconds_objects | 1.0.3 | Per-second object bounding box coordinates (e.g., logos) |
| frames_objects | 1.0.0 | Per-frame object bounding box coordinates |
| visual_captions | 1.2.0 | Visual scene description metadata |
The Core metadata is the default and most comprehensive format. The other formats provide specialized data (such as spatial coordinates) that would increase the Core file size significantly if included inline.
How to Access Valossa AI
There are four primary ways to use Valossa AI:
Valossa Assistant (Conversational AI)
Valossa Assistant is a conversational AI for video question answering — ask questions about your videos, search transcripts, extract clips, and generate reports using natural language. No code required. Learn more →
Valossa Portal (Web Interface)
The Valossa Portal provides a graphical interface for uploading videos, viewing analysis reports, searching within videos, and managing your account. It is the fastest way to start exploring Valossa AI capabilities.
Valossa Core API (Programmatic Access)
The Valossa Core API is a REST API that lets you integrate video analysis directly into your applications. Submit videos for analysis, poll for status, and download structured metadata results -- all via standard HTTPS requests with JSON responses.
Docker (On-Premises Deployment)
For organizations that require on-premises processing, Valossa AI can be deployed as a Docker-based installation running on your own infrastructure.
What You Can Build
Valossa AI metadata enables a wide range of applications:
- Content moderation pipelines -- Automatically flag videos containing violence, nudity, substance use, or other sensitive content before publication.
- Contextual advertising platforms -- Match ads to video content using IAB Content Taxonomy categories and Ad Score values.
- Video search engines -- Build searchable indexes across large video libraries using detections, transcripts, and topic tags.
- Automated captioning workflows -- Generate SRT/VTT subtitle files from speech-to-text analysis results.
- Media asset management -- Enrich video assets with structured metadata for cataloging and discovery.
- Sentiment and emotion dashboards -- Track facial expressions and speech sentiment over time for audience research.
- Highlight reel generators -- Use detection data and shot boundaries to automatically create video summaries.
- Brand monitoring tools -- Detect logos and brand mentions across visual and audio channels.
What Next?
- Quickstart Guide -- Analyze your first video in under 15 minutes
- Authentication -- Set up your API key
- Core Concepts -- Understand the data model
- API Reference -- Full endpoint documentation