Metadata Overview
Valossa Metadata is the structured output of a video analysis job. It contains all detected concepts, time-coded occurrences, detection groupings, shot segmentations, and other information extracted from your video.
Metadata Types
Valossa Metadata is an umbrella term covering several distinct metadata types:
| Type | Description | Format |
|---|---|---|
| Core | The primary and most comprehensive metadata. Contains all detections, detection groupings, segmentations, and temporal data. | JSON |
| frames_faces | Per-frame face bounding box coordinates for all detected faces. | JSON |
| seconds_objects | Per-second object bounding box coordinates (currently for logos). | JSON |
| frames_objects | Per-frame object bounding box coordinates. | JSON |
| visual_captions | Visual scene description metadata. | JSON |
| speech_to_text_srt | Speech-to-text transcript in SRT subtitle format. | SRT text |
The Core metadata is always the starting point. It contains the detection IDs that link to the spatial data in the specialized metadata files (frames_faces, seconds_objects, etc.).
How to Download Metadata
Use the job_results API endpoint with the type parameter:
# Core metadata (default)
curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID"
# Face bounding boxes
curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID&type=frames_faces"
# Object bounding boxes (seconds-based)
curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID&type=seconds_objects"
# Speech-to-text SRT
curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID&type=speech_to_text_srt"
Metadata is also downloadable from the Valossa Portal results page.
Versioning
Each metadata type has its own version number in x.y.z format:
- x (major) -- Breaking changes to the structure
- y (minor) -- New fields or features that may require parser updates
- z (patch) -- Purely additive changes that will not break existing parsers
The version number is found in the version_info.metadata_format field of each metadata JSON file. Core metadata versioning is independent of frames_faces or seconds_objects versioning.
For a complete version history, see the Changelog.
Storage Recommendations
- Save metadata locally. Download and store the metadata JSON in your database or file system after each analysis job completes. Metadata may not be stored permanently on Valossa servers, and download count limits may be imposed in the future.
- File sizes vary from a few kilobytes to several megabytes depending on video length and the number of detections.
- JSON is compact. The API returns JSON without line breaks or indentation. For human-readable viewing, use a pretty-printing tool:
- Browser: JSONView plugin
- Command line:
cat metadata.json | python -m json.tool > pretty.json - Command line:
jq . metadata.json > pretty.json
What Questions Does Metadata Answer?
Valossa Core metadata addresses four fundamental questions:
- What does the video contain? -- Browse detections by type (visual objects, faces, audio events, speech, topics).
- When does X appear in the video? -- Read the
occs(occurrences) of a specific detection. - What is happening at time X? -- Read the
by_secondarray at the desired time index. - What is the video about overall? -- Read
topic.iab,topic.general, andtopic.genredetections.
Next Steps
- JSON Structure -- Detailed breakdown of the Core metadata format
- Detection Types -- All 36+ detection types explained
- Occurrences -- How temporal data is represented