Valossa Metadata Reader

Valossa Metadata Reader is an open-source Python command-line tool for parsing and exploring Valossa Core Metadata JSON files. It provides quick extraction of detections, categories, occurrences, summaries, and visualizations — without writing custom code.

Installation

The tool is available on GitHub:

git clone https://github.com/valossalabs/metadata-reader.git
cd metadata-reader
pip install --user .

Optional: Install matplotlib (version 1.5.3+) if you want to use the plot mode for generating charts:

pip install matplotlib

Verify the installation:

python -m metareader --help

Getting Started

To explore metadata with the tool, you need a Valossa Core Metadata JSON file. You can obtain one by running a video analysis job via the API:

# Download core metadata for a finished job
curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=YOUR_JOB_ID" \
  -o core_metadata.json

Or use the example file included in the repository:

python -m metareader list-detections metadata_example.json

This will print all detected concepts in the video — faces, objects, speech segments, topics, audio events, and more.

Modes of Operation

The Metadata Reader provides 7 modes for extracting different views of the metadata.

`list-detections`

Lists all detections without examining the temporal by_second structure. Shows detection type, label, confidence, and concept identifiers.

python -m metareader list-detections core_metadata.json
python -m metareader list-detections --type "visual.context.*" core_metadata.json
python -m metareader list-detections --format csv core_metadata.json

Useful for: Getting an overview of everything detected in a video — visual objects, audio events, faces, speech, topics.

`list-detections-by-second`

Examines the by_second temporal structure to show what is detected at each second of the video, including per-second confidence values and sentiment data.

python -m metareader list-detections-by-second core_metadata.json
python -m metareader list-detections-by-second --type "human.face" core_metadata.json
python -m metareader list-detections-by-second --start 30 --end 60 core_metadata.json

Useful for: Understanding the temporal distribution of detections, tracking face emotions over time, analyzing per-second activity.

`list-categories`

Lists detection category tags with duration metrics. Categories provide higher-level groupings of detections (e.g., content compliance tags, IAB topic categories).

python -m metareader list-categories core_metadata.json
python -m metareader list-categories --format csv core_metadata.json

Useful for: Reviewing content compliance flags, extracting IAB topic categories, understanding the high-level content profile of a video.

`list-occurrences`

Lists all time-coded occurrences for one or multiple detections, showing start/end times, shot indices, and confidence values.

python -m metareader list-occurrences core_metadata.json
python -m metareader list-occurrences --type "audio.speech" core_metadata.json
python -m metareader list-occurrences --person "John" core_metadata.json

Useful for: Finding when specific detections appear in the video, extracting temporal segments, building timelines.

`metadata-info`

Displays information about the metadata file itself: job details, media properties (duration, resolution, FPS), metadata version, and processing parameters.

python -m metareader metadata-info core_metadata.json

Useful for: Quick verification that you have the right file and understanding the technical properties of the analyzed video.

See also: Core Concepts, JSON Structure Reference

`summary`

Creates an aggregated summary based on total detection occurrence time. Shows each detection's screentime as a percentage of total video duration.

python -m metareader summary core_metadata.json
python -m metareader summary --type "visual.context.*" core_metadata.json
python -m metareader summary --format csv core_metadata.json

Useful for: Getting a quick overview of the most prominent content in a video, ranked by screentime. Useful for tagging and categorization workflows.

`plot`

Generates matplotlib visualizations: sentiment/valence graphs, intensity plots, and horizontal bar charts. Requires matplotlib.

# Facial sentiment over time
python -m metareader plot --sentiment core_metadata.json

# Detection frequency bar chart
python -m metareader plot --barh core_metadata.json

# Intensity graph
python -m metareader plot --intensity core_metadata.json

Useful for: Visualizing emotional arcs in video (face valence over time), comparing detection frequencies, creating charts for presentations or reports.

Output Formats

Most modes support multiple output formats via the --format flag:

Format	Flag	Description
Free text	`--format free` (default)	Human-readable, terminal-width-aware table format
CSV	`--format csv`	Comma-separated values for import into spreadsheets or data pipelines
JSON	`--format json`	Structured JSON output for programmatic consumption
SRT	`--format srt`	SubRip subtitle format with `hh:mm:ss,mmm` timestamps (applicable to speech/temporal modes)

Metadata Type Coverage

The Metadata Reader operates on Valossa Core Metadata (the default type=core output from job_results). Here's what each mode reads:

Mode	Data Source	Key Detection Types
`list-detections`	`detections` object	All types (visual.context, audio.context, human.face, audio.speech, topic.*, etc.)
`list-detections-by-second`	`detection_groupings.by_second`	All types with per-second entries
`list-categories`	Detection `categ` attribute	Detections with category tags
`list-occurrences`	Detection `occs` arrays	All types with occurrence data
`metadata-info`	Top-level job/media objects	N/A (file metadata)
`summary`	Aggregated from `occs`	All types with occurrence data
`plot`	`by_second`, `occs`, sentiment `a.sen`	`human.face` (sentiment), all types (bar charts)

What the Tool Does Not Cover

The Metadata Reader focuses on Core Metadata. The following require custom code or separate tools:

Data Type	Why Not Covered	Alternative
Visual scene descriptions (`visual_captions`)	Separate metadata file, not part of Core metadata	See Scene Descriptions Guide for parsing code
Face bounding boxes (`frames_faces`)	Spatial data requiring per-frame coordinate processing	See Faces & Identity Reference
Object bounding boxes (`seconds_objects`, `frames_objects`)	Spatial data	See Localized Objects Reference
Custom filtering logic	Business-specific rules beyond built-in filters	See Code Examples
Multi-file analysis	Tool processes one file at a time	Write custom scripts for batch processing

Planned Enhancements

Two new modes are planned for upcoming releases:

list-scene-descriptions — Read visual_captions JSON files and output time-coded natural language scene descriptions in text, CSV, and SRT formats. Optional --combine-speech flag to interleave scene descriptions with speech transcript from Core metadata.
list-face-expressions — Extract per-face expression timelines from Core metadata, showing valence and named facial expressions for each detected face as time-coded occurrences. Supports --face-id filtering and --identified-only for recognized faces.

Source Code as Reference

The Metadata Reader source code is a useful reference for understanding how to navigate the metadata JSON structure. The tool is organized into three modules:

Module	Purpose
`mdreader.py`	Data access layer — `MetadataReader` class with query methods for detections, occurrences, categories, sentiment, and summaries. Wraps `CoreMetadata` for efficient indexed access.
`mdprinter.py`	Output formatting — abstract `MetadataPrinter` base class with `MetadataCSVPrinter`, `MetadataFreePrinter`, `MetadataSubtitlePrinter`, and `MetadataJSONPrinter` implementations.
`mdplotter.py`	Visualization — `MetadataPlotter` class using matplotlib for sentiment graphs, intensity plots, and bar charts.

Review the source to learn:

How to iterate over detections using by_detection_type groupings
How to read face identity matches from the similar_to attribute
How to extract time-coded data from by_second
How to access occurrence data from occs arrays
How to read sentiment/valence from a.sen.val
How to extract named emotions from a.sen.emo

Guides:

Content Moderation — Detect and filter unsafe content
Video Tagging — Extract structured tags and labels
Face Recognition — Detect and identify faces
Speech-to-Text — Extract spoken word transcripts
IAB Ad Suitability — Content categorization for advertising
Emotion Analysis — Face valence, named emotions, voice emotion
Scene Descriptions — Natural language scene descriptions

Metadata Reference:

Metadata Overview — All metadata types
JSON Structure — Core metadata format
Detection Types — All 38 detection types
Code Examples — Custom parser examples

Installation​

Getting Started​

Modes of Operation​

list-detections​

list-detections-by-second​

list-categories​

list-occurrences​

metadata-info​

summary​

plot​

Output Formats​

Metadata Type Coverage​

What the Tool Does Not Cover​

Planned Enhancements​

Source Code as Reference​

Related Resources​