Skip to main content

Content Moderation Guide

Available on Transcribe Pro Vision MAX

Content moderation (visual, audio, and speech compliance detection) is fully included in the Transcribe Pro Vision MAX free trial. Start free — no sales call →

This guide walks you through using Valossa AI to automatically detect inappropriate or sensitive content in videos. Content moderation covers visual, speech-based, and audio-based detections for use in brand safety, platform compliance, and age-rating workflows.

Quick Exploration with Metadata Reader

Before writing custom code, you can quickly inspect compliance detections using the Metadata Reader CLI tool:

# List all content compliance category tags
python -m metareader list-categories core_metadata.json

# List all detections and filter by compliance type
python -m metareader list-detections --type "audio.keyword.compliance" core_metadata.json

Overview

Valossa AI detects sensitive content through three channels:

ChannelWhat It DetectsKey Detection Types
VisualNudity, violence, weapons, substance use, etc.visual.context with content_compliance category tag
SpeechProfanity, references to violence/substancesaudio.keyword.compliance
Audio soundsGunshots, explosions, violent soundsaudio.context with content_compliance category tag

Step 1: Analyze Your Video

Submit the video for analysis:

import requests

response = requests.post(
"https://api-eu.valossa.com/core/1.0/new_job",
json={
"api_key": "YOUR_API_KEY",
"media": {
"video": {"url": "https://example.com/video.mp4"},
"language": "en-US"
}
}
)
job_id = response.json()["job_id"]

Wait for completion, then download the results:

import time

while True:
status = requests.get(
"https://api-eu.valossa.com/core/1.0/job_status",
params={"api_key": "YOUR_API_KEY", "job_id": job_id}
).json()
if status["status"] == "finished":
break
time.sleep(10)

metadata = requests.get(
"https://api-eu.valossa.com/core/1.0/job_results",
params={"api_key": "YOUR_API_KEY", "job_id": job_id}
).json()

Step 2: Find All Content Compliance Detections

The content_compliance category tag is present on all sensitive detections. Filter for it across all detection types:

compliance_issues = []

for det_id, detection in metadata["detections"].items():
if "categ" in detection:
tags = detection["categ"]["tags"]
if "content_compliance" in tags:
compliance_issues.append({
"id": det_id,
"type": detection["t"],
"label": detection["label"],
"tags": tags,
"occurrences": detection.get("occs", [])
})

print(f"Found {len(compliance_issues)} content compliance detections")

Step 3: Filter by Specific Sensitive Categories

For more granular control, filter by specific sensitive tags:

# Define which categories to flag
flagged_categories = {
"sexual",
"violence",
"act_of_violence",
"substance_use",
"gun_weapon",
"explicit_content",
"bad_language"
}

severe_issues = []
for issue in compliance_issues:
matched_tags = set(issue["tags"]) & flagged_categories
if matched_tags:
severe_issues.append({
**issue,
"matched_categories": list(matched_tags)
})

for issue in severe_issues:
print(f"[{issue['type']}] {issue['label']}")
print(f" Categories: {', '.join(issue['matched_categories'])}")
for occ in issue["occurrences"]:
print(f" Time: {occ['ss']:.1f}s - {occ['se']:.1f}s")

Step 4: Check Speech-Based Compliance

Speech compliance detections use a separate detection type:

speech_compliance_ids = metadata["detection_groupings"]["by_detection_type"].get(
"audio.keyword.compliance", []
)

for det_id in speech_compliance_ids:
detection = metadata["detections"][det_id]
print(f"Speech compliance: '{detection['label']}'")
for occ in detection.get("occs", []):
print(f" Time: {occ['ss']:.1f}s - {occ['se']:.1f}s")

Step 5: Check Audio Context for Sensitive Sounds

Some audio events (explosions, gunshots) may be flagged:

audio_ids = metadata["detection_groupings"]["by_detection_type"].get("audio.context", [])

for det_id in audio_ids:
detection = metadata["detections"][det_id]
if "categ" in detection and "content_compliance" in detection["categ"]["tags"]:
print(f"Sensitive audio: '{detection['label']}'")
for occ in detection.get("occs", []):
print(f" Time: {occ['ss']:.1f}s - {occ['se']:.1f}s")

Step 6: Generate a Compliance Report

Combine all findings into a structured report:

import json

report = {
"job_id": metadata["job_info"]["job_id"],
"duration_s": metadata["media_info"]["technical"]["duration_s"],
"visual_flags": [],
"speech_flags": [],
"audio_flags": [],
"is_safe": True
}

for issue in compliance_issues:
entry = {
"label": issue["label"],
"categories": issue["tags"],
"time_segments": [
{"start": o["ss"], "end": o["se"]}
for o in issue["occurrences"]
]
}

if issue["type"] == "visual.context":
report["visual_flags"].append(entry)
elif issue["type"].startswith("audio.keyword"):
report["speech_flags"].append(entry)
elif issue["type"] == "audio.context":
report["audio_flags"].append(entry)

report["is_safe"] = False

print(json.dumps(report, indent=2))

Using Shot Boundaries for Clipping

When you need to extract the full shot containing a flagged detection (for human review or automated removal):

shots = metadata["segmentations"]["detected_shots"]

for issue in compliance_issues:
for occ in issue["occurrences"]:
shot_start_idx = occ.get("shs", 0)
shot_end_idx = occ.get("she", shot_start_idx)
shot_start = shots[shot_start_idx]
shot_end = shots[shot_end_idx]
print(
f"Flagged: '{issue['label']}' -> "
f"Full shot clip: {shot_start['ss']:.2f}s - {shot_end['se']:.2f}s"
)