Detection Types
Every detection in Valossa Metadata has a t field containing its detection type identifier. Detection types follow a hierarchical naming convention where the prefix indicates the source modality.
Detection Type Reference
Visual Detections (visual.*)
Detected from the visual content of video frames.
| Type | Description | Has occs | Has cid |
|---|---|---|---|
visual.context | Broad visual concept detection (objects, scenes, actions, explicit content) | Yes | Yes |
visual.object.localized | Visual objects with bounding box location data (currently logos). Requires seconds_objects or frames_objects metadata for coordinates. | Yes | Yes |
visual.color | Dominant colors per second. Color values are in the by_second structure as RGB hex strings. | Single occurrence | No |
visual.text_region.full_frame_analysis | OCR text detected from the full video frame | Yes | No |
visual.text_region.lower_third | OCR text detected from the lower third of frames (typical subtitle area) | Yes | No |
visual.text_region.middle_third | OCR text detected from the middle third of frames | Yes | No |
visual.text_region.upper_third | OCR text detected from the upper third of frames | Yes | No |
visual.text_region.keyword.compliance | Compliance-flagged keywords from OCR (e.g., profanity in visual text) | Yes | No |
Audio Detections (audio.*)
Detected from the audio track of the video.
| Type | Description | Has occs | Has cid |
|---|---|---|---|
audio.context | Audio event detection (music, applause, laughter, environmental sounds) | Yes | Yes |
audio.speech | Speech-to-text transcript segments (roughly corresponding to subtitle groupings) | Yes | No |
audio.speech_detailed | Individual words with precise timestamps, confidence scores, and speaker diarization | Yes | No |
audio.speech_detailed.stats | Statistics about the detailed speech analysis | No | No |
audio.speech_summary | AI-generated summary of the speech content | No | No |
audio.speech_summary.keyword | Keywords extracted from the speech summary | No | No |
audio.voice_emotion | Voice emotion data (valence and arousal from voice prosodics). Data is in by_second. | Yes | No |
audio.keyword.compliance | Compliance-flagged words from speech (profanity, substance references, etc.) | Yes | No |
audio.keyword.novelty_word | Noteworthy or distinguishing keywords from speech | Yes | No |
audio.keyword.name.person | Person names mentioned in speech | Yes | No |
audio.keyword.name.location | Location names mentioned in speech | Yes | No |
audio.keyword.name.organization | Organization names mentioned in speech | Yes | No |
audio.keyword.name.general | Other named entities mentioned in speech | Yes | No |
Human Detections (human.*)
Human-centered detections, currently face-related only.
| Type | Description | Has occs | Has cid |
|---|---|---|---|
human.face | Detected face with optional identity matching, gender, screen time. | Yes | No |
human.face_group | Group of faces with temporal correlation (likely interacting people) | Yes | No |
Transcript Detections (transcript.*)
Derived from a user-provided pre-existing SRT transcript (not from automatic speech-to-text).
| Type | Description | Has occs | Has cid |
|---|---|---|---|
transcript.keyword.compliance | Compliance-flagged words from the provided transcript | Yes | No |
transcript.keyword.novelty_word | Noteworthy keywords from the provided transcript | Yes | No |
transcript.keyword.name.person | Person names from the provided transcript | Yes | No |
transcript.keyword.name.location | Location names from the provided transcript | Yes | No |
transcript.keyword.name.organization | Organization names from the provided transcript | Yes | No |
transcript.keyword.name.general | Other named entities from the provided transcript | Yes | No |
Topic Detections (topic.*)
Video-level or section-level topic classifications.
| Type | Description | Has occs | Has cid |
|---|---|---|---|
topic.iab | IAB Content Taxonomy categories for the entire video | No | No |
topic.iab.section | IAB categories for time-based sub-sections of the video (with optional Ad Score) | Yes | No |
topic.general | Non-IAB topic categories for the entire video | No | No |
topic.genre | Genre classification of the video | No | No |
topic.iab.audio | (Deprecated) Audio-based IAB categories | No | No |
topic.iab.transcript | (Deprecated) Transcript-based IAB categories | No | No |
topic.iab.visual | (Deprecated) Visual-based IAB categories | No | No |
External Detections (external.*)
Derived from user-provided title and description text (submitted in the new_job request).
| Type | Description | Has occs | Has cid |
|---|---|---|---|
external.keyword.novelty_word | Noteworthy keywords from the video title/description | No | No |
external.keyword.name.person | Person names from the video title/description | No | No |
external.keyword.name.location | Location names from the video title/description | No | No |
external.keyword.name.organization | Organization names from the video title/description | No | No |
external.keyword.name.general | Other named entities from the video title/description | No | No |
Highlight Detections (highlight)
Automatically identified highlight segments of the video.
| Type | Description | Has occs | Has cid |
|---|---|---|---|
highlight | Highlight segments with a relevance score. Labels indicate the highlight category (e.g., "action"). | Yes | No |
Each occurrence includes a scr (score) field from 0.0 to 1.0, indicating how significant or interesting the segment is. See Occurrences for details.
Explicit Content (explicit_content.*) -- Deprecated
| Type | Description |
|---|---|
explicit_content.audio.offensive | (Deprecated) Offensive audio content. Use audio.keyword.compliance instead. |
explicit_content.transcript.offensive | (Deprecated) Offensive transcript content. Use transcript.keyword.compliance instead. |
Explicit visual content detections are now part of visual.context and are identified using category tags such as content_compliance, sexual, violence, etc.
Understanding Novelty Words
The term "novelty word" appears in several detection types (e.g., audio.keyword.novelty_word). A novelty word is a keyword or phrase detected as particularly relevant or distinguishing in the content. This established NLP term distinguishes these content-descriptive keywords from proper names (person, location, organization) which have their own detection types.
JSON Examples
visual.context Detection
{
"t": "visual.context",
"label": "hair",
"cid": "lC4vVLdd5huQ",
"ext_refs": {
"wikidata": { "id": "Q28472" },
"gkg": { "id": "/m/03q69" }
},
"categ": { "tags": ["human_features"] },
"occs": [
{ "id": "267", "ss": 60.227, "se": 66.191, "c_max": 0.804, "c_med": 0.75, "shs": 47, "she": 48 }
]
}
human.face Detection
{
"t": "human.face",
"label": "face",
"a": {
"gender": { "c": 0.929, "value": "female" },
"s_visible": 4.4,
"similar_to": [
{
"c": 0.928,
"name": "Jane Doe",
"gallery": { "id": "a3ead7b4-8e84-43ac-9e6b-d1727b05f189" },
"gallery_face": { "id": "f6a728c6-5991-47da-9c17-b5302bfd0aff", "name": "Jane Doe" }
}
]
},
"occs": [
{ "id": "123", "ss": 28.333, "se": 33.567, "shs": 8, "she": 9 }
]
}
audio.context Detection
{
"t": "audio.context",
"label": "exciting music",
"cid": "o7WLKO1GuL5r",
"ext_refs": {
"gkg": { "id": "/t/dd00035" }
},
"occs": [
{ "id": "8", "ss": 15.0, "se": 49.0, "shs": 14, "she": 29, "c_max": 0.979 }
]
}
audio.speech_detailed Detection
{
"t": "audio.speech_detailed",
"label": "stay",
"c": 0.59,
"a": { "s": { "id": "14" } },
"occs": [
{ "id": "341", "ss": 44.32, "se": 44.4, "fs": 1064, "fe": 1066, "sdur": 0.08, "shs": 28, "she": 28 }
]
}
The a.s.id field contains the speaker ID for diarization purposes. Occurrences include fs/fe (frame start/end) and sdur (segment duration) for frame-accurate word timing.
topic.iab Detection
{
"t": "topic.iab",
"label": "Personal Finance",
"ext_refs": {
"iab": {
"labels_hierarchy": ["Personal Finance"],
"id": "IAB13"
}
}
}
Keyword Detection
{
"t": "transcript.keyword.name.location",
"label": "Chillsbury Hills",
"occs": [
{ "ss": 109.075, "se": 110.975, "id": "460" }
]
}