Detection Types

Every detection in Valossa Metadata has a t field containing its detection type identifier. Detection types follow a hierarchical naming convention where the prefix indicates the source modality.

Detection Type Reference

Visual Detections (`visual.*`)

Detected from the visual content of video frames.

Type	Description	Has `occs`	Has `cid`
`visual.context`	Broad visual concept detection (objects, scenes, actions, explicit content)	Yes	Yes
`visual.object.localized`	Visual objects with bounding box location data (currently logos). Requires `seconds_objects` or `frames_objects` metadata for coordinates.	Yes	Yes
`visual.color`	Dominant colors per second. Color values are in the `by_second` structure as RGB hex strings.	Single occurrence	No
`visual.text_region.full_frame_analysis`	OCR text detected from the full video frame	Yes	No
`visual.text_region.lower_third`	OCR text detected from the lower third of frames (typical subtitle area)	Yes	No
`visual.text_region.middle_third`	OCR text detected from the middle third of frames	Yes	No
`visual.text_region.upper_third`	OCR text detected from the upper third of frames	Yes	No
`visual.text_region.keyword.compliance`	Compliance-flagged keywords from OCR (e.g., profanity in visual text)	Yes	No

Audio Detections (`audio.*`)

Detected from the audio track of the video.

Type	Description	Has `occs`	Has `cid`
`audio.context`	Audio event detection (music, applause, laughter, environmental sounds)	Yes	Yes
`audio.speech`	Speech-to-text transcript segments (roughly corresponding to subtitle groupings)	Yes	No
`audio.speech_detailed`	Individual words with precise timestamps, confidence scores, and speaker diarization	Yes	No
`audio.speech_detailed.stats`	Statistics about the detailed speech analysis	No	No
`audio.speech_summary`	AI-generated summary of the speech content	No	No
`audio.speech_summary.keyword`	Keywords extracted from the speech summary	No	No
`audio.voice_emotion`	Voice emotion data (valence and arousal from voice prosodics). Data is in `by_second`.	Yes	No
`audio.keyword.compliance`	Compliance-flagged words from speech (profanity, substance references, etc.)	Yes	No
`audio.keyword.novelty_word`	Noteworthy or distinguishing keywords from speech	Yes	No
`audio.keyword.name.person`	Person names mentioned in speech	Yes	No
`audio.keyword.name.location`	Location names mentioned in speech	Yes	No
`audio.keyword.name.organization`	Organization names mentioned in speech	Yes	No
`audio.keyword.name.general`	Other named entities mentioned in speech	Yes	No

Human Detections (`human.*`)

Human-centered detections, currently face-related only.

Type	Description	Has `occs`	Has `cid`
`human.face`	Detected face with optional identity matching, gender, screen time.	Yes	No
`human.face_group`	Group of faces with temporal correlation (likely interacting people)	Yes	No

Transcript Detections (`transcript.*`)

Derived from a user-provided pre-existing SRT transcript (not from automatic speech-to-text).

Type	Description	Has `occs`	Has `cid`
`transcript.keyword.compliance`	Compliance-flagged words from the provided transcript	Yes	No
`transcript.keyword.novelty_word`	Noteworthy keywords from the provided transcript	Yes	No
`transcript.keyword.name.person`	Person names from the provided transcript	Yes	No
`transcript.keyword.name.location`	Location names from the provided transcript	Yes	No
`transcript.keyword.name.organization`	Organization names from the provided transcript	Yes	No
`transcript.keyword.name.general`	Other named entities from the provided transcript	Yes	No

Topic Detections (`topic.*`)

Video-level or section-level topic classifications.

Type	Description	Has `occs`	Has `cid`
`topic.iab`	IAB Content Taxonomy categories for the entire video	No	No
`topic.iab.section`	IAB categories for time-based sub-sections of the video (with optional Ad Score)	Yes	No
`topic.general`	Non-IAB topic categories for the entire video	No	No
`topic.genre`	Genre classification of the video	No	No
`topic.iab.audio`	(Deprecated) Audio-based IAB categories	No	No
`topic.iab.transcript`	(Deprecated) Transcript-based IAB categories	No	No
`topic.iab.visual`	(Deprecated) Visual-based IAB categories	No	No

External Detections (`external.*`)

Derived from user-provided title and description text (submitted in the new_job request).

Type	Description	Has `occs`	Has `cid`
`external.keyword.novelty_word`	Noteworthy keywords from the video title/description	No	No
`external.keyword.name.person`	Person names from the video title/description	No	No
`external.keyword.name.location`	Location names from the video title/description	No	No
`external.keyword.name.organization`	Organization names from the video title/description	No	No
`external.keyword.name.general`	Other named entities from the video title/description	No	No

Highlight Detections (`highlight`)

Automatically identified highlight segments of the video.

Type	Description	Has `occs`	Has `cid`
`highlight`	Highlight segments with a relevance score. Labels indicate the highlight category (e.g., "action").	Yes	No

Each occurrence includes a scr (score) field from 0.0 to 1.0, indicating how significant or interesting the segment is. See Occurrences for details.

Explicit Content (`explicit_content.*`) -- Deprecated

Type	Description
`explicit_content.audio.offensive`	(Deprecated) Offensive audio content. Use `audio.keyword.compliance` instead.
`explicit_content.transcript.offensive`	(Deprecated) Offensive transcript content. Use `transcript.keyword.compliance` instead.

Explicit visual content detections are now part of visual.context and are identified using category tags such as content_compliance, sexual, violence, etc.

Understanding Novelty Words

The term "novelty word" appears in several detection types (e.g., audio.keyword.novelty_word). A novelty word is a keyword or phrase detected as particularly relevant or distinguishing in the content. This established NLP term distinguishes these content-descriptive keywords from proper names (person, location, organization) which have their own detection types.

JSON Examples

visual.context Detection

{
  "t": "visual.context",
  "label": "hair",
  "cid": "lC4vVLdd5huQ",
  "ext_refs": {
    "wikidata": { "id": "Q28472" },
    "gkg": { "id": "/m/03q69" }
  },
  "categ": { "tags": ["human_features"] },
  "occs": [
    { "id": "267", "ss": 60.227, "se": 66.191, "c_max": 0.804, "c_med": 0.75, "shs": 47, "she": 48 }
  ]
}

human.face Detection

{
  "t": "human.face",
  "label": "face",
  "a": {
    "gender": { "c": 0.929, "value": "female" },
    "s_visible": 4.4,
    "similar_to": [
      {
        "c": 0.928,
        "name": "Jane Doe",
        "gallery": { "id": "a3ead7b4-8e84-43ac-9e6b-d1727b05f189" },
        "gallery_face": { "id": "f6a728c6-5991-47da-9c17-b5302bfd0aff", "name": "Jane Doe" }
      }
    ]
  },
  "occs": [
    { "id": "123", "ss": 28.333, "se": 33.567, "shs": 8, "she": 9 }
  ]
}

audio.context Detection

{
  "t": "audio.context",
  "label": "exciting music",
  "cid": "o7WLKO1GuL5r",
  "ext_refs": {
    "gkg": { "id": "/t/dd00035" }
  },
  "occs": [
    { "id": "8", "ss": 15.0, "se": 49.0, "shs": 14, "she": 29, "c_max": 0.979 }
  ]
}

audio.speech_detailed Detection

{
  "t": "audio.speech_detailed",
  "label": "stay",
  "c": 0.59,
  "a": { "s": { "id": "14" } },
  "occs": [
    { "id": "341", "ss": 44.32, "se": 44.4, "fs": 1064, "fe": 1066, "sdur": 0.08, "shs": 28, "she": 28 }
  ]
}

The a.s.id field contains the speaker ID for diarization purposes. Occurrences include fs/fe (frame start/end) and sdur (segment duration) for frame-accurate word timing.

topic.iab Detection

{
  "t": "topic.iab",
  "label": "Personal Finance",
  "ext_refs": {
    "iab": {
      "labels_hierarchy": ["Personal Finance"],
      "id": "IAB13"
    }
  }
}

Keyword Detection

{
  "t": "transcript.keyword.name.location",
  "label": "Chillsbury Hills",
  "occs": [
    { "ss": 109.075, "se": 110.975, "id": "460" }
  ]
}

Detection Type Reference​

Visual Detections (visual.*)​

Audio Detections (audio.*)​

Human Detections (human.*)​

Transcript Detections (transcript.*)​

Topic Detections (topic.*)​

External Detections (external.*)​

Highlight Detections (highlight)​

Explicit Content (explicit_content.*) -- Deprecated​

Understanding Novelty Words​

JSON Examples​

visual.context Detection​

human.face Detection​

audio.context Detection​

audio.speech_detailed Detection​

topic.iab Detection​

Keyword Detection​