Occurrences

Occurrences represent the time segments during which a detection is present in the video. They are stored in the occs array of each detection.

Occurrence Fields

Field	Type	Description
`id`	string	Unique occurrence ID within this metadata file
`ss`	float	Start second -- seconds from the beginning of the video
`se`	float	End second -- seconds from the beginning of the video
`c_max`	float	Maximum confidence during this occurrence (for `visual.context` and `audio.context`)
`c_med`	float	Median confidence during this occurrence (for `visual.context`). Complements `c_max` by showing the typical confidence rather than the peak.
`c`	float	Confidence for the entire occurrence (for `topic.iab.section`)
`shs`	integer	Shot start -- 0-based index into the `detected_shots` array in `segmentations`
`she`	integer	Shot end -- 0-based index into the `detected_shots` array
`fs`	integer	Frame start -- 0-based frame number (for `audio.speech_detailed` and `visual.text_region.keyword.compliance`)
`fe`	integer	Frame end -- 0-based frame number
`sdur`	float	Segment duration in seconds (for `audio.speech_detailed`). Equals `se - ss`.
`a`	object	Per-occurrence attributes (for `topic.iab.section`). See Type-specific occurrence fields below.
`scr`	float	Score for the occurrence (for `highlight` detections). Ranges from 0.0 to 1.0.

Example

A visual.context detection for "umbrella" detected twice in the video:

{
  "t": "visual.context",
  "label": "umbrella",
  "cid": "abc123",
  "occs": [
    {
      "id": "45",
      "ss": 0.3,
      "se": 3.6,
      "c_max": 0.912,
      "c_med": 0.87,
      "shs": 0,
      "she": 1
    },
    {
      "id": "46",
      "ss": 64.4,
      "se": 68.2,
      "c_max": 0.876,
      "c_med": 0.801,
      "shs": 22,
      "she": 23
    }
  ]
}

This means:

First appearance: from 0.3s to 3.6s, with peak confidence 0.912 (median 0.87), spanning shots 0-1
Second appearance: from 64.4s to 68.2s, with peak confidence 0.876 (median 0.801), spanning shots 22-23

Confidence Behavior

Confidence values differ by detection type:

visual.context and audio.context: Occurrences have c_max (the maximum confidence observed at any frame/second during the occurrence) and c_med (the median confidence). Since confidence varies from moment to moment, these summary statistics are reported in the occurrence. For per-second confidence, use the c field in the by_second structure.
topic.iab.section: Occurrences have c (a single confidence for the entire occurrence).
human.face: Occurrences do not have a confidence field. Face confidence is only available in the similar_to items (gallery match confidence).

Type-Specific Occurrence Fields

Some detection types include additional fields on their occurrences beyond the common set.

audio.speech_detailed

Word-level speech occurrences include frame-level timing and duration:

{
  "id": "1064",
  "ss": 7.262,
  "se": 7.482,
  "fs": 174,
  "fe": 179,
  "sdur": 0.22,
  "shs": 5,
  "she": 5
}

Field	Description
`fs`	Start frame number (0-based)
`fe`	End frame number (0-based)
`sdur`	Segment duration in seconds

visual.text_region.keyword.compliance

OCR compliance keyword occurrences also include frame references:

{
  "id": "902",
  "ss": 9.009,
  "se": 13.513,
  "fs": 216,
  "fe": 323,
  "shs": 6,
  "she": 7
}

topic.iab.section

IAB section occurrences include per-occurrence attributes indicating the source modalities and optional ad score:

{
  "id": "1687",
  "ss": 7.262,
  "se": 31.642,
  "c": 0.779,
  "a": {
    "sources": ["speech"],
    "ad_score": 0.85
  },
  "shs": 5,
  "she": 10
}

Field	Description
`a.sources`	Array of modalities that contributed to this classification. Values: `"speech"`, `"visual"`, `"ocr"`
`a.ad_score`	Optional. Ad suitability score for this section (0.0 to 1.0)

highlight

Highlight occurrences include a score indicating the significance of the segment:

{
  "id": "12932",
  "ss": 3.545,
  "se": 43.377,
  "shs": 3,
  "she": 51,
  "scr": 1.0
}

Field	Description
`scr`	Highlight score (0.0 to 1.0), indicating how significant or interesting the segment is

Which Detections Have Occurrences?

Not all detection types include occurrences:

Has Occurrences	Detection Types
Yes	`visual.context`, `visual.object.localized`, `audio.context`, `audio.speech`, `audio.speech_detailed`, `audio.voice_emotion`, `audio.keyword.`, `transcript.keyword.`, `human.face`, `human.face_group`, `topic.iab.section`, `visual.text_region.*`, `highlight`
No	`topic.iab`, `topic.general`, `topic.genre`, `external.keyword.*`, `visual.color`

Detections without occurrences are either video-level (topics, external keywords) or have their temporal data represented differently (color is in by_second only).

Shot References

The shs and she fields connect occurrences to the shot boundary data in segmentations.detected_shots. This is useful for workflows that need to clip entire shots rather than just occurrence segments.

For example, when detecting inappropriate content, you might want to flag the entire shot containing the occurrence rather than just the specific seconds.

# Get the shot that contains the start of an occurrence
shot_index = occurrence["shs"]
shot = metadata["segmentations"]["detected_shots"][shot_index]
shot_start = shot["ss"]
shot_end = shot["se"]

Occurrences vs. by_second

The two structures answer different questions:

Structure	Question Answered	Access Pattern
`occs` (in a detection)	"When does this specific thing appear?"	Start from a detection, read its time segments
`by_second` (in detection_groupings)	"What is happening at this specific second?"	Start from a time position, see all active detections

When to Use Occurrences

Use occurrences when you need to:

Find all time segments where a specific concept appears
Calculate total screen time for a detection
Generate clips containing a specific object or person
Get the peak confidence for a detection's appearance

When to Use by_second

Use by_second when you need to:

Build a timeline view of all detections
Find what is happening at a specific moment in the video
Read per-second confidence values
Access rapidly changing data (face emotions, colors) that only makes sense on a per-second basis

Code Example

Python: List All Occurrences of a Detection

import json

with open("core_metadata.json", "r") as f:
    metadata = json.load(f)

# Find all occurrences of visual.context detections
visual_ids = metadata["detection_groupings"]["by_detection_type"].get("visual.context", [])

for det_id in visual_ids[:5]:  # First 5 detections
    detection = metadata["detections"][det_id]
    print(f"\n{detection['label']}:")
    if "occs" in detection:
        for occ in detection["occs"]:
            print(f"  {occ['ss']:.1f}s - {occ['se']:.1f}s (confidence: {occ.get('c_max', 'N/A')})")

Python: What is Happening at Second 45?

second = 45
if second < len(metadata["detection_groupings"]["by_second"]):
    detections_at_second = metadata["detection_groupings"]["by_second"][second]
    for item in detections_at_second:
        det = metadata["detections"][item["d"]]
        confidence = item.get("c", "N/A")
        print(f"  {det['t']}: {det['label']} (confidence: {confidence})")

Occurrence Fields​

Example​

Confidence Behavior​

Type-Specific Occurrence Fields​

audio.speech_detailed​

visual.text_region.keyword.compliance​

topic.iab.section​

highlight​

Which Detections Have Occurrences?​

Shot References​

Occurrences vs. by_second​

When to Use Occurrences​

When to Use by_second​

Code Example​

Python: List All Occurrences of a Detection​

Python: What is Happening at Second 45?​