Occurrences
Occurrences represent the time segments during which a detection is present in the video. They are stored in the occs array of each detection.
Occurrence Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique occurrence ID within this metadata file |
ss | float | Start second -- seconds from the beginning of the video |
se | float | End second -- seconds from the beginning of the video |
c_max | float | Maximum confidence during this occurrence (for visual.context and audio.context) |
c_med | float | Median confidence during this occurrence (for visual.context). Complements c_max by showing the typical confidence rather than the peak. |
c | float | Confidence for the entire occurrence (for topic.iab.section) |
shs | integer | Shot start -- 0-based index into the detected_shots array in segmentations |
she | integer | Shot end -- 0-based index into the detected_shots array |
fs | integer | Frame start -- 0-based frame number (for audio.speech_detailed and visual.text_region.keyword.compliance) |
fe | integer | Frame end -- 0-based frame number |
sdur | float | Segment duration in seconds (for audio.speech_detailed). Equals se - ss. |
a | object | Per-occurrence attributes (for topic.iab.section). See Type-specific occurrence fields below. |
scr | float | Score for the occurrence (for highlight detections). Ranges from 0.0 to 1.0. |
Example
A visual.context detection for "umbrella" detected twice in the video:
{
"t": "visual.context",
"label": "umbrella",
"cid": "abc123",
"occs": [
{
"id": "45",
"ss": 0.3,
"se": 3.6,
"c_max": 0.912,
"c_med": 0.87,
"shs": 0,
"she": 1
},
{
"id": "46",
"ss": 64.4,
"se": 68.2,
"c_max": 0.876,
"c_med": 0.801,
"shs": 22,
"she": 23
}
]
}
This means:
- First appearance: from 0.3s to 3.6s, with peak confidence 0.912 (median 0.87), spanning shots 0-1
- Second appearance: from 64.4s to 68.2s, with peak confidence 0.876 (median 0.801), spanning shots 22-23
Confidence Behavior
Confidence values differ by detection type:
visual.contextandaudio.context: Occurrences havec_max(the maximum confidence observed at any frame/second during the occurrence) andc_med(the median confidence). Since confidence varies from moment to moment, these summary statistics are reported in the occurrence. For per-second confidence, use thecfield in theby_secondstructure.topic.iab.section: Occurrences havec(a single confidence for the entire occurrence).human.face: Occurrences do not have a confidence field. Face confidence is only available in thesimilar_toitems (gallery match confidence).
Type-Specific Occurrence Fields
Some detection types include additional fields on their occurrences beyond the common set.
audio.speech_detailed
Word-level speech occurrences include frame-level timing and duration:
{
"id": "1064",
"ss": 7.262,
"se": 7.482,
"fs": 174,
"fe": 179,
"sdur": 0.22,
"shs": 5,
"she": 5
}
| Field | Description |
|---|---|
fs | Start frame number (0-based) |
fe | End frame number (0-based) |
sdur | Segment duration in seconds |
visual.text_region.keyword.compliance
OCR compliance keyword occurrences also include frame references:
{
"id": "902",
"ss": 9.009,
"se": 13.513,
"fs": 216,
"fe": 323,
"shs": 6,
"she": 7
}
topic.iab.section
IAB section occurrences include per-occurrence attributes indicating the source modalities and optional ad score:
{
"id": "1687",
"ss": 7.262,
"se": 31.642,
"c": 0.779,
"a": {
"sources": ["speech"],
"ad_score": 0.85
},
"shs": 5,
"she": 10
}
| Field | Description |
|---|---|
a.sources | Array of modalities that contributed to this classification. Values: "speech", "visual", "ocr" |
a.ad_score | Optional. Ad suitability score for this section (0.0 to 1.0) |
highlight
Highlight occurrences include a score indicating the significance of the segment:
{
"id": "12932",
"ss": 3.545,
"se": 43.377,
"shs": 3,
"she": 51,
"scr": 1.0
}
| Field | Description |
|---|---|
scr | Highlight score (0.0 to 1.0), indicating how significant or interesting the segment is |
Which Detections Have Occurrences?
Not all detection types include occurrences:
| Has Occurrences | Detection Types |
|---|---|
| Yes | visual.context, visual.object.localized, audio.context, audio.speech, audio.speech_detailed, audio.voice_emotion, audio.keyword.*, transcript.keyword.*, human.face, human.face_group, topic.iab.section, visual.text_region.*, highlight |
| No | topic.iab, topic.general, topic.genre, external.keyword.*, visual.color |
Detections without occurrences are either video-level (topics, external keywords) or have their temporal data represented differently (color is in by_second only).
Shot References
The shs and she fields connect occurrences to the shot boundary data in segmentations.detected_shots. This is useful for workflows that need to clip entire shots rather than just occurrence segments.
For example, when detecting inappropriate content, you might want to flag the entire shot containing the occurrence rather than just the specific seconds.
# Get the shot that contains the start of an occurrence
shot_index = occurrence["shs"]
shot = metadata["segmentations"]["detected_shots"][shot_index]
shot_start = shot["ss"]
shot_end = shot["se"]
Occurrences vs. by_second
The two structures answer different questions:
| Structure | Question Answered | Access Pattern |
|---|---|---|
occs (in a detection) | "When does this specific thing appear?" | Start from a detection, read its time segments |
by_second (in detection_groupings) | "What is happening at this specific second?" | Start from a time position, see all active detections |
When to Use Occurrences
Use occurrences when you need to:
- Find all time segments where a specific concept appears
- Calculate total screen time for a detection
- Generate clips containing a specific object or person
- Get the peak confidence for a detection's appearance
When to Use by_second
Use by_second when you need to:
- Build a timeline view of all detections
- Find what is happening at a specific moment in the video
- Read per-second confidence values
- Access rapidly changing data (face emotions, colors) that only makes sense on a per-second basis
Code Example
Python: List All Occurrences of a Detection
import json
with open("core_metadata.json", "r") as f:
metadata = json.load(f)
# Find all occurrences of visual.context detections
visual_ids = metadata["detection_groupings"]["by_detection_type"].get("visual.context", [])
for det_id in visual_ids[:5]: # First 5 detections
detection = metadata["detections"][det_id]
print(f"\n{detection['label']}:")
if "occs" in detection:
for occ in detection["occs"]:
print(f" {occ['ss']:.1f}s - {occ['se']:.1f}s (confidence: {occ.get('c_max', 'N/A')})")
Python: What is Happening at Second 45?
second = 45
if second < len(metadata["detection_groupings"]["by_second"]):
detections_at_second = metadata["detection_groupings"]["by_second"][second]
for item in detections_at_second:
det = metadata["detections"][item["d"]]
confidence = item.get("c", "N/A")
print(f" {det['t']}: {det['label']} (confidence: {confidence})")