Skip to main content

Segmentation and Shots

The segmentations section of Valossa Core metadata contains automatically detected video segmentation data. Currently, shot boundary detection is the supported segmentation type.

Shot Boundary Detection

Shot boundaries divide the video into individual shots (continuous camera takes). The detected_shots array provides precise start and end times for each shot.

Structure

{
"segmentations": {
"detected_shots": [
{
"ss": 0.083,
"se": 5.214,
"fs": 0,
"fe": 122,
"sdur": 5.131
},
{
"ss": 5.214,
"se": 10.177,
"fs": 123,
"fe": 241,
"sdur": 4.963
}
]
}
}

Shot Fields

FieldTypeDescription
ssfloatStart time in seconds
sefloatEnd time in seconds
fsintegerStart frame number (0-based)
feintegerEnd frame number (0-based)
sdurfloatShot duration in seconds

Shots are ordered chronologically in the array. Frame numbers are 0-based (the first frame of the video is frame 0).

How Shots Relate to Occurrences

Detection occurrences reference shots via the shs (shot start) and she (shot end) fields. These are 0-based indexes into the detected_shots array.

This connection is useful for workflows that need to operate at the shot level. For example, when flagging inappropriate content, you might want to extract the entire shot containing the detection rather than just the detection's time segment.

Example: Get the Full Shot for an Occurrence

# Given an occurrence
occurrence = {"ss": 12.5, "se": 15.3, "shs": 3, "she": 3}

# Get the corresponding shot
shot = metadata["segmentations"]["detected_shots"][occurrence["shs"]]
print(f"Occurrence: {occurrence['ss']}s - {occurrence['se']}s")
print(f"Full shot: {shot['ss']}s - {shot['se']}s (duration: {shot['sdur']}s)")
print(f"Shot frames: {shot['fs']} - {shot['fe']}")

Use Cases

  • Video editing: Use shot boundaries to split a video into its constituent shots for editing workflows.
  • Content moderation: Flag entire shots containing sensitive content rather than precise time segments.
  • Highlight extraction: Select shots with high-confidence detections for automated highlight reels.
  • Scene analysis: Group consecutive shots with similar detections to identify logical scenes.

Code Example

Python: List All Shots

import json

with open("core_metadata.json", "r") as f:
metadata = json.load(f)

shots = metadata["segmentations"]["detected_shots"]
print(f"Total shots: {len(shots)}")

for i, shot in enumerate(shots):
print(
f"Shot {i}: {shot['ss']:.2f}s - {shot['se']:.2f}s "
f"(duration: {shot['sdur']:.2f}s, frames {shot['fs']}-{shot['fe']})"
)