Segmentation and Shots
The segmentations section of Valossa Core metadata contains automatically detected video segmentation data. Currently, shot boundary detection is the supported segmentation type.
Shot Boundary Detection
Shot boundaries divide the video into individual shots (continuous camera takes). The detected_shots array provides precise start and end times for each shot.
Structure
{
"segmentations": {
"detected_shots": [
{
"ss": 0.083,
"se": 5.214,
"fs": 0,
"fe": 122,
"sdur": 5.131
},
{
"ss": 5.214,
"se": 10.177,
"fs": 123,
"fe": 241,
"sdur": 4.963
}
]
}
}
Shot Fields
| Field | Type | Description |
|---|---|---|
ss | float | Start time in seconds |
se | float | End time in seconds |
fs | integer | Start frame number (0-based) |
fe | integer | End frame number (0-based) |
sdur | float | Shot duration in seconds |
Shots are ordered chronologically in the array. Frame numbers are 0-based (the first frame of the video is frame 0).
How Shots Relate to Occurrences
Detection occurrences reference shots via the shs (shot start) and she (shot end) fields. These are 0-based indexes into the detected_shots array.
This connection is useful for workflows that need to operate at the shot level. For example, when flagging inappropriate content, you might want to extract the entire shot containing the detection rather than just the detection's time segment.
Example: Get the Full Shot for an Occurrence
# Given an occurrence
occurrence = {"ss": 12.5, "se": 15.3, "shs": 3, "she": 3}
# Get the corresponding shot
shot = metadata["segmentations"]["detected_shots"][occurrence["shs"]]
print(f"Occurrence: {occurrence['ss']}s - {occurrence['se']}s")
print(f"Full shot: {shot['ss']}s - {shot['se']}s (duration: {shot['sdur']}s)")
print(f"Shot frames: {shot['fs']} - {shot['fe']}")
Use Cases
- Video editing: Use shot boundaries to split a video into its constituent shots for editing workflows.
- Content moderation: Flag entire shots containing sensitive content rather than precise time segments.
- Highlight extraction: Select shots with high-confidence detections for automated highlight reels.
- Scene analysis: Group consecutive shots with similar detections to identify logical scenes.
Code Example
Python: List All Shots
import json
with open("core_metadata.json", "r") as f:
metadata = json.load(f)
shots = metadata["segmentations"]["detected_shots"]
print(f"Total shots: {len(shots)}")
for i, shot in enumerate(shots):
print(
f"Shot {i}: {shot['ss']:.2f}s - {shot['se']:.2f}s "
f"(duration: {shot['sdur']:.2f}s, frames {shot['fs']}-{shot['fe']})"
)