Skip to main content

Localized Objects

In addition to face bounding boxes, Valossa AI provides spatial coordinates for certain types of visual detections through the visual.object.localized detection type. Currently, this feature is available for logo detection, with potential for custom detection models.

How It Works

Localized objects appear in two places:

  1. Core metadata: The visual.object.localized detection (with label, occurrences, etc.) appears in the detections structure and in by_second and by_detection_type groupings.
  2. seconds_objects metadata (or frames_objects metadata): Contains the per-second (or per-frame) bounding box coordinates for these detections.

The split design keeps the Core metadata file manageable in size while providing precise spatial data in a separate download.

Downloading Localized Object Coordinates

Seconds-Based (seconds_objects)

curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID&type=seconds_objects"

Frames-Based (frames_objects)

curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID&type=frames_objects"

seconds_objects Structure

The objects_by_second array is indexed by second number (0-based). Each second contains an array of detection items.

{
"version_info": { "metadata_type": "seconds_objects", ... },
"objects_by_second": [
[],
[],
[
{
"d": "137",
"o": ["314"],
"b": [
{
"x": 0.267,
"y": 0.747,
"w": 0.081,
"h": 0.066,
"c": 0.995
},
{
"x": 0.498,
"y": 0.669,
"w": 0.081,
"h": 0.069,
"c": 0.984
}
]
}
]
]
}

Detection Item Fields

FieldDescription
dDetection ID (references the visual.object.localized detection in Core metadata)
oArray of occurrence IDs overlapping with this second
bArray of bounding boxes for this detection in this second

Bounding Box Fields

FieldDescription
xX offset of the upper-left corner (fraction of frame width)
yY offset of the upper-left corner (fraction of frame height)
wWidth (fraction of frame width)
hHeight (fraction of frame height)
cConfidence of the detection at this bounding box location

All coordinate values are relative to the frame size (0.0 to 1.0). Values may be slightly outside this range when an object is partially off-screen.

Multiple Bounding Boxes

A single detection can have multiple bounding boxes in the same second. For example, if two instances of the same logo appear simultaneously in the frame, the b array will contain two bounding box objects.

In the Core metadata's by_second, the c confidence for a visual.object.localized detection is the highest confidence among all simultaneously observed bounding boxes. To see individual bounding box confidences, read the seconds_objects metadata.

Confidence in Core vs. seconds_objects

SourceConfidence Meaning
Core metadata by_secondMaximum confidence across all bounding boxes in that second
seconds_objects bounding box cConfidence for that specific bounding box instance

Code Example

Python: Extract Logo Positions

import json

with open("core_metadata.json", "r") as f:
core = json.load(f)

with open("seconds_objects.json", "r") as f:
objects = json.load(f)

for second_idx, second_data in enumerate(objects["objects_by_second"]):
for item in second_data:
det_id = item["d"]
detection = core["detections"].get(det_id, {})
label = detection.get("label", "Unknown")

for bbox in item["b"]:
print(
f"Second {second_idx}: {label} "
f"at ({bbox['x']:.3f}, {bbox['y']:.3f}), "
f"size {bbox['w']:.3f}x{bbox['h']:.3f}, "
f"confidence {bbox['c']:.3f}"
)