Localized Objects
In addition to face bounding boxes, Valossa AI provides spatial coordinates for certain types of visual detections through the visual.object.localized detection type. Currently, this feature is available for logo detection, with potential for custom detection models.
How It Works
Localized objects appear in two places:
- Core metadata: The
visual.object.localizeddetection (with label, occurrences, etc.) appears in thedetectionsstructure and inby_secondandby_detection_typegroupings. - seconds_objects metadata (or frames_objects metadata): Contains the per-second (or per-frame) bounding box coordinates for these detections.
The split design keeps the Core metadata file manageable in size while providing precise spatial data in a separate download.
Downloading Localized Object Coordinates
Seconds-Based (seconds_objects)
curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID&type=seconds_objects"
Frames-Based (frames_objects)
curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID&type=frames_objects"
seconds_objects Structure
The objects_by_second array is indexed by second number (0-based). Each second contains an array of detection items.
{
"version_info": { "metadata_type": "seconds_objects", ... },
"objects_by_second": [
[],
[],
[
{
"d": "137",
"o": ["314"],
"b": [
{
"x": 0.267,
"y": 0.747,
"w": 0.081,
"h": 0.066,
"c": 0.995
},
{
"x": 0.498,
"y": 0.669,
"w": 0.081,
"h": 0.069,
"c": 0.984
}
]
}
]
]
}
Detection Item Fields
| Field | Description |
|---|---|
d | Detection ID (references the visual.object.localized detection in Core metadata) |
o | Array of occurrence IDs overlapping with this second |
b | Array of bounding boxes for this detection in this second |
Bounding Box Fields
| Field | Description |
|---|---|
x | X offset of the upper-left corner (fraction of frame width) |
y | Y offset of the upper-left corner (fraction of frame height) |
w | Width (fraction of frame width) |
h | Height (fraction of frame height) |
c | Confidence of the detection at this bounding box location |
All coordinate values are relative to the frame size (0.0 to 1.0). Values may be slightly outside this range when an object is partially off-screen.
Multiple Bounding Boxes
A single detection can have multiple bounding boxes in the same second. For example, if two instances of the same logo appear simultaneously in the frame, the b array will contain two bounding box objects.
In the Core metadata's by_second, the c confidence for a visual.object.localized detection is the highest confidence among all simultaneously observed bounding boxes. To see individual bounding box confidences, read the seconds_objects metadata.
Confidence in Core vs. seconds_objects
| Source | Confidence Meaning |
|---|---|
Core metadata by_second | Maximum confidence across all bounding boxes in that second |
seconds_objects bounding box c | Confidence for that specific bounding box instance |
Code Example
Python: Extract Logo Positions
import json
with open("core_metadata.json", "r") as f:
core = json.load(f)
with open("seconds_objects.json", "r") as f:
objects = json.load(f)
for second_idx, second_data in enumerate(objects["objects_by_second"]):
for item in second_data:
det_id = item["d"]
detection = core["detections"].get(det_id, {})
label = detection.get("label", "Unknown")
for bbox in item["b"]:
print(
f"Second {second_idx}: {label} "
f"at ({bbox['x']:.3f}, {bbox['y']:.3f}), "
f"size {bbox['w']:.3f}x{bbox['h']:.3f}, "
f"confidence {bbox['c']:.3f}"
)
Related Resources
- Faces & Identity -- Face bounding boxes (frames_faces format)
- Detection Types -- Full list of detection types