Skip to main content

Faces and Identity

Face detection in Valossa AI produces human.face detections in the Core metadata. This page covers face identity matching, face grouping, bounding box coordinates, and the specialized frames_faces metadata.

Face Detection Basics

Each detected face in the video is represented as a human.face detection with:

FieldDescription
labelAlways "face"
occsTime segments when the face appears
a.genderDetected gender (value: "male"/"female", c: confidence)
a.s_visibleTotal screen time in seconds (actual frame-by-frame visibility, usually less than combined occurrence duration)
a.qualityEither "normal" or "low" (low means the face is not frontal enough or otherwise less reliable)
a.under_18_yearsMinor detection data: c_max (peak confidence), c_median (median confidence), and intervals (time segments where detection was triggered). Present when the face is estimated to be under 18 years old. Also flagged via categ.tags with "under_18_years".
a.similar_toArray of gallery face matches (if any)

When a detected face matches one or more faces in a face gallery (Valossa Celebrities Gallery or your custom gallery), the similar_to array contains match details:

{
"t": "human.face",
"label": "face",
"a": {
"gender": { "c": 0.929, "value": "female" },
"s_visible": 4.4,
"similar_to": [
{
"c": 0.928,
"name": "Jane Doe",
"gallery": { "id": "a3ead7b4-8e84-43ac-9e6b-d1727b05f189" },
"gallery_face": {
"id": "f6a728c6-5991-47da-9c17-b5302bfd0aff",
"name": "Jane Doe"
}
}
]
}
}
FieldDescription
cConfidence that this face is the named person (0.0 to 1.0)
namePerson name
gallery.idUUID of the face gallery
gallery_face.idUUID of the face identity within the gallery
gallery_face.nameName of the person in the gallery

Matches are sorted by confidence (highest first). A face may match multiple gallery faces with varying confidence levels.

note

Face occurrences do not have their own c_max confidence. Face confidence is only in the similar_to items.

Minor Detection (under_18_years)

When a face is estimated to be under 18 years old, the under_18_years attribute provides confidence scores and the time intervals where the detection was triggered:

{
"t": "human.face",
"label": "face",
"a": {
"gender": { "value": "male", "c": 0.5 },
"s_visible": 8.267,
"quality": "low",
"under_18_years": {
"c_max": 0.516,
"c_median": 0.475,
"intervals": [
{ "ss": 1834.333, "se": 1840.333, "c_max": 0.516 }
]
}
},
"categ": {
"tags": ["content_compliance", "under_18_years"]
}
}
FieldDescription
c_maxPeak confidence that the face is under 18 across all intervals
c_medianMedian confidence across all intervals
intervalsArray of time segments with ss (start), se (end), and c_max (peak confidence in that interval)

The detection is also flagged in categ.tags with both "content_compliance" and "under_18_years" tags.

Multiple Detections of the Same Person

The AI may create multiple human.face detections for the same person if the face appears sufficiently different across the video (different angles, lighting, etc.). Each of these detections can independently match the same gallery face via similar_to.

Merged Occurrences (similar_to_face_id)

To make it easy to find all appearances of a recognized person, the by_detection_property grouping merges occurrences across all face detections that match the same gallery face.

{
"detection_groupings": {
"by_detection_property": {
"human.face": {
"similar_to_face_id": {
"cb6f580b-fa3f-4ed4-94b6-ec88c6267143": {
"moccs": [
{ "ss": 5.0, "se": 10.0 },
{ "ss": 21.0, "se": 35.0 },
{ "ss": 64.0, "se": 88.0 }
],
"det_ids": ["3", "4"]
}
}
}
}
}
}
FieldDescription
Key (UUID)Gallery face ID
moccsMerged occurrences from all matching face detections
det_idsArray of detection IDs that matched this gallery face

Use det_ids to look up the original detections and read the person's name from similar_to.

Minor Grouping (refined_from_multiple_detection_types)

The by_detection_property grouping also includes a refined_from_multiple_detection_types section that aggregates minor detections across detection types:

{
"by_detection_property": {
"human.face": {
"similar_to_face_id": { ... }
},
"refined_from_multiple_detection_types": {
"people_under_18_years": {
"human.face": [
{
"det_id": "6",
"intervals": [
{ "ss": 3151.667, "se": 3152.667, "c_max": 0.961 },
{ "ss": 3155.667, "se": 3211.133, "c_max": 0.979 }
]
}
],
"visual.context": []
}
}
}
}
FieldDescription
people_under_18_yearsGroups detections where a person under 18 was identified
human.face / visual.contextArrays of detections per type that contributed to the minor detection
det_idDetection ID
intervalsTime segments with ss (start), se (end), and c_max (peak confidence)

This grouping allows you to find all under-18 detections in a single lookup, regardless of which detection type produced them.

Face Groups

human.face_group detections group faces that have high temporal correlation, meaning they likely appear together and may be interacting.

Per-Second Face Data

In the by_second structure, face detections include additional per-second attributes:

Face Size

{
"d": "1",
"o": ["1"],
"a": {
"sz": { "h": 0.188 }
}
}

The h field is the face height as a fraction of the video frame height (1.0 = full frame height). The value is measured at the first frame within that second where the face is detected.

Face Emotions

When face emotion analysis is enabled, the by_second data includes sentiment and emotion data. See Sentiment & Emotion for details.

Face Bounding Boxes (frames_faces Metadata)

For per-frame face coordinates, download the frames_faces metadata:

curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID&type=frames_faces"

Structure

The frames_faces metadata contains a faces_by_frame array indexed by frame number (0-based):

{
"version_info": { "metadata_type": "frames_faces", ... },
"faces_by_frame": [
[],
[],
[
{
"id": "1",
"x": 0.445,
"y": 0.194,
"w": 0.120,
"h": 0.214
}
],
[
{
"id": "1",
"x": 0.435,
"y": 0.196,
"w": 0.120,
"h": 0.215
},
{
"id": "5",
"x": 0.338,
"y": 0.239,
"w": 0.205,
"h": 0.399
}
]
]
}

Bounding Box Fields

FieldDescription
idDetection ID matching the human.face detection in Core metadata
xX offset of the upper-left corner (fraction of frame width, 0.0 to 1.0)
yY offset of the upper-left corner (fraction of frame height, 0.0 to 1.0)
wWidth of the bounding box (fraction of frame width)
hHeight of the bounding box (fraction of frame height)

All coordinate values are relative to the frame dimensions. Values may be slightly less than 0.0 or greater than 1.0 when a face is partially outside the frame.

Code Example: Reading Face Bounding Boxes

import json

with open("frames_faces.json", "r") as f:
faces_metadata = json.load(f)

with open("core_metadata.json", "r") as f:
core = json.load(f)

fps = core["media_info"]["technical"]["fps"]

for frame_idx, faces in enumerate(faces_metadata["faces_by_frame"]):
if faces:
time_s = frame_idx / fps
for face in faces:
det = core["detections"].get(face["id"], {})
name = "Unknown"
if "a" in det and "similar_to" in det["a"]:
name = det["a"]["similar_to"][0]["name"]
print(f"Frame {frame_idx} ({time_s:.2f}s): {name} at ({face['x']:.3f}, {face['y']:.3f}), size {face['w']:.3f}x{face['h']:.3f}")