Faces and Identity
Face detection in Valossa AI produces human.face detections in the Core metadata. This page covers face identity matching, face grouping, bounding box coordinates, and the specialized frames_faces metadata.
Face Detection Basics
Each detected face in the video is represented as a human.face detection with:
| Field | Description |
|---|---|
label | Always "face" |
occs | Time segments when the face appears |
a.gender | Detected gender (value: "male"/"female", c: confidence) |
a.s_visible | Total screen time in seconds (actual frame-by-frame visibility, usually less than combined occurrence duration) |
a.quality | Either "normal" or "low" (low means the face is not frontal enough or otherwise less reliable) |
a.under_18_years | Minor detection data: c_max (peak confidence), c_median (median confidence), and intervals (time segments where detection was triggered). Present when the face is estimated to be under 18 years old. Also flagged via categ.tags with "under_18_years". |
a.similar_to | Array of gallery face matches (if any) |
Gallery Matching (similar_to)
When a detected face matches one or more faces in a face gallery (Valossa Celebrities Gallery or your custom gallery), the similar_to array contains match details:
{
"t": "human.face",
"label": "face",
"a": {
"gender": { "c": 0.929, "value": "female" },
"s_visible": 4.4,
"similar_to": [
{
"c": 0.928,
"name": "Jane Doe",
"gallery": { "id": "a3ead7b4-8e84-43ac-9e6b-d1727b05f189" },
"gallery_face": {
"id": "f6a728c6-5991-47da-9c17-b5302bfd0aff",
"name": "Jane Doe"
}
}
]
}
}
| Field | Description |
|---|---|
c | Confidence that this face is the named person (0.0 to 1.0) |
name | Person name |
gallery.id | UUID of the face gallery |
gallery_face.id | UUID of the face identity within the gallery |
gallery_face.name | Name of the person in the gallery |
Matches are sorted by confidence (highest first). A face may match multiple gallery faces with varying confidence levels.
Face occurrences do not have their own c_max confidence. Face confidence is only in the similar_to items.
Minor Detection (under_18_years)
When a face is estimated to be under 18 years old, the under_18_years attribute provides confidence scores and the time intervals where the detection was triggered:
{
"t": "human.face",
"label": "face",
"a": {
"gender": { "value": "male", "c": 0.5 },
"s_visible": 8.267,
"quality": "low",
"under_18_years": {
"c_max": 0.516,
"c_median": 0.475,
"intervals": [
{ "ss": 1834.333, "se": 1840.333, "c_max": 0.516 }
]
}
},
"categ": {
"tags": ["content_compliance", "under_18_years"]
}
}
| Field | Description |
|---|---|
c_max | Peak confidence that the face is under 18 across all intervals |
c_median | Median confidence across all intervals |
intervals | Array of time segments with ss (start), se (end), and c_max (peak confidence in that interval) |
The detection is also flagged in categ.tags with both "content_compliance" and "under_18_years" tags.
Multiple Detections of the Same Person
The AI may create multiple human.face detections for the same person if the face appears sufficiently different across the video (different angles, lighting, etc.). Each of these detections can independently match the same gallery face via similar_to.
Merged Occurrences (similar_to_face_id)
To make it easy to find all appearances of a recognized person, the by_detection_property grouping merges occurrences across all face detections that match the same gallery face.
{
"detection_groupings": {
"by_detection_property": {
"human.face": {
"similar_to_face_id": {
"cb6f580b-fa3f-4ed4-94b6-ec88c6267143": {
"moccs": [
{ "ss": 5.0, "se": 10.0 },
{ "ss": 21.0, "se": 35.0 },
{ "ss": 64.0, "se": 88.0 }
],
"det_ids": ["3", "4"]
}
}
}
}
}
}
| Field | Description |
|---|---|
| Key (UUID) | Gallery face ID |
moccs | Merged occurrences from all matching face detections |
det_ids | Array of detection IDs that matched this gallery face |
Use det_ids to look up the original detections and read the person's name from similar_to.
Minor Grouping (refined_from_multiple_detection_types)
The by_detection_property grouping also includes a refined_from_multiple_detection_types section that aggregates minor detections across detection types:
{
"by_detection_property": {
"human.face": {
"similar_to_face_id": { ... }
},
"refined_from_multiple_detection_types": {
"people_under_18_years": {
"human.face": [
{
"det_id": "6",
"intervals": [
{ "ss": 3151.667, "se": 3152.667, "c_max": 0.961 },
{ "ss": 3155.667, "se": 3211.133, "c_max": 0.979 }
]
}
],
"visual.context": []
}
}
}
}
| Field | Description |
|---|---|
people_under_18_years | Groups detections where a person under 18 was identified |
human.face / visual.context | Arrays of detections per type that contributed to the minor detection |
det_id | Detection ID |
intervals | Time segments with ss (start), se (end), and c_max (peak confidence) |
This grouping allows you to find all under-18 detections in a single lookup, regardless of which detection type produced them.
Face Groups
human.face_group detections group faces that have high temporal correlation, meaning they likely appear together and may be interacting.
Per-Second Face Data
In the by_second structure, face detections include additional per-second attributes:
Face Size
{
"d": "1",
"o": ["1"],
"a": {
"sz": { "h": 0.188 }
}
}
The h field is the face height as a fraction of the video frame height (1.0 = full frame height). The value is measured at the first frame within that second where the face is detected.
Face Emotions
When face emotion analysis is enabled, the by_second data includes sentiment and emotion data. See Sentiment & Emotion for details.
Face Bounding Boxes (frames_faces Metadata)
For per-frame face coordinates, download the frames_faces metadata:
curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID&type=frames_faces"
Structure
The frames_faces metadata contains a faces_by_frame array indexed by frame number (0-based):
{
"version_info": { "metadata_type": "frames_faces", ... },
"faces_by_frame": [
[],
[],
[
{
"id": "1",
"x": 0.445,
"y": 0.194,
"w": 0.120,
"h": 0.214
}
],
[
{
"id": "1",
"x": 0.435,
"y": 0.196,
"w": 0.120,
"h": 0.215
},
{
"id": "5",
"x": 0.338,
"y": 0.239,
"w": 0.205,
"h": 0.399
}
]
]
}
Bounding Box Fields
| Field | Description |
|---|---|
id | Detection ID matching the human.face detection in Core metadata |
x | X offset of the upper-left corner (fraction of frame width, 0.0 to 1.0) |
y | Y offset of the upper-left corner (fraction of frame height, 0.0 to 1.0) |
w | Width of the bounding box (fraction of frame width) |
h | Height of the bounding box (fraction of frame height) |
All coordinate values are relative to the frame dimensions. Values may be slightly less than 0.0 or greater than 1.0 when a face is partially outside the frame.
Code Example: Reading Face Bounding Boxes
import json
with open("frames_faces.json", "r") as f:
faces_metadata = json.load(f)
with open("core_metadata.json", "r") as f:
core = json.load(f)
fps = core["media_info"]["technical"]["fps"]
for frame_idx, faces in enumerate(faces_metadata["faces_by_frame"]):
if faces:
time_s = frame_idx / fps
for face in faces:
det = core["detections"].get(face["id"], {})
name = "Unknown"
if "a" in det and "similar_to" in det["a"]:
name = det["a"]["similar_to"][0]["name"]
print(f"Frame {frame_idx} ({time_s:.2f}s): {name} at ({face['x']:.3f}, {face['y']:.3f}), size {face['w']:.3f}x{face['h']:.3f}")
Related Resources
- Face Training API -- Create custom face galleries
- Face Recognition Guide -- End-to-end face recognition workflow
- Sentiment & Emotion -- Face emotion and valence data