Skip to main content

Face Recognition Guide

Available on Transcribe Pro Vision MAX

Face detection, celebrity recognition, and custom face gallery training are all included in the Transcribe Pro Vision MAX free trial. Start free — no sales call →

This guide covers detecting faces in videos, identifying known people, and training custom face galleries for your own face identities.

Quick Exploration with Metadata Reader

Before writing custom code, you can quickly inspect face detections using the Metadata Reader CLI tool:

# List face detections with identity matches
python -m metareader list-detections --type "human.face" core_metadata.json

# See when each face appears in the video
python -m metareader list-occurrences --type "human.face" core_metadata.json

# Generate facial sentiment charts (requires matplotlib)
python -m metareader plot --sentiment core_metadata.json

How Face Recognition Works

  1. Valossa AI detects all visible faces in the video and creates human.face detections.
  2. Each detected face is compared against face galleries (Valossa Celebrities Gallery and/or your custom galleries).
  3. Matches are reported in the similar_to field with confidence scores.
  4. Face grouping (human.face_group) identifies people who appear together.

Step 1: Analyze a Video

import requests
import time

response = requests.post(
"https://api-eu.valossa.com/core/1.0/new_job",
json={
"api_key": "YOUR_API_KEY",
"media": {
"video": {"url": "https://example.com/video.mp4"}
}
}
)
job_id = response.json()["job_id"]

# Wait for completion
while True:
status = requests.get(
"https://api-eu.valossa.com/core/1.0/job_status",
params={"api_key": "YOUR_API_KEY", "job_id": job_id}
).json()
if status["status"] == "finished":
break
time.sleep(10)

metadata = requests.get(
"https://api-eu.valossa.com/core/1.0/job_results",
params={"api_key": "YOUR_API_KEY", "job_id": job_id}
).json()

Step 2: List All Detected Faces

face_ids = metadata["detection_groupings"]["by_detection_type"].get("human.face", [])

print(f"Total faces detected: {len(face_ids)}")

for det_id in face_ids:
detection = metadata["detections"][det_id]
attrs = detection.get("a", {})

gender = attrs.get("gender", {})
screen_time = attrs.get("s_visible", 0)
quality = attrs.get("quality", "normal")

print(f"\nFace (ID: {det_id}):")
print(f" Gender: {gender.get('value', 'N/A')} (confidence: {gender.get('c', 0):.2f})")
print(f" Screen time: {screen_time:.1f}s")
print(f" Quality: {quality}")

# Check for identity matches
if "similar_to" in attrs:
for match in attrs["similar_to"]:
print(f" Identified as: {match['name']} (confidence: {match['c']:.2f})")
if "gallery" in match:
print(f" Gallery: {match['gallery']['id']}")
else:
print(" Identity: Unknown (no gallery match)")

Step 3: Find All Appearances of a Recognized Person

Use the similar_to_face_id grouping for merged occurrences:

face_property = (
metadata["detection_groupings"]
.get("by_detection_property", {})
.get("human.face", {})
.get("similar_to_face_id", {})
)

for face_uuid, data in face_property.items():
# Look up the name from one of the detections
det_id = data["det_ids"][0]
detection = metadata["detections"][det_id]
name = "Unknown"
for match in detection.get("a", {}).get("similar_to", []):
if match.get("gallery_face", {}).get("id") == face_uuid:
name = match["name"]
break

total_time = sum(m["se"] - m["ss"] for m in data["moccs"])
print(f"\n{name} (gallery face: {face_uuid}):")
print(f" Total appearances: {len(data['moccs'])}")
print(f" Total time: {total_time:.1f}s")
print(f" Detection IDs: {data['det_ids']}")
for mocc in data["moccs"]:
print(f" Segment: {mocc['ss']:.1f}s - {mocc['se']:.1f}s")

Step 4: Get Face Bounding Boxes (Optional)

For spatial face coordinates, download the frames_faces metadata:

faces_metadata = requests.get(
"https://api-eu.valossa.com/core/1.0/job_results",
params={
"api_key": "YOUR_API_KEY",
"job_id": job_id,
"type": "frames_faces"
}
).json()

fps = metadata["media_info"]["technical"]["fps"]

# Get bounding boxes at a specific time
target_second = 30
start_frame = int(target_second * fps)

for frame_offset in range(int(fps)):
frame_idx = start_frame + frame_offset
if frame_idx < len(faces_metadata["faces_by_frame"]):
faces = faces_metadata["faces_by_frame"][frame_idx]
for face in faces:
print(f"Frame {frame_idx}: Face {face['id']} at ({face['x']:.3f}, {face['y']:.3f}), size {face['w']:.3f}x{face['h']:.3f}")

To recognize people not in the Valossa Celebrities Gallery, create a custom gallery and upload reference images.

Using the Training API

  1. Create a gallery (optional -- skip to use the default gallery):
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"api_key": "YOUR_API_KEY",
"name": "My Team Gallery"
}' \
https://api-eu.valossa.com/training/1.0/create_face_gallery
  1. Upload face images for each person (see details in Training API documentation), or upload face images in Valossa Portal, or use the download-from-URL functionality as in this example:
curl \
-F "api_key=YOUR_API_KEY" \
-F "image_data=@ricky_1.jpg" \
https://api-eu.valossa.com/training/1.0/upload_image

The return value contains a file reference that begins with valossaupload://.

Remember to create face identities and assign each image (file reference) to a face identity. Details in Training API documentation.

  1. Use the gallery in analysis by specifying it in your new_job request.

Using Valossa Portal

You can also manage face galleries through the Valossa Portal GUI:

  • Navigate to the face gallery management section
  • Create galleries, upload face images, and name identities visually
  • Use the Tag & Train feature in Valossa Report to train faces directly from analysis results

Tips for Best Results

  • Upload multiple photos per person with different angles, lighting, and expressions.
  • Use clear, well-lit images where the face is clearly visible.
  • Higher quality images produce more accurate recognition.