Face Recognition Guide

Available on Transcribe Pro Vision MAX

Face detection, celebrity recognition, and custom face gallery training are all included in the Transcribe Pro Vision MAX free trial. Start free — no sales call →

This guide covers detecting faces in videos, identifying known people, and training custom face galleries for your own face identities.

Quick Exploration with Metadata Reader

Before writing custom code, you can quickly inspect face detections using the Metadata Reader CLI tool:

# List face detections with identity matches
python -m metareader list-detections --type "human.face" core_metadata.json

# See when each face appears in the video
python -m metareader list-occurrences --type "human.face" core_metadata.json

# Generate facial sentiment charts (requires matplotlib)
python -m metareader plot --sentiment core_metadata.json

How Face Recognition Works

Valossa AI detects all visible faces in the video and creates human.face detections.
Each detected face is compared against face galleries (Valossa Celebrities Gallery and/or your custom galleries).
Matches are reported in the similar_to field with confidence scores.
Face grouping (human.face_group) identifies people who appear together.

Step 1: Analyze a Video

import requests
import time

response = requests.post(
    "https://api-eu.valossa.com/core/1.0/new_job",
    json={
        "api_key": "YOUR_API_KEY",
        "media": {
            "video": {"url": "https://example.com/video.mp4"}
        }
    }
)
job_id = response.json()["job_id"]

# Wait for completion
while True:
    status = requests.get(
        "https://api-eu.valossa.com/core/1.0/job_status",
        params={"api_key": "YOUR_API_KEY", "job_id": job_id}
    ).json()
    if status["status"] == "finished":
        break
    time.sleep(10)

metadata = requests.get(
    "https://api-eu.valossa.com/core/1.0/job_results",
    params={"api_key": "YOUR_API_KEY", "job_id": job_id}
).json()

Step 2: List All Detected Faces

face_ids = metadata["detection_groupings"]["by_detection_type"].get("human.face", [])

print(f"Total faces detected: {len(face_ids)}")

for det_id in face_ids:
    detection = metadata["detections"][det_id]
    attrs = detection.get("a", {})

    gender = attrs.get("gender", {})
    screen_time = attrs.get("s_visible", 0)
    quality = attrs.get("quality", "normal")

    print(f"\nFace (ID: {det_id}):")
    print(f"  Gender: {gender.get('value', 'N/A')} (confidence: {gender.get('c', 0):.2f})")
    print(f"  Screen time: {screen_time:.1f}s")
    print(f"  Quality: {quality}")

    # Check for identity matches
    if "similar_to" in attrs:
        for match in attrs["similar_to"]:
            print(f"  Identified as: {match['name']} (confidence: {match['c']:.2f})")
            if "gallery" in match:
                print(f"    Gallery: {match['gallery']['id']}")
    else:
        print("  Identity: Unknown (no gallery match)")

Step 3: Find All Appearances of a Recognized Person

Use the similar_to_face_id grouping for merged occurrences:

face_property = (
    metadata["detection_groupings"]
    .get("by_detection_property", {})
    .get("human.face", {})
    .get("similar_to_face_id", {})
)

for face_uuid, data in face_property.items():
    # Look up the name from one of the detections
    det_id = data["det_ids"][0]
    detection = metadata["detections"][det_id]
    name = "Unknown"
    for match in detection.get("a", {}).get("similar_to", []):
        if match.get("gallery_face", {}).get("id") == face_uuid:
            name = match["name"]
            break

    total_time = sum(m["se"] - m["ss"] for m in data["moccs"])
    print(f"\n{name} (gallery face: {face_uuid}):")
    print(f"  Total appearances: {len(data['moccs'])}")
    print(f"  Total time: {total_time:.1f}s")
    print(f"  Detection IDs: {data['det_ids']}")
    for mocc in data["moccs"]:
        print(f"  Segment: {mocc['ss']:.1f}s - {mocc['se']:.1f}s")

Step 4: Get Face Bounding Boxes (Optional)

For spatial face coordinates, download the frames_faces metadata:

faces_metadata = requests.get(
    "https://api-eu.valossa.com/core/1.0/job_results",
    params={
        "api_key": "YOUR_API_KEY",
        "job_id": job_id,
        "type": "frames_faces"
    }
).json()

fps = metadata["media_info"]["technical"]["fps"]

# Get bounding boxes at a specific time
target_second = 30
start_frame = int(target_second * fps)

for frame_offset in range(int(fps)):
    frame_idx = start_frame + frame_offset
    if frame_idx < len(faces_metadata["faces_by_frame"]):
        faces = faces_metadata["faces_by_frame"][frame_idx]
        for face in faces:
            print(f"Frame {frame_idx}: Face {face['id']} at ({face['x']:.3f}, {face['y']:.3f}), size {face['w']:.3f}x{face['h']:.3f}")

Training a Custom Face Gallery

To recognize people not in the Valossa Celebrities Gallery, create a custom gallery and upload reference images.

Using the Training API

Create a gallery (optional -- skip to use the default gallery):

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "api_key": "YOUR_API_KEY",
    "name": "My Team Gallery"
  }' \
  https://api-eu.valossa.com/training/1.0/create_face_gallery

Upload face images for each person (see details in Training API documentation), or upload face images in Valossa Portal, or use the download-from-URL functionality as in this example:

curl \
  -F "api_key=YOUR_API_KEY" \
  -F "image_data=@ricky_1.jpg" \
  https://api-eu.valossa.com/training/1.0/upload_image

The return value contains a file reference that begins with valossaupload://.

Remember to create face identities and assign each image (file reference) to a face identity. Details in Training API documentation.

Use the gallery in analysis by specifying it in your new_job request.

Using Valossa Portal

You can also manage face galleries through the Valossa Portal GUI:

Navigate to the face gallery management section
Create galleries, upload face images, and name identities visually
Use the Tag & Train feature in Valossa Report to train faces directly from analysis results

Tips for Best Results

Upload multiple photos per person with different angles, lighting, and expressions.
Use clear, well-lit images where the face is clearly visible.
Higher quality images produce more accurate recognition.

Face Training API -- Training API details
Faces & Identity -- Face metadata format reference
Sentiment & Emotion -- Face emotion data
Metadata Reader -- CLI tool for quick face detection inspection and sentiment charts

How Face Recognition Works​

Step 1: Analyze a Video​

Step 2: List All Detected Faces​

Step 3: Find All Appearances of a Recognized Person​

Step 4: Get Face Bounding Boxes (Optional)​

Training a Custom Face Gallery​

Using the Training API​

Using Valossa Portal​

Tips for Best Results​

Related Resources​