Emotion Analysis Guide
This guide covers how to extract and interpret emotion and sentiment data from Valossa AI analysis results, including face-based emotions, speech sentiment, and voice emotion.
When Valossa AI reports "emotion", "mood", or "sentiment", these terms refer to apparent, external signs that can be described with emotion-related vocabulary. They must not be interpreted as indicating the internal emotional states of a person. AI-detected emotions reflect observable visual and auditory patterns, not psychological assessments.
Prerequisites
Face valence, named emotions, speech sentiment, and voice emotion are all included in the Transcribe Pro Vision MAX free trial — no sales call needed. Start free trial →
Your Valossa subscription must include emotion analytics. Transcribe Pro Vision MAX includes all four emotion types. For higher-volume or custom configurations, contact Valossa sales.
The Metadata Reader CLI tool is especially powerful for emotion data — it can generate sentiment visualizations and show per-second valence without writing any code:
# Per-second emotion data for all faces
python -m metareader list-detections-by-second --type "human.face" core_metadata.json
# Generate facial sentiment timeline chart (requires matplotlib)
python -m metareader plot --sentiment core_metadata.json
# Bar chart of detection frequencies
python -m metareader plot --barh core_metadata.json
Four Types of Emotion Data
| Type | Source | Data Location | Description |
|---|---|---|---|
| Face valence | Facial expressions | by_second for human.face | Positivity/negativity of facial expression (-1.0 to 1.0) |
| Named emotions | Facial expressions | by_second for human.face | Specific emotion labels (joy, sadness, anger, etc.) |
| Speech valence | Meaning of spoken words | audio.speech attributes | Positivity/negativity of speech content (-1.0 to 1.0) |
| Voice emotion | Voice prosodics (tone/pitch) | by_second for audio.voice_emotion | Valence and arousal from how the voice sounds |
Extracting Face Emotions
Face Valence Over Time
import json
with open("core_metadata.json", "r") as f:
metadata = json.load(f)
face_ids = metadata["detection_groupings"]["by_detection_type"].get("human.face", [])
if not face_ids:
print("No faces detected")
exit()
# Track the most prominent face
main_face_id = face_ids[0]
valence_timeline = []
for second_idx, second_data in enumerate(metadata["detection_groupings"]["by_second"]):
for item in second_data:
if item["d"] == main_face_id:
sen = item.get("a", {}).get("sen", {})
if "val" in sen:
valence_timeline.append({
"second": second_idx,
"valence": sen["val"]
})
print(f"Face valence over time (Face ID: {main_face_id}):")
for entry in valence_timeline:
bar = "+" * int(max(0, entry["valence"]) * 20) or "-" * int(abs(min(0, entry["valence"])) * 20)
print(f" Second {entry['second']:4d}: {entry['valence']:+.2f} {bar}")
Named Emotions
emotion_counts = {}
for second_idx, second_data in enumerate(metadata["detection_groupings"]["by_second"]):
for item in second_data:
if item["d"] == main_face_id:
emotions = item.get("a", {}).get("sen", {}).get("emo", [])
for emo in emotions:
name = emo["value"]
emotion_counts[name] = emotion_counts.get(name, 0) + 1
print("\nEmotion frequency for main face:")
for emotion, count in sorted(emotion_counts.items(), key=lambda x: -x[1]):
print(f" {emotion}: {count} seconds")
Available Emotions
V2 (current, 13 emotions): joy, mild joy, sadness, serious expression, fear, tension/anxiousness, disgust, displeasure, anger, concentration/displeasure, surprise, startlement, neutral
V1 (legacy, 6 emotions): happiness, sadness, anger, disgust, surprise, neutral
Extracting Speech Sentiment
Speech valence reflects the emotional tone of the content of spoken words (currently English only):
speech_ids = metadata["detection_groupings"]["by_detection_type"].get("audio.speech", [])
print("\nSpeech sentiment:")
for det_id in speech_ids:
detection = metadata["detections"][det_id]
text = detection["label"]
valence = detection.get("a", {}).get("sen", {}).get("val")
start = detection["occs"][0]["ss"] if detection.get("occs") else 0
if valence is not None:
sentiment = "positive" if valence > 0.1 else "negative" if valence < -0.1 else "neutral"
print(f" [{start:.1f}s] ({sentiment}, {valence:+.2f}) \"{text[:60]}...\"" if len(text) > 60 else f" [{start:.1f}s] ({sentiment}, {valence:+.2f}) \"{text}\"")
Extracting Voice Emotion
Voice emotion detects emotional states from how the voice sounds (tone, pitch, rhythm), independent of what is being said:
voice_ids = metadata["detection_groupings"]["by_detection_type"].get("audio.voice_emotion", [])
if voice_ids:
voice_det_id = voice_ids[0]
print("\nVoice emotion (valence and arousal):")
for second_idx, second_data in enumerate(metadata["detection_groupings"]["by_second"]):
for item in second_data:
if item["d"] == voice_det_id and "a" in item:
valence = item["a"].get("val", 0)
arousal = item["a"].get("aro", 0)
print(f" Second {second_idx}: valence={valence:+.3f}, arousal={arousal:.3f}")
Voice emotion provides:
- Valence (-1.0 to 1.0): How positive or negative the voice sounds
- Arousal (0.0 to 1.0): How energetic or excited the voice sounds
Combined Emotion Dashboard
Build a second-by-second emotion overview combining all sources:
def build_emotion_timeline(metadata, face_id, voice_det_id=None):
"""Build a combined emotion timeline."""
duration = int(metadata["media_info"]["technical"]["duration_s"])
timeline = []
for second in range(duration):
entry = {"second": second, "face_valence": None, "face_emotion": None,
"voice_valence": None, "voice_arousal": None}
if second < len(metadata["detection_groupings"]["by_second"]):
for item in metadata["detection_groupings"]["by_second"][second]:
if item["d"] == face_id:
sen = item.get("a", {}).get("sen", {})
entry["face_valence"] = sen.get("val")
emos = sen.get("emo", [])
if emos:
entry["face_emotion"] = emos[0]["value"]
if voice_det_id and item["d"] == voice_det_id:
entry["voice_valence"] = item.get("a", {}).get("val")
entry["voice_arousal"] = item.get("a", {}).get("aro")
timeline.append(entry)
return timeline
Related Resources
- Sentiment & Emotion Reference -- Metadata format details
- Faces & Identity -- Face detection basics
- Speech & Transcription -- Speech detection details
- Metadata Reader -- CLI tool for sentiment visualization and per-second emotion extraction