Sentiment and Emotion
Valossa AI detects sentiment and emotion from both visual (face) and audio (speech, voice) modalities. There are four distinct types of sentiment and emotion data.
When Valossa AI reports "emotion", "mood", or "sentiment", these terms refer to apparent, external signs that can be described with emotion-related vocabulary. They must not be interpreted as indicating the internal emotional states of a person. AI-detected emotions reflect observable patterns, not psychological assessments.
Overview of Sentiment Types
| Type | Source | Location in Metadata | Scope |
|---|---|---|---|
| Face valence | Facial expression analysis | by_second for human.face detections | Per face, per second |
| Named emotions | Facial expression analysis | by_second for human.face detections | Per face, per second |
| Speech valence | Meaning of spoken words | audio.speech detection attributes | Per speech segment |
| Voice emotion | Voice prosodics (tone, pitch) | by_second for audio.voice_emotion detection | Per second |
These features require face and speech emotion analytics to be activated for your subscription.
Face Valence
Valence describes the emotional positivity or negativity of a person at a specific moment, ranging from -1.0 (most negative) to 1.0 (most positive), with 0.0 being neutral.
Face valence data is in the by_second structure for human.face detections:
{
"d": "9",
"o": ["51"],
"a": {
"sen": {
"val": -0.82
}
}
}
| Field | Description |
|---|---|
a.sen.val | Valence value (-1.0 to 1.0) |
Named Emotions from Faces
Multiple emotions can be recognized on faces, each with a confidence score.
V2 Emotions (Current)
Most subscriptions use V2 face expressions with 13 named emotions:
- joy
- mild joy
- sadness
- serious expression
- fear
- tension/anxiousness
- disgust
- displeasure
- anger
- concentration/displeasure
- surprise
- startlement
- neutral
V1 Emotions (Legacy)
Some long-standing subscriptions (pre-December 2020) may use V1 with 6 named emotions:
- happiness
- sadness
- anger
- disgust
- surprise
- neutral
Data Format
Named emotions appear alongside valence in the sen structure:
{
"d": "1",
"o": ["1"],
"a": {
"sen": {
"emo": [
{ "c": 0.772, "value": "disgust" }
],
"val": -0.796
}
}
}
| Field | Description |
|---|---|
a.sen.emo | Array of detected emotions |
a.sen.emo[].value | Emotion identifier string |
a.sen.emo[].c | Confidence (0.0 to 1.0) |
The emo array may contain multiple emotions if more than one is detected simultaneously.
Speech Valence
Speech valence is derived from the meaning of the spoken words (not the sound of the voice). It is available for English only.
Speech valence appears in the a.sen.val field of audio.speech detections:
{
"t": "audio.speech",
"label": "we profoundly believe that justice will win despite the looming challenges",
"a": {
"sen": {
"val": 0.307
}
}
}
This indicates the text content has a mildly positive sentiment.
Voice Emotion
Voice emotion detects emotional states from voice prosodics (tone, pitch, rhythm) rather than from the content of the words. This is fundamentally different from speech valence.
Voice emotion data is in a single audio.voice_emotion detection, with per-second values in by_second:
{
"d": "1034",
"o": ["1688"],
"a": {
"val": -0.022,
"aro": 0.655
}
}
| Field | Description |
|---|---|
a.val | Voice emotional valence (-1.0 to 1.0) |
a.aro | Voice emotional arousal (0.0 to 1.0, where 1.0 is maximum arousal) |
The audio.voice_emotion detection has occurrences (occs) indicating the time segments where voice emotion was detected.
Comparison of Sentiment Types
| Type | What It Measures | Example |
|---|---|---|
| Face valence | How positive/negative a face looks | A frowning face: -0.7 |
| Named emotions | Specific emotion categories from facial expressions | "joy" with confidence 0.85 |
| Speech valence | Positive/negative meaning of spoken words | "I love this" = positive valence |
| Voice emotion | Emotional tone of the voice itself | High arousal, negative valence = distressed tone |
Code Example
Python: Extract Emotion Timeline for a Face
import json
with open("core_metadata.json", "r") as f:
metadata = json.load(f)
# Find all face detection IDs
face_ids = metadata["detection_groupings"]["by_detection_type"].get("human.face", [])
if not face_ids:
print("No faces detected")
else:
target_face_id = face_ids[0] # First (most prominent) face
for second_idx, second_data in enumerate(metadata["detection_groupings"]["by_second"]):
for item in second_data:
if item["d"] == target_face_id and "a" in item and "sen" in item.get("a", {}):
sen = item["a"]["sen"]
valence = sen.get("val", "N/A")
emotions = sen.get("emo", [])
emotion_str = ", ".join(f"{e['value']}({e['c']:.2f})" for e in emotions)
print(f"Second {second_idx}: valence={valence}, emotions=[{emotion_str}]")
Related Resources
- Emotion Analysis Guide -- Complete workflow for emotion analysis
- Faces & Identity -- Face detection details
- Speech & Transcription -- Speech-to-text details