Skip to main content

Sentiment and Emotion

Valossa AI detects sentiment and emotion from both visual (face) and audio (speech, voice) modalities. There are four distinct types of sentiment and emotion data.

Important Caveat

When Valossa AI reports "emotion", "mood", or "sentiment", these terms refer to apparent, external signs that can be described with emotion-related vocabulary. They must not be interpreted as indicating the internal emotional states of a person. AI-detected emotions reflect observable patterns, not psychological assessments.

Overview of Sentiment Types

TypeSourceLocation in MetadataScope
Face valenceFacial expression analysisby_second for human.face detectionsPer face, per second
Named emotionsFacial expression analysisby_second for human.face detectionsPer face, per second
Speech valenceMeaning of spoken wordsaudio.speech detection attributesPer speech segment
Voice emotionVoice prosodics (tone, pitch)by_second for audio.voice_emotion detectionPer second
note

These features require face and speech emotion analytics to be activated for your subscription.

Face Valence

Valence describes the emotional positivity or negativity of a person at a specific moment, ranging from -1.0 (most negative) to 1.0 (most positive), with 0.0 being neutral.

Face valence data is in the by_second structure for human.face detections:

{
"d": "9",
"o": ["51"],
"a": {
"sen": {
"val": -0.82
}
}
}
FieldDescription
a.sen.valValence value (-1.0 to 1.0)

Named Emotions from Faces

Multiple emotions can be recognized on faces, each with a confidence score.

V2 Emotions (Current)

Most subscriptions use V2 face expressions with 13 named emotions:

  • joy
  • mild joy
  • sadness
  • serious expression
  • fear
  • tension/anxiousness
  • disgust
  • displeasure
  • anger
  • concentration/displeasure
  • surprise
  • startlement
  • neutral

V1 Emotions (Legacy)

Some long-standing subscriptions (pre-December 2020) may use V1 with 6 named emotions:

  • happiness
  • sadness
  • anger
  • disgust
  • surprise
  • neutral

Data Format

Named emotions appear alongside valence in the sen structure:

{
"d": "1",
"o": ["1"],
"a": {
"sen": {
"emo": [
{ "c": 0.772, "value": "disgust" }
],
"val": -0.796
}
}
}
FieldDescription
a.sen.emoArray of detected emotions
a.sen.emo[].valueEmotion identifier string
a.sen.emo[].cConfidence (0.0 to 1.0)

The emo array may contain multiple emotions if more than one is detected simultaneously.

Speech Valence

Speech valence is derived from the meaning of the spoken words (not the sound of the voice). It is available for English only.

Speech valence appears in the a.sen.val field of audio.speech detections:

{
"t": "audio.speech",
"label": "we profoundly believe that justice will win despite the looming challenges",
"a": {
"sen": {
"val": 0.307
}
}
}

This indicates the text content has a mildly positive sentiment.

Voice Emotion

Voice emotion detects emotional states from voice prosodics (tone, pitch, rhythm) rather than from the content of the words. This is fundamentally different from speech valence.

Voice emotion data is in a single audio.voice_emotion detection, with per-second values in by_second:

{
"d": "1034",
"o": ["1688"],
"a": {
"val": -0.022,
"aro": 0.655
}
}
FieldDescription
a.valVoice emotional valence (-1.0 to 1.0)
a.aroVoice emotional arousal (0.0 to 1.0, where 1.0 is maximum arousal)

The audio.voice_emotion detection has occurrences (occs) indicating the time segments where voice emotion was detected.

Comparison of Sentiment Types

TypeWhat It MeasuresExample
Face valenceHow positive/negative a face looksA frowning face: -0.7
Named emotionsSpecific emotion categories from facial expressions"joy" with confidence 0.85
Speech valencePositive/negative meaning of spoken words"I love this" = positive valence
Voice emotionEmotional tone of the voice itselfHigh arousal, negative valence = distressed tone

Code Example

Python: Extract Emotion Timeline for a Face

import json

with open("core_metadata.json", "r") as f:
metadata = json.load(f)

# Find all face detection IDs
face_ids = metadata["detection_groupings"]["by_detection_type"].get("human.face", [])
if not face_ids:
print("No faces detected")
else:
target_face_id = face_ids[0] # First (most prominent) face

for second_idx, second_data in enumerate(metadata["detection_groupings"]["by_second"]):
for item in second_data:
if item["d"] == target_face_id and "a" in item and "sen" in item.get("a", {}):
sen = item["a"]["sen"]
valence = sen.get("val", "N/A")
emotions = sen.get("emo", [])
emotion_str = ", ".join(f"{e['value']}({e['c']:.2f})" for e in emotions)
print(f"Second {second_idx}: valence={valence}, emotions=[{emotion_str}]")