Sentiment and Emotion

Valossa AI detects sentiment and emotion from both visual (face) and audio (speech, voice) modalities. There are four distinct types of sentiment and emotion data.

Important Caveat

When Valossa AI reports "emotion", "mood", or "sentiment", these terms refer to apparent, external signs that can be described with emotion-related vocabulary. They must not be interpreted as indicating the internal emotional states of a person. AI-detected emotions reflect observable patterns, not psychological assessments.

Overview of Sentiment Types

Type	Source	Location in Metadata	Scope
Face valence	Facial expression analysis	`by_second` for `human.face` detections	Per face, per second
Named emotions	Facial expression analysis	`by_second` for `human.face` detections	Per face, per second
Speech valence	Meaning of spoken words	`audio.speech` detection attributes	Per speech segment
Voice emotion	Voice prosodics (tone, pitch)	`by_second` for `audio.voice_emotion` detection	Per second

note

These features require face and speech emotion analytics to be activated for your subscription.

Face Valence

Valence describes the emotional positivity or negativity of a person at a specific moment, ranging from -1.0 (most negative) to 1.0 (most positive), with 0.0 being neutral.

Face valence data is in the by_second structure for human.face detections:

{
  "d": "9",
  "o": ["51"],
  "a": {
    "sen": {
      "val": -0.82
    }
  }
}

Field	Description
`a.sen.val`	Valence value (-1.0 to 1.0)

Named Emotions from Faces

Multiple emotions can be recognized on faces, each with a confidence score.

V2 Emotions (Current)

Most subscriptions use V2 face expressions with 13 named emotions:

joy
mild joy
sadness
serious expression
fear
tension/anxiousness
disgust
displeasure
anger
concentration/displeasure
surprise
startlement
neutral

V1 Emotions (Legacy)

Some long-standing subscriptions (pre-December 2020) may use V1 with 6 named emotions:

happiness
sadness
anger
disgust
surprise
neutral

Data Format

Named emotions appear alongside valence in the sen structure:

{
  "d": "1",
  "o": ["1"],
  "a": {
    "sen": {
      "emo": [
        { "c": 0.772, "value": "disgust" }
      ],
      "val": -0.796
    }
  }
}

Field	Description
`a.sen.emo`	Array of detected emotions
`a.sen.emo[].value`	Emotion identifier string
`a.sen.emo[].c`	Confidence (0.0 to 1.0)

The emo array may contain multiple emotions if more than one is detected simultaneously.

Speech Valence

Speech valence is derived from the meaning of the spoken words (not the sound of the voice). It is available for English only.

Speech valence appears in the a.sen.val field of audio.speech detections:

{
  "t": "audio.speech",
  "label": "we profoundly believe that justice will win despite the looming challenges",
  "a": {
    "sen": {
      "val": 0.307
    }
  }
}

This indicates the text content has a mildly positive sentiment.

Voice Emotion

Voice emotion detects emotional states from voice prosodics (tone, pitch, rhythm) rather than from the content of the words. This is fundamentally different from speech valence.

Voice emotion data is in a single audio.voice_emotion detection, with per-second values in by_second:

{
  "d": "1034",
  "o": ["1688"],
  "a": {
    "val": -0.022,
    "aro": 0.655
  }
}

Field	Description
`a.val`	Voice emotional valence (-1.0 to 1.0)
`a.aro`	Voice emotional arousal (0.0 to 1.0, where 1.0 is maximum arousal)

The audio.voice_emotion detection has occurrences (occs) indicating the time segments where voice emotion was detected.

Comparison of Sentiment Types

Type	What It Measures	Example
Face valence	How positive/negative a face looks	A frowning face: -0.7
Named emotions	Specific emotion categories from facial expressions	"joy" with confidence 0.85
Speech valence	Positive/negative meaning of spoken words	"I love this" = positive valence
Voice emotion	Emotional tone of the voice itself	High arousal, negative valence = distressed tone

Code Example

Python: Extract Emotion Timeline for a Face

import json

with open("core_metadata.json", "r") as f:
    metadata = json.load(f)

# Find all face detection IDs
face_ids = metadata["detection_groupings"]["by_detection_type"].get("human.face", [])
if not face_ids:
    print("No faces detected")
else:
    target_face_id = face_ids[0]  # First (most prominent) face

    for second_idx, second_data in enumerate(metadata["detection_groupings"]["by_second"]):
        for item in second_data:
            if item["d"] == target_face_id and "a" in item and "sen" in item.get("a", {}):
                sen = item["a"]["sen"]
                valence = sen.get("val", "N/A")
                emotions = sen.get("emo", [])
                emotion_str = ", ".join(f"{e['value']}({e['c']:.2f})" for e in emotions)
                print(f"Second {second_idx}: valence={valence}, emotions=[{emotion_str}]")

Emotion Analysis Guide -- Complete workflow for emotion analysis
Faces & Identity -- Face detection details
Speech & Transcription -- Speech-to-text details

Overview of Sentiment Types​

Face Valence​

Named Emotions from Faces​

V2 Emotions (Current)​

V1 Emotions (Legacy)​

Data Format​

Speech Valence​

Voice Emotion​

Comparison of Sentiment Types​

Code Example​

Python: Extract Emotion Timeline for a Face​

Related Resources​