All iLive content is medically reviewed or fact checked to ensure as much factual accuracy as possible.

We have strict sourcing guidelines and only link to reputable media sites, academic research institutions and, whenever possible, medically peer reviewed studies. Note that the numbers in parentheses ([1], [2], etc.) are clickable links to these studies.

If you feel that any of our content is inaccurate, out-of-date, or otherwise questionable, please select it and press Ctrl + Enter.

"Two Counters - One Solution": How the Brain Combines Sound and Picture to Press a Button Faster

Alexey Kryvenko, Medical Reviewer, Editor
Last reviewed: 18.08.2025

2025-08-15 13:30

When there's a rustling sound in the grass and a flickering shadow, we react faster than if there was just a sound or a flash. Classic. But what exactly is happening in the brain in those split seconds? A new paper in Nature Human Behaviour shows that vision and hearing accumulate evidence separately, and at the moment of decision, their "sum" launches a single motor trigger. In other words, there are two sensory accumulators in the head that co-activate a single motor mechanism.

Background

How the brain makes quick decisions in a “noisy world” of sounds and images is a centuries-old question, but without a clear answer. Since the late 19th and 20th centuries, the “redundant signals effect” (RSE) has been known in psychophysics: if a target is presented simultaneously in two modalities (for example, a flash and a tone), the reaction is faster than with a single signal. The dispute was about the mechanism: a “race” of independent channels (race model), where the fastest sensory process wins, or coactivation, where evidence from different modalities actually adds up before triggering a response. Formal tests (like Miller’s inequality) helped at the behavioral level, but did not show where exactly the “fold” occurs - on the side of sensory accumulators or already at the motor trigger.

Over the past 10-15 years, neurophysiology has offered reliable markers of these latent stages. Most notably, the centro-parietal positivity (CPP), a supra-modal EEG “accumulation to threshold” signal that fits well with drift-diffusion models of decision making, and the beta reduction (~20 Hz) over the left motor cortex as an index of movement preparation. These signals have made it possible to link computational models to real brain circuits. But key gaps remain: are audio and visual evidence accumulated in one or two separate accumulators? And is there a single motor threshold for multimodal decision making, or is each modality “judged” by separate criteria?

An additional complication is timing. In real conditions, vision and hearing come with microsecond-millisecond desynchronies: a slight time shift can mask the true architecture of the process. Therefore, paradigms are needed that simultaneously control the response rule (to respond to any modality or only to both at once), vary asynchrony, and allow combining behavioral distributions of reaction times with the dynamics of EEG markers in a single modeling. It is this approach that allows us to distinguish “summation of sensory accumulators with subsequent single motor start” from the scenarios of “channel race” or “early merging into a single sensory stream”.

Finally, there are practical motivations beyond the basic theory. If the sensory accumulators are indeed separate and the motor trigger is shared, then in clinical groups (e.g., Parkinsonism, ADHD, spectrum disorders) the bottleneck may lie at different levels - in accumulation, in convergence, or in motor preparation. For human-machine interfaces and warning systems, the phase and timing of cues are critical: the correct phasing of sound and image should maximize the joint contribution to the motor threshold, and not simply "increase the volume/brightness." These questions are the context of a new paper in Nature Human Behaviour, which explores multimodal detection simultaneously at the level of behavior, EEG dynamics (CPP and beta), and computational modeling.

What exactly did they find out?

In two EEG experiments (n=22 and n=21), participants detected changes in a dot animation (vision) and a series of tones (auditory) by pressing a button either when either changed (redundant detection) or only when both changed (conjunctive detection).
The researchers monitored a neural evidence "counter" - centro-parietal positivity (CPP) - and left hemisphere beta activity dynamics (~20 Hz) as a marker of movement preparation. These signals were compared with reaction time distributions and computational models.
Bottom line: auditory and visual evidence accumulate in separate processes, and when redundantly detected, their cumulative contribution subadditively (less than a simple sum) co-activates one threshold motor process - the very "trigger" of the action.

An important detail is the "out-of-sync" check. When the researchers introduced a small asynchrony between the audio and visual signals, a model in which the sensory accumulators first integrate and then inform the motor system explained the data better than the accumulators "racing" against each other. This reinforces the idea that the sensory streams run in parallel but converge on a single motor decision node.

Why you need to know this (examples)

Clinic and diagnostics. If the sensory accumulators are separate, and the motor threshold is common, then different groups of patients (with ASD, ADHD, Parkinsonism) can expect different "breakdown nodes" - in accumulation, in convergence or in motor triggering. This helps to more accurately design biomarkers and attention/reaction training.
Human-machine interfaces: The design of warning signals and multimodal interfaces can benefit from optimal phasing of sound and visual cues - so that motor co-activation is faster and more stable.
Neural models of decision making. The results link long-term behavioral "controversies" (race vs. co-activation) to specific EEG markers (CPP and beta rhythm of the motor cortex), bringing computational models closer to real physiology.

How it was done (methodology, but briefly)

Paradigms: redundant (respond to any modality) and conjunctive (respond only to both at once) - a classic technique that allows you to "weigh" the contribution of each sensory branch. Plus a separate experiment with a given asynchrony between audio and video.
Neurosignals:
- CPP - "supramodal" index of accumulation of sensory evidence up to threshold;
- Beta decrease over the left motor cortex is an index of movement preparation. Comparison of their time profiles showed different CPP amplitudes for auditory vs. visual targets (a sign of separate accumulators) and a joint drive of the beta mechanism (a sign of a common motor threshold).
Simulation: joint fitting of RT behavioral distributions and EEG dynamics. The model with integration of sensory accumulators before the motor node won the comparison, especially in the presence of asynchrony.

What does this change in the brain picture?

Multimodality ≠ "mix and forget." The brain doesn't dump all the evidence into one pot; it keeps parallel records across channels, and the integration happens closer to the action. This explains why multimodal cues speed up reaction time - they co-raise the same motor flag.
Subadditivity is the norm. The "sum" of sensory inputs is less than simple arithmetic, but it is enough to reach the motor threshold faster. So, the goal of the interface is not to "add volume and brightness", but to synchronize convergence.
A Bridge Between Psychophysics and Neurophysiology: Old Behavioral “Redundant Cue” Effects Receive a Mechanistic Explanation via CPP and Beta Markers.

Limitations and the next step

The sample is healthy adults in laboratory tasks; clinical conclusions are the next stage. Tests are needed in patients and in natural multimodal environments.
EEG provides an excellent temporal but limited spatial picture; it is logical to supplement it with MEG/invasive registration and effective connectivity models.
The theory predicts that training in the timing of audio-visual cues should selectively improve the motor stage without changing the sensory accumulators - this is a testable hypothesis in applied tasks (sports, aviation, rehabilitation).

Summary

The brain keeps separate "counters" for vision and hearing, but decides with one button. Understanding where exactly the "folding" of sensory information into action occurs, we can more accurately adjust diagnostics, interfaces and rehabilitation - from pilot helmets to telemedicine and neuroeducation of attention.

Source: Egan, JM, Gomez-Ramirez, M., Foxe, JJ et al. Distinct audio and visual accumulators co-activate motor preparation for multisensory detection. Nat Hum Behav (2025). https://doi.org/10.1038/s41562-025-02280-9