Probabilistic Reversal Learning

Overview

The probabilistic reversal learning (PRL) task measures cognitive flexibility and reward-guided learning under uncertainty by presenting two response options with asymmetric probabilistic contingencies, typically 80% versus 20% reward delivery. Once the subject reaches a learning criterion on the initial discrimination, contingencies reverse so that the previously less-rewarded option becomes more rewarding. The task engages orbitofrontal cortex, anterior cingulate cortex, and dorsal striatal circuits that compute prediction errors and update action values. PRL is a direct translational analogue of tasks used to assess cognitive inflexibility in depression, OCD, and schizophrenia.

Performance is decomposed using computational reinforcement learning models that estimate learning rate (alpha), inverse temperature (beta), and separate parameters for positive and negative prediction error sensitivity. Win-stay probability quantifies the tendency to repeat a choice after reward, while lose-shift probability measures switching after non-reward. The number of trials to reach reversal criterion, perseverative errors (continued selection of the previously correct option after reversal), and total reversals achieved within a session are standard behavioral endpoints. These measures dissociate value updating from response selection processes.

ConductMaze automates PRL by delivering probabilistic reward outcomes on each lever press according to configurable probability matrices and detecting criterion-based reversals without experimenter intervention. The system logs every trial outcome, fits trial-by-trial reinforcement learning models, and exports win-stay/lose-shift matrices alongside raw event data. Reversal triggers, perseverative error counts, and session-wide learning curves are computed in real time. The platform supports multiple reversals within a single session to capture both initial learning and cognitive flexibility dynamics.

Trial Flow

start

Session Start

Illuminate house light and extend both levers; assign initial 80/20 contingency mapping.

decision

Choice Trial

Subject presses one of two levers; outcome determined probabilistically by current contingency.

output

Outcome Delivery

Deliver pellet on rewarded trials or illuminate cue light on non-rewarded trials; begin ITI.

process

Criterion Check

Evaluate whether subject has met reversal criterion on the current contingency mapping.

process

Contingency Reversal

If criterion met, silently reverse the 80/20 contingency mapping across levers.

decision

Post-Reversal Monitoring

Track perseverative errors and trials to new criterion following reversal.

output

Model Fitting

Fit reinforcement learning model to trial-by-trial choice data; export alpha, beta, and prediction errors.

end

Session End

Terminate session after maximum trials or time limit; export all reversal metrics.

Parameters

ParameterTypeDefaultDescription
Reward Probability (High)float0.80Probability of reward delivery on the correct lever.
Reward Probability (Low)float0.20Probability of reward delivery on the incorrect lever.
Reversal Criterioninteger8Number of correct choices in last 10 trials required to trigger a contingency reversal.
Maximum Trialsinteger200Maximum number of choice trials per session.
Inter-Trial Intervalduration5 sInterval between outcome delivery and the next lever extension.
Response Windowseconds10Maximum time to make a lever choice before the trial is scored as an omission.
Session Durationduration60 minMaximum session time; session ends when trials or time limit is reached.
Pellet Reward Sizeinteger1Number of pellets delivered on rewarded trials.

Metrics

MetricUnitDescription
Win-Stay ProbabilityproportionProbability of repeating a choice following a rewarded trial.
Lose-Shift ProbabilityproportionProbability of switching choice following a non-rewarded trial.
Trials to Reversal CriterioncountNumber of trials required to reach criterion after each contingency reversal.
Perseverative ErrorscountContinued selections of the previously correct lever in the first block after reversal.
Total Reversals AchievedcountNumber of successful contingency reversals completed within the session.
Learning Rate (alpha)parameterReinforcement learning model parameter reflecting speed of value updating.
Inverse Temperature (beta)parameterModel parameter reflecting choice stochasticity; higher values indicate more deterministic selection.

Sample Data

SubjectGroupWin-StayLose-ShiftReversalsPerseverative ErrorsAlphaBeta

Representative data for illustration purposes. Actual values will vary by species, strain, and experimental conditions.

Applications

  • 1
    Depression and Cognitive RigidityAssess inflexible reward learning and impaired negative feedback processing in chronic stress and serotonin depletion models.
  • 2
    OCD ModelsQuantify compulsive perseveration and resistance to contingency change in SAPAP3 knockout or quinpirole-sensitized rodents.
  • 3
    Schizophrenia Cognitive DeficitsMeasure learning rate and prediction error signaling abnormalities in NMDA receptor hypofunction or dopamine dysregulation models.
  • 4
    Serotonergic Drug ScreeningEvaluate how SSRIs, 5-HT2C antagonists, or psilocybin microdosing affect probabilistic learning and reversal flexibility.

Compatible Products

ME-OC-BASEME-OC-LEVERME-OC-PELLETCS-958344

Ready to Automate Your Behavioral Protocols?

Contact us for a demo and pricing information.