Probabilistic Reversal Learning

Overview

The probabilistic reversal learning (PRL) task measures cognitive flexibility and reward-guided learning under uncertainty by presenting two response options with asymmetric probabilistic contingencies, typically 80% versus 20% reward delivery. Once the subject reaches a learning criterion on the initial discrimination, contingencies reverse so that the previously less-rewarded option becomes more rewarding. The task engages orbitofrontal cortex, anterior cingulate cortex, and dorsal striatal circuits that compute prediction errors and update action values. PRL is a direct translational analogue of tasks used to assess cognitive inflexibility in depression, OCD, and schizophrenia.

Performance is decomposed using computational reinforcement learning models that estimate learning rate (alpha), inverse temperature (beta), and separate parameters for positive and negative prediction error sensitivity. Win-stay probability quantifies the tendency to repeat a choice after reward, while lose-shift probability measures switching after non-reward. The number of trials to reach reversal criterion, perseverative errors (continued selection of the previously correct option after reversal), and total reversals achieved within a session are standard behavioral endpoints. These measures dissociate value updating from response selection processes.

ConductMaze automates PRL by delivering probabilistic reward outcomes on each lever press according to configurable probability matrices and detecting criterion-based reversals without experimenter intervention. The system logs every trial outcome, fits trial-by-trial reinforcement learning models, and exports win-stay/lose-shift matrices alongside raw event data. Reversal triggers, perseverative error counts, and session-wide learning curves are computed in real time. The platform supports multiple reversals within a single session to capture both initial learning and cognitive flexibility dynamics.

Trial Flow

start

Session Start

Illuminate house light and extend both levers; assign initial 80/20 contingency mapping.

decision

Choice Trial

Subject presses one of two levers; outcome determined probabilistically by current contingency.

output

Outcome Delivery

Deliver pellet on rewarded trials or illuminate cue light on non-rewarded trials; begin ITI.

process

Criterion Check

Evaluate whether subject has met reversal criterion on the current contingency mapping.

process

Contingency Reversal

If criterion met, silently reverse the 80/20 contingency mapping across levers.

decision

Post-Reversal Monitoring

Track perseverative errors and trials to new criterion following reversal.

output

Model Fitting

Fit reinforcement learning model to trial-by-trial choice data; export alpha, beta, and prediction errors.

end

Session End

Terminate session after maximum trials or time limit; export all reversal metrics.

Parameters

Parameter	Type	Default	Description
Reward Probability (High)	`float`	0.80	Probability of reward delivery on the correct lever.
Reward Probability (Low)	`float`	0.20	Probability of reward delivery on the incorrect lever.
Reversal Criterion	`integer`	8	Number of correct choices in last 10 trials required to trigger a contingency reversal.
Maximum Trials	`integer`	200	Maximum number of choice trials per session.
Inter-Trial Interval	`duration`	5 s	Interval between outcome delivery and the next lever extension.
Response Window	`seconds`	10	Maximum time to make a lever choice before the trial is scored as an omission.
Session Duration	`duration`	60 min	Maximum session time; session ends when trials or time limit is reached.
Pellet Reward Size	`integer`	1	Number of pellets delivered on rewarded trials.

Metrics

Metric	Unit	Description
Win-Stay Probability	`proportion`	Probability of repeating a choice following a rewarded trial.
Lose-Shift Probability	`proportion`	Probability of switching choice following a non-rewarded trial.
Trials to Reversal Criterion	`count`	Number of trials required to reach criterion after each contingency reversal.
Perseverative Errors	`count`	Continued selections of the previously correct lever in the first block after reversal.
Total Reversals Achieved	`count`	Number of successful contingency reversals completed within the session.
Learning Rate (alpha)	`parameter`	Reinforcement learning model parameter reflecting speed of value updating.
Inverse Temperature (beta)	`parameter`	Model parameter reflecting choice stochasticity; higher values indicate more deterministic selection.

Sample Data

Subject	Group	Win-Stay	Lose-Shift	Reversals	Perseverative Errors	Alpha	Beta

Representative data for illustration purposes. Actual values will vary by species, strain, and experimental conditions.

Applications

1
Depression and Cognitive Rigidity — Assess inflexible reward learning and impaired negative feedback processing in chronic stress and serotonin depletion models.
2
OCD Models — Quantify compulsive perseveration and resistance to contingency change in SAPAP3 knockout or quinpirole-sensitized rodents.
3
Schizophrenia Cognitive Deficits — Measure learning rate and prediction error signaling abnormalities in NMDA receptor hypofunction or dopamine dysregulation models.
4
Serotonergic Drug Screening — Evaluate how SSRIs, 5-HT2C antagonists, or psilocybin microdosing affect probabilistic learning and reversal flexibility.

Related Protocols

Delay Discounting Touchscreen Extinction Learning Sign Tracking Goal Tracking

Compatible Products

ME-OC-BASEME-OC-LEVERME-OC-PELLETCS-958344

Ready to Automate Your Behavioral Protocols?

Request Quote View All Protocols