Probabilistic Reversal Learning
Overview
The probabilistic reversal learning (PRL) task measures cognitive flexibility and reward-guided learning under uncertainty by presenting two response options with asymmetric probabilistic contingencies, typically 80% versus 20% reward delivery. Once the subject reaches a learning criterion on the initial discrimination, contingencies reverse so that the previously less-rewarded option becomes more rewarding. The task engages orbitofrontal cortex, anterior cingulate cortex, and dorsal striatal circuits that compute prediction errors and update action values. PRL is a direct translational analogue of tasks used to assess cognitive inflexibility in depression, OCD, and schizophrenia.
Performance is decomposed using computational reinforcement learning models that estimate learning rate (alpha), inverse temperature (beta), and separate parameters for positive and negative prediction error sensitivity. Win-stay probability quantifies the tendency to repeat a choice after reward, while lose-shift probability measures switching after non-reward. The number of trials to reach reversal criterion, perseverative errors (continued selection of the previously correct option after reversal), and total reversals achieved within a session are standard behavioral endpoints. These measures dissociate value updating from response selection processes.
ConductMaze automates PRL by delivering probabilistic reward outcomes on each lever press according to configurable probability matrices and detecting criterion-based reversals without experimenter intervention. The system logs every trial outcome, fits trial-by-trial reinforcement learning models, and exports win-stay/lose-shift matrices alongside raw event data. Reversal triggers, perseverative error counts, and session-wide learning curves are computed in real time. The platform supports multiple reversals within a single session to capture both initial learning and cognitive flexibility dynamics.
Trial Flow
Session Start
Illuminate house light and extend both levers; assign initial 80/20 contingency mapping.
Choice Trial
Subject presses one of two levers; outcome determined probabilistically by current contingency.
Outcome Delivery
Deliver pellet on rewarded trials or illuminate cue light on non-rewarded trials; begin ITI.
Criterion Check
Evaluate whether subject has met reversal criterion on the current contingency mapping.
Contingency Reversal
If criterion met, silently reverse the 80/20 contingency mapping across levers.
Post-Reversal Monitoring
Track perseverative errors and trials to new criterion following reversal.
Model Fitting
Fit reinforcement learning model to trial-by-trial choice data; export alpha, beta, and prediction errors.
Session End
Terminate session after maximum trials or time limit; export all reversal metrics.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Reward Probability (High) | float | 0.80 | Probability of reward delivery on the correct lever. |
| Reward Probability (Low) | float | 0.20 | Probability of reward delivery on the incorrect lever. |
| Reversal Criterion | integer | 8 | Number of correct choices in last 10 trials required to trigger a contingency reversal. |
| Maximum Trials | integer | 200 | Maximum number of choice trials per session. |
| Inter-Trial Interval | duration | 5 s | Interval between outcome delivery and the next lever extension. |
| Response Window | seconds | 10 | Maximum time to make a lever choice before the trial is scored as an omission. |
| Session Duration | duration | 60 min | Maximum session time; session ends when trials or time limit is reached. |
| Pellet Reward Size | integer | 1 | Number of pellets delivered on rewarded trials. |
Metrics
| Metric | Unit | Description |
|---|---|---|
| Win-Stay Probability | proportion | Probability of repeating a choice following a rewarded trial. |
| Lose-Shift Probability | proportion | Probability of switching choice following a non-rewarded trial. |
| Trials to Reversal Criterion | count | Number of trials required to reach criterion after each contingency reversal. |
| Perseverative Errors | count | Continued selections of the previously correct lever in the first block after reversal. |
| Total Reversals Achieved | count | Number of successful contingency reversals completed within the session. |
| Learning Rate (alpha) | parameter | Reinforcement learning model parameter reflecting speed of value updating. |
| Inverse Temperature (beta) | parameter | Model parameter reflecting choice stochasticity; higher values indicate more deterministic selection. |
Sample Data
| Subject | Group | Win-Stay | Lose-Shift | Reversals | Perseverative Errors | Alpha | Beta |
|---|
Representative data for illustration purposes. Actual values will vary by species, strain, and experimental conditions.
Applications
- 1Depression and Cognitive Rigidity — Assess inflexible reward learning and impaired negative feedback processing in chronic stress and serotonin depletion models.
- 2OCD Models — Quantify compulsive perseveration and resistance to contingency change in SAPAP3 knockout or quinpirole-sensitized rodents.
- 3Schizophrenia Cognitive Deficits — Measure learning rate and prediction error signaling abnormalities in NMDA receptor hypofunction or dopamine dysregulation models.
- 4Serotonergic Drug Screening — Evaluate how SSRIs, 5-HT2C antagonists, or psilocybin microdosing affect probabilistic learning and reversal flexibility.
Compatible Products
Ready to Automate Your Behavioral Protocols?
Contact us for a demo and pricing information.
