Question 1

How does live keyboard scoring work for the FST and TST?

Accepted Answer

The scoring pad assigns each behavioral category (immobility, swimming, climbing, or other behaviors) to a keyboard key. When you press a key, the tool records the exact timestamp and switches the active behavior state. Because behavioral states are mutually exclusive, pressing a new key automatically ends the previous behavior and begins the new one. This approach eliminates the need to start and stop a stopwatch for each behavior, reduces transcription errors, and produces a continuous time-series record of the animal's behavior throughout the trial.

Question 2

What is the standard definition of immobility in the FST and TST?

Accepted Answer

In the Forced Swim Test, immobility is defined as the absence of any movement except those necessary for the animal to keep its head above water, typically small paddling movements of the hind limbs. In the Tail Suspension Test, immobility is defined as the complete cessation of limb and body movements while the animal hangs passively. Both definitions require the scorer to make a judgment call at the boundary between minimal movement and true immobility, which is why inter-rater reliability assessment is critical. Operational definitions should be established and documented before scoring begins, and all scorers in a study should be trained on the same criteria.

Question 3

What is inter-rater reliability and why is it important for behavioral scoring?

Accepted Answer

Inter-rater reliability quantifies the degree of agreement between two or more independent scorers observing the same behavioral trial. In manual behavioral scoring, subjective judgment is inherent in distinguishing immobility from low-level activity, making it essential to demonstrate that different scorers produce consistent results. Cohen's kappa is the standard metric, calculated as kappa = (P_observed - P_expected) / (1 - P_expected), where P_observed is the proportion of time-bins where both scorers agree and P_expected is the agreement expected by chance. A kappa above 0.80 is generally considered excellent agreement, 0.60-0.80 is substantial, and below 0.60 may indicate a need for retraining or clearer operational definitions.

Question 4

What is the difference between the Forced Swim Test and the Tail Suspension Test?

Accepted Answer

The Forced Swim Test (Porsolt et al., 1977) places a rodent in an inescapable cylinder of water and measures immobility as a proxy for behavioral despair. The Tail Suspension Test (Steru et al., 1985) suspends a mouse by its tail and measures immobility similarly. The FST can be used in both rats and mice and allows scoring of active behaviors (swimming and climbing) in addition to immobility, which can help differentiate serotonergic from noradrenergic drug mechanisms. The TST is primarily used in mice, avoids the confound of hypothermia from water exposure, and is generally shorter (6 minutes vs. the typical 5-6 minute test session of the FST). Both assays are widely used for antidepressant screening, and results from one assay do not always predict the other.

Question 5

What are time-bin analyses and why are they useful?

Accepted Answer

Time-bin analysis divides the total trial duration into equal intervals (typically 1-minute bins) and calculates the percentage of immobility within each bin. This approach reveals the temporal dynamics of behavioral despair rather than collapsing the entire trial into a single immobility score. For instance, most rodents show increasing immobility over the course of a 6-minute FST session, and some drug treatments selectively reduce immobility in later bins. Time-bin analysis can also detect ceiling or floor effects that are masked by total immobility scores. Reporting time-course data alongside total scores is increasingly recommended in the behavioral pharmacology literature.

Question 6

How does this tool compare to automated video scoring systems?

Accepted Answer

Automated video scoring systems such as ANY-maze, EthoVision, SMART, or ConductVision use algorithms to classify behaviors from video recordings, offering high throughput and eliminating scorer bias. However, automated systems require careful calibration, may struggle with behaviors at the immobility threshold, and can produce systematic errors that differ from human judgment. Manual live scoring with this tool provides a human-in-the-loop ground truth that is essential for validating automated classifiers. Many laboratories use both approaches: manual scoring to establish reliability on a subset of trials, and automated scoring for the full dataset, then report the correlation between the two methods.

Question 7

Can I import previously scored data into this tool?

Accepted Answer

Yes, the scoring pad supports importing pre-scored event logs so you can perform time-bin analysis and inter-rater reliability calculations on data that was collected outside the tool. The import format expects timestamped behavioral state changes with the behavior label and onset time for each event. This is useful when you have existing scoring records from a previous experiment or from another scoring tool and want to compute Cohen's kappa against a second scorer's data, or when you need to re-analyze archived datasets with different time-bin widths.

FST/TST Live Scoring Pad.

Calculator

Assay Configuration

Animal Details

Scoring Pad — Forced Swim Test

Blind the scorer to treatment group

Define immobility operationally before scoring begins

Use consistent time-bin widths across studies

Score from the correct trial segment

Do not confuse behavioral despair with learned helplessness

Method

Validated

How to cite

How to Cite

Manual Behavioral Scoring: Principles and Practice

FST vs TST: When to Use Each Assay

Inter-Rater Reliability and Cohen's Kappa

Frequently asked

Next steps

Forced Swim Test

Tail Suspension Test

FST Immobility Calculator

TST Immobility Calculator