mRNA Raman Calibrator

Build a PLS calibration from Raman spectra and known mRNA concentrations, then predict unknowns with 95% prediction intervals and outlier diagnostics. Runs entirely in your browser.

ChemometricsPLS RegressionRaman SpectroscopyClient-Side

Try it out

Load example mRNA Raman data to see the full workflow

1

Calibration spectra

Drop SPC, JCAMP-DX, or two-column CSV/TSV files and enter the known concentration for each.

2

Build model

Pipeline: Savitzky–Golay smoothing → baseline correction (AsLS) → SNV → optional SG 1st derivative → PLS regression with leave-one-out CV.

3

Predict unknowns

Drop spectrum files of unknown concentration. Each prediction includes a 95% interval and an outlier flag based on Hotelling's T² and Q-residuals.

  • Quantifying mRNA concentration in IVT reactions or LNP formulations from Raman spectra
  • Building a per-instrument calibration when reference standards span the working concentration range
  • Routine QC where you score dozens to hundreds of unknowns against a fixed calibration set
  • Replacing a univariate peak-height assay with a multivariate model that uses the whole spectrum

Don't use for

  • No vetted calibration set — the tool is honest about uncertainty, but only if your standards are honest
  • Single-spectrum prediction with no reference samples (use a published shipped model instead)
  • Hyperspectral imaging cubes — the tool is sized for ≤ 200 spectra of ≤ 4000 wavenumbers each

Raman quantification of mRNA

Raman spectra of mRNA in aqueous solution are dominated by a small number of vibrational bands.

Diagnostic regions: - ~785 cm⁻¹ — uracil ring breathing - ~1080 cm⁻¹ — phosphate-backbone symmetric PO₂⁻ stretch - ~1240 cm⁻¹ — uracil + cytosine in-plane bending - ~1485 cm⁻¹ — guanine + adenine ring stretches - ~1660 cm⁻¹ — amide I / carbonyl stretching of pyrimidines

Intensity at the phosphate band (~1080) tracks total nucleic acid concentration well over the 0.05–5 mg/mL range typical of IVT and LNP work. Combining several bands in a multivariate model (PLS) outperforms univariate calibration, especially when buffers contribute background bands.

How PLS regression works

Partial Least Squares regression solves the same problem as linear regression — predicting concentration from spectrum — but copes with the fact that adjacent wavenumbers are highly correlated.

The algorithm (NIPALS): 1. Find the direction in spectral space (a 'latent variable') that is most correlated with concentration 2. Project all spectra onto that direction to get scores 3. Subtract the explained variation from spectra and concentrations 4. Repeat for the next latent variable
Choosing latent variables: too few = under-fit, too many = over-fit. Leave-one-out cross-validation picks the LV count that minimizes the prediction error on samples the model hasn't seen — usually 2–6 LVs for Raman calibration.
Diagnostics to check: - R²cv (Q²): how well predictions match held-out truth. > 0.9 is good for routine assays - RMSECV: average prediction error in concentration units - T² and Q residuals: detect when an unknown spectrum doesn't look like the training set

Preprocessing rationale

Savitzky–Golay smoothing fits a low-order polynomial to a sliding window and replaces the centre point with the polynomial's value. Removes shot noise without flattening narrow Raman peaks. Window 11, polynomial 2 is a safe default for typical 1–4 cm⁻¹ resolution data.
AsLS baseline correction (Eilers & Boelens 2005) estimates a smooth baseline by penalized least squares with asymmetric weights — points above the current estimate are downweighted, so the baseline tracks the bottom envelope of the spectrum. λ = 10⁵, p = 0.01 works well for fluorescent biological matrices.
SNV (Standard Normal Variate) subtracts the per-spectrum mean and divides by the per-spectrum standard deviation. Removes multiplicative scatter effects from path-length and density variations between samples.
SG 1st derivative (optional) takes the local slope rather than the smoothed value. Strips residual broad backgrounds completely, at the cost of amplifying high-frequency noise.

Frequently Asked Questions