Question 1

What file formats are supported?

Accepted Answer

Galactic / Thermo SPC binary (.spc), JCAMP-DX text (.jdx, .dx, .jcm, .jcamp), and any two-column CSV / TSV / whitespace-delimited text where the first column is Raman shift (cm⁻¹) and the second is intensity. Spectra with mismatched wavenumber axes are linearly interpolated onto a common grid.

Question 2

How many calibration spectra do I need?

Accepted Answer

At least 4, but realistically 8–20 standards spanning the concentration range you expect to measure. The tool uses leave-one-out cross-validation, which becomes meaningful around n = 8 and statistically informative around n = 12. More replicates per concentration tighten the prediction interval.

Question 3

Why PLS instead of a deep learning model?

Accepted Answer

For Raman quantification on a few dozen standards, PLS is the reference method in chemometrics. It handles the highly correlated wavenumber columns natively, gives interpretable diagnostics (R²cv, RMSECV, latent-variable loadings), and trains in milliseconds. CNNs help when you have hundreds of training spectra and care about non-linear lineshape variation, which is rare in routine assay calibration.

Question 4

What does the "outlier" flag mean?

Accepted Answer

For each unknown, we compute Hotelling's T² (how far its PLS scores are from the calibration mean) and the Q-residual (how much of the spectrum is left unexplained by the model). If either statistic exceeds the 95% confidence limit derived from the calibration set, the prediction is flagged. Common causes are samples outside the calibration range, a different sample matrix, or instrument drift since calibration.

Question 5

How is the 95% prediction interval calculated?

Accepted Answer

PI = t·s·√(1 + 1/n + T²/(n−1)), where s is the standard deviation of the leave-one-out CV residuals, n is the number of calibration samples, and t is the two-tailed t critical value at α = 0.025 with n−1 degrees of freedom. The interval widens for samples whose PLS scores are far from the calibration centroid.

Question 6

Are my spectra uploaded anywhere?

Accepted Answer

No. The entire pipeline (parsing, preprocessing, PLS fit, prediction) runs in your browser tab. Files never leave your machine, and no data is logged. Closing the tab discards everything.

Question 7

Why Savitzky–Golay smoothing first, before baseline?

Accepted Answer

AsLS baseline correction estimates a smooth pseudo-baseline by penalized least squares; high-frequency noise can leak into the baseline estimate and shift it locally. Smoothing first stabilizes the baseline fit. SNV is applied last (before the optional 1st derivative) so that scaling reflects the corrected signal, not the raw drift.

Question 8

When should I turn on the SG 1st derivative?

Accepted Answer

Use the derivative when residual baseline drift survives AsLS — for example with strongly fluorescent samples — or when broad sloping backgrounds dominate sharp peaks. Derivatives sharpen features but also amplify noise, so start with it off and only enable if your RMSECV improves.

mRNA Raman Calibrator

Calibration spectra

Build model

Predict unknowns

Raman quantification of mRNA

How PLS regression works

Preprocessing rationale

Frequently Asked Questions

Next Steps

Beer-Lambert calculator

Bland-Altman analyzer

Buffer calculator