Results of Exercise 4 (Validation study)

Main error

This exercise was about evaluating a GLP validiation in practise. The results were not so good as I hoped, many of you used the wrong formula for the calculation of the standard deviation. There are two formulae for the standard deviation: One for the case where the sample is a representative subset of a larger group (S) and one for the situation where the sample is the entire data population (P). The difference between P and S is that the denominator of the formula is (N-1) in the S case and (N) in the P case. Therefore the P value is a little lower than the S value.

Our case – the samples we prepared for validation – are a subset of all samples in the future that are made and therefore a subset (this is my opinion). This is also the default case in Excel. So if you specified nothing I have to assume you used the (S) formula. The case where all samples are measured is rare – even if you measure e.g. all six samples of a treatment group this is a sample (of all treatment cases). I hope this explanation is correct and helps you..

Excel =STDEV() or STDEV.S()

Formula for the subsample standard deviation (

Excel =STDEV.P()

Formula for the population standard deviation (


In the below sections exemplary reports for the three cases are given. There is some interpretation space with regard to whether using ‘out-of-spec’ calibrators or not. My opinion is that for validation, one should consider all results and correct only obvious, operation based mistakes. Why? Because if deviations are excluded under validation, the method performance looks good, but it will be difficult to keep the performance in routine operation – and that’s a much bigger problem than “not-so-nice” validation data. As a rule of thumb, since the routine operation requirements are ±15%, it is comforting to have validation inaccuracies and precisions of less than ±10%.

The Excel sheets supplied below show how I would set up a sheet that calculates everything from the data on tab 1 automatically. Have a look, but don’t use them further because there are some tweaks due of the sample confusion in sheet 1 and 2!

Data set 1

Data set 1 was an example of a well performing method: Inaccuracies and precisions are well below 10%. But QC’s 2 and 3 had been confounded on all three days, therefore the mistake happened very likely when the samples were prepared (and mislabeled).


Does the performance match the guidance’s expectations ? Yes

Accuracy and precision are good. There is no trend in slope between the three days.

Data set 2

In data set 2 the method is good but an unreliable technician seems to be at work – in the first two series calibrators 33 and 100 had been confounded and in the third series 3.33 and 10. There is a trend in slopes – attention!


Does the performance match the guidance’s expectations ? Yes

Accuracy and precision are good. The trend in slope has t be followed up:

  • The internal standard might be instable or stock solutions concentrated through evaporation.
  • The lab procedures have to be improved, samples have to be labelled throughout the whole procedure (sample-extraction vial – autosampler vial) and measuring sequence verified.

Data set 3

Data set 3 is an example of simply bad method performance. There is a large overall scatter in the data.


Does the performance match the guidance’s expectations ? No

The method has a bad reproducibility, but accuracy is ok if multiple measurements are made. It could be used maybe for measurements where 1:10 or bigger changes are expected and sufficient replicates are measured, but not for GxP boanalysis. The following points could be evaluated:

  • better – more similar – internal standard? How is the performance in external standard?
  • sample homogeneity
  • extraction recovery constant?

Leave a comment