Diagnostics Development and Validation

So you have conducted a clinical study and used the data to train a Machine Learning classifier, perhaps embedded in a device, to make predictions on future subjects for the presence/absence of a disease or a risk factor for a disease. You are now planning a validation (confirmatory) clinical trial for registrational purposes?

In our 10+ years of experience in this area, we have identified the major risks of failure of a validation trial to show the desired level of accuracy in predicting for new subjects. There are as follows:

Inadequate training: This is related to the training set sample size and the ratio of heterogeneity captured by the training data to the real-world heterogeneity. Training of machine learning models/algorithms for diagnostic purposes is often constrained by time and resources. Also there is no straightforward way to estimate the sample size without making assumptions that may be unrealistic such as assumptions on prevalence/incidence; model complexity; and bias and variance considerations. This poses challenges in ensuring that adequate training has been achieved before starting the validation trial. Cross-validation often leads to over-estimated accuracy measures which might be compounded by possible overfitting to the training data.
Population Heterogeneity: Often, training data are collected from a small number of local sites while the validation trial needs to be conducted over different geographical locations for establishing generalizability and for registrational purposes. This increases the risk of failure in the validation trial - for example, if the feature distributions in the validation set show much larger heterogeneity than in the training set.
Inadequate Testing before Validation: Constraints such as time, resources and availability of a holdout set may lead to inadequate testing for robustness, generalizability and reproducibility. For probability or score classifiers this may also lead to sub-optimal threshold selection which may lead to poor accuracy for the validation data.

In order to mitigate such risks, we propose and have supported several sponsors with an adaptive approach to augmented training + validation. An example of such an approach is depicted in the figure below:

An example of an adaptive augmented-training + validation trial design

flowchart LR
  A["Start Pilot-Validation<br/>(say with N=50)<br/>- Check Acc.<br/>- Th. Opt."] --> B{"Th. Opt. &<br/>Acc. Adequate?"}
  B -->|YES| C["Freeze Algorithm and threshold<br/>Re-calc. validation sample size<br/>based on final CV acc. from<br/>augmented training"]
  B -->|NO| D["Add pilot to<br/>training and<br/>re-train"]
  C --> H["Start Validation trial<br/>with pre-planned interim<br/>looks for early stopping<br/>and/or sample size<br/>re-estimation"]
  D --> E["Next Data Batch<br/>(say with N=20)<br/>- Check Acc.<br/>- Th. opt."]
  E -.-> F{"Th. Opt. &<br/>Acc. Adequate?"}
  F -->|YES| C
  F -.->|NO| G["Add last<br/>data batch<br/>to training &<br/>re-train"]
  subgraph AG["Augmented Training"]
    G -.-> E
    D
    E
    F
    G
  end

The above adaptive strategy can also be generalized into a seamless training + validation trial while following a learning curve (accuracy vs. training set size) to decide when to start the validation phase.

Our statisticians and data scientists have experience both in setting up efficient machine learning workflows as well as in designing risk-mitigated validation studies. We can also support in regulatory approval of such adaptive study designs and in general for all statistical aspects related to the training and validation of machine learning classifiers.

Diagnostics Development and Validation

🍪 We use cookies

Cookie Settings

Essential Cookies

Analytics Cookies

Marketing Cookies