Diagnostics Development and Validation

Seamless approaches for development and validation of diagnostic devices using machine learning can increase the chance of market authorization.

So you have conducted a clinical study and used the data to train a Machine Learning classifier, perhaps embedded in a device, to make predictions on future subjects for the presence/absence of a disease or a risk factor for a disease and you are planning a validation (confirmatory) clinical trial for registrational purposes?

In our 10+ years of experience in this area, we identify the major risks of failure of a validation trial to show the desired level of accuracy in predicting for new subjects as follows:

In order to mitigate such risks, we propose and have supported several sponsors with an adaptive approach to augmented training + validation. An example of such an approach is depicted in the figure below:

An example of an adaptive augmented-training + validation trial design

flowchart LR
  A["<b>Start Pilot-Validation</b>
            <b>(say with N=50)</b>
         - Check Acc.
         - Th. Opt."] --> B(["Th. Opt. &
                             Acc. Adequate?"])
  B -->|YES| C["- Freeze Algorithm and threshold
                    - Re-calc. validation sample size
                      based on final CV acc. from
                      augmented training"]
  B -->|NO| D["Add pilot to
                   training and
                   <b>re-train</b>"]
  C --> H["<b>Start Validation trial</b>
                 with pre-planned interim
                 looks for early sopping 
                 and/or sample size
                 re-estimation"]
  D --> E["Next Data Batch
                <b>(say with N=20)</b>
               - Check Acc.
               - Th. opt."]
  E -.-> F(["Th. Opt. &
                Acc. Adequate?"])
  F -->|YES| C
  F -.->|NO| G["Add last 
                   data batch
                   to training &
                   <b>re-train</b>"]
    subgraph AG["<b>Augmented Training</b>"]
       G -.-> E
       D
       E
       F
       G
       end   


The above adaptive strategy can also be generalized into a seamless training + validation trial while following a learning curve (accuracy vs. training set size) to decide when to start the validation phase.

Our statisticians and data scientists have experience both in setting up efficient machine learning workflows as well as in designing risk-mitigated validation studies. We can also support in regulatory approval of such adaptive study designs and in general for all statistical aspects related to the training and validation of machine learning classifiers.