<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=489233&amp;fmt=gif">

Machine Learning for Medical Devices: Best Practices Guideline


Quality Assurance (QA) is an essential process in Life Sciences organizations, regardless of their size. It involves ensuring that products, services, and processes meet specific standards, and regulations, and meet the needs and expectations of customers and patients. 

Machine learning for medical devices needs unique considerations due to their complexity as well as the iterative and data-driven nature of their development. Therefore, The US Food and Drug Administration (FDA), Health Canada, and the UK´s Medicines and Healthcare products Regulatory Agency (MHRA) joined forces together for the Development of Good Machine Learning Practice (GMLP). The GMLP has set out ten guiding principles to promote safe, effective, and high-quality medical devices that use artificial intelligence and machine learning (AI/ML). This article discusses the ten guiding principles and the regulatory thought process behind them.


10 Guiding Principles for Machine Learning in Life Sciences

The International Medical Device Regulators Forum (IMDRF), international standards organizations and other collaborative bodies may work together in the future to advance GMLP. For this reason, let's go through each guiding principle to understand the areas that are of concern to the US, Canada, and UK regulatory agencies.




10 Guiding Principles for Machine Learning in Life Sciences | Scilife


1. Leverage Multi-Disciplinary Expertise throughout Product Life Cycle

According to GMLP, multi-disciplinary expertise is crucial as the end user is likely to be a patient. For example, the AI/ML engineer, cloud engineer, and other engineering experts involved in the process of medical device development may be new to various clinical concepts. Therefore they should leverage the expertise of life science researchers to understand the clinical benefits of the medical device and potential risks to patient safety. Similarly, life science researchers may be new to the concepts used in machine learning algorithms. So, in turn, they should leverage the expertise of the AI/ML engineer, cloud engineer, and other engineers to maximize the benefit to the patient's health and security.



2. Implement Good Software and Engineering Practice

For healthcare and life sciences applications, the ML environment's data integrity, security, and privacy are paramount. It is strongly recommended that you protect your environment against unauthorized access, privilege escalation, and data exfiltration. This can be addressed by communicating with your cloud platform vendors and understanding their pricing plans and architecture to select the best plan that meets your security and authentication layers requirements.

In the context of data security and integrity, the GMLP expects medical device companies to implement good software engineering practices, data quality assurance, data management, and robust cybersecurity practices. These practices include methodical risk management and design processes that can appropriately capture and communicate design, implementation, and risk management decisions and rationale, as well as ensure data authenticity and integrity. These practices may be binding requirements owing to legislations such as 21 CFR Part 11, EU Annex 11, GDPR, etc., or non-binding requirements owing to guidelines such as GMLP, GAMP 5 second edition, ISO/TR 24291:2021, etc. In either case, implementation of these practices will protect the best interests of the manufacturer as well as the user.


3. Use Clinical Study Participants and Data Sets as Representatives of the Intended Patient Population

The clinical study outcome for populations belonging to different races, ethnicity, age, and gender varies due to genetic variation. It is important to use data sets with sample variability and sample size that are truly representative of the end user patient population. It should also be ensured that the sample variability and sample size is also adequately and separately addressed for the train data set and test data set to generalize the results for the population of interest. Applying sound statistical principles may be helpful in creating data sets that are representative of the intended patient population.


4. Ensure Training Data Sets Are Independent of Test Sets

For an unbiased assessment of ML model performance, it is critical to ensure that train data and test data are independent of each other. To overcome this challenge, be sure to eliminate the potential sources of dependence such as the patient, the method of data acquisition, or the site of data acquisition.


5. Base Selected Data Reference Data Sets on Available Methods

A correlation does not imply causation all the time. Therefore, the possibility of a good correlation between clinically irrelevant data to a certain clinical output can not be ruled out. To avoid the aforementioned possibility, ensure that the reference dataset contains well-characterized and clinically relevant data for the medical device end-use.

Additionally, it is also important to understand the limitation of the reference dataset. For example, if the reference dataset contains clinical data from a specific age group, then the limitation on extrapolating the clinical output for different age groups must be comprehensively understood. The GMLP guideline suggests promoting and demonstrating model robustness and generalizability across the intended patient population in this context.


6. Tailor Model Design to Available Data and Ensure It Reflects the Intended Use of the Device

Ensure that the chosen model design is fit for analysis of the available data. At the same time, it also actively mitigates known risks associated with overfitting, performance degradation, and security. Additionally, you should ensure that the clinical benefits and risks related to the product are well understood and the model can be used to derive clinically meaningful performance testing. The model performance should demonstrate that the product can safely and effectively achieve its intended use.

The model design should be robust to include the impact of both global and local performance and uncertainty/variability in the device inputs, outputs, intended patient populations, and clinical use conditions.


7. Focus on the Performance of the Human-AI Team

If a human is looped in the process of interpreting the model output, then you should also give consideration to possible variability in human interpretation. The model performance should be assessed collectively for the human and AI as a team rather than assessing in isolation only for the AI model.


8. Use Testing to Demonstrate Device Performance During Clinically Relevant Conditions

The test plans should be developed and executed on the basis of sound statistical concepts. The testing should assess the model's performance in terms of clinical relevance. The testing should be independent of the training data. The test performance should be evaluated for intended variability in measurement inputs such as patient population, important subgroups, clinical environment, etc., and potential confounding factors such as use by the Human-AI as a team.


9. Provide Users with Clear, Essential Information

Clear and contextually relevant user information should be provided to the intended user (such as health care providers or patients). The information may include the following details:

    • The product's intended use and indications for use
    • Performance of the model for appropriate subgroups
    • Characteristics of the data used to train and test the model
    • Acceptable inputs
    •  Known limitations
    • User interface interpretation
    • Clinical workflow integration of the model

In addition to the above aspects, it is also important to make users aware of device modifications, model updates from real-world performance monitoring, the basis for decision-making when available, and a means to communicate product concerns to the developer.


10. Monitor Deployed Models for Performance and Manage Re-training Risks

The safety and performance of the models should be improved periodically or continually by monitoring real-world use. Additionally, when models are re-trained after deployment, you should ensure appropriate controls are in place to manage risks of overfitting, unintended bias, or degradation of the model (for example, dataset drift) that may impact the safety and performance of the model as it is used collectively by the Human-AI team.


Final thoughts

As stated in GMLP, the best practice guiding principles equip medical device manufacturers to anticipate risks associated with the safety and effectiveness of the model in advance and prepare a contingency plan to avoid them proactively. The guiding principles will also serve as mistake-proofing tools for the development and deployment of machine-learning models for medical devices. GMLP also expects the manufacturers to go beyond these ten guiding principles and explore proven practices in other sectors, tailor those practices to make them fit for use in the medical technology and healthcare sector, and finally, if required, create new practices specific for medical technology and the healthcare sector.


Discover how a Smart Quality Platform tailored for Medical Devices can guarantee your compliance with current and future regulations 


Pharma 4.0 is a framework for incorporating digital strategies into pharmaceutical manufacturing contexts. This framework envisions a manufacturing paradigm that allows manufacturers to change and iterate, connect resources and workers, sim...

Subscribe to the

Scilife Blog

Life Science and Quality resources and news. All directly to your inbox!

Scilife Skyrocket microscope | Scilife