FDA Would Like to Join You in the Sandbox When Developing AI Enabled Devices


In the past three months, FDA has released two guidance documents related to artificial intelligence (AI) enabled medical devices: (1) a final guidance titled Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions (PCCP Guidance, which we blogged about here) was issued in December 2024; and (2) a draft guidance document titled Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations (Draft AI Guidance, which we blogged about here) was issued in February 2025.  Both guidance documents recommend data management practices for collecting data for use in developing, tuning, and testing an artificial intelligence model and making changes to said model.  Data management practices encompass data collection, processing, storage, annotation, control, and use, and are an important means of identifying and mitigating bias in AI models, thereby ensuring the integrity of the health data output by these models.

We were struck by the level of detail expected by FDA for processes related to data management, especially for data collected and used early in development to train an initial AI model, which may occur before a manufacturer decides to move forward with device development under design controls.

While there is a natural tendency to speak of research and development (R&D) as a single activity, in practice there is often a line between the initial research performed in the “sandbox” to establish technological feasibility and the development work needed to bring the technology through testing, manufacturing, and market entry.  The former may not follow a rigorous and controlled process, but if technology developed in the sandbox shows promise, it moves from research to development, where a formal design controls process is followed to establish requirements, specifications, processes for manufacturing and/or maintenance, and to conduct verification and validation testing.

For non-AI-enabled devices, the early feasibility research may not directly affect the development process, i.e., the final, finished device can be fully developed, transferred to a controlled manufacturing environment, and tested under a design controls process.  For software incorporating AI models, however, FDA notes that the performance and behavior of AI systems rely heavily on the quality, diversity, and quantity of data used to train and tune them, which means there is an FDA expectation that developers will have controls in place for data management even before they know if the technology will ever leave the sandbox.

Here, we will describe the recommendations in FDA’s guidance documents for collection and processing of data that will be used for training, tuning, and testing AI models and what to include in a marketing submission for AI-enabled software. Before training begins, a Data Collection Protocol (DCP) may be developed and is specifically recommended as a section within a modification protocol within a PCCP.

The DCP should describe how data will be collected, including the inclusion and exclusion criteria for data.  The inclusion criteria may include elements such as, but not limited to, the patient’s age, weight, height, race, ethnicity, sex, and disease severity, consistent with the intended patient population for the final product, which may not be known in the early days in the sandbox.  Although bias may be difficult to eliminate completely, FDA recommends that manufacturers, as a starting point, ensure that the test data sufficiently represents the intended use (target) population of a medical device.  FDA notes the use of data collected outside the U.S. (OUS) is another potential confounding factor to be considered in data collection.  OUS data may introduce bias if the OUS population “does not reflect the U.S. population due to differences in demographics, practice of medicine, or standard of care.”  The DCP may also define the sources of the data (e.g., inpatient hospital, out-patient clinic), date range for the data, and location of the data collection sites (e.g., different geographical locations), along with any acquisition conditions (e.g., data acquisition device).  The DCP should define if data will be collected prospectively or retrospectively, and whether data will be sequentially acquired or randomly sampled.  Some disease conditions may not be as prevalent and the DCP should describe any enrichment strategies to ensure subgroups are represented.  The DCP should follow applicable regulations governing human subject protections, where applicable. In the context of a PCCP, the DCP should also address when new data should be acquired and/or older data removed to ensure the datasets remain current with respect to acquisition technologies, clinical practices, changes in the patient population, and disease management.  A robust DCP can help ensure data used to train AI models are unbiased and representative, which will promote generalizability to the intended use population and avoids perpetuating biases or idiosyncrasies from the data itself.

The manufacturer should have defined processes in place to assess the quality of the data collected under the DCP, including processes to ensure data consistency, completeness, authenticity, transparency, and integrity.  If data are excluded because of data quality issues, the rationale and criteria for the exclusions should be documented in the DCP. This is important as FDA will expect the data used for training to be representative of the type of data that could be used in clinical practice with the final product. In addition, the manufacturer should define if the process for the checking data quality is a manual or automated process. The DCP should address if there are missing data elements (e.g., if an image was obtained but patient demographic information is not available) and when it is acceptable and/or when data quality issues warrant an investigation before proceeding.

If the data collected will be annotated (e.g., adding labels or tags to raw data), as is done in semi-supervised or supervised machine learning, the annotation process and credentials of the annotators should be documented.

Another important element of the DCP is defining what data will be used for training, tuning, and testing the AI model, the independence of the data (e.g., sampled from completely different clinical sites), and any data cleaning or processing performed on the training or tuning data.  The manufacturer should have processes in place that define how the datasets will be stored and who will have access to each dataset.  This should include controls to prevent unauthorized access and manipulation of the data.  The test data should be sequestered, not cleaned, and not used for the development of the AI Model with a process in place to prevent unauthorized access.

In order to evaluate the AI model output, manufacturers may need to establish a reference standard.  A reference standard is the “most suitable standard to define the true condition for each patient/case/record.”  A reference standard may be used during “training, tuning, testing or all three.”  When using a reference standard, the manufacturer should define how it will be determined and the uncertainty associated with that method.  For example, if clinical interpretation is the reference standard, the manufacturer should define the qualifications of the clinician performing the interpretation, number of clinicians, data provided, and how the results will be combined and/or adjudicated.

As data used in the research sandbox can impact the final AI-enabled medical device, developing robust data management practices in the early stages of AI model development are important to avoid problems and costly rework later in development.  Doing so will help ensure a more generalizable model and a more seamless transition from the research sandbox into design controls and, ultimately, a future market authorization.

We will be happy to hear your thoughts

Leave a reply

Som2ny Network
Logo
Compare items
  • Total (0)
Compare
0