Cancer Dataset


Request the Cancer Patient Dataset

The de-identified dataset of UF Health patients is easily accessible for rapid research. Click below and complete the form to request access. Once requested, an IDR research team member will follow up.

The Dataset

To expedite clinical data delivery, UF Health’s data experts are introducing ready-for-use, UF Institutional Review Board-approved patient record registries to help faculty and staff advance medical knowledge and the delivery of care. The cancer patient dataset has just been released featuring details about more than 300,000 patients diagnosed with or suspected of having cancer at UF Health since Jan. 1, 2012. It is available for use by anyone within the UF and UF Health community.

The regularly updated and Institutional Review Board-approved patient data is readily accessible to researchers throughout UF and UF Health. Patients’ protected health information has been de-identified and the dataset can be delivered quickly, bypassing the need for study-specific approval.

The project is managed by faculty and staff in the UF Health Integrated Data Repository, or IDR, program, part of the Clinical and Translational Science Institute, and UF Health Information Technology Services. 

The dataset is available by request with a turnaround in 1-2 business days. It includes each patient’s health history, demographics, vitals, diagnoses, medications, laboratory results, hospital utilization and more.

Data image


The data are delivered in Observational Medical Outcomes Partnership, or OMOP, Common Data Model format. OMOP is a data standard used worldwide, making it easy for informaticists and data analysts on UF research teams to organize and analyze data to answer their unique research questions. The dataset is also delivered with a detailed data dictionary describing each data element available for analysis. The IDR team intends this initiative to pave the way for greater efficiency in the future through collaboration with a community of users and plans for future standardized IDR research datasets. They are committed to helping research teams become adept in using common data formats, like OMOP, to work together to rapidly answer translational research questions using real-world data.

“Real-world data”are the data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources. These can include electronic health records, claims and billing activities, product and disease registries, patient-generated data and data from other sources that can inform on health status. For more information, visit