Secondary data are data that have been originally collected for purposes other than a specific research question. This includes data from sources like electronic medical records (EMRs), medical claims data, product or disease registries, and other digital health technologies. Secondary data are often used in pharmacoepidemiological research to evaluate the safety and effectiveness of medicines in real-world settings.
Primary data refers to information that is collected firsthand by researchers directly from original sources for specific research purposes. It involves data gathered through various methods such as experimental and observational trials, surveys, interviews, focus groups. Unlike secondary data, which is collected and compiled from existing sources, primary data is tailored to the specific needs of a research project, ensuring relevance and accuracy to the study's objectives.
In a hybrid study design it is possible to combine secondary and primary data to enhance the comprehensiveness, accuracy, and generalizability of study findings. This hybrid approach leverages the strengths of both data types; however, it is not exempt of limitations and operational challenges.
Benefits of Combining Different Type of Data
1. Enhanced Data:
- Primary Data Collection: Provides specific, tailored information relevant to the research question, ensuring high quality and relevance.
- Secondary Data Sources: Offer a larger volume of existing data, often encompassing a wider patient population and long-term outcomes, providing a broader context.
2. Improvement of the overall dataset:
- Combining data allows for cross-validation of findings, increasing the reliability and validity of the results. Primary data can fill in gaps or correct inaccuracies found in secondary data. Some registries might even offer the possibility to map the eCRF into the EMR system.
3. Cost and Time Efficiency:
- Using secondary data can significantly reduce the time and cost associated with data collection. This allows researchers to focus their resources on collecting additional primary data that are unavailable or incomplete in secondary sources.
4. Comprehensive Patient Profiles:
- Integrating data from multiple sources can create more complete patient profiles, capturing detailed clinical, demographic, and socioeconomic information.
Challenges and Considerations
1. Data Integration:
- Harmonization: Different type of data have varying formats, terminologies, and coding systems, requiring harmonization to ensure compatibility. See also my previous post on EMR to EDC conversion.
- Linkage: Accurately linking datasets at the patient level while maintaining confidentiality and adhering to data protection regulations is critical. This aspect is particularly challenging in rare disease research, where variables might not reach the lowest threshold count to be disclosed.
2. Quality Control:
- Ensuring the quality and consistency of data across sources is essential. This includes addressing issues such as missing data, data inaccuracies, and inconsistencies.
3. Ethical and Legal Considerations:
- Compliance with ethical guidelines and legal requirements, particularly concerning patient privacy and consent, is key. Researchers must navigate regulations such as General Data Protection Regulation (GDPR) when handling and integrating data.
4. Data Source Experience:
- Small disease registries might have been established by patient advocacy groups for different purposes and might lack the relevant internal resources to conduct complex studies
Conclusion
Combining secondary and primary data can be a powerful approach in healthcare research, enabling more comprehensive, reliable, and cost-effective studies. By carefully addressing integration challenges and adhering to ethical standards, researchers can leverage the strengths of both data types to generate robust and meaningful results.