Data Curator

United States, California, South San Francisco

Rapid technological advances (e.g. next generation sequencing technology, electronic medical records) are resulting in a dramatic increase in the volume and diversity of data (e.g. CRF data, various samples, images, biomarker and omics data) that are being generated both internally and externally. These data are key to understanding disease subtypes, and drug response of patients within certain molecular subgroups, allowing us to develop new medicines and companion diagnostic tests more effectively and efficiently.


The Data Curator conducts acquisition and disposition activities related to biomedical data. Such data may be collected from various sources, in diverse formats, using different modalities. He/She supports the specification, acquisition, organization, checking and annotation of said data using specified standards, thus ensuring high quality and QC’d data are cataloged and stored in a defined location or system. Thereby data remains discoverable, usable, and is preserved over time.

Primary responsibilities:
  • Work with relevant internal staff to plan which data will be loaded for what purpose, and define the required quality control steps.
  • Direct interface with electronic data providers. E.g. involvement in RFP (request for proposal), input into SOW (scope of work), defining DTA (data transfer agreement) as well as defining FFS (file format specifications).
  • Manages acquisition of electronically loaded data.
  • Understand the context underlying each data type targeted for acquisition, in order to ensure that the data quality is fit for purpose.
  • Assess loaded data for quality and conformance, performs discrepancy management, reconciliation and takes appropriate action with data provider for query resolution.
  • Work with appropriate internal staff in the development and implementation of data standards and processes to ensure data quality.
  • Track and record data. Enable controlled access to pre-processed analysis data sets (e.g. biomarker data that have already been integrated with a subset of the clinical database annotations)
  • Ensure clear descriptions of the nature and provenance of source data; reproducibility of analyses; and a list of contacts.
  • Provide the organization with understanding of the decision-making process for data access, and suggested best practices for informing data providers of shared data use.