Senior Biological Data Architect

About Rancho BioSciences, LLC

Rancho BioSciences is a fully remote, US–based provider of biomedical data curation and data science services for pharma and biotech, spanning drug discovery through translational research. Our teams of scientists, data engineers, and software experts deliver end-to-end solutions across data curation, management, mining, and analysis to help customers accelerate R&D. We partner long-term with blue-chip clients and emerging biotechs, bringing scientific rigor, quality, and a customer-first mindset to every engagement.

About the role

We are seeking a full-time contractor for a Senior Biological Data Architect to design, harmonize, and govern complex biomedical data models on behalf of our pharmaceutical, academic, and institutional clients. The successful candidate will be an expert problem solver with deep expertise in conceptual, logical, and canonical data modeling for biomedical and scientific domains, including disease biology, genetics, translational research, and drug development. You will play a central role in client initiatives that deliver FAIR-aligned data products enabling rapid query and decision-making by R&D scientists.
We are a Data Curation company collaborating with some of the most renowned pharmaceutical organizations in the world. Our team of scientists, curators, computational biologists, data scientists, knowledge engineers, and solution developers is distributed across the country; we support talented people living where they choose, working collaboratively on projects that have real impact on human health.
While fully remote, candidates will be expected to spend the majority of time overlapping East Coast US or UK working hours.

What you'll do

Partner with scientific and technical stakeholders to elicit requirements and propose canonical data models that represent the full breadth of biomedical concepts relevant to target discovery, disease understanding, and translational research, along with the evidence and provenance that support them.
Design and lead source-to-canonical harmonization activities, covering vocabulary alignment, persistent identifier assignment, and lineage and provenance capture.
Define schemas, controlled vocabularies, identifier strategies, and ontology bindings in collaboration with knowledge engineering, curation, data engineering, and platform teams.
Design models that power data pipelines, APIs, knowledge graphs, analytical workflows, and downstream R&D query use cases.
Establish validation rules and data quality checks covering ontology term validation, range and cardinality checks, required-field enforcement, ID and label consistency, cross-field consistency, and provenance completeness.
Manage the full schema lifecycle: repository management (e.g., GitHub-based), semantic versioning, changelogs, tagged releases, data dictionaries, metadata catalogs, and downstream impact assessments.
Drive schema review, approval, and publication processes; identify modeling risks early, such as metadata gaps, ontology conflicts, source data quality issues, lineage gaps, and compatibility risks.
Lead modeling strategy spanning harmonization, pipeline validation, knowledge graphs, and FAIR data product delivery.
Translate ambiguous scientific requirements into clear, durable canonical models and make defensible, documented decisions on ontology reuse, extension, and mapping.
Design modular, reusable, future-proof models aligned with FAIR and enterprise standards, with consistent persistent identifier and provenance conventions across data assets.
Communicate strategies, trade-offs, and progress clearly to clients and internal teams.

Qualifications

Required:

PhD in Life Sciences (or equivalent demonstrated expertise) with first-hand experience working with biomedical or research data.
Strong conceptual, logical, and canonical data modeling experience for complex scientific or biomedical domains.
Hands-on experience with LinkML or equivalent schema modeling frameworks, comfortable defining classes, slots, ranges, identifiers, required fields, constraints, cardinality, descriptions, and ontology bindings.
Working knowledge of YAML-based schema authoring.
Solid grasp of FAIR principles (findability, accessibility, interoperability, reusability), including persistent identifiers, metadata standards, provenance, and schema versioning.
Experience with biomedical ontologies and controlled vocabularies, including familiarity with public ontology resources covering genes, diseases, phenotypes, anatomy, cell types, assays, units, and evidence.
Familiarity with semantic web technologies such as RDF, OWL, JSON-LD, SHACL, ShEx, and SPARQL, and with knowledge graph modeling.
Proven experience designing Entity Relationship Diagrams and Conceptual and Logical Data Models.
Experience with schema and model registries, data catalogs, metadata registries, and data dictionary management.
Proficiency in Python, R, or SQL for model conformance testing, ontology mapping, or data quality validation (notebook-based workflows a plus).
Experience with SDLC methodologies, unit and integration testing, and documentation practices.
AI awareness: comfortable evaluating how and AI-driven curation and mapping tools can accelerate modeling, harmonization, and validation workflows.

Nice to Have:

Experience working with modern cloud data platforms and data lake environments such as Snowflake or Databricks.
Hands-on use of AI-powered coding assistants and established collaboration workflows that incorporate them into day-to-day modeling, documentation, or validation work.

The pay range for this role is:

70 - 90 USD per hour (United States)

Data Analytics

Remote (United States)

Share on: