EngineeringRemote / Glasgow (preferred), UKFull-time

Data Engineer – Pathology Data Infrastructure

Build the infrastructure that fuels our foundation model training by managing terabytes of high-resolution medical imagery.

About the Role

TileBio is building a foundation model of pathology. We treat histology as a language of tissue and train models directly on raw, unlabelled whole slide images (WSIs) to learn the underlying structure of biology. Our system converts tissue into a sequence of learned tokens and trains transformer models to understand disease at scale. Data is our lifeblood, and we are looking for a Data Engineer to build the refinery.

At TileBio, "Data Engineering" isn't just about moving rows in a database; it’s about architecting the flow of terabytes of high-resolution medical imagery. You will build and maintain the infrastructure that fuels our foundation model training. You will own the lifecycle of a Whole Slide Image from the moment it leaves a scanner or a partner’s server to the moment it becomes a versioned, reproducible training set for our AI team.

What You'll Do

  • Architect the ingestion and preprocessing of massive histopathology datasets (SVS, NDPI, iSyntax)
  • Build and refine automated, high-throughput pipelines for tiling, filtering, and metadata extraction
  • Implement rigorous dataset versioning so every training experiment is traceable and repeatable
  • Coordinate data movement between local high-performance storage, cluster environments (SLURM), and cloud buckets
  • Monitor and resolve quality issues, ensuring our models learn from the highest quality biological signals
  • Work closely with Deep Learning Engineers to optimise data loading and throughput for multi-GPU training

What We're Looking For

  • Strong Python skills with the ability to write clean, performant, and maintainable code
  • Hands-on experience managing large scientific, imaging, or unstructured datasets
  • Experience with storage architectures such as ZFS, object storage, or cloud-native solutions
  • Experience building and deploying automated data pipelines (e.g., Prefect, Dagster, Airflow, or custom-built)
  • Deep understanding of data versioning (e.g., DVC, LakeFS) and why it matters for ML

Nice to Have

  • Experience with WSI formats or medical imaging libraries (OpenSlide, CuCIM)
  • Experience with SLURM or other cluster-based scheduling systems
  • Familiarity with Azure or AWS environments for hybrid-cloud workflows
  • Strong systems thinking with the ability to design for "the next 100TB" today

What We Offer

  • Salary of £35,000 to £60,000 (depending on experience) and equity eligibility
  • Opportunity to build the core infrastructure that enables biological discovery at a global scale
  • A culture that optimises for impact, takes ownership, and delivers
  • An environment that challenges ideas, not people, and values speed, clarity, and rigour

To apply, email careers@tilebio.com with the subject "Data Engineer", your CV and a short note about why you're interested.