Data ML Engineer
/
Engineering
San Francisco, US
We’re building an intelligent matching engine that connects time-sensitive opportunities with relevant supply in real time. You’ll design the data systems and algorithms that power this matching layer - transforming unstructured input streams into structured, actionable insights. This is a greenfield project where you’ll take ownership of everything from data ingestion to model training and deployment, helping shape the foundation of a core system that drives automated decision-making at scale.
What you'll do
Design and implement data pipelines that ingest and process large volumes of semi-structured or unstructured data
Develop NLP-based extraction logic to transform messy text into structured, queryable data
Build robust systems for data validation, cleaning, and deduplication
Engineer and evaluate features for ranking and matching algorithms (e.g. temporal, geospatial, categorical, and preference-based)
Develop and deploy ML models that score and rank potential matches using multi-criteria optimisation
Collaborate closely with product and backend engineers to integrate the model into production workflows
Who you are
3+ years of experience in Python for data engineering, ML, or backend development
Strong foundations in feature engineering, ranking models, and ML evaluation metrics
Skilled at working with unstructured or noisy data from multiple sources
Familiar with NLP techniques, regex parsing, or similar text-processing methods
Experienced in designing end-to-end pipelines - from ingestion to model deployment
Independent, resourceful, and comfortable building production-grade systems from scratch
Bonus: experience with real-time data pipelines, multi-criteria optimisation, or supply-demand matching problems.
What we offer
Competitive compensation
Ownership over technical decisions and architecture
Greenfield project - build the system your way with best practices
Clear success metrics and feedback loop on match quality
Opportunity to expand role if match quality proves valuable
Professional development support for ML/data tools and conferences