Role Description
The AI Computer Vision Lead will be responsible for developing, upgrading and owning the entire AI Vision Stack of Tara - our AI-powered cooking companion. This includes designing model architectures for different stages of the cooking process, ensuring scalability, accuracy and real-world performance.
The role demands a deep understanding of computer vision, particularly in applying it to dynamic, real-world, edge-based environments. You’ll define model architectures, guide data annotation processes, and set the foundation for scalable datasets and pipelines. You’ll also establish annotation standards and feedback loops to ensure high-quality training data.
In short, you’ll be tackling one of the hardest problems in frontier AI bringing human-like visual understanding to the kitchen and redefining how people cook.
What You’ll Do?
“Long story short - you’ll own everything related to the core AI Vision Stack.”
- Design and develop scalable model architectures for vision tasks in real-world cooking environments.
- Build and experiment across architectures - from CNNs to Transformers and hybrid approaches.
- Develop multi-head architectures optimized for both edge and cloud inference.
- Define compute and hardware requirements for efficient real-time performance on edge devices.
- Continuously optimize models for accuracy, latency and scalability.
- Rapidly prototype and iterate to identify best-performing model structures for specific cooking tasks.
- Implement sub-task-based training for higher modularity and accuracy in complex vision pipelines.
- Lead work on temporal action prediction to understand and anticipate cooking actions over time.
- Define best practices for annotation, data quality and dataset scalability.
- Collaborate closely with product and hardware teams to ensure seamless edge integration.