Job description

The Company

Our client is a cutting-edge AI startup based in Toronto, focused on building next-generation embodied AI systems. Their mission is to create highly capable, reasoning-driven models that integrate vision, language, and action to power physical intelligence in real-world environments. They’re on a fast track to redefining what’s possible in cognitive robotics in the smart home solutions industry.

The Role

They’re looking for a Multimodal AI Research Engineer with deep expertise in large language models (LLMs), vision-language models (VLMs), and embodied AI. This is a critical in-office role (based in Toronto), where you’ll play a central part in developing the cognitive core of a physical AI agent. Ideal candidates will have a PhD in machine learning, computer science, or a related field, and must bring hands-on experience building, optimizing, and integrating LLMs and VLMs into real-world systems. A GitHub or equivalent portfolio showcasing work in these areas is required.

Requirements

PhD in Computer Science, Machine Learning, AI, or a related discipline.
GitHub (or equivalent) portfolio showcasing LLM/VLM work is required.
Extensive experience with LLMs and multimodal models (e.g., GPT, LLaMA/LLaVA, Gemini, Flamingo, CLIP, PaLM-E).
Demonstrated ability to build and fine-tune large-scale models from scratch (not just fine-tuning).
Strong coding skills in Python and C++.
Proficiency with PyTorch, TensorFlow, JAX, and Hugging Face Transformers.
Experience in dataset creation, including annotation pipelines and synthetic data generation.
Solid understanding of attention mechanisms, tokenization, and architecture scaling.
Familiarity with vision, action, or robotics integrations is a major plus.
Bonus: Experience with LangChain, LlamaIndex, RAG pipelines, vector databases (e.g., FAISS), and embodied AI systems.

Responsibilities

Design and develop advanced LLMs, VLMs, and VLA models for real-world integration.
Optimize transformer architectures for throughput, efficiency, and deployment on edge devices.
Architect fine-tuning strategies using LoRA and adapter-based approaches.
Build structured multimodal datasets and oversee data curation.
Compress and optimize models using quantization, pruning, and distillation for constrained environments.
Integrate AI models with perception and control stacks for embodied use cases.
Evaluate model performance in physical and simulated environments; lead ablation studies.
Collaborate closely with teams across robotics, perception, and embedded systems.
Clearly communicate research findings to technical and non-technical stakeholders.

Why Apply?

This is a fantastic opportunity to join a highly respected design agency in a key role that will allow you to drive business growth while working with a creative, collaborative team. If you’re passionate about business development in the design space, this role is for you!

To automatically receive notifications on new roles and market updates, follow our LinkedIn page: https://lnkd.in/gjWAJjt9

#welcometowhitebay

#dylanhorgan

Multimodal AI Research Engineer (PhD Required) | LLMs, VLMs & Robotics Integration | In-Office (Toronto) | $160K–$185K

Multimodal AI Research Engineer (PhD Required) | LLMs, VLMs & Robotics Integration | In-Office (Toronto) | $160K–$185K

Job description