▸ Print version ▸ Deck ▸ Index
BM · CV · v2026.04
AUCKLAND · NZ // OPEN TO ROLES
Subject — Bob Ma
Dr Bob Ma · 马博

Perception,
on the edge.

Machine Learning Engineer · Computer Vision · Embedded AI ·
Vision–Language–Action — shipped on real devices.
Email
Phone
+64 21 110 8064
Based
Auckland, New Zealand
Status
Permanent Resident
§ 01 — SUMMARY

Who I am.

Senior Vision–Language–Action and Multimodal AI engineer with deep experience in real-time perception, multimodal reasoning, and embodied system deployment on edge devices. Proven track record designing VLA pipelines, optimising VLMs / multimodal LLMs for on-device inference, and integrating perception outputs with downstream action and control systems — from Ambarella SoCs to large-scale vehicle fleets.

§ 02 — EXPERIENCE

Where I've shipped.

NOW Mar 2024 —
Present ~2 yrs

Principal Machine Learning Engineer

Resideo Inc. (Honeywell Home) · NYSE:REZI
▸ Vision understanding & multimodal LLM inference on edge SoCs
DETAILS
  • Designed and optimised multimodal large-model inference (LLaMA3, GPT, YOLO, CLIP) on edge SoCs (Oclea) for real-time video understanding and mobile deployment.
  • Rewrote and customised backbone operators to support Clavary (.cvimodel) conversion and hardware-accelerated inference on Ambarella CV25 / CV72.
  • Implemented model compression, quantisation and knowledge distillation — large models hitting SOTA inference under strict compute and memory constraints.
  • Built a general image/video understanding framework and contributed to the Oclea SDK, integrating video pipelines with multimodal LLM capabilities.
  • End-to-end ML: train → ONNX → Clavary → on-chip → real-time inference → post-processing.
  • Perception-to-action pipelines: real-time triggers for tracking, sensor adjustment and motor actuation from visual scene interpretation.
  • Integrated with RESTful APIs and edge-cloud pipelines to coordinate inference results with backend decision systems.
  • Yocto-based embedded builds; cross-compilation and chip-specific SDK integrations; CI/CD for embedded reliability.
Multimodal LLMs C++ Python Ambarella CV25/CV72 Oclea Yocto ONNX PyTorch Azure CI/CD
Mar 2022 —
Mar 2024 2 yrs

Senior Machine Learning Engineer

EROAD Ltd · ASX:ERD
▸ ADAS perception, lane-level navigation & multi-sensor vehicle intelligence
DETAILS
  • Real-time perception pipelines for commercial-vehicle dashcams — YOLO/SSD object detection, tracking, road segmentation.
  • Lane-level navigation combining lane detection with vSLAM — spatial estimation of lane orientation and lane-centre waypoint generation.
  • Camera-to-SLAM extrinsic calibration — lane detections projected into global spatial coordinates for navigation-aware perception.
  • Multi-modal data processing: video + GPS + vehicle telemetry for robust scene understanding and trajectory analytics.
  • Road-condition detection and collision-warning systems deployed on edge devices.
  • Scalable AWS data infrastructure for fleet-scale sensor/video datasets (SageMaker + containerised workflows).
  • Edge deployment via C++, PyTorch and Android environments; ETL pipelines for continuous training & eval.
vSLAM ADAS C++ Python OpenCV GPS Fusion PyTorch AWS SageMaker Docker
Jun 2021 —
Feb 2022 9 mo

Machine Learning Engineer

StayinFront Ltd
▸ Computer-vision framework for grocery / shelf analysis
DETAILS
  • Designed a CV framework for grocery and supermarket analysis — stock forecasting and price-tag recognition.
  • Backend services and ML models detecting price tags / SKU info from shelf & packaging images.
  • AWS S3/EC2 integration, DynamoDB warehouse workflows, Python services.
PythonKerasTensorFlow PySparkDynamoDBFlask AWS
Sep 2020 —
Apr 2021 8 mo

Machine Learning Engineer

BuildingEstimates.com
▸ ML for architectural plan detection
DETAILS
  • Building-information extraction & plan recognition — elevation / floor / roof plan analysis.
  • Framework for material analysis and quantity-survey information extraction.
  • Python + Flask on GCP for cloud image processing and plan management.
PythonPandasNumPy FlaskGCPKubernetes
Sep 2017 —
Sep 2019 2 yrs

Data Engineer

Woolworths Group · ASX:WOW · World 500
▸ Data warehouse, logistics platform, cross-border e-commerce payments
DETAILS
  • Enterprise-platform integration with SAP and third-party ERP systems, including payment interfaces.
  • Data virtualisation + user-facing interfaces in .NET; deployment records and reporting via SQL.
  • Supported secure data procedures, stock management and logistics/payment systems in Agile delivery.
C#PythonDocker OracleSAPMS SQL
Jun 2014 —
May 2017 3 yrs

Full Stack Developer

New Image International Ltd
▸ E-commerce platform & ERP system
DETAILS
  • Order-processing and customer-management workflows across e-commerce and ERP platforms.
  • Frontend / web features, UX design support, online payment-gateway integration.
  • Web, mobile, SEO/SEM and analytics across a full-stack role.
C#/.NETMS SQLHTML5 CSS3React NativeAlicloud
Dec 2010 —
Aug 2013 ~3 yrs

Software Developer

Huawei Symantec Technology Ltd · World 500
▸ Pattern recognition and detection systems
DETAILS
  • Industrial software: statistical process control, SOA, multithreaded Linux applications.
  • Radiation detection and electromagnetic-spectrum analytics — image & signal processing.
  • Code optimisation, version control, cross-team delivery across frontend and backend.
C/C++LinuxOpenCV GDBSVN
Oct 2007 —
Mar 2010 ~3 yrs

Software Developer

Topsec Network Security Co., Ltd. · 002212.SZ
▸ Intrusion-detection systems
DETAILS
  • Linux-based intrusion detection + network-attack monitoring; kernel-level packet analysis and filtering.
  • Drivers, network protocols, mutexes/semaphores, Linux debugging for system-level development.
  • DDoS detection & defence; QA-side deployment and debugging support.
C/C++LinuxGDB CMakeValgrind
§ 03 — TECHNICAL

Technical range.

VLA · Multimodal
  • Vision–Language–Action▰▰▰▰▰
  • VLM encoders▰▰▰▰▰
  • Multimodal LLM deploy▰▰▰▰▰
  • Embodied AI▰▰▰▰▱
  • Perception→action▰▰▰▰▰
Perception · Vision
  • YOLO / SSD▰▰▰▰▰
  • CLIP▰▰▰▰▱
  • DeepSORT▰▰▰▰▱
  • Visual SLAM▰▰▰▰▰
  • Sensor fusion▰▰▰▰▰
Edge · Systems
  • ONNX / TensorRT▰▰▰▰▰
  • Quant · Distill▰▰▰▰▰
  • Ambarella CV25/72▰▰▰▰▱
  • Oclea SDK▰▰▰▰▱
  • Yocto Linux▰▰▰▰▱
Languages · Cloud
  • Python10y
  • C / C++8y
  • C# / .NET3y
  • SQL
  • AWS · Azure · GCP▰▰▰▰▱
§ 04 — EDUCATION

Research lineage.

Jan 2019 — Feb 2023

Doctor of Philosophy (PhD)

Auckland University of Technology · New Zealand
Research: Vision-driven Semantic SLAM for Scene Understanding and Autonomous Systems — visual perception, semantic mapping and robust localisation in dynamic environments.
Sep 2010 — May 2014

Doctoral Research / Prior Research Study

Northwestern Polytechnical University · China
Topic: Learning-augmented robust control for complex dynamic systems.
Sep 2007 — Jul 2010

Master of Science

Sichuan Normal University · China — Computer Software & Theory
Sep 2003 — Jul 2007

Bachelor of Engineering

Sichuan Normal University · China — Data Science & Electronic Commerce
§ 05 — PUBLICATIONS

Selected research.

2026 · ARXIV
Bodhi VLM: Privacy-Alignment Modeling for Hierarchical Visual Representations in Vision Backbones and VLM Encoders
Ma, Wu, Yan · arXiv:2603.13728
2026 · ARXIV
CT-to-X-ray Distillation Under Tiny Paired Cohorts — Evidence-Bounded Reproducible Pilot
Ma, Wu, Yan, Wei · arXiv:2603.29167
2026 · ARXIV
PPEDCRF — Privacy-Preserving Enhanced Dynamic CRF for Sequence Videos
Ma et al. · arXiv:2603.01593
2026 · ARXIV
TSDCRF — Balancing Privacy and Multi-Object Tracking via Time-Series CRF
Ma, Wu, Yan · arXiv:2603.13667
2026 · ARXIV
REAEDP — Entropy-Calibrated Differentially Private Data Release
Ma, Wu, Yan · arXiv:2603.13709
2024 · MTAP
Privacy-preserving word-embedding text classification via DBN-constructed privacy boundary
Multimedia Tools & Applications, 83(10): 30181–30206
2024 · ICN
JudPriNet — Video transition detection via semantic relationship and Monte Carlo sampling
Intelligent and Converged Networks, 5(2): 134–146
2021 · GLOBECOM
PPDTSA — Privacy-preserving deep transformation self-attention framework for object detection
IEEE Global Communications Conference
§ 06 — PATENTS · SERVICE

Recognised contributions.

PAT · 01
Telecom telephone-fraud prevention via voice-based semantic content analysis · CNIPA
PAT · 02
Access safety detection & isolation for virtualised users · CNIPA
PAT · 03
Multichannel network intrusion detection & defence · CNIPA
SW · ×3
Three software copyrights granted by the Copyright Protection Center of China
REVIEW
Reviewer — AAAI · VLDB · IEEE TCSVT · TETCI · Systems Journal · Access