Mô tả công việc
About Sun Group and DEC
Sun Group is behind Vietnam’s most iconic destinations, from Ba Na Hills to JW Marriott Phu Quoc, InterContinental Danang, and Sun World entertainment complexes. Beyond destinations, the Group has built a diverse ecosystem in Luxury Real Estate, Smart Infrastructure, Culture & Arts, and Aviation, shaping modern lifestyles across Vietnam.
At the core of this vision, the Digital Excellence Center (DEC) is Sun Group’s strategic hub for technology and innovation. DEC drives digital initiatives across tourism, hospitality, real estate, aviation, and entertainment; bridging Sun Group’s heritage of excellence with the future of digital-first experiences.
More than a transformation engine, DEC advances Vietnam’s digital journey through key initiatives in data and AI. With a culture of agility and innovation, DEC empowers people to push boundaries, achieve breakthroughs, and create lasting value for both the Group and society.
Job Summary
As an MLOps Engineer for the Platform teams, you will build and run the ML platform that turns Sun Group’s AI ideas into production systems which are reliable, scalable, secure, and cost-efficient. Your work powers AI-first E-Commerce and Digital Experiences across Sun Group’s ecosystem.
You will orchestrate training and inference at scale on cloud and on-prem Kubernetes (CPU/GPU/edge), enabling recommendation, personalization, pricing, fraud/risk, LLM/RAG, and computer vision workloads. From CI/CT/CD and model registry to feature stores and low-latency serving, you make models observable, governable, and ready for millions of users and transactions.
You will partner closely with Backend, Data/AI, and Security teams to define SLAs/SLOs, harden model rollouts, ensure privacy and compliance, and streamline the path from experiment to business impact.
Responsibilities
- Build and operate AI/ML platforms on Kubernetes (cloud/on-prem; CPU/GPU) using Terraform/Ansible, Helm/Kustomize, secure secrets, and automated cluster lifecycle.
- Implement CI/CT/CD pipelines for ML with Airflow/Kubeflow/Argo; manage experiment tracking and model registry.
- Serve models at scale with KServe/Seldon/Triton; run batch/offline jobs; optimize latency and cost.
- Monitor data/model health and performance: metrics/logs/traces, drift/freshness dashboards, KPIs, SLOs and lead incident response.
- Apply security and compliance by default.
- Contribute to runbooks/playbooks, Architecture Decision Records (ADR), and continuously evolve platform best practices
Qualifications
- Bachelor’s degree in Computer Science, IT, Electronics/Telecom, or related fields.
- 4+ years experience in MLOps/DevOps/SRE or Data Platform; at least 2+ years operating ML workloads in production.
- Hands-on with Airflow, Kubeflow, MLflow and model registry practices.
- Experience serving models with KServe/Seldon/Triton; GPU scheduling and autoscaling.
- Proficient in Python and Shell; containerizing and troubleshooting GPU workloads; basics in Go/Node is a plus.
- Proficiency in IaC (Terraform), config management (Ansible), containers (Docker), Helm/Kustomize and CI/CD tools like AzureDevops/GitHub Actions/Jenkins/GitLab CI.
- Strong knowledge of Linux, networking (DNS, TLS, HTTP, TCP, load balancing), and platform security (RBAC, secrets, image scanning).
- Hands-on with observability stacks (Prometheus/Grafana/Datadog, ELK/OpenSearch, OpenTelemetry).
- Scripting/automation with Shell and one of Python/Go.
- Familiarity with message queues/streams (Kafka/RabbitMQ) and managed data services (PostgreSQL/MySQL/Redis) from an ops perspective.
- Experience operating 24/7 services, on-call rotations, and incident management.
- Nice-to-have: service mesh (Istio/Linkerd), API Gateway, cloud certifications.
Soft Skills
- Ownership mindset focused on reliability, security, and speed of delivery.
- Clear communicator, collaborative with cross-functional teams under time pressure.
- Systematic problem solver; data-driven, automation-first, comfortable with ambiguity; proactive in raising quality bars.
WHY YOU‘LL LOVE WORKING HERE
- “An cư lạc nghiệp” policy: eligible employees receive full-value commercial apartments for long-term tenure.
- Premium SunCare Insurance for employees and their families (fully covered for employees).
- Annual comprehensive health check-up at Sun Group International Hospital.
- Complimentary access to theme parks, cable cars, cultural/symphonic events, plus exclusive discounts at Hotels & Resorts.
- Special offers across aviation, NCB banking services, and real estate products.
- A dynamic, innovation-driven working environment where talent and contributions are recognized and rewarded.