Open to Relocation • Based in London, United Kingdom

Manoj Panduraj

DevOps Engineer | Site Reliability Engineer | Cloud Engineer | Cloud Architect

Kubernetes • Terraform • AWS • GCP • CI/CD • Observability

Building reliable cloud platforms, scalable Kubernetes systems, and production-ready automation.

Manoj Panduraj

Professional Snapshot

GCP-focused DevOps/SRE engineer with 3 years of experience supporting production cloud platforms and Kubernetes workloads. Strong in IaC, observability, cloud migration, and security-by-design.

3+
Years' Experience
30%
Deployment Speed Improvement
25%
Incident Response Time (MTTR) Reduction
20%
Manual Effort Optimization

GKE / EKS

Production Kubernetes Management

AWS → GCP

Enterprise Migration Support

Terraform

Secure Infrastructure as Code

Splunk + NR

Observability & Incident Response

Professional Summary

I am a GCP-focused DevOps and SRE engineer with 3+ years of experience architecting, automating, and supporting production-grade cloud platforms across enterprise environments. My passion lies in building resilient, scalable, and secure infrastructure that empowers development teams and delivers seamless user experiences.

I have strong hands-on expertise in Kubernetes (GKE/EKS), Terraform (Infrastructure as Code), CI/CD automation, cloud networking, and production observability. I have contributed to large-scale AWS-to-GCP migration initiatives, ensuring smooth cutovers, operational stability, and high platform availability throughout critical transitions.

Beyond infrastructure engineering, I specialize in monitoring, logging, and reliability engineering using tools such as Splunk, New Relic, Prometheus, and Grafana. I focus heavily on proactive incident management, root cause analysis (RCA), performance optimisation, and reducing operational toil through automation.

My technical background also includes cloud security and governance practices such as IAM, RBAC, secrets management, and infrastructure hardening across cloud-native environments. I enjoy working on high-impact systems where reliability, scalability, and operational excellence are critical.

Work History

Customer Service Executive

John Lewis & Partners · Part-time

Jan 2026 - Present London, UK
  • Assisted customers with product queries, in-store navigation, and service requests in a high-volume retail environment.
  • Upheld Partnership values through consistent service quality and team coordination across daily operations.

Platform Engineer (Contract)

Aramark UK (Client: Chessington World of Adventures & Merlin Entertainments)

Apr 2025 - Dec 2025 London, UK
  • Managing cloud infrastructure and Kubernetes environments (GKE/EKS) for large-scale leisure resort operations.
  • Designing and optimising CI/CD pipelines and infrastructure automation using Terraform and GitHub Actions.
  • Supporting high availability, reliability, and deployment efficiency across mission-critical production systems.
  • Collaborating across operations and technical teams to improve monitoring, automation, and operational workflows.

DevOps Engineer

Applied Cloud Computing (Client: Zee Entertainment Enterprises Ltd)

Dec 2023 - Jan 2025 Bengaluru, India
  • Drove AWS to GCP migration for enterprise workloads, ensuring seamless cutovers and stable post-migration operations.
  • Orchestrated production Kubernetes clusters (GKE/EKS) for high-availability workloads, managing health checks, troubleshooting, and upgrade cycles.
  • Engineered automated CI/CD pipelines (GitHub Actions/Jenkins), boosting deployment efficiency by 30%.
  • Developed Terraform-based IaC modules for secure, repeatable provisioning across hybrid cloud environments.
  • Enhanced observability using Splunk and New Relic, optimizing alerts and dashboards to reduce MTTR by 25%.

Site Reliability Engineer

Randstad India Private Limited (Client: Zee Entertainment Enterprises Ltd)

May 2023 - Dec 2023 Bengaluru, India
  • Maintained production GCP services and Kubernetes workloads, enhancing platform stability and release frequency.
  • Automated repetitive operational tasks, achieving a 20% reduction in manual toil.
  • Streamlined deployment workflows using Git-driven CI/CD best practices.
  • Refined monitoring and alerting via Splunk/New Relic for proactive incident detection.
  • Led incident investigations and RCAs, optimizing workloads for maximum efficiency and reliability.

Subject Matter Expert (Freelance)

Chegg India Private Limited

Apr 2022 - May 2023 Remote
  • Provided computer science subject matter support to global learners, delivering clear and accurate solutions across core CS topics.
  • Maintained strong attention to detail and academic standards to support Chegg’s global education mission.
  • Ensured high-quality, timely delivery of solutions while managing workload independently in a freelance environment.
  • Maintained a flexible work schedule as a freelance contractor, accommodating student demands and deadlines.

Projects

Live Production Systems

InfraWatch - SRE Monitoring Platform (Grafana Based)

Production Observability • GitOps • Incident Response

Live

Platform Overview

Production-style observability platform built with Prometheus, Grafana, Loki, Promtail, Alertmanager, Slack, Docker, and GitHub Actions for real-time monitoring, centralized logging, alerting, GitOps deployment, and incident response automation across cloud and local systems.

Demo Access

Username: demo_user

Password: demo_user

Use these demo credentials to explore the live monitoring platform.

Prometheus Grafana Loki Alertmanager GitHub Actions

3D Portfolio Website

Cloud Visualization • Interactive UI • Frontend Engineering

Live

An immersive digital environment designed to bridge the gap between complex DevOps concepts and high-performance frontend engineering. Features a custom 3D engine and a reactive UI shell, hosted on Google Cloud Platform.

60 FPS

Performance

Optimized RequestAnimationFrame loop & hardware-accelerated GSAP animations.

WebGL

Graphics

Direct GPU-accelerated 3D rendering for complex particle systems and effects.

< 2s

Load Time

Asset compression, lazy loading, and low-latency GCP hosting.

100%

Responsive

Mobile-first CSS and dynamic 3D camera resize listeners.

Three.js WebGL Tailwind CSS GSAP JavaScript

Shift-Ops - Payroll Tracker

Full-Stack • React + FastAPI • GCP Deployment

Live

Personal shift-tracking web app for two Aramark employees to log hours, calculate pay, and monitor weekly and pay-period targets. Replaces manual spreadsheets with a mobile-first React + FastAPI app - auto break deductions, Aramark payroll calendar, and CI/CD to GCP via GitHub Actions.

2

Users

Manoj & Jothesh - each user sees only their own shifts via user_id isolation on every API endpoint.

5

App Tabs

Log, Overview (donut chart), Shifts table, Salary (pay periods), Schedule (Aramark W02–W52 tax calendar).

£12.71

Hourly Rate

30 min break auto-deducted if shift > 6h. 2-week Aramark pay cycles anchored to Sat 14 Mar 2026.

GCP

Hosted

Nginx serves React build + proxies /api/ to FastAPI :8000. Systemd keeps backend alive. GitHub Actions CI/CD on push.

React 18 Vite FastAPI Python SQLite Nginx GCP VM GitHub Actions

Engineering Repository & Lab Projects

Multi-Environment Kubernetes Deployment Platform

Designed a reusable deployment framework for development, staging, and production Kubernetes environments using Helm, Terraform, and GitHub Actions.

  • Standardized releases across environments with repeatable infrastructure provisioning.
  • Integrated CI/CD automation with secure secrets handling and rollout validation.
GitHub

Cloud Migration Readiness & Cutover Automation

Built a migration support workflow for AWS to GCP transitions covering provisioning, validation checklists, and post-cutover monitoring readiness.

  • Created Terraform templates for faster and safer cloud resource setup.
  • Defined rollback-aware deployment steps and smoke-test verification.
GitHub

Observability & Incident Response Dashboard Suite

Implemented operational dashboards and alert tuning workflows with Splunk, New Relic, and Cloud monitoring tools for production systems.

  • Centralized logs, metrics, and alert signals for faster diagnosis.
  • Reduced noise through alert tuning and prioritization logic.
GitHub

Secure Infrastructure as Code Modules

Developed modular Terraform components for cloud networking, IAM, and workload deployment with security-by-design principles.

  • Enabled repeatable provisioning across AWS and GCP environments.
  • Implemented automated policy checks (Sentinel/OPA) for IaC compliance.
GitHub
Flagship SRE Project

InfraWatch - SRE Monitoring Platform

A production-style observability platform built using Prometheus, Grafana, Loki, Promtail, Alertmanager, Slack, Docker, and GitHub Actions for real-time infrastructure monitoring, centralized logging, alerting, and incident response automation.

Metrics

GCP VM, MacBook, website, and Prometheus health monitoring.

Logs

Centralized Docker and system logs using Loki and Promtail.

Alerts

Alertmanager sends real-time Slack alerts and recovery notices.

Infrastructure Flow

MacBook

Node Exporter

GCP VM

Docker Stack

Prometheus

Metrics

Grafana

Dashboards

Slack

Alerts

Grafana Dashboard Gallery

GCP Dashboard

GCP VM Dashboard

CPU, memory, filesystem, load average, network traffic, SLA, and infrastructure health.

MacBook Dashboard

MacBook Monitoring

Local machine observability using secure SSH reverse tunneling.

Website Dashboard

Website Uptime Dashboard

Blackbox monitoring for uptime, latency, HTTP status, and SSL expiry.

Prometheus Dashboard

Prometheus Monitoring

Target health, scrape duration, failures, and monitoring pipeline metrics.

Slack Incident Alerts

Slack Alerts

Firing Alerts

Real-time Slack alerts from Alertmanager.

Slack Resolved Alerts

Resolved Alerts

Full alert lifecycle with automatic recovery notifications.

Incident Response Workflow

1. Alert

Slack receives incident notification.

2. Metrics

Analyze Grafana dashboards.

3. Logs

Search Loki logs for RCA.

4. Fix

Resolve infrastructure/service issue.

5. Recovery

Alertmanager sends resolved notification.

Core Competencies

Platform Engineering

Designing scalable and secure cloud-native infrastructure.

SRE & Reliability

Production support, incident response, RCA, and MTTR reduction.

Infrastructure Automation

Building reusable and repeatable infrastructure provisioning.

CI/CD Delivery

Automating build, test, and deployment workflows.

Observability

Monitoring, logging, alerting, and incident analysis.

Cloud Operations

Managing and optimizing multi-cloud environments.

Security

Implementing IAM, access control, and platform governance.

Ways of Working

Agile practices, documentation, and cross-team collaboration.

Operational Excellence

Platform Engineering Pillars

Architecting production environments where reliability meets velocity. My approach centers on automation, security, and deep observability.

Automation & IaC

Eliminating manual toil through modular Terraform modules and GitOps workflows. Ensuring repeatable, drift-aware infrastructure deployments.

Reliability & GKE

Managing production Kubernetes workloads with high-availability patterns. Optimizing cluster performance, autoscaling, and secure IAM/RBAC.

SRE & Observability

Reducing MTTR via Splunk and New Relic. Implementing proactive alerting, incident RCA, and data-driven platform optimizations.

Global Cloud Status

99.99% Uptime Stable
Region us-central1-a
Type e2-medium
Security IAM Secure
Status Healthy

Technical Skills

Cloud
Google Cloud Platform Amazon Web Service Microsoft Azure
Kubernetes & Containers
Kubernetes GKE EKS Docker Helm Ingress RBAC
Infrastructure as Code
Terraform
CI/CD & GitOps
GitHub Actions Jenkins Argo CD Git
Secrets Management
GCP Secret Manager AWS Secrets Manager
Observability
Cloud Operations AWS CloudWatch New Relic Splunk Coralogix
Web / CDN
Akamai AWS CloudFront Google Cloud CDN
Service Mesh
Istio Anthos Service Mesh
Databases
MySQL PostgreSQL MariaDB DynamoDB MongoDB Redis ScyllaDB Cloud SQL Spanner
Scripting
Python Bash Shell
Operating Systems
Linux RHEL Ubuntu CentOS
Tools
Jira Confluence Microsoft Teams

Certifications

Google Cloud Associate Cloud Engineer

In Progress

Associate-level certification focused on Google Cloud infrastructure, deployment, monitoring, and operations.

Expected Completion: May 2026

Google Cloud Professional Cloud Architect

Planned

Advanced cloud architecture, scalability, reliability, and security design.

Planned Completion: End of Q2 2026

Certified Kubernetes Administrator (CKA)

Planned

Kubernetes cluster administration, troubleshooting, networking, and production orchestration expertise.

Planned Completion: End of Q2 2026

HashiCorp Terraform Associate

Planned

Infrastructure as Code (IaC), automation, provisioning, and cloud infrastructure management using Terraform.

Planned Completion: End of Q2 2026

Education

M.Sc in Data Science

University of Roehampton, London, United Kingdom

Jan 2025 - May 2026

B.Tech in Computer Science Engineering

Dayananda Sagar University, Bengaluru, India

Aug 2018 - Apr 2022

MSc Data Science - Projects

University of Roehampton, London  ·  Jan 2025 – May 2026

Dissertation Feb – May 2026

Evolution of Last-Mile Delivery Efficiency & E-Commerce Logistics Performance

Python Random Forest scikit-learn OLS Regression seaborn
Deep Learning Sep – Dec 2025

Explainable Credit Scoring Using Deep Learning & SHAP

TensorFlow Keras SHAP XAI
Data Visualisation Sep – Dec 2025

European Airport Traffic Data Visualisation

matplotlib seaborn Jupyter pandas
Applications of DS Sep – Dec 2025

Statistically Guided ML Pipeline for Diabetic Retinopathy Classification

Python MATLAB scikit-learn
Machine Learning Jan – May 2025

Classification, Neural Networks & Heart Disease Prediction

CNN LSTM Naive Bayes TensorFlow
Data Analytics Jan – May 2025

Big Data Analytics for Lung Cancer Risk Prediction on GCP

Apache Spark GCP Power BI Spark MLlib
Maths for DS Jan – Apr 2025

Statistical Analysis Across Multiple Datasets

NumPy scipy Hypothesis Testing Statistics

NETWORK
NODES

Global Communication Channels

Open to new opportunities