page · expertise

Three disciplines, in full

Every pillar below mirrors the structure of my actual working notes — grouped into sub-categories, not flattened into a handful of headline bullets.

Cloud & Platform Engineering

Strategic Transformation & Capacity

  • Orchestrated a 30-month digital overhaul transitioning a 30-year-old institution from legacy on-prem to hybrid cloud, powering the "SmartTrader" launch (₹30 Cr+ client fee savings)
  • Architected "10x-ready" infrastructure scaling daily orders 10K → 700K+ (70×) within 10 months; migrated ~900 servers to AWS
  • Managed a ₹100 Cr+ technology investment budget, aligning roadmaps to business KPIs

High-Performance & Low-Latency Systems

  • Designed a hybrid ingestion engine — Direct Connect + Transit Gateway Multicast + GRE — for NSE/BSE/MCX feeds at 0.40ms latency (~90% reduction)
  • Eliminated "thundering herd" login surges via WebSocket management on Amazon ECS, sustaining 99.99% uptime for hundreds of thousands of concurrent traders
  • Re-architected marketing/static sites to render via CDN for caching and performance

FinOps, Networking & Multi-Account

  • Implemented a FinOps framework and AWS MAP vendor negotiation for 29–40% annualized cost savings — FinOps Leader of the Year, 2025
  • Governed ~130 AWS accounts via Control Tower; ran a ~900-server estate at ~$1.2M TCO
  • Designed the hybrid network backbone — hub-and-spoke, Transit Gateway, Direct Connect, VPN tunnels
  • Earned AWS Well-Architected qualification ($6,000 in credits)

Security, Compliance & Reliability

  • Built an in-house DevSecOps program with IaC SecOps gates in CI/CD, cutting MTTR materially
  • Deployed CloudHSM-backed KMS, AWS WAF, and led EDR/XDR/MDR + SIEM/SOC adoption
  • Standardized IaC org-wide (Terraform, Pulumi, OpenTofu); deployed a self-hosted LGTM observability stack with AIOps-driven incident response
  • Key contributor to ISO 27001 and SEBI compliance audits
AWSTransit GatewayDirect ConnectAmazon ECSControl TowerCloudHSMAWS WAFTerraformPulumiOpenTofuLGTMFinOps

AI / ML & Applied GenAI

Model Training, Fine-Tuning & Alignment

  • Built an end-to-end fine-tuning pipeline for open-weight models (Gemma 2, Qwen2.5, Llama) using LoRA/QLoRA on curated golden datasets
  • Engineered an "online golden-dataset" flywheel — production traffic → review → curated SFT/preference pairs → scheduled retraining
  • Applied preference-based alignment (DPO/RLHF/RLAIF) to steer models to domain tone and compliance constraints
  • Stood up an evaluation harness — golden-set scoring, LLM-as-judge, red-teaming — as a production promotion gate

Model Compression & Efficient Serving

  • Led model compression — quantization (INT8/INT4, GPTQ/AWQ/GGUF), distillation, pruning — to fit larger models onto smaller GPUs (L4/A10G vs. A100-class)
  • Optimized throughput with vLLM/TensorRT-LLM, paged KV-cache, continuous batching, and speculative decoding
  • Ran a right-sized GPU serving cluster on AWS ECS with autoscaling across training and inference workloads

Self-Hosted Inference — Cost & Compliance

  • Replaced third-party API inference with fully self-hosted, in-VPC models to meet SEBI data-residency requirements
  • Centralized model access via a LiteLLM gateway — unified auth, routing, rate limiting, cost attribution, audit logging

Applied GenAI Products

  • Architected a proprietary GenAI Agent Platform on Amazon Bedrock (Claude 3.5) with MCP 98.7% less context overhead, 60% of support workloads automated
  • Shipped a RAG assistant (LangChain/LangGraph); built an AIOps self-healing platform cutting incident RCA time ~92% (2 hrs → <10 min)
  • Automated compliance ops: AI voice bots replaced 38% of manual collection calls; QA audit coverage scaled 12% → 100%
  • Built a PyTorch signature-verification model and a GenAI portfolio analyzer; delivered predictive (Prophecy) and operational (MOTIF) ML platforms
Amazon BedrockClaude 3.5MCPLangGraphRAGLoRA/QLoRADPO/RLHFvLLMTensorRT-LLMLiteLLMPyTorchAIOps

Data Engineering

Lakehouse & Data Platform (0-to-1)

  • Established an enterprise Medallion Lakehouse(Bronze/Silver/Gold) on Apache Iceberg, AWS Glue, S3 with ACID-compliant, time-travel auditing for SEBI compliance
  • Architected the open-source pipeline — DLT ingestion, dbt transformation, Parquet + Delta Lake with Lake Formation governance, Presto/Athena query — orchestrated by Airflow, self-hosted on AWS EKS

Real-Time Data & Governance

  • Developed a fault-tolerant real-time reconciliation engine (microservices + ElastiCache) with zero tolerance for packet loss
  • Enforced data modeling, lineage, and auditability with dbt + Spark and OpenMetadata, meeting SEBI standards
  • Delivered a semantic layer (Cube.js) for embedded analytics, surfaced through Metabase for BI

Data Products

  • Led the build of Piper, a Customer Data Platform (Customer 360) — 50% improvement in customer engagement
  • Provided the data substrate for predictive (Prophecy) and operational (MOTIF) analytics platforms
Apache IcebergAWS GlueAmazon S3dbtDLTAirflowCube.jsOpenMetadataElastiCacheMetabase