page · expertise
Three disciplines, in full
Every pillar below mirrors the structure of my actual working notes — grouped into sub-categories, not flattened into a handful of headline bullets.
Cloud & Platform Engineering
Strategic Transformation & Capacity
- Orchestrated a 30-month digital overhaul transitioning a 30-year-old institution from legacy on-prem to hybrid cloud, powering the "SmartTrader" launch (₹30 Cr+ client fee savings)
- Architected "10x-ready" infrastructure scaling daily orders 10K → 700K+ (70×) within 10 months; migrated ~900 servers to AWS
- Managed a ₹100 Cr+ technology investment budget, aligning roadmaps to business KPIs
High-Performance & Low-Latency Systems
- Designed a hybrid ingestion engine — Direct Connect + Transit Gateway Multicast + GRE — for NSE/BSE/MCX feeds at 0.40ms latency (~90% reduction)
- Eliminated "thundering herd" login surges via WebSocket management on Amazon ECS, sustaining 99.99% uptime for hundreds of thousands of concurrent traders
- Re-architected marketing/static sites to render via CDN for caching and performance
FinOps, Networking & Multi-Account
- Implemented a FinOps framework and AWS MAP vendor negotiation for 29–40% annualized cost savings — FinOps Leader of the Year, 2025
- Governed ~130 AWS accounts via Control Tower; ran a ~900-server estate at ~$1.2M TCO
- Designed the hybrid network backbone — hub-and-spoke, Transit Gateway, Direct Connect, VPN tunnels
- Earned AWS Well-Architected qualification ($6,000 in credits)
Security, Compliance & Reliability
- Built an in-house DevSecOps program with IaC SecOps gates in CI/CD, cutting MTTR materially
- Deployed CloudHSM-backed KMS, AWS WAF, and led EDR/XDR/MDR + SIEM/SOC adoption
- Standardized IaC org-wide (Terraform, Pulumi, OpenTofu); deployed a self-hosted LGTM observability stack with AIOps-driven incident response
- Key contributor to ISO 27001 and SEBI compliance audits
AWSTransit GatewayDirect ConnectAmazon ECSControl TowerCloudHSMAWS WAFTerraformPulumiOpenTofuLGTMFinOps
AI / ML & Applied GenAI
Model Training, Fine-Tuning & Alignment
- Built an end-to-end fine-tuning pipeline for open-weight models (Gemma 2, Qwen2.5, Llama) using LoRA/QLoRA on curated golden datasets
- Engineered an "online golden-dataset" flywheel — production traffic → review → curated SFT/preference pairs → scheduled retraining
- Applied preference-based alignment (DPO/RLHF/RLAIF) to steer models to domain tone and compliance constraints
- Stood up an evaluation harness — golden-set scoring, LLM-as-judge, red-teaming — as a production promotion gate
Model Compression & Efficient Serving
- Led model compression — quantization (INT8/INT4, GPTQ/AWQ/GGUF), distillation, pruning — to fit larger models onto smaller GPUs (L4/A10G vs. A100-class)
- Optimized throughput with vLLM/TensorRT-LLM, paged KV-cache, continuous batching, and speculative decoding
- Ran a right-sized GPU serving cluster on AWS ECS with autoscaling across training and inference workloads
Self-Hosted Inference — Cost & Compliance
- Replaced third-party API inference with fully self-hosted, in-VPC models to meet SEBI data-residency requirements
- Centralized model access via a LiteLLM gateway — unified auth, routing, rate limiting, cost attribution, audit logging
Applied GenAI Products
- Architected a proprietary GenAI Agent Platform on Amazon Bedrock (Claude 3.5) with MCP — 98.7% less context overhead, 60% of support workloads automated
- Shipped a RAG assistant (LangChain/LangGraph); built an AIOps self-healing platform cutting incident RCA time ~92% (2 hrs → <10 min)
- Automated compliance ops: AI voice bots replaced 38% of manual collection calls; QA audit coverage scaled 12% → 100%
- Built a PyTorch signature-verification model and a GenAI portfolio analyzer; delivered predictive (Prophecy) and operational (MOTIF) ML platforms
Amazon BedrockClaude 3.5MCPLangGraphRAGLoRA/QLoRADPO/RLHFvLLMTensorRT-LLMLiteLLMPyTorchAIOps
Data Engineering
Lakehouse & Data Platform (0-to-1)
- Established an enterprise Medallion Lakehouse(Bronze/Silver/Gold) on Apache Iceberg, AWS Glue, S3 with ACID-compliant, time-travel auditing for SEBI compliance
- Architected the open-source pipeline — DLT ingestion, dbt transformation, Parquet + Delta Lake with Lake Formation governance, Presto/Athena query — orchestrated by Airflow, self-hosted on AWS EKS
Real-Time Data & Governance
- Developed a fault-tolerant real-time reconciliation engine (microservices + ElastiCache) with zero tolerance for packet loss
- Enforced data modeling, lineage, and auditability with dbt + Spark and OpenMetadata, meeting SEBI standards
- Delivered a semantic layer (Cube.js) for embedded analytics, surfaced through Metabase for BI
Data Products
- Led the build of Piper, a Customer Data Platform (Customer 360) — 50% improvement in customer engagement
- Provided the data substrate for predictive (Prophecy) and operational (MOTIF) analytics platforms
Apache IcebergAWS GlueAmazon S3dbtDLTAirflowCube.jsOpenMetadataElastiCacheMetabase