Lead Software Engineer with 20+ years of experience in architecting and implementing large-scale data solutions across cloud and on-premise environments. Proven expertise in developing high-performance distributed data pipelines, and modernizing legacy systems. Successfully migrated systems to AWS Cloud and Snowflake using AWS services, Apache Spark, Python, Snowflake, and AI-pioneered tools. Specialized in integrating generative AI and LLMs into data engineering processes to enhance automation and development efficiency.
Overview
7
7
years of professional experience
1
1
Certification
Work History
Vice President-Lead Software Engineer
JP Morgan Chase & Company
Wilmington
05.2023 - Current
Led a strategic Hadoop to AWS cloud migration project, redesigning legacy pipelines into scalable, cost-efficient Spark-Java data pipelines on EKS, AWS Glue, and Iceberg solutions.
Spearheaded Teradata to Snowflake data warehouse modernization, enabling high-performance analytics with native Snowflake capabilities.
Designed and developed data models, transitioned complex ETL workflows, and enhanced cost-performance on Snowflake.
Refactoring Spark-Java AWS ingestion, transformation, and provisioning of data pipelines to PySpark, reducing data latency by 45%.
Implemented the optimized CI/CD pipelines for automated deployment and increased release frequency in data ingestion and processing workflows.
Managed a team of 10 software engineers, guiding junior developers, and promoting an environment of technical excellence.
Partnered with cross-functional teams (e.g., Data Owners, Operations and Support team, DevOps, Data Scientists, Business Analysts, Gen AI, and LLM) to identify data requirements and provide innovative AI-driven data solutions.
Partnered with business teams to roll out AI-assisted reporting tools, laying the groundwork for Gen AI integration.
Incorporated AI into data engineering workflow utilizing GitHub Copilot and Codeium AI tools for accelerated development of data wrangling and transformation logic
Senior Data Engineer
Cardlytics Inc.
Atlantic Mine
12.2021 - 05.2023
Engineered event-driven, data quality validation framework for executing pre-validations and preparing various client data sets to ensure compliance with Cardlytics' data specifications.
Engineered data migration jobs for transferring historical and incremental data from MySQL database to Redshift utilizing AWS DMS service.
Spearhead design and implementation of logical and physical data modeling for diverse data sources on Snowflake and Redshift.
Developed a metadata collection framework for data ingestion, data quality validations, data normalization, and identification, storing collected data across event types and brands in Snowflake tables.
Executed data onboarding and implementation activities for new clients, including preparation and implementation of ingestion mapping, data normalization, identification, ETL, and profile load mapping for various POS, loyalty, email, and online customer datasets.
Designed daily, weekly, monthly, and annual data pipelines for diverse customer feeds sourced from FinTech data providers and market data aggregators.
Created Snowpipes for automated data loading from AWS S3 lake into Snowflake stage tables. Generated SQL transformers for pre-validated data, and stored transformed data in multiple fact and dimension tables in Snowflake.
Developed data copy pipelines utilizing stored procedures in JavaScript and Python for unloading data from Snowflake to AWS S3 for enhanced data processing.
Engineered an automated ETL process utilizing Snowpark Python APIs for a seamless Snowflake connection and efficient data computations, transformations, and data transfer into AWS S3 Lake.
Developed automation scripts for Snowflake data load status, Snowflake DB/Warehouse usage reporting, and Snowpipes monitoring, utilizing Snowpark Python API and Information Schemas/Catalog tables.
Constructed Airflow DAGs for designated data pipelines with AWS Managed Workflow for Airflow (MWAA).
Executed code deployment via the CI/CD pipeline using GitHub, Jenkins, and the Serverless framework in both UAT and Production environments.
Senior Data Engineer
Capital One
Plano
06.2019 - 10.2021
Transformed legacy ETL workflows into Databricks while constructing unified data management platform utilizing Delta Lake to address challenges associated with legacy data lakes such as efficient ACID transaction handling and integrated data versioning for simplified rollbacks
Architect and implement centralized, enterprise-level cloud infrastructure monitoring, alerting, compliance reporting systems, and dashboard solutions for all resources/applications hosted on AWS Cloud.
Acquired comprehensive raw data metrics from multiple API endpoints, including CloudWatch and Datadog, parsed and transformed API response data to derive actionable analytics, created data marts using Snowflake for efficient data storage and analysis, and loaded the enriched analytical data into S3 Lake for downstream usage.
Extracted job scheduling configuration and execution data metrics from various Oracle tables operating on AWS cloud, prepared and transformed data using Cloud ETL tool DMXpress and Python, and stored enriched data in Snowflake for advanced reporting and data visualization in ThoughtSpot.
Engineered Kafka consumer applications utilizing Python to process PagerDuty event data from all partitions of Kafka topics.
Formulated complex queries using SnowSQL to analyze and produce data analytics for interactive dashboard creations in Thoughtspot.
Rehydrated EC2 and EMR servers using CFT and Terraform in Jenkins every 45 days with the latest AMI, provisioned new servers through user scripts, and updated crontab jobs, decommissioned old instances following graceful shutdown procedures.
Coordinating multiple data pipelines and tasks by establishing intricate job and workflow dependencies utilizing open-source scheduler Arow, subsequently migrated to Apache Airflow.
Senior Data Engineer
Capital One
Plano
06.2018 - 05.2019
Executed the migration of on-premises HDFS datasets to the cloud utilizing Snowball, and created automation scripts for monitoring and reporting the data migration status.
Executed the migration of Hive tables into EMR Hive via DMxpress ETL extraction, and enhanced data processing using PySpark for final storage in Snowflake and S3 Lake.
Executed migration of warehouse data from Teradata and operational data from Oracle into Snowflake cloud platform
Executed migration of on-premises data center to cloud using External File Gateway (EFG) for diverse external vendor inbound and outbound file transmissions.
Executed data quality checks, tokenization, encryption, and transformation on files transmitted via Cloud Gateway (EFG) while loading enriched data into Snowflake.
Engineered automation scripts in Python for monitoring daily AWS ingestion jobs, Snowflake load status, and post-load data validations.
Executed rehydration of EMR clusters and EC2 instances for Financial Services applications every 45 days in alignment with Organization's infrastructure security policies and guidelines
Education
Bachelers of Technology -
Bharthidasan University
Trichy, Tamilnadu, India
04-2002
Skills
Agentic AI solutions
Generative AI and LLM engineering
Data engineering and architecture
Cloud migration and modeling
Cloud solution architecture
ETL processes and CI/CD
Data analytics and quality assurance
Technical leadership and problem-solving
Performance monitoring
Certification
AI Masterclass Completion from Freedom with AI
Machine Learning Level 1 from Super Data Science
AI tools such as ChatGPT, Jasper AI, MidJourney learning Certificates from Coursiv
Recent GEN AI & LLM Experiences
Generative AI and LLM Experimentation (2024 - Present)
Self-directed learning and implementation across cutting-edge AI tools
Connect LLMs to collaborate using the fundamentals of Agentic AI, like Tools, Structured Outputs, Memory.
Build robust and repeatable Agentic solutions with LangGraph.
Create Autonomous Agentic applications with CrewAI, including Agents that write and execute codeRapidly build Agentic products with OpenAI Agents SDK.
Pioneer at the forefront of Agentic AI with AutoGen AgentChat and AutoGen Core.
Architect Agentic solutions with proven, best practice design patterns.
Unlock the vast capabilities of open-source tools and resources enabled by Anthropic’s Model Context
Experimented with over 20 foundational models including GPT-4, Claude, LLaMA, Deep seek, Mistral, and Gemini.
Implemented state-of-the-art techniques such as RAG (Retrieval-Augmented Generation), QLoRA fine-tuning, and Agents.
Developed proficiency with platforms like HuggingFace, LangChain, and Gradio.
Build advanced Generative AI products , Multi-modal customer support agent using cutting-edge models and frameworks.