Available for new opportunities

Spoorthi
Basu

Software Engineer · Distributed Systems · Data Engineering

I build distributed systems that move data at scale. Real-time pipelines, high-throughput APIs, and the infrastructure that keeps them reliable.

Indianapolis, IN · Open to relocation

Spoorthi Basu
5+
Years of
experience
100M+
Events processed
per day
99.99%
Uptime
delivered
5K+ RPS
APIs
engineered

What I Build

Real-time Stream Processing

Apache Kafka + Flink pipelines ingesting hundreds of millions of events daily into schema-driven, analytics-ready Iceberg tables on S3.

KafkaFlinkIcebergAvro

Distributed Systems

Fault-tolerant, multi-region Java microservices with circuit breakers, event partitioning, and 99.99% uptime across sustained production workloads.

JavaResilience4jMulti-region

High-Scale API Engineering

Spring Boot REST APIs handling 5K+ RPS with Redis cache sharding that cuts database load by 50%, built for reliability under sustained traffic.

Spring BootRedisREST

Data Infrastructure

End-to-end data platforms: event modeling, schema consolidation, download APIs, and analytics infrastructure that scales to petabyte workloads.

AWS S3DynamoDBSchema Design

About Me

I'm a Software Engineer with 5+ years building the infrastructure that makes data move: reliably, at scale, and in real time. At Genesys, I've engineered systems processing hundreds of millions of events per day, APIs that handle 5K+ requests per second, and testing infrastructure that went from zero to production-grade in weeks.

My work spans the full data path: from Kafka topics and Flink jobs to Iceberg tables and REST APIs that serve data to end users. I care deeply about correctness, fault tolerance, and the operational rigor that makes systems boring, which is exactly what production needs.

I recently published in InfoQ on schema proliferation in Kafka and Flink pipelines, and I'm an active contributor to Apache Flink CDC. At Genesys, I've mentored engineers, led cross-functional delivery of 10+ features, and held a 15-minute P0/P1 SLA.

Currently at
Software Engineer
Mar 2021 · Present
Core Stack
JavaApache KafkaApache Flink AWSRedisSpring Boot DockerTerraform

Work Experience

Software Engineer

Genesys · Cloud Contact Center Platform
Mar 2021 · Present
  • Scaled fault-tolerant Java/Kafka microservices to 10M+ events/day via partitioning and rebalancing, cutting latency by 25%.
  • Flattened company-wide Kafka streams via Flink into schema-driven, query-ready S3 datasets processing hundreds of millions of events daily; built metadata-backed download service for secure customer-facing analytics.
  • Engineered 5K+ RPS REST APIs (Spring Boot/Redis) with cache sharding, reducing DB load by 50%.
  • Built org-wide integration testing 0→1 (LocalStack/AWS), automating test creation to <1 min for 5+ teams.
  • Delivered 99.99% uptime through multi-region deployment and circuit breakers (Resilience4j) for fault isolation.
  • Collaborated cross-functionally to deliver 10+ features, increasing daily active users by 25%.
  • Mentored junior engineer on distributed systems, halving onboarding time while maintaining 15-min P0/P1 SLA.

Software Engineer

Coding Minds, Inc · Ed-Tech Platform
Jul 2020 · Feb 2021
  • Developed academic system (React/Java/Node.js) deployed on Heroku, serving 500+ daily active users.
  • Built RESTful APIs (Java/Spring) with MySQL CRUD operations, achieving 95% test coverage.
  • Led full SDLC from requirements to deployment using Agile/Scrum, delivering 3 major releases.
  • Enhanced React performance via memoization and lazy loading, improving page load speed by 30%.
  • Implemented automated testing, catching 20+ critical bugs pre-production.

Selected Projects

Featured Project

Kafka Flink Schema Consolidation

Reference implementation for discriminator-based schema consolidation in Kafka and Flink pipelines. Collapses twelve schemas into one consolidated Avro record, enabling single filtered queries over Apache Iceberg on S3. Companion to the InfoQ article on schema proliferation.

Apache FlinkApache KafkaApache IcebergAvroJava
ConsolidatedRide.avsc
// Discriminator-based union field
{
  "name": "standardRideAttributes",
  "type": ["null", {
    "type": "record",
    "name": "StandardRideAttributes",
    "fields": [
      { "name": "vehicleClass",
        "type": "string" },
      { "name": "surgeMultiplier",
        "type": "double" }
    ]
  }],
  "default": null
}

Kafka Flink Audit Trail

Flink pipeline writing profile change events to an append-only Iceberg table. Historical state reconstructed at query time via SQL window functions.

FlinkKafkaIcebergSQL

Health Web

Clinic website deployed on AWS enabling patients to find nearby doctors and book appointments. Java backend with MySQL via JDBC.

JavaJSPMySQLAWS

Ecommerce Order Processing

Order microservices using Spring Boot and JPA with PostgreSQL. Swagger docs, Docker containerized, Log4j logging.

Spring BootPostgreSQLDocker

Hair & Skin Segmentation

Deep autoencoder using U-NET for hair/skin segmentation with Keras, tested on Celeb-A dataset with data augmentation and transfer learning.

KerasU-NETNumPy

Safe Driving: Collision Prevention

Demo vehicle using Renesas Microcontroller with ultrasonic sensors. Proximity alerts via CubeSuite++, with AWS storing messages to nearby vehicles.

Embedded CRenesasAWS

Education & Credentials

Education

Master's in Computer Science
Cal Poly Pomona
GPA 3.66 · 2018 – 2020
Bachelor's in Computer Science
Dr. Ambedkar Institute of Technology
GPA 4.0 · 2014 – 2018

Publication

Schema Proliferation in Kafka and Flink Pipelines
InfoQ · 2026

Explores schema proliferation in event-driven systems and presents a discriminator-based consolidation approach for scalable Kafka and Flink pipelines.

Read on InfoQ

Open Source

Apache Flink CDC Contribution
apache/flink-cdc · 2026

Fixed duplicate record issues in Iceberg sinks by redesigning checkpoint commit behavior for correct sequencing and delete handling.

View PR #4360

Skills & Technologies

Streaming & Data

Apache Kafka Apache Flink Apache Iceberg Avro Schema Registry

Languages

Java Python JavaScript SQL

Cloud & Infrastructure

AWS Docker Terraform Kubernetes CI/CD

Frameworks

Spring Boot Spring Cloud React gRPC

Databases & Caching

DynamoDB Redis PostgreSQL MySQL S3

Testing & Observability

JUnit 5 Mockito Testcontainers LocalStack Sumo Logic New Relic Prometheus

Professional Activity

Presenter: Keys to Success

Cal Poly Pomona Graduate Student Welcome

  • Gave a talk to 50+ students on my journey in Computer Science
  • Addressed the transition from undergraduate to graduate school
Cal Poly Pomona

Judge, Game Gala 2021

Coding Competition for Gamers

  • Evaluated 20+ K-12 developers on digital game projects
  • Provided feedback on code quality and game performance
  • Participated in selecting the competition winner
Game Gala 2021