Available for new opportunities

Spoorthi
Basu

Software Engineer · Distributed Systems · Real-Time Data Infrastructure

I build distributed systems that move data at scale. Real-time pipelines, high-throughput APIs, and the infrastructure that keeps them reliable.

InfoQ Published Author Apache Flink CDC Contributor 100M+ Events / Day

Get in Touch Resume

Indianapolis, IN · Open to relocation

What I Build

Real-time Stream Processing

Apache Kafka + Flink pipelines ingesting hundreds of millions of events daily into schema-driven, analytics-ready Iceberg tables on S3.

KafkaFlinkIcebergAvro

Distributed Systems

Fault-tolerant, multi-region Java microservices with circuit breakers, event partitioning, and 99.99% uptime across sustained production workloads.

JavaResilience4jMulti-region

High-Scale API Engineering

Spring Boot REST APIs handling 5K+ RPS with Redis cache sharding that cuts database load by 50%, built for reliability under sustained traffic.

Spring BootRedisREST

Data Infrastructure

End-to-end data platforms: event modeling, schema consolidation, download APIs, and analytics infrastructure that scales to hundreds of millions of events per day.

AWS S3DynamoDBSchema Design

About Me

I'm a Software Engineer with 5+ years building the infrastructure that makes data move: reliably, at scale, and in real time. At Genesys, I've engineered systems processing hundreds of millions of events per day, APIs that handle 5K+ requests per second, and testing infrastructure that went from zero to production-grade in weeks.

My work spans the full data path, from Kafka topics and Flink jobs to Iceberg tables and the REST APIs that serve data to end users. I care deeply about correctness, fault tolerance, and the operational rigor that makes systems boring, which is exactly what production needs.

I recently built the company's first end-to-end data validation framework, now adopted across three teams. I've also published in InfoQ on schema proliferation in Kafka and Flink pipelines, and I'm an active contributor to Apache Flink CDC.

Currently at

Genesys

Software Engineer

Mar 2021 · Present

Core Stack

JavaApache KafkaApache Flink Apache IcebergAWSRedis Spring BootDockerTerraform

Work Experience

Software Engineer

Genesys · Cloud Contact Center Platform

Mar 2021 · Present

Built Flink pipelines turning company-wide Kafka streams into schema-driven Iceberg/S3 datasets at hundreds of millions of events/day.
Architected and built the company's first end-to-end data validation framework, catching data issues before customers; adopted by 3 teams.
Scaled fault-tolerant Java/Kafka microservices to 10M+ events/day, cutting latency 25%.
Engineered customer-facing REST APIs at 5K+ RPS with Redis cache sharding, halving DB load.
Stood up org-wide integration testing 0→1 (LocalStack/AWS), cutting test creation to <1 min for 5+ teams.
Delivered 99.99% uptime via multi-region deployment and circuit breakers (Resilience4j).
Drove 10+ customer-facing features with Product and QA, growing daily active users 25%.
Mentored a junior engineer (halved onboarding) and held a 15-min P0/P1 SLA.

Software Engineer

Coding Minds, Inc · Ed-Tech Platform

Jul 2020 · Feb 2021

Developed academic system (React/Java/Node.js) deployed on Heroku, serving 500+ daily active users.
Built RESTful APIs (Java/Spring) with MySQL CRUD operations, achieving 95% test coverage.
Led full SDLC from requirements to deployment using Agile/Scrum, delivering 3 major releases.
Enhanced React performance via memoization and lazy loading, improving page load speed by 30%.
Implemented automated testing, catching 20+ critical bugs pre-production.

Selected Projects

Featured Project

Kafka Flink Schema Consolidation

Reference implementation for discriminator-based schema consolidation in Kafka and Flink pipelines. Collapses twelve schemas into one consolidated Avro record, enabling single filtered queries over Apache Iceberg on S3. Companion to the InfoQ article on schema proliferation.

Apache FlinkApache KafkaApache IcebergAvroJava

GitHub InfoQ Article

ConsolidatedRide.avsc

// Discriminator-based union field
{
  "name": "standardRideAttributes",
  "type": ["null", {
    "type": "record",
    "name": "StandardRideAttributes",
    "fields": [
      { "name": "vehicleClass",
        "type": "string" },
      { "name": "surgeMultiplier",
        "type": "double" }
    ]
  }],
  "default": null
}

Kafka Flink Audit Trail

Flink pipeline writing profile change events to an append-only Iceberg table. Historical state reconstructed at query time via SQL window functions.

FlinkKafkaIcebergSQL

Health Web

Clinic website deployed on AWS enabling patients to find nearby doctors and book appointments. Java backend with MySQL via JDBC.

JavaJSPMySQLAWS

Ecommerce Order Processing

Order microservices using Spring Boot and JPA with PostgreSQL. Swagger docs, Docker containerized, Log4j logging.

Spring BootPostgreSQLDocker

Hair & Skin Segmentation

Deep autoencoder using U-NET for hair/skin segmentation with Keras, tested on Celeb-A dataset with data augmentation and transfer learning.

KerasU-NETNumPy

Safe Driving: Collision Prevention

Demo vehicle using Renesas Microcontroller with ultrasonic sensors. Proximity alerts via CubeSuite++, with AWS storing messages to nearby vehicles.

Embedded CRenesasAWS

Education & Credentials

Education

M.S. in Computer Science

Cal Poly Pomona

2018 – 2020 · GPA 3.66

B.S. in Computer Science

Dr. Ambedkar Institute of Technology

2014 – 2018 · GPA 4.0

Publication

Schema Proliferation in Kafka and Flink Pipelines

InfoQ · 2026

Explores schema proliferation in event-driven systems and presents a discriminator-based consolidation approach for scalable Kafka and Flink pipelines.

#1 Top Article in InfoQ's weekly Round-Up
550K+ monthly readers
Peer-reviewed

Read on InfoQ

Open Source

Apache Flink CDC Contributor

apache/flink-cdc · 2026

Fixed silent data duplication in the Iceberg sink during in-checkpoint schema changes. Also fixed a JobManager OOM on large MySQL CDC tables caused by retained snapshot-split metadata.

Merged, shipping in Flink CDC 3.7
Streaming-data internals
Reviewed by a committer

View Pull Requests

Professional Activity

Presenter: Keys to Success

Cal Poly Pomona Graduate Student Welcome

Gave a talk to 50+ students on my journey in Computer Science
Addressed the transition from undergraduate to graduate school

Cal Poly Pomona

Judge, Game Gala 2021

Coding Competition for Gamers

Evaluated 20+ K-12 developers on digital game projects
Provided feedback on code quality and game performance
Participated in selecting the competition winner

Game Gala 2021

Spoorthi
Basu

What I Build

Real-time Stream Processing

Distributed Systems

High-Scale API Engineering

Data Infrastructure

About Me

Work Experience

Software Engineer

Software Engineer

Selected Projects

Kafka Flink Schema Consolidation

Kafka Flink Audit Trail

Health Web

Ecommerce Order Processing

Hair & Skin Segmentation

Safe Driving: Collision Prevention

Education & Credentials

Education

Publication

Open Source

Skills & Technologies

Streaming & Data

Languages

Cloud & Infrastructure

Frameworks

Databases & Caching

Testing & Observability

Professional Activity

Presenter: Keys to Success

Judge, Game Gala 2021

SpoorthiBasu

What I Build

Real-time Stream Processing

Distributed Systems

High-Scale API Engineering

Data Infrastructure

About Me

Work Experience

Software Engineer

Software Engineer

Selected Projects

Kafka Flink Schema Consolidation

Kafka Flink Audit Trail

Health Web

Ecommerce Order Processing

Hair & Skin Segmentation

Safe Driving: Collision Prevention

Education & Credentials

Education

Publication

Open Source

Skills & Technologies

Streaming & Data

Languages

Cloud & Infrastructure

Frameworks

Databases & Caching

Testing & Observability

Professional Activity

Presenter: Keys to Success

Judge, Game Gala 2021

Spoorthi
Basu