TechXSherpa implements Scaling Email Sync Service to Thousands of Inboxes Using kafka-Powered Architecture

Every engineering team eventually faces that one project that forces you to rethink scale, throughput, resilience and observability — all at once.

For us at TechXSherpa, that challenge came in the form of building an Email Sync Platform — enabling our client’s users to connect their email accounts (like Gmail or Outlook) and have the messages and attachments automatically flow into internal Sendbird-powered chat groups. The goal: bring traditional email into the center of team collaboration, ensuring important conversations surface where work actually happens.

Sounds simple on paper. But behind the scenes, it’s a high-throughput, multi-tenant, fault-tolerant, extensible data pipeline.

From Idea to Architecture

Every great system begins as a simple idea — until scale and reliability start asking tough questions.

Our mission: “Let any platform user securely connect their email account and see their emails flow into the right place — intelligently and reliably.”

While Gmail(the first provider we integrated) offers push notifications, our internal architectural priorities led us to design a controlled pull-based sync mechanism. It provided us flexibility in scheduling, batching, and error handling — important levers for a system meant to scale horizontally across thousands of inboxes.

With such a scale of inboxes, unpredictable email volumes, and multiple external dependencies, we knew from day one this couldn’t be a monolith. Different concerns — authentication, message retrieval, processing, and dispatching — needed to evolve and scale independently.

So, we split the architecture into specialized microservices, each owning a distinct stage of the data pipeline.

The Services and Their Roles

What emerged was not just a set of services, but a coordinated pipeline — each stage responsible for transforming, enriching, or routing data in its own right.

01

OAuth Service — Securely connecting user inboxes to the platform

(Before any syncing can happen, the platform needs a secure, compliant way for users to connect their email accounts. That’s where the OAuth Service comes in — it handles all authentication flows and ensures we store and refresh tokens safely)

  • Manages user authentication and secure token storage using OAuth 2.0.
  • Handles token refresh transparently.
  • Uses a pluggable Factory-based design — adding any new provider(e.g. Outlook) only requires implementing a new provider class.
02

Scheduler / Sync Orchestrator — Deciding when to sync

(Once users connect their inboxes, the platform needs a distributed, reliable way to trigger sync attempts for every active account. The Scheduler is a lightweight orchestrator that kicks off the pipeline)

  • Spring Boot service triggered every few minutes.
  • Scans all connected accounts (via OAuth Service) and pushes sync requests to Kafka topic(mail-sync-requests).
  • Fully stateless — horizontally scalable.
03

Dispatcher Service — Delivering messages to the right chat system

(The final stage: convert the processed email into a chat message and deliver it to the target communication platform)

  • Consumes processed emails from topic processed-emailsand formats them for chat.
  • Publishes messages and attachments into the mapped Sendbird channels that the client platform uses for its chat-based workflows.
  • Uses ExecutorService + CompletableFuture for controlled parallel fetching.
  • Built using Factory + Strategy Pattern — adding Slack, Teams, or any future chat platform requires only a new “Dispatcher Strategy” class.

All services share common DNA:

  • Stateless
  • Containerized
  • Horizontally scalable
  • Independently deployable
  • Leverage Rate Limiting, Retries and Circuit Breaker patterns

The End-to-End Pipeline (Simplified Flow)


      [ Scheduler (every X minutes) ]
        ↓
      [ Kafka Topic#1: mail-sync-requests ]
        → key = userId
        ↓
      [ Mail Sync Worker (Kafka Consumer + Producer) ]
        → Refresh OAuth token if needed
        → Fetch incremental messages
        → Filter relevant emails
        → For each email:
               a. Fetch full content
               b. Upload attachments to internal file storage when required
        → Publish to Topic#2: processed-emails
               key = userId
        ↓
      [ Dispatcher Service ]
        → Transform into chat-friendly format
        → Send to target chat platform (e.g., Sendbird)

Why Kafka? The Nervous System of our Platform

As the architecture took shape, one thing became obvious early on:
we needed a backbone strong enough to handle massive concurrency, asynchronous workflows, and reliable delivery across independently scaling services.

That backbone was Kafka.

Kafka sits at the heart of this system, acting as the central nervous system connecting all microservices. It ensures that each stage of the pipeline can operate at its own speed, scale independently, and recover gracefully when external systems slow down or fail.

It enables:

  1. Fan-out:

    Multiple consumers can process the same stream for different use cases.

  2. Partitioning for parallelism:

    Each user (partition key) gets isolated processing, preserving order within their own stream.

  3. Replay and recovery:

    Any topic can be replayed for auditing, backfills, or debugging.

  4. Decoupling:

    Producers and consumers scale independently.

To support the full workflow, we designed three core topics:

  • mail-sync-requests — the Scheduler publishes user-level sync triggers.
  • processed-emails — the Mail Worker publishes fully processed, normalized email payloads.
  • DLQs (mail-sync-dlt, mail-dispatcher-dlt) — capture failed events for safe reprocessing without blocking the main flow.

Designing for Scale and Resilience

Building the pipeline was the easy part. Making our solution resilient under real-world load — that’s where the engineering really began.

Scalability & Efficiency

  • Thousands of mailboxes

  • Extremely large number of email counts

  • Multiple email and chat providers

  • To manage that scale:

  • Each microservice is stateless and horizontally scalable.

  • Kafka partitioning ensures parallelism across users while keeping each user’s inbox ordered — the best of both worlds.

  • Using CompletableFuture and ExecutorService for concurrency.

  • Smart Gmail queries with filters to reduce API calls drastically.

  • Batch processing and caching minimize redundant requests.

Resilience & Recoverability

Email APIs fail, Tokens expire, Rate limits hit unexpectedly — we designed the system assuming failures are the norm, not the exception:

  • Retry & backoff for transient failures.

  • Rate limiting and adaptive throttling at runtime.

  • DLQs (Dead Letter Queues) for safe replay of failed messages — ensuring idempotency on reprocessing.

  • No single point of failure; each service can be scaled or restarted independently.

Security & Data Integrity

Security was not an afterthought — it was a design constraint from day one. User trust is paramount when dealing with inboxes:

  • OAuth 2.0 flow with secure token encryption.

  • Secrets stored in managed vaults.

  • Deduplication logic prevents reprocessing of the same message.

  • Controlled data retention and cleanup policies.

Observability & Debuggability

We built deep visibility into every moving part:

  • CloudWatch metrics for service health and performance.

  • Rollbar for real-time error tracking.

  • Kibana for detailed logs and investigation.

  • SNS notifications for operational alerts.

  • Traceable message IDs across Kafka topics for full auditability.

  • Kafka UI for monitoring topic health, consumer lag, and message flow in real time.

Designing for Scale and Resilience

The migration to this integrated Odoo solution is poised to deliver significant strategic advantages upon full deployment:

  • Projected Major Cost Savings by Eliminating Per-User Licensing: Eliminating the escalating subscription fees from the previous tool is expected to provide immediate and substantial savings.
  • Unified & Improved User Experience: The solution is designed so that end-users will interact with financial information directly within the familiar platform interface, removing friction and improving satisfaction.
  • Full Control & Data Ownership: Upon completion, the client will gain complete control over their financial ledger data, workflows, and business logic, enhancing security and simplifying compliance.
  • Enhanced Financial Visibility: The system will provide internal teams with near real-time access to accurate ledger data, enabling faster decision-making and operational adjustments.
  • Increased Agility: The custom Odoo module framework developed will allow the client to rapidly implement new financial features, rules, or integrations without being constrained by third-party limitations.
  • Future-Proof Foundation: The modular Odoo solution provides a scalable base for future innovations, such as offering financial APIs to their own users or expanding embedded finance capabilities.

Technology Stack

None of this would be possible without the right tools. The technologies we chose — and how we used them — made all the difference between a working system and a truly production-grade platform.

Backend (Microservices)

  • Java 21 + Spring Boot 3
  • Resilience4j (retries, rate-limiting, circuit-breaker)
  • ExecutorService + CompletableFuture for controlled concurrency
  • Kafka (AWS MSK) as the event backbone

Data & Storage

  • PostgreSQL for metadata, tokens, and sync state
  • Amazon S3 for attachment storage
  • Kafka DLQ topics for observability and recovery

Frontend & Integrations

  • ReactJS for the user-facing experience
  • Sendbird (and adapters via Strategy Pattern)
  • Gmail API + OAuth 2.0 for secure mailbox access

Infrastructure & DevOps

  • Docker + Kubernetes (EKS) for deployment and scaling
  • AWS core services: S3, RDS, Secrets Manager, VPC, API Gateway
  • Kafka UI & Kibana for observability
  • Rollbar for real-time error monitoring

Challenges Along the Way

No system that touches thousands of inboxes survives contact with reality unchanged — and ours was no exception.

As we moved from design to real-world execution, a set of challenges began shaping the architecture in important ways:

1. Multi-Provider Complexity

Integrating multiple email providers (and future chat providers) meant dealing with different:

  • Authentication flows
  • Data models
  • Quirks in APIs

This pushed us to enforce a strict extension-first architecture — new providers should plug in without modifying existing flows. That decision influenced our factory patterns, strategy patterns, and contract-based service boundaries.

2. Wildly Varied Email Formats

Emails in the wild aren’t predictable:

  • Different MIME structures
  • Nested attachments
  • Large inline images
  • Forwarded chains that look different across clients

Our parsing and attachment-handling logic had to get significantly smarter as these real-world payloads surfaced.

3. Rate Limits & API Behavior

Large-scale syncing with Gmail quickly taught us:

  • Rate limits can spike without warning
  • Token refreshes sometimes fail silently
  • Latency varies widely

This drove our investment in Resilience4j — retries, circuit breakers, and adaptive throttling became essential rather than optional.

4. Idempotency at Scale

With DLQs, replays, and multiple consumers, one rule became sacred:

Reprocessing must never cause duplicates or inconsistencies.

Designing idempotent pipelines (both in email fetch and in chat dispatch) required strict message-keying and controlled state transitions.

This pushed us to enforce a strict extension-first architecture — new providers should plug in without modifying existing flows. That decision influenced our factory patterns, strategy patterns, and contract-based service boundaries.

These challenges weren’t setbacks — they were architectural forcing functions. They pushed us to build a pipeline that is significantly more resilient, predictable, and future-ready. Today, adding new providers or integrations is no longer a redesign — it’s just an extension.

What’s Next

The next phase is where intelligence begins. From summarizing threads to detecting intent to automatically routing information — AI will turn email from a passive stream into an active signal.

Closing Thoughts

Building this platform was a masterclass in system design — where distributed processing, fault tolerance, and extensibility meet.

Kafka gave us the spine. Spring Boot and Resilience4j gave us the muscle. Patterns and clean design gave us longevity.

This project reminded us that scalability isn’t something you buy — it’s something you design.

Empowering Partners to Excel in an Evolving Digital Landscape