Backend (Microservices)
- Java 21 + Spring Boot 3
- Resilience4j (retries, rate-limiting, circuit-breaker)
- ExecutorService + CompletableFuture for controlled concurrency
- Kafka (AWS MSK) as the event backbone
Every engineering team eventually faces that one project that forces you to rethink scale, throughput, resilience and observability — all at once.
For us at TechXSherpa, that challenge came in the form of building an Email Sync Platform — enabling our client’s users to connect their email accounts (like Gmail or Outlook) and have the messages and attachments automatically flow into internal Sendbird-powered chat groups. The goal: bring traditional email into the center of team collaboration, ensuring important conversations surface where work actually happens.
Sounds simple on paper. But behind the scenes, it’s a high-throughput, multi-tenant, fault-tolerant, extensible data pipeline.
Every great system begins as a simple idea — until scale and reliability start asking tough questions.
Our mission: “Let any platform user securely connect their email account and see their emails flow into the right place — intelligently and reliably.”While Gmail(the first provider we integrated) offers push notifications, our internal architectural priorities led us to design a controlled pull-based sync mechanism. It provided us flexibility in scheduling, batching, and error handling — important levers for a system meant to scale horizontally across thousands of inboxes.
With such a scale of inboxes, unpredictable email volumes, and multiple external dependencies, we knew from day one this couldn’t be a monolith. Different concerns — authentication, message retrieval, processing, and dispatching — needed to evolve and scale independently.
So, we split the architecture into specialized microservices, each owning a distinct stage of the data pipeline.
What emerged was not just a set of services, but a coordinated pipeline — each stage responsible for transforming, enriching, or routing data in its own right.
(Before any syncing can happen, the platform needs a secure, compliant way for users to connect their email accounts. That’s where the OAuth Service comes in — it handles all authentication flows and ensures we store and refresh tokens safely)
(Once users connect their inboxes, the platform needs a distributed, reliable way to trigger sync attempts for every active account. The Scheduler is a lightweight orchestrator that kicks off the pipeline)
(The final stage: convert the processed email into a chat message and deliver it to the target communication platform)
[ Scheduler (every X minutes) ]
↓
[ Kafka Topic#1: mail-sync-requests ]
→ key = userId
↓
[ Mail Sync Worker (Kafka Consumer + Producer) ]
→ Refresh OAuth token if needed
→ Fetch incremental messages
→ Filter relevant emails
→ For each email:
a. Fetch full content
b. Upload attachments to internal file storage when required
→ Publish to Topic#2: processed-emails
key = userId
↓
[ Dispatcher Service ]
→ Transform into chat-friendly format
→ Send to target chat platform (e.g., Sendbird)As the architecture took shape, one thing became obvious early on:
we needed a backbone strong enough to handle massive concurrency, asynchronous workflows, and reliable delivery across independently scaling services.
Kafka sits at the heart of this system, acting as the central nervous system connecting all microservices. It ensures that each stage of the pipeline can operate at its own speed, scale independently, and recover gracefully when external systems slow down or fail.
It enables:Multiple consumers can process the same stream for different use cases.
Each user (partition key) gets isolated processing, preserving order within their own stream.
Any topic can be replayed for auditing, backfills, or debugging.
Producers and consumers scale independently.
Building the pipeline was the easy part. Making our solution resilient under real-world load — that’s where the engineering really began.
Thousands of mailboxes
Extremely large number of email counts
Multiple email and chat providers
Each microservice is stateless and horizontally scalable.
Kafka partitioning ensures parallelism across users while keeping each user’s inbox ordered — the best of both worlds.
Using CompletableFuture and ExecutorService for concurrency.
Smart Gmail queries with filters to reduce API calls drastically.
Batch processing and caching minimize redundant requests.
Email APIs fail, Tokens expire, Rate limits hit unexpectedly — we designed the system assuming failures are the norm, not the exception:
Retry & backoff for transient failures.
Rate limiting and adaptive throttling at runtime.
DLQs (Dead Letter Queues) for safe replay of failed messages — ensuring idempotency on reprocessing.
No single point of failure; each service can be scaled or restarted independently.
Security was not an afterthought — it was a design constraint from day one. User trust is paramount when dealing with inboxes:
OAuth 2.0 flow with secure token encryption.
Secrets stored in managed vaults.
Deduplication logic prevents reprocessing of the same message.
Controlled data retention and cleanup policies.
We built deep visibility into every moving part:
CloudWatch metrics for service health and performance.
Rollbar for real-time error tracking.
Kibana for detailed logs and investigation.
SNS notifications for operational alerts.
Traceable message IDs across Kafka topics for full auditability.
Kafka UI for monitoring topic health, consumer lag, and message flow in real time.
The migration to this integrated Odoo solution is poised to deliver significant strategic advantages upon full deployment:
None of this would be possible without the right tools. The technologies we chose — and how we used them — made all the difference between a working system and a truly production-grade platform.
No system that touches thousands of inboxes survives contact with reality unchanged — and ours was no exception.
As we moved from design to real-world execution, a set of challenges began shaping the architecture in important ways:
Integrating multiple email providers (and future chat providers) meant dealing with different:
This pushed us to enforce a strict extension-first architecture — new providers should plug in without modifying existing flows. That decision influenced our factory patterns, strategy patterns, and contract-based service boundaries.
Emails in the wild aren’t predictable:
Our parsing and attachment-handling logic had to get significantly smarter as these real-world payloads surfaced.
Large-scale syncing with Gmail quickly taught us:
This drove our investment in Resilience4j — retries, circuit breakers, and adaptive throttling became essential rather than optional.
With DLQs, replays, and multiple consumers, one rule became sacred:
Reprocessing must never cause duplicates or inconsistencies.Designing idempotent pipelines (both in email fetch and in chat dispatch) required strict message-keying and controlled state transitions.
This pushed us to enforce a strict extension-first architecture — new providers should plug in without modifying existing flows. That decision influenced our factory patterns, strategy patterns, and contract-based service boundaries.
These challenges weren’t setbacks — they were architectural forcing functions. They pushed us to build a pipeline that is significantly more resilient, predictable, and future-ready. Today, adding new providers or integrations is no longer a redesign — it’s just an extension.
The next phase is where intelligence begins. From summarizing threads to detecting intent to automatically routing information — AI will turn email from a passive stream into an active signal.
Building this platform was a masterclass in system design — where distributed processing, fault tolerance, and extensibility meet.
Kafka gave us the spine. Spring Boot and Resilience4j gave us the muscle. Patterns and clean design gave us longevity.
This project reminded us that scalability isn’t something you buy — it’s something you design.