Unified Audit Logging & Activity Tracking Solution

In modern distributed systems, data changes come from many sources - API requests, background jobs, scheduled processes, event consumers, admin tools, and even automated scripts. As platforms scale, it becomes increasingly difficult to answer the most essential questions:

Who made a particular change?
What exactly changed in the data?
When did it happen, and in what sequence?
Why was the change triggered - by a user action, a job, an event, or a workflow?
Where in the system did the change originate?

Whether for compliance, security, debugging, support, or product analytics, businesses need a reliable way to trace data modifications across their ecosystem.

A well-designed audit logging system becomes the source of truth for understanding platform behavior. It reduces investigation time, strengthens governance, and ensures confidence that every meaningful change can be explained and trusted - without relying on developers to sprinkle logging code everywhere.

This case study describes how we solved exactly that challenge through a clean, modular, and scalable audit logging architecture.

Client Overview

The client is a rapidly scaling enterprise SaaS platform used by mid-market and large organizations. As their product footprint grew, user activities and system events became increasingly distributed across multiple microservices. This created gaps in traceability and slowed down support workflows.

They needed a unified audit logging solution because:

Activity signals were scattered across independent microservices with inconsistent formats.

Debugging and customer support lacked an end-to-end activity trail.

Compliance expectations required full visibility into user and system-driven changes.

Internal teams needed to search, filter, and visually explore activity logs quickly.

The solution had to be introduced with minimal disruption to existing services.

The system had to support multi-environment isolation and high-volume workloads.

💡

Our team was engaged to design a scalable, low-overhead audit platform that delivers clarity, accountability, and operational insight across the entire system.

A Closer look at the Challenge

Fragmented Activity Signals

Different microservices emitted logs in different ways - sometimes through request logs, sometimes through job logs, sometimes not at all. There was no unified source of truth.

Minimal Performance Impact

The solution needed to work asynchronously, without slowing down production databases or microservices.

No Way to Reconstruct “Who Changed What”

Raw Database changes lacked application context:

No user ID
No originating API request
No associated background job
No business process detail

Need for Complete DB Change Tracking

The platform needed full CUD (Create/Update/Delete) tracking with:

Before/after values
Table & column-level visibility
Time-based search
Object-level filtering
User-level filtering
Retention & cleanup policies

Configurable, Not Hardcoded

Each microservice needed the ability to define:

Which tables to track
Which fields to include or exclude

Multi-Environment Safety

The platform runs multiple environments (dev, staging, UAT, production), so the audit system had to ensure environment isolation.

Our Solution: A Modular, End-to-End Audit System

We designed a solution that automatically captures every meaningful data change, enriches it with business context, and exposes it through a fast, filterable Elasticsearch-backed API

Below is the end-to-end system flow, followed by the module breakdown.

How the System Works (High-Level Flow)

A change happens in the application → A transaction context is created

Whenever an API request, background job, scheduled task, or integration workflow modifies data, it records a transaction context before committing to the database.
This captures details like:

who initiated the change
which service triggered it
why the change occurred etc.

This ensures the system never loses the “intent”.

A change is committed to the database → The change is automatically detected

Once the DB commit happens, a real-time change-data-capture (CDC) component observes the change and publishes a compact event describing details such as:

the table
the primary key,
the operation (create/update/delete),
the before/after values.

This requires no custom logging code inside any microservice - the system “sees” all changes automatically.

A raw change event is published by CDC component → The Enricher attaches the missing context

A dedicated Enricher Service consumes the CDC event and looks up the corresponding transaction context.
It then merges both pieces into a single complete audit record capturing:

What changed
How it changed
Who made the change
Why it happened
When it occurred

If the context isn’t immediately available, the Enricher uses a safe retry mechanism with exponential backoff and fallbacks to handle delayed or missing data.

A fully enriched audit record is ready → It is indexed in Elasticsearch

The enriched event is written to an Elasticsearch index built specifically for audit logs.
Index templates, mappings, and ILM policies ensure:

structured storage of all audit fields
high-performance search and filtering,
automatic index rollover
long-term retention

The audit data is stored and indexed → Exposed via a clean, search-optimized Audit API

A simple Audit API queries Elasticsearch’s optimized index and provides:

filters by table, object ID, user, service, operation
fast search across millions of records
before/after diff views
pagination and time-based filtering
CSV export for compliance teams

This API would power a future UI module for audit exploration

Solution Architecture (Modules)

Module 1: Transaction Context Capture

This module records the “intent” behind every database change before it happens, without slowing down application logic.

What it does

Whenever an API request, background job, workflow engine, or integration process modifies data, it writes into the database a small transaction context entry just before committing the database transaction. This ensures every DB change can later be traced back to a user, service, or system action.

Key Technical Details

Implemented as a lightweight write to a dedicated transaction_contexts table.
Captures:
- user_id (or system identity)
- service_name (derived from CDC connector metadata)
- transaction_id (DB-level TX ID from CDC event)
- created_by, created_at
- optional metadata in a jsonb field (e.g., request UUID, job reference, workflow step)
Minimum code changes were required in existing microservices - this was added via a common library / shared pattern.

This makes the system self-observing and incredibly reliable by providing the critical link needed for “who” and “why”.

Module 2: Change Data Capture (CDC) & Stream Publishing

What it does

Any change to a configured table - insert, update, delete - is automatically detected and streamed out as an event containing before/after data

Key Technical Details

Implemented using Debezium running in MSK Connect.
Each event contains:
- table, schema, DB name
- primary key fields
- operation type (c/u/d)
- before/after row state
- DB transaction ID (tx_id)
- Debezium metadata (source.connector, ts_ms, binlog/LSN position)
Per-table & per-column tracking rules allow fine-grained configuration (avoid noise).
CDC events are published to partitioned Kafka topics with:
- high throughput
- ordering guarantees per primary key
No change in application code required - DB becomes the single truth source.

Benefits

Completely asynchronous
Zero application performance impact
No code changes required inside business flows
Configurable to track only specific tables or fields

This satisfies the requirement of "Track all DB changes with minimal overhead."

Module 3: Enricher Service - Context Binding & Robust Processing

What it does

The Enricher merges the raw DB change event with the corresponding transaction context. This produces a fully attributed and enriched audit record: who, what, where, when, why.

Key Technical Details

Stateless service consuming from CDC Kafka topics.
When a CDC event arrives:
- Locate matching transaction_contexts record using (tx_id + service_name)
- Combine raw change + context (metadata, request/job details, timestamps, entity details etc.) into a single enriched audit payload structure.
Includes advanced reliability features:
- Retry with exponential backoff when context is not written yet
- Uses idempotent enrichment to avoid duplicate ES writes
- Built-in DLQ topic for events that cannot be enriched
Transformations include:
- flattening before/after fields
- computing derived fields (operation type, change summary)
- attaching contextual metadata for search

This module ensures no change is ever orphaned or missing crucial attribution data.

Module 4: Elasticsearch Indexing, ILM & Search Foundation

What it does

Once enriched, the audit record is indexed into Elasticsearch so users can search, filter, and explore logs quickly.

Why Elasticsearch

It supports:

high-volume write ingestion
time-based queries
fast filtering
free-text search
faceted exploration
scalable retention policies

Exactly what an audit system needs.

Key Technical Details

A dedicated audit index with:
- index templates?
- explicit mappings (keyword for exact matches, text for search, date fields, nested diff structures)?
- routing based on table or record ID for faster lookups?
ILM (Index Lifecycle Management) enabled:
- automatically roll over index (e.g., every 30 days or size threshold)?
- move warm → cold → delete based on retention policies?
High-performance denormalized structure storing:
- table_name, record_id, operation, actor_user_id?
- before/after values?
- full diff payload?
- service name, metadata?
- timestamps (event time, context time)?
Query layer optimized for:
- object-level drilldowns?
- time-based investigations?
- troubleshooting sequences (sorted by commit timestamp)?
- bulk CSV export for compliance?
Serves as the backend for:
- the Audit UI (search, filters, visual timeline)?
- admin support tools?
- automated compliance reports?

This module provides the speed, searchability, and structure needed for effective audit exploration.

Kikstart Your Dream Project With Us

Build With Us

Outcome & Business Benefits

With this unified audit logging foundation in place, the client is now well-positioned to achieve full visibility and governance across their platform. The system delivers strong technical guarantees while remaining lightweight for developers and non-disruptive to existing services.

1. Full Traceability Across the Entire Ecosystem

Every change can now be reconstructed with complete clarity:

Who performed or triggered the action
What changed (with before/after states)
When it happened
Why the change occurred (API call, job, workflow, system event)
Where it originated within the architecture

2. Observability Without Burden

The system captures audit trails without placing responsibility on individual microservices:

No manual logging by developers
No audit code inside service handlers
Production performance remains unaffected

3. Zero Blind Spots

Because CDC monitors every committed change at the database level, the system automatically captures actions triggered by:

background jobs
scheduled tasks
asynchronous event processors
admin tools
system scripts and maintenance tasks

Nothing is missed — even if there is no API involved.

4. Enterprise-Grade Searchability

Elasticsearch indexing and ILM policies enable:

high-volume storage (millions of audit events)
extremely fast search and filtering
flexible CSV export for compliance
long-term, cost-efficient retention

5. Architecture Built for Scale

The solution is designed to support growth without redesign:

modular components
multi-environment safety
easy onboarding of new microservices
extensible metadata model for future needs

Conclusion

The organization now has a future-ready audit logging foundation that ties together application behavior and database changes with clarity and consistency. The architecture provides:

minimal operational overhead
strong data consistency end-to-end
high reliability and fault tolerance
rich insights for compliance, support, product, and engineering teams

This project demonstrates our ability to deliver elegant, scalable, and intelligence-driven platforms that strengthen trust, transparency, and operational excellence across distributed systems.