Databus provides a timeline-consistent stream of change capture events for a database. It enables applications to watch a database, view and process updates in near real-time. Databus provides a complete after-image of every new/changed record as well as deletes, while maintaining timeline consistency and transactional boundaries. The application integration is decoupled from the source database, and each application integration is isolated, which allows for parallel development and rapid innovation.

How it works

Databus has a few key parts:

  • a database connector to watch changes and maintain a clock or sequence value
  • an in-memory relay that keeps recent changes for efficient retrieval
  • a bootstrap service/database that enables long lookback queries (including from the beginning of time)
  • a client that provides a simple API to get changes since a point in time

To use databus, the consuming application simply maintains a high watermark, and periodically requests all changes since that point in time using the Databus client. Each consuming application maintains its own high watermark, which provides isolation from one another.

How we use it

We use Databus extensively to propagate profile, connection, company updates, and many other databases at LinkedIn. For example, if a member adds a position, the standardization service will generate a canonical version of the company, which will be added to the profile and the people search index. Connection and group updates are propagated into recommendation systems.

There are many other examples across the site, as our data is heavily inter-connected!