Two services start the same batch job at the same time. Both read the same data, both process it, both write conflicting results. Your database row lock didn’t help because the services are on different JVMs. This is the distributed lock problem.

Why Database Locks Don’t Work Here#

A SELECT FOR UPDATE on a MySQL row holds a lock only for the lifetime of that connection. Cross-service, that’s useless. You’d need a shared coordination point, something every instance can talk to.

Redis is that point. The pattern is SET key value NX PX 30000: SET the key only if it does Not eXist, with a 30-second expiry. One atomic command. If it returns OK, you hold the lock. If it returns nil, someone else does.

In Spring Boot:

Boolean acquired = redisTemplate.opsForValue()
    .setIfAbsent("job:nightly-sync", instanceId, 30, TimeUnit.SECONDS);

Never use SETNX followed by a separate EXPIRE. Two commands aren’t atomic. The process can die between them, leaving the lock held forever.

The Expiry Trap#

Here’s the thing that bit me: the lock expires at 30 seconds, but what if your job takes 35? The lock disappears, another instance grabs it, now two processes are both running. The fix is a heartbeat thread that renews the expiry every 10 seconds while work is ongoing.

Fencing tokens add another safety net: see Leader Election for how a monotonically increasing token lets downstream systems reject stale lock holders even after expiry.

At Oracle#

We used a MySQL SELECT FOR UPDATE on a config row as a makeshift distributed lock for a batch job. It worked until one job ran longer than the DB connection timeout. The connection dropped, MySQL released the lock, a second instance started, both wrote conflicting config values to 2M+ rows. Recovery took most of a day. Switching to Redis SET NX PX with a 20-second TTL and a heartbeat renewal thread fixed it.

What I’m Learning#

Redis distributed locks are simple to get right if you remember: one atomic command, a heartbeat for long jobs, and fencing tokens for downstream safety.

Have you hit the two-holders problem from lock expiry, or is this a theoretical concern in your systems?