Transcoding Pipelines: Processing Video at Scale

A user uploads a 4K video. Your system needs to produce: 4 resolution variants, 3 audio codec versions, thumbnails at 10-second intervals, and subtitle extraction. That’s not one job. That’s a directed acyclic graph of dependent tasks.

The Pipeline as a DAG#

Transcoding isn’t a linear process. Some steps depend on others. Some can run in parallel.

graph TD U["Upload: raw video"] --> V["Validate format"] V --> S["Split into segments"] S --> T1["Transcode 1080p"] S --> T2["Transcode 720p"] S --> T3["Transcode 480p"] S --> T4["Transcode 240p"] S --> TH["Generate thumbnails"] S --> SUB["Extract subtitles"] T1 --> M["Merge segments + manifest"] T2 --> M T3 --> M T4 --> M TH --> M SUB --> M M --> P["Publish: video available"] style U fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style V fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style T1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style T2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style T3 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style T4 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style TH fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style SUB fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style M fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style P fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

The 4 resolution transcodes can run in parallel. But they all depend on the split step completing first. And the merge step depends on all transcodes finishing. This is the same MapReduce pattern: split, process in parallel, combine.

Segment-Level Parallelism#

A 2-hour video split into 4-second segments gives you 1,800 segments. Each segment can be transcoded independently. Distribute segments across a pool of workers. With 100 workers, you transcode 100 segments simultaneously.

public class TranscodeJob {
    private final String videoId;
    private final int segmentIndex;
    private final String targetResolution;

    public void execute() {
        byte[] segment = storage.getSegment(videoId, segmentIndex);
        byte[] transcoded = ffmpeg.transcode(segment, targetResolution);
        storage.putSegment(videoId, segmentIndex, targetResolution, transcoded);
        // Checkpoint progress
        checkpointStore.markCompleted(videoId, segmentIndex, targetResolution);
    }
}

Handling Failures#

Worker crashes mid-transcode of segment 847 out of 1,800. Without checkpointing, you’d restart the entire video. With per-segment checkpointing, you just re-process segment 847. The 846 completed segments are safe.

This is why the pipeline tracks state per segment, not per video. A simple status table works:

SELECT segment_index FROM transcode_progress
WHERE video_id = 'abc' AND resolution = '720p' AND status = 'COMPLETED';

Compare against the total segment count. The difference is your remaining work. Assign those segments to available workers.

Priority and Fairness#

Not all transcode jobs are equal. A viral video that just got 10,000 views needs its HD version now. A video uploaded by a new account can wait. Priority queues with starvation prevention ensure popular content gets transcoded fast without starving the rest.

At Salesforce, the code generation pipeline had the same structure. 4,000+ service configurations needed code generated, each with multiple output targets (Java clients, REST stubs, validation rules). Each config/target combo was independent, like a video segment. We ran them in parallel across a worker pool. The DAG was: parse config -> validate -> generate each target -> merge results. Checkpointing per config meant a failed generation didn’t require reprocessing all 4,000. When we added this per-config checkpointing, full pipeline reruns dropped from the common case to the exception. The 80% reduction in review cycles came partly from this: reviewers only saw the changed configs, not the entire regenerated set.

What I’m Learning#

Transcoding pipelines are just distributed workflow execution. The principles are universal: model the work as a DAG, parallelize independent steps, checkpoint at the finest useful granularity, and handle priority. Whether you’re transcoding video, generating code, or running ETL, the architecture looks the same. The video domain just makes the parallelism obvious because segments are naturally independent units.

What’s the most complex pipeline you’ve built, and how did you handle partial failures?