How to Design Job Scheduler

4 min readDec 12, 2018

This is one of the trending topics in interview questions. Here I am describing step-by-step discussion for cracking such design questions.

Basics Principle Of Scheduler

It will schedule a Job which will be triggered at a specified time.

Interview design question always comes in one-line requirement. The candidate has to gather complete requirement by asking questions.

In Below discussion, we have to find out the Use Case of Scheduler because interview and candidate might be having a different perception of the same question so let’s begin.

Candidate: Can we set a repetitive task by this scheduler?

Interviewer: Yes

2. Candidate: Do we need to persist the Scheduled Jobs for later reference? For Ex: How many Jobs are scheduled, already ran successfully etc.

Interviewer: Well, You need to persist but not exactly for tracking the above scenario but to update/edit the already Scheduled Job.

3. Candidate: How about canceling scheduled Jobs?

Interviewer: Yes.

4. Candidate: What is the max scheduled period of any Job, 1 year, 2 years etc

Interviewer: Max 10year. Not interested in scheduling any future job of more than 10 years.

5. Candidate: Do we need to define certain criteria feature to check before triggering Job? Let’s say, internet availability OR some Expression execution.

Interviewer: We need it but we can ignore for sake of Simplicity.

6. Candidate: Do we need to provide a fail-Reschedule mechanism like If a Job triggered but could not complete its work due to some inter-job reason. Can it be rescheduled?

Interviewer: No. Leave it

Note: In the Above scenario, If the interviewer says yes then the candidate can ask for “Max Retry Count”.

Use Cases

After the above discussion, we have found out below Use-cases

Create(Normal, repetitive )
Read/Update/Delete Jobs

Finding out Constraints: Now we need to find constraints like the load and space requirement of this system.

Candidate: How many jobs will be run on the system per sec?

Interviewer: 1Million

Constraint1: Now, if 1M request will come to our system then we need to handle 1M/24*12*60*60= approx. 12request/sec. Here, We need to handle the concurrency for taking request.

Constraint2: We also need to handle concurrency at execution time, running/trigger multiple jobs at the exact same time.

How much data system need to store?

We need to store some data related to a job like some integer, string, and tag etc. for simplicity I consider that 1Job will take somewhere 1KB.

1KB = 1 Job

1MB = 1024 Jobs

1GB = 1024 * 1024 = 1M Jobs(In single day)

365 GB = 1 Year

1TB = 4Year Approx.

Abstract Design

Based on the information above, we do not need to have too many machines to hold the data. I would break up the design into the following:

Application layer: serves the requests, shows UI details.

Data storage layer: Acts like a big hash table: Stores the mappings of key-value (key would be the jobs organized by dateTime they were run, while the values would show details of these jobs). This is to enable easy search of the historical and/or scheduled jobs

The bottlenecks:

Traffic: 12 jobs/sec is not too challenging. If this spikes, we can use a load balancer to distribute the jobs to different servers for execution.

Data: At 3.6 TB, we need a hash table that can be easily queried for fast access to the jobs which have been executed in the application.

Scaling the abstract design

The nature of this job scheduler is that each job possesses one of a few states: Pending, Failed, Success, Terminated. No business logic Returns little data.

For handling the traffic we can have an application server that handles 12 requests/sec and a backup in case this one fails. In future, we can use a load balancer to reduce the number of requests going to each server (assuming >1 server are in production) Advantage of this would be to reduce the number of requests/server, increase availability (in case one server fails, and handle spikey traffic well).

For data storage, to store 3.6 TB of data we will need a few machines to hold it in the database. We can use a NoSQL DB or SQL DB. Given how the latter has more widespread use and community support which would help in troubleshooting matters and is used by large firms at the moment, I would choose MySQL DB.

As the data grows, I would adopt the following strategies to handle it:

1) Create a unique index on the hash

2) Scale MySQL DB vertically by adding more memory

3) Partition the data by sharding

4) Use a master-slave replication strategy with master-master replication to ensure redundancy of data

How to Design Job Scheduler

Use Cases

Written by kapil sharma