Designing a "domain" Scheduler
- Provide more context between adaptive and fixed tick mechanisms so that I can understand the benefits and implementation complexity
- Does deadline scheduling use heap queue?
- Is it commons for schedulers to support more than one strategy concurrently?
- How to handle long-running tasks (e.g. tasks that take longer than a single tick)
- In going between description of task and scheduler I can see situations where the boundary between the two is less than clear, especially when comparing long running task strategies with schedulers. For example, in the "cooperative yielding" strategy for long running tasks it seems like the task would be wrapped in some periodic loop such that it can timeout or be interrupted. It almost seems like a scheduler that administers an individual task in addition to a mast scheduler that administers multiple tasks.
- What is difference between "circuit breaker pattern" and "backoff"?
- What parts of "task execution" are typically logged? Start, stop, and exception are top of mind.
- Latency is typically measured between two points, WRT observability metrics, what latency should be measured?
- How is scheduler lag measured?
- What is different between "dry-run" and "simulation" mode?
- instead of task state of "ready" is "pending" an acceptable and frequently used convention?
- Python is the likely system for implementing the scheduler, are there highly recommended off the shelf solutions (e.g. PIP installable) or is it common to implement scheduler from scratch as needed?
Resources
- Several chats happened prior to 4/6/2026.
- There is a particularly productive chat that started 4/6/2026
- Scheduler mechanics
Scheduler Mechanics
"Ticks" can represent a fixed interval or, in more advanced cases, an adaptive interval.
Fixed interval ticks are easier to reason with, and in general, recommended for initial implementation. Fixed ticks can result in quantization issues but many problems are tolerant to this.
On the other hand, "adaptive tick" systems are an optimization that result in more complex scheduling logic at the benefit of efficient idel behavior and higher timing precision with less CPU overhead. Adaptive systems are a bit harder to reason about and debug.
As such, it is generally recommended to start scheduler implementation using a fixed interval mechanics and only expanding to more complex systems as needed.
Task Types
- Periodic runs every N ticks, useful for polling sensors.
- One-shot runs once at a scheduled time.
- Conditional runs when a predicate (i.e. condition) true
- Dependent runs after another task completes
Task Model
class
Task States
- READY awaiting execution
- RUNNING execution in progress
- WAITING execution pending condition to be true
- PAUSED execution disabled
- COMPLETED execution finished
- FAILED execution failed with exception
Scheduling Strategies
- First come first serve (FCFS) uses a simple queue
- Priority based uses a priority sorted heap queue
- deadline scheduling (TBR) good for time sensitive measurements
- Rate monotonic scheduling (RMS) gives higher priority to shorter interval tasks. Good for periodic sampling systems.
- Earliest Deadline First (EDF) assigns a dynamic priority based on deadline. Used when optimization needed over complexity.
Time Sources
Time can be abstracted in a way that it becomes a controlled variable instead of an external dependency. This is a powerful concept.
In such abstractions, time can be paused and resumed such that it doesn't advance while the system is paused. This opens possibilities, like retaining (i.e. storing state) of scheduled offsets.
Such a strategy also allows time to be advanced manually during testing.
Execution Model
- Synchronous execution is the simplist and runs tasks in the same thread as the scheduler. While this is the simplist model to implement it is risky since tasks can block the scheduling thread which is often undesirable.
- Thread pool execution offers parallelism WRT to scheduler and other tasks at the cost of needing to use thread safety (synchronization mechanisms or non-blocking strategies)
- Process Pool Execution is best for executing CPU heavy tasks like image processing
- Hybrid models afford a single-threaded scheduler and options for how to execute the scheduled task (e.g. thread pool).
Data Flow
The system can be comprised of multiple queues. Try to think about "what is ready" to run versus "what should run".
- Ready Queue for tasks that are ready to execute
- Delayed Queues for tasks that are scheduled for future ticks
- Event Queue for tasks that are triggered by other events/tasks
Handling "Long-Running" tasks
A task might be considered "long-running" if there is a strong probability that it takes longer than a single "tick". When this happens, there are few common strategies.
- block scheduler until task finishes
- cooperative by yielding control of task to scheduler periodically
- external by running tasks outside of scheduler (e.g., thread or process)
Error Handling and Recovery
- Retry Policy
- Immediate (abort, try again next tick)
- Backoff (delay)
- Max retry (N attempts)
Logging
- Task execution (start, complete, error)
Metrics and Debug
- task execution time
- latency
- queue depth
- Why didn't a task run (state instrospection)
Determinism and Testing
- Dry run (as fast as possible?)
- Simulation mode (accelerated time?)
- Same inputs then expect same outputs
Other Patterns to Research
- Event Loop
- Reactor
- Actor
- Producer-Consumer
- State Machine
- Strategy
- Command
- Work Queue/Pipeline
Development Strategies
- Start simple
- Used fixed tick based clock instead of wall-clock
- use priority queue for task execution
- Single-threaded scheduler
- Thread pool executor for task execution.
- Custom clock abstraction (for control, flexibility, testing)
- Design fir pause/resume from day one
- Treat tasks as stateful objectsß
- avoid asyncio, unless IO Bound, and premature optimization. You will be impressed how far simple basics will take you.
Objectives
- Use Python's internal
schedmodule as an internal permeative to build out a fixed scheduler.
State of the Art
There may not be a good existing off the shelf solution for scheduler for test applications. For specific hardware test applications, it is reportedly common to build custom schedulers to achieve what is needed. Recommendation is to build a small custom scheduler core and use standard Python primitives around it.
Try using
concurrent.futuresfor worker threads/processes. It is pitched as a high-level interface for asynchronously executing callables via threads or processesasynciofor device interactions that are naturally asynceheapqfor internal ready/due queuesschedfor interface inspiration, but not necessarily the module itself.
Think in layers
- scheduler core decides what is due, paused, and eligible for execution
- executor runs the work using thread/process works
- clock controls time
- policy handles prioritizing tasks, handling deadlines, retry/backoff logic, cancellation and fairness/contention.
Expect 200–600 lines of scheduler core, not a giant framework and not a pile of ad hoc timers.