How We Architected a Professional-Grade Trading Engine for Indian Markets — and the Hard Decisions That Shaped It

Disclaimer
This article is a purely technical discussion of trading system architecture and software engineering decisions. ThWorks does not provide investment advice, trading signals, strategy recommendations, or any form of financial guidance.
As per SEBI regulations (Investment Advisers Regulations, 2013 and Research Analysts Regulations, 2014), only registered and licensed entities are permitted to offer investment advice or research recommendations in India. ThWorks is a technology company. We build fintech infrastructure and trading automation solutions exclusively for trading firms, brokerages, prop desks, and other licensed financial entities — not for individual retail investors.
Nothing in this article constitutes financial or investment advice. All architecture decisions described here are engineering choices made for a specific platform context.

The Stance We’re Taking

Before we get into architecture, here is the opinion that drove every decision in this build:

Risk systems are the most critical layer. Most trading systems fail not because their strategies are wrong, but because their architecture doesn’t hold under market stress.

Bad execution infrastructure eats good strategies alive. A trailing stop that fires 200ms late on a volatile day, a reprice loop that silently stops retrying, a reconciliation gap that leaves a position unhedged overnight — these are not edge cases. They are the normal operating conditions of Indian equity derivatives markets, and they happen to every platform that isn’t specifically engineered to handle them.

We built this engine for trading firms that have already solved the strategy problem. They know what they want to trade. What they need is infrastructure that executes reliably, protects capital when things go wrong, and gives them a forensic trail when they need to explain exactly what happened.

This post walks through eight architecture decisions that shaped the platform — and the reasoning behind each one.

1. Risk Is Not a Layer. It’s a Cross-System Control Plane.

The standard framing of trading system architecture puts risk management as a distinct layer — something that sits between strategy and execution and checks positions before orders go out. That framing is wrong, or at least incomplete.

If risk only lives in one place, it can be bypassed. A timeout on the risk check, a race condition in the signal pipeline, a manual override that skips the layer entirely — any of these puts capital at risk. Real risk management has to intercept every stage of the pipeline, structurally, not procedurally.

Here is how this plays out in the platform we built:

At signal time: Before any entry order is placed, a kill switch check runs. If a kill switch is active — whether it was triggered manually by the trader, by a daily loss limit breach, or by an automated circuit breaker — the order never reaches the broker. This isn’t a warning. It’s a hard stop in code.

At order placement: NSE blocked MARKET orders and SL-M orders for index options in September 2021. Our engine enforces LIMIT-only order placement structurally — the order placement function only emits LIMIT orders. There is no flag to disable this, no escape hatch. A platform that lets you configure your way around a regulatory constraint isn’t safe; it’s liability.

In the tick path: A risk monitor processes every incoming price tick with a hard latency constraint of under 5 milliseconds, with no database writes and no network I/O. Within that window, it evaluates trailing stop conditions, checks target levels, and computes SL-to-cost transitions. This is where strategy intent meets market reality in real time.

At the broker: We run a hybrid stop-loss model. A broker-side SL-LIMIT order sits as the safety net — it fires even if our engine goes down. Our engine-side monitor runs as the smart layer — it handles trailing stops, reprice decisions, and target exits. Two independent guards, either of which can save a position. The broker-side order is not a backup; it is a parallel system.

At the portfolio level: Kill switches are mode-scoped. A kill switch can target live trading only (paper and backtest continue), or it can kill all modes simultaneously. Combined mark-to-market stops apply across multi-leg strategy positions — one leg’s paper profit cannot mask another leg’s runaway loss. The portfolio view is not an afterthought.

The principle behind all of this: a safety check you can bypass is not a safety check. Risk controls have to be structural — woven into the path of execution itself, not bolted on top of it.

2. Strategies Are Rule-Based and Deterministic. Deliberately.

We get asked why we don’t incorporate ML-driven signals into the strategy engine. The answer is simple: ML adds model risk on top of market risk, and it destroys auditability.

When a trade goes wrong at 10:32 AM — and it will — the firm needs to reconstruct exactly what happened. What was the signal? What was the entry trigger? What were the stop-loss conditions? With a rule-based strategy, every one of these questions has a deterministic, queryable answer. With an ML model, the answer is “the model predicted X with probability Y given these features” — and that answer doesn’t satisfy a compliance team, a client, or a post-mortem investigation.

For execution engines serving professional trading firms, determinism and auditability are not optional features. They are requirements.

The strategies we’ve implemented are:

Opening Range Breakout (ORB): The underlying’s 1-minute OHLC data over the range window [entry_time, breakout_time] defines the range high and low. At the breakout time, broker SL-LIMIT trigger orders are placed at the range high (for long triggers) or range low (for short triggers). The engine doesn’t monitor for the breakout in real time — it delegates that to the broker’s own trigger mechanism, which is faster and more reliable.

Wait & Trade (WAT): At fire time, the strategy reads the option’s current LTP and places a broker SL-LIMIT trigger at a ±offset from that price — either in points or as a percentage. For positional trades, the trigger is re-armed each morning based on the new day’s LTP. The offset is stored on the trade row, so the re-arm calculation is always grounded in the original strategy intent.

Multi-leg strategies (straddles, strangles): A leg-exit policy governs what happens when one leg exits. CLOSE_ALL cascades an exit to all sibling legs when any one of them closes — preventing the dangerous situation where one leg exits on its stop loss while the other remains open as an unhedged directional bet. INDEPENDENT lets each leg manage itself. The default is CLOSE_ALL, because an unintended naked leg is a real-money risk, not an academic concern.

Multi-strategy orchestration: The scheduler fires multiple strategies simultaneously. Each strategy runs with its own independent run state, its own legs, its own SL tracking, and its own kill switch scope. Running two strategies at once doesn’t require coordination between them — they are fully isolated at the data model level.

3. Live, Paper, and Backtest Are One Code Path

The architectural decision that most platforms get wrong — including expensive commercial platforms — is treating live, paper, and backtest as separate systems.

When live and paper use different code paths, the paper results don’t mean anything. When backtest uses a different order-routing layer than live, the backtest doesn’t replicate what would actually happen. You end up with three systems that diverge in subtle, critical ways, and you discover the divergence after a live loss.

Our approach: mode as a first-class parameter throughout the system.

The same engine code runs all three modes. The same strategy executor fires. The same reconciliation logic runs. The same risk monitor evaluates every tick. Mode is a property of a trade and a strategy run — not a property of the codebase.

We enforce this with a CI lint check that fails the build if any if mode == "backtest" branches appear in business logic outside three specifically sanctioned files. The sanctioned files are: the broker factory (which must dispatch to different broker clients per mode), the order tag formatter (which prefixes tags differently per mode), and the alert formatter (which prepends a [LIVE] or [PAPER] badge to Telegram messages). Everywhere else, mode is invisible.

The key technical decision that makes this work: SL and target trigger prices are anchored to the strategy’s intended entry price, not the actual fill price. If the strategy intended to enter at 150, the SL at 10% below is 135 — regardless of whether the actual fill came in at 148 or 152. The trigger ladder is identical across all three modes. Only the fill price differs — in backtest, that’s captured by a slippage model; in live, it’s captured by real market spread.

This means when you run a strategy in paper mode alongside live mode, you can compare results and know that any divergence is fill quality, not logic divergence.

For backtest specifically: 1-minute OHLC data from the broker’s historical API is the source. Each bar generates 4 synthetic ticks in a deterministic pattern (Open → Low → High → Close for bearish bars, Open → High → Low → Close for bullish bars). A slippage model approximates fill quality. Metrics produced: win rate, profit factor, maximum drawdown, expectancy, equity curve. The backtest determinism test in CI asserts that running the same strategy with the same parameters twice produces an identical result.

4. Execution Engineering: Where Most Platforms Are Hollow

The gap between a platform that “places orders” and a platform that executes reliably in production is wider than it looks. Here is where the real engineering lives.

LIMIT-only order placement: Not just a regulatory compliance checkbox — a structural property of the order placement layer. Every order that leaves the engine is a LIMIT order. The reprice loop handles the case where the limit doesn’t fill immediately.

The reprice loop: Entry LIMIT placed at LTP. If not filled within the next tick cycle, the limit price is recalculated and the order is modified. Each modify counts against a cap of 25 broker modifications (an exchange-imposed limit for some brokers). The loop has a budget — a maximum percentage distance from the original intent price that the engine is willing to chase. When the budget is exhausted, the order is cancelled and an alert fires. Chasing a runaway price without a cap is how one order can blow through a position’s entire risk budget.

Circuit clamping: Before any SL-LIMIT order is placed, the engine fetches the symbol’s upper and lower circuit limits from the broker and clamps the SL price to a valid range. An order placed at a price outside the circuit range will be rejected by the exchange — but that rejection comes back as a hard error, sometimes without a clear error code. Clamping preemptively avoids the error and keeps the SL state machine clean.

Retry and failure recovery: Broker API calls that fail due to timeouts or transient errors retry with exponential backoff. Symbols that repeatedly return circuit-related rejections are added to a symbol-level pause registry — the tick path won’t attempt to place or modify orders for that symbol until the pause expires. This prevents the tick path from hammering a frozen symbol and eating into the modify budget.

Broker abstraction layer: Every broker integration — Zerodha, AliceBlue, Tradejini — exposes an identical client interface to the engine. A capabilities registry maps each broker to its specific constraints: the field used for order tags, the maximum tag length, the supported product types per exchange, whether the broker supports automated login. Engine code never branches on broker name. Adding a new broker integration is one new module tree and one new row in the capabilities registry.

Event-driven state machine: Every trade has a state (ENTRY_PENDING → OPEN → EXIT_PENDING → CLOSED) and every state transition writes an event row with a timestamp, the reason for the transition, and the prices involved. Reconciliation is the only mechanism that updates trade state from broker fills — there are no per-trade polling loops running in the background. The reconciliation job runs every 3 seconds and is the single source of truth for what has actually happened at the broker.

5. The Data Layer Is an Audit Trail, Not a Log File

Most trading platforms treat their database as application state — a place to track current positions and write logs. We treat it as a forensic record.

Every state transition is an event row. When an entry order fills, an event row is written with the fill price, quantity, timestamp, and the reconciliation pass that detected it. When a kill switch is activated, an event row is written. When the reconciliation job detects that the broker has a fill our DB didn’t know about, an event row records both the detection and the DB update. The event table is append-only and never updated — only new rows are added.

Trade rows separate intent from reality. A trade row holds two kinds of data: what the strategy intended (entry price, stop-loss level, target, risk configuration) and what actually happened (average fill price, actual SL trigger price, exit price). The risk configuration is a snapshot taken at trade creation — if the strategy template is edited tomorrow, the open trade continues operating under the config that was in effect when it was created.

The database schema is UI-agnostic. No column holds a unit in its name (_points, _pct). No column holds composite strings that need parsing at read time. Risk configuration is stored as structured JSONB — {"type": "PERCENTAGE", "value": 10} — not as a string like "10%". Every UI (Telegram bot, web interface, REST API, CLI) translates from user input to this canonical structure before writing to the database. Parsing happens at the boundary, not inside the database.

The reconciliation feedback loop: Every 3 seconds, the reconciliation job runs six sequential passes against the broker’s live orderbook. It detects entry fills, exit fills, orphan orders (broker orders without a matching trade row), zombie positions (trade rows marked OPEN but with no corresponding broker position), and manual exits (positions closed directly at the broker terminal without going through the engine). Broker reality continuously corrects engine state. This is the feedback loop that matters — not a strategy performance feedback loop, but a system correctness feedback loop.

6. Risk Controls Are Dynamic, Not Static Rules

A static risk rule is “exit if loss exceeds X.” A dynamic risk engine adjusts what X means as the position moves.

Trailing stop-loss: The SL level moves in the direction of the trade as the position profits. The trail reference is anchored to the strategy’s intended entry price — not the fill price — so the trail ladder is deterministic across all runs of the same strategy. Trail steps can be defined in absolute points or as a percentage of the entry price.

SL-to-cost transition: When a target level is hit, the engine moves the stop-loss to the entry price — locking in breakeven. From this point, the position can only close at cost or better. This transition is automatic and doesn’t require manual intervention.

Combined MTM stops for multi-leg strategies: A straddle has two legs. If one leg is deeply profitable and the other is at its stop, a combined stop based on the total mark-to-market P&L of the strategy run prevents the profitable leg from subsidizing a runaway loss on the other. The stop is evaluated across the portfolio of legs, not per-leg.

Kill switch severity: SOFT severity blocks new entries but leaves existing positions running. HARD severity blocks entries and exits all open positions immediately. The severity is set per kill switch activation. A daily loss limit breach might trigger a SOFT kill; a broker connectivity outage might trigger HARD.

Order-level circuit breakers: Separate from position-level risk controls. The modify-count cap, the reprice budget cap, and the symbol-level pause registry are circuit breakers at the order execution layer. They prevent execution infrastructure problems from compounding into larger losses.

7. Reliability Is Designed In, Not Added After

Startup reconciliation: On every engine restart, before accepting any new signals, the engine runs a full reconciliation against the broker orderbook and rebuilds all open trade states. There is no manual “resync” step after a restart. This is the same reconciliation job that runs every 3 seconds during operation — not a separate recovery path.

No silent failures: Every error path either succeeds or logs and alerts. A broker API call that fails raises an exception that propagates to an alert. A reconciliation pass that can’t complete emits a warning. An SL placement that exhausts its retry budget triggers a Telegram message to the operator. There is no code path where an error is swallowed and execution continues as if nothing happened.

Alert tiers:

INFO → Telegram (strategy signal fired, position opened)
WARN → Telegram (tick data stale, retry in progress)
ERROR → Telegram + email (SL placement failed, reconciliation error)
CRITICAL → Telegram + email + external uptime monitor (kill switch triggered, process down, broker unreachable during market hours)

Deploy discipline: No deployments during market hours (09:15–15:30 IST). No deployments on Friday after 15:00. Every release is git-tagged. A full Postgres dump runs before any schema migration. The recovery procedure from a failed deployment is documented and drilled before it’s needed.

8. Security Is a Property of the Data Model

Broker token encryption: Broker API tokens are encrypted at rest using Fernet symmetric encryption before being stored in the database. The encryption key is in the environment, not in the codebase. Token rotation happens automatically each day as part of the broker’s own session refresh cycle.

Secrets management: All secrets live in an .env file on the server — never in the repository. The repository contains .env.example with key names and no values. This is the simplest approach that works for a self-hosted, single-tenant deployment.

Multi-user isolation by design: Every database table has a user_id column from the first migration. Every query filters by user_id. In a single-tenant deployment, this column is always 1. In a multi-tenant deployment — which this architecture supports without a schema change — the same column becomes the isolation boundary. This is the cheapest investment in future capability: one extra column per table, every query already scoped correctly.

Audit trail as compliance record: The event table (described above) is the forensic record. If an exchange or regulator asks “what happened to this position on this date,” the answer is a query against the event table — complete with timestamps, prices, system actions, and the reconciliation passes that detected each state change. No manual reconstruction needed.

Closing

The question isn’t “does your strategy have edge?”

The question is: “Will your infrastructure hold when the market moves against you at 09:17 AM, your broker’s API returns a 429, the symbol hits its circuit limit, and your SL-LIMIT order gets rejected?”

Most platforms are not built to answer that question with confidence. That’s not a criticism — it’s a reflection of the difficulty. Building execution infrastructure that is correct under market stress requires solving a lot of unglamorous engineering problems: reprice loops, circuit clamping, reconciliation, event sourcing, mode parity, broker abstraction. None of these are the exciting parts of a trading platform. All of them are what separate production-grade infrastructure from a prototype.

ThWorks builds the infrastructure layer. The strategy is yours.

Building serious trading infrastructure for your firm? Talk to us at thworks.org.

ThWorks is a fintech technology company. We build execution engines, risk management systems, and trading automation infrastructure for trading firms and licensed financial entities. We do not provide investment advice, trading signals, or financial recommendations of any kind.