PLATFORM ▸ quantitative research & trading ▸ fabric · azure ml · ai foundry · gemini enterprise · neo4j aura

Predict which ASX stocks will go up. Cangler Quant Platform — engineered for point-in-time correctness.

An end-to-end quantitative platform that predicts ASX equity moves from market, announcement, macro, document-intelligence, and graph signals. Every table, feature, rule, source, and service is registered metadata; every join is point-in-time-safe by construction; every model promotion passes a multi-tier governance gate; and every deploy is artifact-authoritative across Microsoft Fabric, Azure ML, Azure AI Foundry, Gemini Enterprise Agent Platform, and Neo4j Aura.

Microsoft Fabric · Lakehouse · Spark · DataPipelines Azure ML · FLAML · MLflow · metamodel · CPCV · DSR Azure AI Foundry · Claude Opus 4.6 Gemini Enterprise Agent Platform · Gemini 3.1 · multimodal · embeddings Neo4j Aura · graph · director · nearology Fabric SQL DB · asx_metadata control plane GitHub Actions · contract-aware CI

scroll ▾

Isometric rendering of the Cangler Quant Platform showing the Data Estate (Norgate live data, ASX PDFs, alt data) feeding the Fabric Medallion Lakehouse (bronze, silver, gold), with the LLM Intelligence Core (Vertex AI Gemini parsing, Azure AI Claude) on the left, structural alpha edges (Cash Runway, Nearology, Director Network) above, Azure ML and Governance on the right (Machine Learning Factory, Metamodel, Base Model, DSR Governance Gate), and Live Market with TVaR and Copula Risk Overlays on the lower right feeding Execution & Risk. — fig. 00 — platform at a glance: data estate ▸ medallion ▸ llm intelligence ▸ alpha edges ▸ ml + governance ▸ execution & risk

00 — system of interest

A platform engineered to predict which ASX stocks will go up.

The Cangler Quant Platform is an end-to-end research, ingestion, feature engineering, LLM intelligence, ML training, and scoring system for the Australian equities market. Its purpose is concrete: produce calibrated, governance-gated predictions of which ASX-listed stocks are most likely to rise, within explicit holding-period and risk constraints.

It runs on Microsoft Fabric, Azure ML, Azure AI Foundry, Google's Gemini Enterprise Agent Platform (the evolution of Vertex AI), and Neo4j Aura — with a Python wheel (asx_quant) at the centre that owns every piece of reusable logic.

Vendor and public sources land in a Delta-Lake medallion (bronze → silver → gold), point-in-time-safe features and labels are produced, base models and a metamodel are trained in Azure ML, and signals only reach the pre-trading layer after a multi-tier statistical governance gate (Deflated Sharpe Ratio over CPCV simulation) certifies them. It is built as a platform first — not a notebook scratchpad.

value stream / source-to-signal

01register source · table · feature · rule
02ingest bronze · per-source · immutable
03conform silver · point-in-time-safe · universe-agnostic
04build gold · features · labels · evidence
05validate · contracts · data-quality rules · point-in-time evidence
06train base + metamodel · Azure ML · MLflow
07score · backtest · governance evidence
08promote on continuous-integration · smoke · property-based tests · runtime proof

Fabric notebooks

Delta table contracts

DataPipelines

Validation rules

Feature contracts

Source definitions

Architecture docs

Observability configs

// counts reflect the live registry — table, feature, source, rule, service, pipeline metadata is YAML-owned and projected into the Fabric SQL control plane.

01 — engineering pillars

Four invariants the rest of the platform is built on.

Each pillar is enforced in code, not prose. CI fails closed when any of them are violated.

Wheel-first reusable logic

Reusable logic lives in the asx_quant Python wheel; Fabric notebooks are thin orchestration shells that bootstrap context, read metadata, call wheel APIs, and write through contract-aware writers.

▸ asx_quant.transforms.merge
▸ asx_quant.transforms.pit_join
▸ asx_quant.contracts.temporal
▸ asx_quant.harness.PipelineHarness

Metadata-driven everything

Tables, features, rules, sources, services, ML, LLM, observability, governance and pipelines are YAML-owned. Configs are compiled into contracts and projected into the asx_metadata SQL control plane.

▸ configs/tables/*.yaml — 269
▸ configs/features/*.yaml — 68
▸ configs/rules/*.yaml — 113
▸ configs/sources/*.yaml — 211

Point-in-time correctness

Point-in-time safety is a release-blocking invariant. Temporal contracts, centralised as-of joins, leakage scans, point-in-time evidence checks, and Hypothesis property-based tests catch regressions before runtime does.

▸ spine_join() · as_of_join()
▸ apply_temporal_visibility_filter()
▸ check_pit_evidence.py
▸ Fabric property-based test & smoke gates

Artifact-authoritative deploy

CI builds one canonical wheel artifact per commit. Fabric Environment, Azure ML, and the User-Data-Function pull the same exact build by SHA; smoke checks verify identity end-to-end.

▸ deterministic build_wheel.sh
▸ verify_deployment() smoke
▸ Fabric git-sync ➜ Environment
▸ delegated User Data Function deploy from CI VM

02 — data architecture

Bronze ↦ Silver ↦ Gold, with universe-agnostic semantics.

Raw vendor feeds enter bronze, are conformed to point-in-time-safe shapes in silver, and produce universe-agnostic feature tables in gold. Universe filtering only happens at the Lake View boundary — never inside feature computation.

layer / 01

bronze.*

Raw, per-source, immutable. One table per source/version.

norgate_ohlcv_daily
asx_announcements
norgate_macro
rba_rates
fred_macro

layer / 02

silver.*

Cleaned, conformed, point-in-time-safe. Universe-agnostic.

silver_market_price_daily
silver_corp_action
silver_macro_*
silver_universe_clust_id
silver_news_event

layer / 03

gold.*

Feature tables, labels, evidence, model inputs.

feat_technical_daily
feat_norgate_daily
feat_announcement_daily
feat_llm_*
model_input_*_daily

Data pipeline diagram showing bronze ingestion, parsing and silver layer with Azure AI Foundry and Vertex AI, ML metamodel layer, and gold feature store with structural alpha edges. — fig. 01 — ingestion ▸ silver ▸ ml/metamodel ▸ alpha edges, with control plane & risk gate underneath

03 — solution architecture

Multi-cloud by design, single source of truth by discipline.

Microsoft Fabric is the analytical core. Azure ML trains and scores. Azure AI Foundry hosts reasoning routes. Google's Gemini Enterprise Agent Platform (the evolution of Vertex AI) handles multimodal extraction. Neo4j Aura holds the structural-alpha graph — director networks, nearology, corporate edges. The wheel ties them together; YAML configs and the Fabric SQL control plane govern them.

Cangler Quant Platform architecture diagram showing Microsoft Azure (Fabric Lakehouse, AI Foundry with Claude, Azure ML), Google Cloud (Vertex AI Gemini, GCS), CI/CD via GitHub Actions, Norgate VM source, observability surfaces, and structural alpha edges. — fig. 02a — system-of-record: clouds, runtimes, ingestion paths, observability, alpha edges

Architectural zones isometric diagram: Zone 1 Ingestion (Norgate OHLCV, ASX Filings, Alt and Macro Streams), Zone 2 Intelligence Core (Vertex AI Extraction, Azure AI Claude), Zone 3 Data Estate (Gold Feature Store with PIT Spine, Bronze Immutable), Zone 4 Alpha Edges (Cash Runway, Nearology, Attention, Foreign Listings, Brokers, Directors, Narrative), Zone 5 Orchestration (Metadata Control Plane), Zone 6 Machine Learning (Walk-Forward Folds, Base Models, Metamodel Stacking), Zone 7 Risk and Execution (Risk Governance and Guardrails, Live Execution and Telemetry, Escrow). — fig. 02b — same architecture as seven labelled zones: ingestion ▸ intelligence ▸ data estate ▸ alpha edges ▸ orchestration ▸ ml ▸ risk & execution

Azure primary

Microsoft Fabric

Lakehouse asx, Spark notebooks, DataPipelines, Environment, User Data Function source artifact, smoke + Hypothesis property-based test targets.

Azure ml

Azure ML

FLAML AutoML, 5-fold walk-forward, metamodel, scoring, backtest, MLflow workspace tracking.

Azure llm

Azure AI Foundry

Claude Opus 4.6 reasoning, Anthropic-org capability, AI Services project — routed via the LLM control plane.

GCP agent platform

Gemini Enterprise Agent Platform

Google's Gemini Enterprise Agent Platform — the evolution of Vertex AI. Gemini 3.1 multimodal extraction, embeddings, Model Garden access, batch + sync routes, taxonomy-aware classification.

GCP staging

GCS & BigQuery

Object staging for PDFs and large multimodal payloads; GDELT ingestion via BigQuery with dry-run evidence.

Graph alpha edges

Neo4j Aura

Managed graph database for the structural-alpha edges: director networks, nearology, corporate / promoter graphs. Fail-closed activation gate; canary-evidence required before production use.

Azure runner

CI VM (Linux)

Always-on Standard_D8ads_v7 self-hosted runner for heavy Spark / SQL / property-based tests, Fabric smoke, ML wheel upload, and User Data Function deploy.

Source bridge

Norgate Premium

Windows 365 bridge VM that runs Norgate schema/capability polls and exposes a thin extract surface to Fabric.

04 — detailed system view

External sources, multi-cloud runtimes, control plane, execution, and CI/CD.

External sources feed an ingestion bridge into Microsoft Azure — Fabric Lakehouse, asx_metadata SQL Database, Azure Functions, Azure ML workspace, Azure AI Foundry, Key Vault, and the Linux CI VM. Google Cloud provides the Gemini Enterprise Agent Platform and BigQuery; Neo4j Aura provides graph analytics. The execution layer publishes daily trading signals, risk overlays, and key-performance-indicator feeds. Point-in-time safety, contracts, observability, and runtime proof span vertically across every layer.

Detailed system view showing Development & Agents (GitHub, Codex, Claude Code) at the top, External Sources, Ingestion Bridge, Microsoft Azure Cloud (Fabric Lakehouse, asx_metadata SQL DB, Functions, Azure ML workspace, Azure AI Foundry, Key Vault, AKS CI VM), Google Cloud (Gemini Enterprise Agent Platform, BigQuery), Neo4j Aura graph analytics, Execution Layer (daily trading, risk overlays, KPI feed), Engineering Excellence column (testability, contracts, observability), and a CI/CD swimlane along the bottom (deploy dev, deploy UDF, smoke, promote). — fig. 03a — detailed system: external sources ▸ azure ▸ gcp ▸ neo4j ▸ execution; engineering-excellence column + ci/cd swimlane

High-density isometric platform rendering: Ingestion (Norgate OHLCV feed, Norgate fundamentals data, Norgate dividend events, ASIC mandatory announcements PDFs, ASIC company trade reports, ASIC sub-registry shorts, alternative news streams, global macro data) feeding Microsoft Fabric OneLake bronze immutable storage; central Medallion Lakehouse with Silver Spine, Gold Feature Store, label classification, feat_technical, feat_fundamental; Dual-LLM Extraction engine (Vertex AI Gemini extraction, Azure AI Foundry Claude reasoning); Metadata Control Plane (Fabric SQL DB, YAML contracts, GitHub Actions CI/CD, observability logs, 3-DAG Orchestrator); Alpha Edges (Cash Runway, Nearology Graph, Narrative Churn, Foreign Supply, Media Content, Foreign Attention); ML Mega-Grid with Metamodel Stacking, Combinatorial Purged CV + walk-forward folds, base models; Copula Risk Engine (TVaR limits, emergency circuit breakers), Live Broker Telemetry, Broker Execution Adapter. — fig. 03b — full-density isometric: every named component on one canvas — ingestion ▸ dual-LLM extraction ▸ medallion ▸ control plane ▸ alpha edges ▸ ML mega-grid ▸ copula risk engine ▸ broker

05 — metadata & data quality

Data is governed before it lands. Schema, validation, partitioning, lineage, and alerts all live in metadata.

Cross-cutting invariants: contract = truth, point-in-time-safe, wheel-first, fail-closed, CI-gated, zero tech debt. Every persisted Delta table has a registered contract; every join routes through wheel-level helpers; every validation rule is YAML that compiles into the runtime rule registry; every drift between repo and live is caught by a CI parity check.

Data Quality, Contracts and Guardrails diagram. Cross-cutting invariants across the top: Contract = Truth, PIT-Safe, Wheel-First, Fail-Closed, CI Gated, Zero Tech Debt. Five labelled columns: (1) Declarative Metadata — Pipeline, ML, LLM, Observability, Glossary, Services, configs/YAML; (2) Compiled Contracts — semantic models, meta-entity types, semantic versioning; (3) Runtime Enforcement — Claude Agent over a Data Dictionary with mapping logic and language map; (4) CI / PBT / Runtime Proof — Parity, SQL, Temporal contracts, Notebook contract, PBT, Analyzer; (5) Observability and SQL Control Plane with integrated database. Footer reads: Operator Review → YAML Update → Re-Sync, with SCD2 sync. — fig. 05 — five-column metadata + DQ stack: declarative YAML ▸ compiled contracts ▸ runtime enforcement ▸ CI / PBT / runtime proof ▸ observability & SQL control plane.

column / 01

Declarative metadata

Every operational fact lives in YAML — sources, tables, features, rules, services, pipelines, observability, governance, ML, LLM, fixes, glossary terms. Tunable policy and runtime metadata belong in YAML; only mathematical mechanics stay in Python.

tables269
features68
rules113
sources211
services14
observability20

column / 02

Compiled contracts

asx_quant.contracts compiles YAML into the canonical persisted contract — schema, keys, partitioning, write mode, managed metadata, status, and notebook ownership. The compiled contract is the schema authority. The live Delta table is never allowed to become the authority.

▸ load_table_contract() · compiled_table_contract_columns()
▸ resolve_temporal_contract() for PIT expectations
▸ Gold features use two-file pattern: table + feature catalog

column / 03

Runtime enforcement (in the wheel)

All persistence routes through wheel-level helpers so notebooks can't reinvent the merge mechanics or skip safety checks. Centralisation prevents partition drift, duplicate keys, missed updates, and parallel-run conflicts — and makes every rule testable in CI.

▸ contract_write() — aligns DF to contract, derives partitions, fails on missing or unexpected columns
▸ merge_into() — contract-aware Delta MERGE with consistent partition handling
▸ scd2_upsert() — shared registry-style upsert
▸ spine_join() · as_of_join() · forward_fill_events() — point-in-time joins
▸ Pipeline harness resolves rules and writes results to SQL on every run

column / 04

CI / property-based tests / runtime proof

Layered enforcement — cheap static checks catch most regressions before Fabric even sees the change; property-based tests cover invariants; smoke and runtime proof verify the live state matches the deployed contract.

▸ check_contract_parity.py — YAML ↔ compiled ↔ runtime projections
▸ check_sql_ddl_parity.py — SQL DDL ↔ YAML
▸ check_yaml_crossrefs.py — reads_from resolves
▸ check_table_contract_drift.py — live SQL stays aligned
▸ Hypothesis property tests under tests/properties/
▸ fabric-pbt.yml — Fabric-side PBT via nb_pbt_fabric notebook

column / 05

Observability & SQL control plane

asx_metadata is the runtime ledger. Every pipeline run, every validation result, every freshness deadline lands here through SqlLogger. Views turn raw logs into operator surfaces.

▸ validation_log · pipeline_run_log · pipeline_execution_log
▸ freshness_sla with stale-data alerts
▸ Views: source health · output health · validation summary · pipeline SLA · stale alerts · failure alerts
▸ Registry sync (sync_configs_to_sql()) is SCD2 — full history of every config change

microsoft purview

Catalogue + DQ scans

Purview auto-discovers Delta tables in the Fabric workspace. asx_quant.governance.PurviewPublisher enriches what Purview already knows with glossary terms, column metadata, and lineage. Scheduled Purview DQ scans cover completeness, consistency, conformity, accuracy, and uniqueness. Advisory layer — not the runtime DQ source of truth.

delta time travel

Every table is a point-in-time replay

Every persisted table is a Delta table, which means every commit is recorded in a transaction log. Any prior state is queryable with VERSION AS OF or TIMESTAMP AS OF, and accidental writes can be undone with RESTORE TABLE. That gives the platform three properties for free:

▸ Reproducible training — every Azure ML run records the Delta version of its model-input table; backtests can re-fit from the exact bytes the original run saw, even months later.
▸ Auditable lineage — DESCRIBE HISTORY on any feature or model-input table returns who wrote it, with which notebook, when, and the operation; combined with the SQL control plane's correlation IDs this closes the loop from prediction back to commit SHA.
▸ Safe rollback — a bad write is one RESTORE command from being undone, with retention controlled by Delta VACUUM windows (extended beyond the 7-day default for governance-critical tables).

agent context

Agents inherit a working understanding of the platform

The metadata layer doesn't just describe what exists — it describes how things relate. Semantic models, mapping logic, and reasoning logic maps turn the registry into something an agent can navigate the way an architect would: which table feeds which feature, which rule governs which field, what a term like cash runway actually means, and where the implementation lives.

The effect is concrete. Implementation and research agents stop guessing. They propose changes grounded in the platform's actual structure, current state, and recent history, rather than fragmentary grep results — which is what makes agent-led work safe to merge.

// a formal ontology layer is being added via Microsoft Fabric IQ — typed entities, classes, and semantic relationships projected from the metadata registry, so agents reason against an explicit conceptual model rather than inferring it from SQL DDL.

06 — entity resolution

A stock is not its ticker. The platform tracks the company behind the symbol.

On the ASX, a company can rename, re-list, merge, or de-list, and the ticker it leaves behind can be reassigned to a completely unrelated business years later. If a backtest treats XYZ as one thing across history, the result is quietly wrong — and that wrongness is what builds models that look brilliant in development and lose money in production.

Entity resolution and survivorship-bias diagram. On the left, raw vendor data sources — primary market-data vendor, exchange announcements feed, securities-reference vendor — flow into an Entity Resolution Engine that produces canonical identities held in a date-bounded registry with one row per canonical identity per ticker lifetime. A ticker-reuse panel shows the same ASX code XYZ resolving to two different canonical entities — XYZ 2008–2014 (delisted) and XYZ 2019–present (active) — making clear that joining on ticker alone would mix two unrelated entities. On the right, a falsifying universe reconstruction restores the 2014 universe with delisted companies present, contrasted against the survivorship-biased view which has returns biased upward (missing losers), volatility biased downward (worst outcomes missing), and tail risk invisible. The result on the lower right is an accurate, verified backtest against a survivorship-bias-free historical universe. — fig. 06 — raw vendor IDs ▸ entity resolution engine ▸ date-bounded canonical registry ▸ ticker-reuse correctly split ▸ survivorship-bias-free historical universe.

why we do this

Survivorship bias is the biggest one — and it's invisible

Build a universe from today's listed ASX companies and everything that de-listed before today is silently missing: the takeovers, the wind-ups, the companies that went to zero. Backtest on that universe and your returns look great — because the losers were never there to lose money.

The platform ingests Norgate Premium, which carries full delisted history. Every company that ever traded on the ASX stays in the universe, marked with its delisted_date if it eventually exited. A validation rule blocks any release where the delisted symbols don't carry price data before their exit. You can't accidentally backtest on the survivors.

Two smaller problems sit beside it. Ticker reuse — an old ASX code can be reassigned to a new company, and a naïve join on the symbol attaches the new company's fundamentals to the old company's prices. Cross-source identity drift — the same company appears under different IDs in Norgate, ASX, and ASIC filings, and a join that ignores that silently produces phantom "companies".

how we fix it

One canonical `entity_id` per company, point-in-time

Every table in silver and gold uses a single canonical key — entity_id — instead of the ticker symbol. The mapping from symbol to entity is date-bounded, so a reused ticker resolves to two different entities on either side of the reuse, and a renamed company keeps the same entity across the rename.

The mapping lives in silver.ref_entity_registry: a registered Delta table with full SCD2 history. New evidence writes new vintage rows; historical rows are never mutated. Delta time-travel can reproduce any prior state of the registry, so a model trained six months ago can be re-fit on the exact entity layout that existed at training time.

the joins that depend on it

And sector membership is point-in-time too

Once the entity layer is in place, every downstream join is date-bounded by construction: Norgate via the trading-date range, ASX via stable asx_issuer_id, and LLM-extracted filer names via a confidence-scored resolver.

Sector and cluster groupings get the same treatment. The Global Industry Classification Standard (GICS) is the sector taxonomy maintained by S&P and MSCI that tags every listed company as Financials, Materials, Healthcare, and so on. Norgate carries a GICS field, but it has no historical parameter — using it would silently apply today's sector to every past row (a second-order survivorship bias). Instead, the platform's own clust_id is computed inside silver and stays correct as-of the trading date.

07 — end-to-end flow

Fifteen named layers, each its own contract.

From cloud infrastructure through data collection, cleaning, AI feature laboratory, AI agent intelligence pipelines, structural alpha edges, cluster & network intelligence, research intelligence engine, virtual research committee, quantum compute, base model army, the super-model (metamodel), risk management & safety, daily trading, and the central command centre for observability.

End-to-end pipeline diagram with 15 numbered layers spanning cloud infrastructure, data collection, data cleaning, AI feature laboratory, AI agent intelligence pipelines, structural alpha edges, cluster and network intelligence, research intelligence engine, virtual research committee, quantum computing layer, base model army, the super-model metamodel, risk management and safety, daily trading, and the central command centre. — fig. 04 — fifteen-layer reference pipeline; engineering surfaces highlighted

08 — governance & validation

A signal does not promote until it passes the governance gates.

The platform treats every backtested strategy as a hypothesis exposed to multiple-testing risk. A three-tier governance gate runs after training: a single-path walk-forward filter, a Combinatorial Purged Cross-Validation (CPCV) distribution with Deflated Sharpe Ratio correction, and a held-out out-of-sample anchor. Promotion requires all three tiers to pass — Deflated Sharpe Ratio ≥ 0.95 across the CPCV distribution is the statistical-significance gate, and the out-of-sample equity curve must land inside that distribution within one standard deviation.

tier 01 single-path walk-forward

Walk-forward filter

Eliminates obviously bad configurations quickly.

Sharpe≥ 0.8
Max Drawdown≥ −25%
Win Rate≥ 43%
Profit Factor≥ 1.3

Necessary but not sufficient. Single-path equity curves cannot detect overfitting at scale — by López de Prado's False Strategy Theorem, 325-trial sweeps will always surface an apparently strong path purely by chance.

tier 02 CPCV · DSR · simulation

Combinatorial Purged CV + Deflated Sharpe

Statistical significance under multiple testing.

Median Sharpe≥ 0.5
5th percentile> 0
DSR≥ 0.95

N independent equity-curve paths are generated by combinatorial fold orderings with purging + embargo. The Bailey & López de Prado (2014) DSR is computed on the resulting Sharpe distribution — correcting for trial count, skew, and kurtosis. Only signals that beat their own noise floor pass.

tier 03 out-of-sample anchor

Out-of-sample holdout, anchored

Forward-looking confirmation of the CPCV finding.

Out-of-sample Sharpe≥ 0.5
Out-of-sample vs CPCVwithin 1σ

A held-out anchored window the metamodel has never seen. The out-of-sample equity curve must land inside the CPCV distribution — divergence here is a hard fail, regardless of how good Tier 02 looked.

5-Fold Walk-Forward Cross-Validation visualisation: concentric arc rings labelled Fold 1 through Fold 5, with an out-of-sample slice highlighted; below, Training slice plus embargo gap feeds Base Model Training (FLAML); base model out-of-fold predictions (stationary, probabilistic) flow into Metamodel Orchestration with lambda, projection, and exclusion operators. Mathematical notation including lambda and sigma symbols overlays the scene. — fig. 05a — five-fold walk-forward with embargo gap ▸ base-model training (FLAML) ▸ out-of-fold predictions ▸ metamodel orchestration. The combinatorial enumeration of fold orderings is what drives the Tier 02 CPCV distribution above.

5-Fold Walk-Forward Cross-Validation timeline view: time-series equity panels arranged left-to-right showing Fold 1, Fold 2 with a 12-month training window, a 60-day embargo gap, Fold 3, and Fold 5 as the out-of-sample validation window. Below the panels, the training slice plus embargo gap feeds Base Model Training (FLAML); base-model out-of-fold predictions (stationary probabilities) flow into Metamodel Orchestration with Lambdamart LTR, formula-read-weighted Sharpe, and canonical-correlation-analysis operators. Mathematical symbols lambda, rho, sigma overlay the scene. — fig. 05b — same scheme on the time axis: 12-month training window, 60-day embargo gap, fold boundaries on the equity series, then the out-of-sample validation window. This is the temporal projection of fig. 05a.

Simulation and Risk Engine — DSR Verification Pathway, the canonical knowledge-graph fusion and reward-engineering loop. Four columns: Data Ingestion (bi-temporal knowledge-time state ingestion, Norgate PIT data, optional public registries, Stage 1 / Stage 2 curated data, gold feature store, geospatial coordinate reconciliation); Core Refinery and Entity Resolution precursor (ADR-008 canonical ER layer); Simulation Loop — the Heart (shadow modelling and chamber testing, ML training pipeline with FLAML AutoML DSR-verified, training-window policy net action selection, PIT KG node embeddings, trading orchestrator, derived-data non-reversibility shield, IBKR production simulator and microstructure engine with slippage estimator, spread modelling, portfolio constraints, strict ADV caps, portfolio and risk manager gate); Governance and DSR Enforcement (iterate on hyperparameters and re-train on fail, DSR ≥ 0.95 deflated Sharpe ratio gate, pass to deployment); Deployment and Execution post-gate (live capability sync, trading orchestrator with automated IBKR API TWAP/VWAP order parenting and ADV caps, derived-data non-reversibility shield). — fig. 05c — simulation & risk engine: DSR verification pathway. Data ingestion ▸ entity resolution ▸ simulation loop (shadow modelling, FLAML training, IBKR production simulator) ▸ DSR ≥ 0.95 gate ▸ deployment & execution.

// risk overlays applied at the execution boundary

TVaR Tail Value-at-Risk monitored continuously; portfolio-level limits encoded in YAML governance configs.

Clayton copula Joint lower-tail dependence on held positions; reject new entries while the joint crash probability exceeds threshold.

Skew & kurtosis monitoring Per-strategy higher-moment surveillance; auto-pause on regime-shift signatures.

Effective hypothesis count DSR correction tracks the total trial count across base-model, metamodel, and backtest layers — selection bias accounted for end-to-end.

09 — ai engineering operating model

Humans, agents, and CI as one delivery system.

AI agents are productive but untrusted executors. Prose instructions help — the real safety model is executable: contracts, tests, CI, manifests, CODEOWNERS, protected paths, and repeatable deployment workflows.

●

User

decides & gates

Scope & sequencing
Architecture trade-offs
Feature approval
Prod promotion

◆

Codex

reviews & specifies

Architecture review
Spec ownership
Risk & point-in-time review
Acceptance gates

▲

Claude Code

implements in scope

Mechanical implementation
Notebook + wheel edits
Tests & handoffs
Repo guardrail conformance

■

CI & branch protection

enforces deterministically

Contract / DDL parity
Repo rules & notebook contracts
Property-based tests, smoke, runtime proof
Deployment identity

Cross-harness agent handoff system (orchestration plane): the Claude Code Agent on the left and the Codex Agent on the right hand off work through Git (immutable refs, pinned commit SHAs), an Azure Service Bus wake signal, a Cosmos DB live-state and searchable-history store in Fabric, and human session notes in Obsidian on a Windows 365 VM. A concurrent-work model shows load-bearing cheap immutable commits alongside isolated test and build resources, with role symmetry giving each agent a finishing-turn handoff. CI gates act as the validation and done-arbiter via contract parity and point-in-time leakage checks. — fig. 07 — cross-harness agent handoff: git immutable refs ▸ service bus wake signal ▸ cosmos live state ▸ obsidian human note ▸ concurrent work model ▸ CI done-arbiter

// research intelligence loop

Claude Code requests synthesis. Gemini DeepMind returns research-grade definitions. The platform gets new features.

Implementation agents do not invent quantitative features in isolation. When Claude Code identifies a gap — a missing alpha vector, a new market-microstructure signal, a feature the architecture requires but doesn't yet implement — it delegates the research step over the Model Context Protocol to Gemini DeepMind, which has the deep-reasoning depth and the academic-corpus access needed to synthesise a candidate definition. Gemini returns the mathematical formulation, citations (market microstructure, canonical correlation analysis, learning-to-rank), and the small-cap-context constraints. Claude Code then implements the feature inside the platform's contracts, tests, and continuous-integration gates.

Both agents stay inside their lane. Claude Code does not invent novel quantitative theory; Gemini DeepMind does not touch production code. The Model Context Protocol is the contract between them.

Research intelligence loop: Claude Code (Implementation and Orchestration Agent) on the left, Gemini DeepMind (Deep Reasoning and Research Specialist) on the right, exchanging via the Model Context Protocol. Centre shows NEW FEATURE: ASX_ALPHA_VECTOR_v2 with mathematical formulae, time-decay weights, and accuracy probability. Arrows label Claude requesting synthesis and Gemini providing definition. Right side shows academic papers (market microstructure, CCA, learning-to-rank), ASX small-cap market context, liquidity maps, and information dispersion rates. Left side shows platform codebase panel. A person in an AR headset gestures at the scene. — fig. 06 — Claude Code ▸ MCP ▸ Gemini DeepMind ▸ research synthesis ▸ feature definition ▸ implementation in repo

// repo guardrails — turning project memory into executable checks

check_contract_parity.pytable / feature contracts ↔ runtime projections

check_sql_ddl_parity.pyYAML ↔ SQL DDL column parity

check_repo_rules.pybanned patterns, unsafe thresholds, fold edits

check_notebook_contracts.pyrequired Fabric notebook artifact shape

check_pit_evidence.pypoint-in-time evidence presence on feature surfaces

check_pit_leakage_regression.pyregression scan for forward-leaking joins

check_yaml_crossrefs.pyreads_from & cross-config references

check_public_api_tests.pynew exported wheel APIs must have tests

check_docs_impact.pycontract / arch changes ↔ docs updates

check_handoff_template.pyrequired sections on agent handoffs

check_circular_deps.pyconfig dependency cycles

check_test_count.pytest count floor — no silent deletion

10 — delivery topology

From commit to live Fabric, one artifact at a time.

Each commit produces one canonical wheel. Fabric Environment, Azure ML, and the live User Data Function all consume that exact SHA-pinned artifact. Promotion to Prod is a deliberate human action through Fabric Deployment Pipelines, after which main bookkeeps the promoted SHA.

1

commit → push to dev

Feature branch → squash merge. Pre-commit hooks run Ruff, mypy, and project-specific local checks. CI is the merge gate.
2

CI: lint · type · contract · test · build

Lint, mypy, repo rules, contract / SQL / notebook parity, unit + Spark + SQL integration, Hypothesis property-based tests, then build one canonical wheel artifact.
3

deploy-dev: Fabric git sync ▸ Environment upload

Fabric updates from the source tree, binds pipeline notebooks, then uploads the exact CI-built wheel to the Dev Environment and polls until publish completes.
4

deploy-udf-dev: delegated User Data Function update

Azure Linux self-hosted runner uses cached delegated Fabric tokens to updateDefinition on the live User Data Function with the same wheel artifact — never re-built, never re-hashed.
5

fabric-smoke-test: pytest + notebook smoke

Smoke pytest and a Fabric notebook smoke run verify build metadata, wheel identity, and key control-plane invariants on the live workspace.
6

Dev → Prod promotion (human-gated)

User-initiated through Fabric Deployment Pipelines. After successful promotion, main is advanced to the promoted SHA and an annotated release tag is created.

Delivery topology diagram. On the left, a Developer Zone where feature branches merge into dev through a merge gate after pre-commit hooks (Ruff, mypy, local checks). The CI pipeline runs lint, type, contract, and test stages and emits one canonical wheel artifact, identified by SHA. The Dev Deployment Zone consumes that exact artifact via Fabric git sync, environment upload to the Dev Environment, and a delegated update to the live User Data Function on the Azure Linux self-hosted runner using cached delegated Fabric tokens. A smoke-test stage runs Fabric pytest and notebook smoke checks to verify build metadata, wheel identity, and key control-plane invariants. The Prod Deployment Zone is reached through a human-gated promotion in Fabric Deployment Pipelines. A maintenance and bookkeeping panel on the right advances main to the promoted SHA and creates the annotated release tag. — fig. 10 — one canonical wheel propagates SHA-pinned from dev merge through CI, Fabric Environment, the live User Data Function, smoke tests, and the human-gated Prod promotion.

11 — contracts as agent boundaries

Six contract surfaces. Every line of platform code lives behind one of them.

contract / table

Delta schema, status, keys, partitioning, write mode, notebook ownership — declared in configs/tables/*.yaml, compiled, enforced at write.

contract / feature

Gold feature columns, lookbacks, warmups, point-in-time notes, reads_from dependencies, and per-column type/nullability.

contract / rule

Validation rule families, severity, check type, target scope — synced into the SQL control plane as the runtime rule registry.

contract / temporal

Point-in-time join shapes, as-of/spine semantics, write-time visibility windows, and evidence expectations for feature/label surfaces.

contract / notebook

Required Fabric artifact shape: .Notebook directory, notebook-content.py, notebook-settings.json, .platform.

contract / deployment

Canonical wheel identity, User Data Function source / wheel payload shape, smoke expectations, and exact-SHA artifact propagation across runtimes.

Data contract runtime enforcement and observability framework. On the left, six contract surfaces — contract/table, contract/feature, contract/rule, contract/temporal, contract/notebook, contract/deployment — declared as YAML and compiled by a central contract compilation engine into a canonical Python wheel. The compiled enforcement payload feeds a runtime validation point that checks table closure, rule registry sync, feature dependencies, and temporal contracts. Valid payloads flow to verified Delta Lake persistence and on to quant model inference under a zero-trust posture; invalid payloads are fail-closed, logged and alerted. Above the validation point, an observability control plane backed by the SQL registry surfaces validation latency, contract compliance rate, and detected drift incidents. — fig. 11 — YAML-declared contracts compile into the wheel, enforce at the runtime validation point, and surface in the SQL observability control plane.

appendix — runtime readout

A snapshot of what is provisioned, registered, and deployed right now.

Azure capacity, Azure ML workspace, the Google Cloud footprint, the YAML-declared platform inventory, and the size of the source tree — read from the live control plane and the repo at the time the page was last built.

asx_quant · runtime readout · 2026-06-18

$ az fabric capacity show --name fabriccapacityaustraliaeast1
▸ sku           F8
▸ state         Active
▸ provisioning  Succeeded
▸ region        australiaeast

$ az ml workspace show --name asx-quant-ml
▸ provisioning  Succeeded
▸ identity      SystemAssigned

$ az vm list -d --query "[?powerState=='VM running']"
▸ ci runner      Standard_D16ads_v7 · 16 vCPU · australiaeast
▸ agent host     Standard_D16ads_v7 · 16 vCPU · australiaeast
▸ norgate bridge Standard_B4s_v2 · 4 vCPU · windows · australiaeast

$ gcloud project · asx-quant-platform
▸ gcs                              private bucket · WIF · key-vault-pinned
▸ gemini enterprise agent platform gemini-3.1 flash-lite + gemini-embedding-2
▸ vector search                    australia-southeast1 · time-sliced indices
▸ bigquery                         gdelt-bq.gdeltv2 · bounded · dry-run gated
▸ neo4j aura                       graph features · structural edges

$ registry counts · from configs/*.yaml
▸ tables        269
▸ features      70
▸ rules         112
▸ sources       210
▸ services      14
▸ pipelines     15

$ source tree · lines (excluding docs)
▸ wheel         162,799 lines · asx_quant/
▸ platform      47,808 lines · scripts, infra, azure_ml
▸ tests         157,859 lines · 5,925 tests
▸ notebooks     19,518 lines · 175 fabric notebooks
▸ yaml          253,920 lines · configs + generated
▸ sql           8,297 lines
▸ total         650,201 lines · all tracked

$ design surface · docs/
▸ architecture  30 docs
▸ adrs          22 decisions
▸ specs         55 specs

$ deployment identity
▸ wheel         artifact-authoritative · one canonical sha
▸ udf           delegated · sha-pinned
▸ smoke         build_info verified