ASX Quant Platform / engineering
live-verified · 2026-05-24
PLATFORM quantitative research & trading fabric · azure ml · ai foundry · gemini enterprise · neo4j aura

Predict which ASX stocks will go up. ASX Quant Platform — engineered for point-in-time correctness.

An end-to-end quantitative platform that predicts ASX equity moves from market, announcement, macro, document-intelligence, and graph signals. Every table, feature, rule, source, and service is registered metadata; every join is point-in-time-safe by construction; every model promotion passes a multi-tier governance gate; and every deploy is artifact-authoritative across Microsoft Fabric, Azure ML, Azure AI Foundry, Gemini Enterprise Agent Platform, and Neo4j Aura.

Microsoft Fabric · Lakehouse · Spark · DataPipelines Azure ML · FLAML · MLflow · metamodel · CPCV · DSR Azure AI Foundry · Claude Opus 4.6 Gemini Enterprise Agent Platform · Gemini 3.1 · multimodal · embeddings Neo4j Aura · graph · director · nearology Fabric SQL DB · asx_metadata control plane GitHub Actions · contract-aware CI
scroll ▾
Isometric rendering of the ASX Quant Platform showing the Data Estate (Norgate live data, ASX PDFs, alt data) feeding the Fabric Medallion Lakehouse (bronze, silver, gold), with the LLM Intelligence Core (Vertex AI Gemini parsing, Azure AI Claude) on the left, structural alpha edges (Cash Runway, Nearology, Director Network) above, Azure ML and Governance on the right (Machine Learning Factory, Metamodel, Base Model, DSR Governance Gate), and Live Market with TVaR and Copula Risk Overlays on the lower right feeding Execution & Risk.
fig. 00 — platform at a glance: data estate ▸ medallion ▸ llm intelligence ▸ alpha edges ▸ ml + governance ▸ execution & risk
§ 00 — system of interest

A platform engineered to predict which ASX stocks will go up.

The ASX Quant Platform is an end-to-end research, ingestion, feature engineering, LLM intelligence, ML training, and scoring system for the Australian equities market. Its purpose is concrete: produce calibrated, governance-gated predictions of which ASX-listed stocks are most likely to rise, within explicit holding-period and risk constraints.

It runs on Microsoft Fabric, Azure ML, Azure AI Foundry, Google's Gemini Enterprise Agent Platform (the evolution of Vertex AI), and Neo4j Aura — with a Python wheel (asx_quant) at the centre that owns every piece of reusable logic.

Vendor and public sources land in a Delta-Lake medallion (bronzesilvergold), point-in-time-safe features and labels are produced, base models and a metamodel are trained in Azure ML, and signals only reach the pre-trading layer after a multi-tier statistical governance gate (Deflated Sharpe Ratio over CPCV simulation) certifies them. It is built as a platform first — not a notebook scratchpad.

value stream / source-to-signal
  1. 01register source · table · feature · rule
  2. 02ingest bronze · per-source · immutable
  3. 03conform silver · point-in-time-safe · universe-agnostic
  4. 04build gold · features · labels · evidence
  5. 05validate · contracts · data-quality rules · point-in-time evidence
  6. 06train base + metamodel · Azure ML · MLflow
  7. 07score · backtest · governance evidence
  8. 08promote on continuous-integration · smoke · property-based tests · runtime proof
0
Fabric notebooks
0
Delta table contracts
0
DataPipelines
0
Validation rules
0
Feature contracts
0
Source definitions
0
Architecture docs
0
Observability configs

// counts reflect the live registry — table, feature, source, rule, service, pipeline metadata is YAML-owned and projected into the Fabric SQL control plane.

§ 01 — engineering pillars

Four invariants the rest of the platform is built on.

Each pillar is enforced in code, not prose. CI fails closed when any of them are violated.

01

Wheel-first reusable logic

Reusable logic lives in the asx_quant Python wheel; Fabric notebooks are thin orchestration shells that bootstrap context, read metadata, call wheel APIs, and write through contract-aware writers.

  • asx_quant.transforms.merge
  • asx_quant.transforms.pit_join
  • asx_quant.contracts.temporal
  • asx_quant.harness.PipelineHarness
02

Metadata-driven everything

Tables, features, rules, sources, services, ML, LLM, observability, governance and pipelines are YAML-owned. Configs are compiled into contracts and projected into the asx_metadata SQL control plane.

  • configs/tables/*.yaml — 269
  • configs/features/*.yaml — 68
  • configs/rules/*.yaml — 113
  • configs/sources/*.yaml — 211
03

Point-in-time correctness

Point-in-time safety is a release-blocking invariant. Temporal contracts, centralised as-of joins, leakage scans, point-in-time evidence checks, and Hypothesis property-based tests catch regressions before runtime does.

  • spine_join() · as_of_join()
  • apply_temporal_visibility_filter()
  • check_pit_evidence.py
  • Fabric property-based test & smoke gates
04

Artifact-authoritative deploy

CI builds one canonical wheel artifact per commit. Fabric Environment, Azure ML, and the User-Data-Function pull the same exact build by SHA; smoke checks verify identity end-to-end.

  • deterministic build_wheel.sh
  • verify_deployment() smoke
  • Fabric git-sync ➜ Environment
  • delegated User Data Function deploy from CI VM
§ 02 — data architecture

Bronze ↦ Silver ↦ Gold, with universe-agnostic semantics.

Raw vendor feeds enter bronze, are conformed to point-in-time-safe shapes in silver, and produce universe-agnostic feature tables in gold. Universe filtering only happens at the Lake View boundary — never inside feature computation.

layer / 01

bronze.*

Raw, per-source, immutable. One table per source/version.

  • norgate_ohlcv_daily
  • asx_announcements
  • norgate_macro
  • rba_rates
  • fred_macro
layer / 02

silver.*

Cleaned, conformed, point-in-time-safe. Universe-agnostic.

  • silver_market_price_daily
  • silver_corp_action
  • silver_macro_*
  • silver_universe_clust_id
  • silver_news_event
layer / 03

gold.*

Feature tables, labels, evidence, model inputs.

  • feat_technical_daily
  • feat_norgate_daily
  • feat_announcement_daily
  • feat_llm_*
  • model_input_*_daily
Data pipeline diagram showing bronze ingestion, parsing and silver layer with Azure AI Foundry and Vertex AI, ML metamodel layer, and gold feature store with structural alpha edges.
fig. 01 — ingestion ▸ silver ▸ ml/metamodel ▸ alpha edges, with control plane & risk gate underneath
§ 03 — solution architecture

Multi-cloud by design, single source of truth by discipline.

Microsoft Fabric is the analytical core. Azure ML trains and scores. Azure AI Foundry hosts reasoning routes. Google's Gemini Enterprise Agent Platform (the evolution of Vertex AI) handles multimodal extraction. Neo4j Aura holds the structural-alpha graph — director networks, nearology, corporate edges. The wheel ties them together; YAML configs and the Fabric SQL control plane govern them.

ASX Quant Platform architecture diagram showing Microsoft Azure (Fabric Lakehouse, AI Foundry with Claude, Azure ML), Google Cloud (Vertex AI Gemini, GCS), CI/CD via GitHub Actions, Norgate VM source, observability surfaces, and structural alpha edges.
fig. 02a — system-of-record: clouds, runtimes, ingestion paths, observability, alpha edges
Architectural zones isometric diagram: Zone 1 Ingestion (Norgate OHLCV, ASX Filings, Alt and Macro Streams), Zone 2 Intelligence Core (Vertex AI Extraction, Azure AI Claude), Zone 3 Data Estate (Gold Feature Store with PIT Spine, Bronze Immutable), Zone 4 Alpha Edges (Cash Runway, Nearology, Attention, Foreign Listings, Brokers, Directors, Narrative), Zone 5 Orchestration (Metadata Control Plane), Zone 6 Machine Learning (Walk-Forward Folds, Base Models, Metamodel Stacking), Zone 7 Risk and Execution (Risk Governance and Guardrails, Live Execution and Telemetry, Escrow).
fig. 02b — same architecture as seven labelled zones: ingestion ▸ intelligence ▸ data estate ▸ alpha edges ▸ orchestration ▸ ml ▸ risk & execution
Azure primary

Microsoft Fabric

Lakehouse asx, Spark notebooks, DataPipelines, Environment, User Data Function source artifact, smoke + Hypothesis property-based test targets.

Azure ml

Azure ML

FLAML AutoML, 5-fold walk-forward, metamodel, scoring, backtest, MLflow workspace tracking.

Azure llm

Azure AI Foundry

Claude Opus 4.6 reasoning, Anthropic-org capability, AI Services project — routed via the LLM control plane.

GCP agent platform

Gemini Enterprise Agent Platform

Google's Gemini Enterprise Agent Platform — the evolution of Vertex AI. Gemini 3.1 multimodal extraction, embeddings, Model Garden access, batch + sync routes, taxonomy-aware classification.

GCP staging

GCS & BigQuery

Object staging for PDFs and large multimodal payloads; GDELT ingestion via BigQuery with dry-run evidence.

Graph alpha edges

Neo4j Aura

Managed graph database for the structural-alpha edges: director networks, nearology, corporate / promoter graphs. Fail-closed activation gate; canary-evidence required before production use.

Azure runner

CI VM (Linux)

Always-on Standard_D8ads_v7 self-hosted runner for heavy Spark / SQL / property-based tests, Fabric smoke, ML wheel upload, and User Data Function deploy.

Source bridge

Norgate Premium

Windows 365 bridge VM that runs Norgate schema/capability polls and exposes a thin extract surface to Fabric.

§ 04 — detailed system view

External sources, multi-cloud runtimes, control plane, execution, and CI/CD.

External sources feed an ingestion bridge into Microsoft Azure — Fabric Lakehouse, asx_metadata SQL Database, Azure Functions, Azure ML workspace, Azure AI Foundry, Key Vault, and the Linux CI VM. Google Cloud provides the Gemini Enterprise Agent Platform and BigQuery; Neo4j Aura provides graph analytics. The execution layer publishes daily trading signals, risk overlays, and key-performance-indicator feeds. Point-in-time safety, contracts, observability, and runtime proof span vertically across every layer. CI/CD runs along the bottom — deploy-dev, deploy-User-Data-Function, smoke, and promote.

Detailed system view showing Development & Agents (GitHub, Codex, Claude Code) at the top, External Sources, Ingestion Bridge, Microsoft Azure Cloud (Fabric Lakehouse, asx_metadata SQL DB, Functions, Azure ML workspace, Azure AI Foundry, Key Vault, AKS CI VM), Google Cloud (Gemini Enterprise Agent Platform, BigQuery), Neo4j Aura graph analytics, Execution Layer (daily trading, risk overlays, KPI feed), Engineering Excellence column (testability, contracts, observability), and a CI/CD swimlane along the bottom (deploy dev, deploy UDF, smoke, promote).
fig. 03a — detailed system: external sources ▸ azure ▸ gcp ▸ neo4j ▸ execution; engineering-excellence column + ci/cd swimlane
High-density isometric platform rendering: Ingestion (Norgate OHLCV feed, Norgate fundamentals data, Norgate dividend events, ASIC mandatory announcements PDFs, ASIC company trade reports, ASIC sub-registry shorts, alternative news streams, global macro data) feeding Microsoft Fabric OneLake bronze immutable storage; central Medallion Lakehouse with Silver Spine, Gold Feature Store, label classification, feat_technical, feat_fundamental; Dual-LLM Extraction engine (Vertex AI Gemini extraction, Azure AI Foundry Claude reasoning); Metadata Control Plane (Fabric SQL DB, YAML contracts, GitHub Actions CI/CD, observability logs, 3-DAG Orchestrator); Alpha Edges (Cash Runway, Nearology Graph, Narrative Churn, Foreign Supply, Media Content, Foreign Attention); ML Mega-Grid with Metamodel Stacking, Combinatorial Purged CV + walk-forward folds, base models; Copula Risk Engine (TVaR limits, emergency circuit breakers), Live Broker Telemetry, Broker Execution Adapter.
fig. 03b — full-density isometric: every named component on one canvas — ingestion ▸ dual-LLM extraction ▸ medallion ▸ control plane ▸ alpha edges ▸ ML mega-grid ▸ copula risk engine ▸ broker
§ 05 — end-to-end flow

Fifteen named layers, each its own contract.

From cloud infrastructure through data collection, cleaning, AI feature laboratory, AI agent intelligence pipelines, structural alpha edges, cluster & network intelligence, research intelligence engine, virtual research committee, quantum compute, base model army, the super-model (metamodel), risk management & safety, daily trading, and the central command centre for observability.

End-to-end pipeline diagram with 15 numbered layers spanning cloud infrastructure, data collection, data cleaning, AI feature laboratory, AI agent intelligence pipelines, structural alpha edges, cluster and network intelligence, research intelligence engine, virtual research committee, quantum computing layer, base model army, the super-model metamodel, risk management and safety, daily trading, and the central command centre.
fig. 04 — fifteen-layer reference pipeline; engineering surfaces highlighted
§ 06 — governance & validation

A signal does not promote until it passes the governance gates.

The platform treats every backtested strategy as a hypothesis exposed to multiple-testing risk. A three-tier governance gate runs after training: a single-path walk-forward filter, a Combinatorial Purged Cross-Validation (CPCV) distribution with Deflated Sharpe Ratio correction, and a held-out out-of-sample anchor. Promotion requires all three tiers to pass — Deflated Sharpe Ratio ≥ 0.95 across the CPCV distribution is the statistical-significance gate, and the out-of-sample equity curve must land inside that distribution within one standard deviation.

tier 01 single-path walk-forward

Walk-forward filter

Eliminates obviously bad configurations quickly.

  • Sharpe≥ 0.8
  • Max Drawdown≥ −25%
  • Win Rate≥ 43%
  • Profit Factor≥ 1.3

Necessary but not sufficient. Single-path equity curves cannot detect overfitting at scale — by López de Prado's False Strategy Theorem, 325-trial sweeps will always surface an apparently strong path purely by chance.

tier 02 CPCV · DSR · simulation

Combinatorial Purged CV + Deflated Sharpe

Statistical significance under multiple testing.

  • Median Sharpe≥ 0.5
  • 5th percentile> 0
  • DSR≥ 0.95

N independent equity-curve paths are generated by combinatorial fold orderings with purging + embargo. The Bailey & López de Prado (2014) DSR is computed on the resulting Sharpe distribution — correcting for trial count, skew, and kurtosis. Only signals that beat their own noise floor pass.

tier 03 out-of-sample anchor

Out-of-sample holdout, anchored

Forward-looking confirmation of the CPCV finding.

  • Out-of-sample Sharpe≥ 0.5
  • Out-of-sample vs CPCVwithin 1σ

A held-out anchored window the metamodel has never seen. The out-of-sample equity curve must land inside the CPCV distribution — divergence here is a hard fail, regardless of how good Tier 02 looked.

5-Fold Walk-Forward Cross-Validation visualisation: concentric arc rings labelled Fold 1 through Fold 5, with an out-of-sample slice highlighted; below, Training slice plus embargo gap feeds Base Model Training (FLAML); base model out-of-fold predictions (stationary, probabilistic) flow into Metamodel Orchestration with lambda, projection, and exclusion operators. Mathematical notation including lambda and sigma symbols overlays the scene.
fig. 05 — five-fold walk-forward with embargo gap ▸ base-model training (FLAML) ▸ out-of-fold predictions ▸ metamodel orchestration. The combinatorial enumeration of fold orderings is what drives the Tier 02 CPCV distribution above.

// risk overlays applied at the execution boundary

TVaR Tail Value-at-Risk monitored continuously; portfolio-level limits encoded in YAML governance configs.
Clayton copula Joint lower-tail dependence on held positions; reject new entries while the joint crash probability exceeds threshold.
Skew & kurtosis monitoring Per-strategy higher-moment surveillance; auto-pause on regime-shift signatures.
Effective hypothesis count DSR correction tracks the total trial count across base-model, metamodel, and backtest layers — selection bias accounted for end-to-end.
§ 07 — ai engineering operating model

Humans, agents, and CI as one delivery system.

AI agents are productive but untrusted executors. Prose instructions help — the real safety model is executable: contracts, tests, CI, manifests, CODEOWNERS, protected paths, and repeatable deployment workflows.

User

decides & gates

  • Scope & sequencing
  • Architecture trade-offs
  • Feature approval
  • Prod promotion

Codex

reviews & specifies

  • Architecture review
  • Spec ownership
  • Risk & point-in-time review
  • Acceptance gates

Claude Code

implements in scope

  • Mechanical implementation
  • Notebook + wheel edits
  • Tests & handoffs
  • Repo guardrail conformance

CI & branch protection

enforces deterministically

  • Contract / DDL parity
  • Repo rules & notebook contracts
  • Property-based tests, smoke, runtime proof
  • Deployment identity
// research intelligence loop

Claude Code requests synthesis. Gemini DeepMind returns research-grade definitions. The platform gets new features.

Implementation agents do not invent quantitative features in isolation. When Claude Code identifies a gap — a missing alpha vector, a new market-microstructure signal, a feature the architecture requires but doesn't yet implement — it delegates the research step over the Model Context Protocol to Gemini DeepMind, which has the deep-reasoning depth and the academic corpus access needed to synthesise a candidate definition. Gemini returns the mathematical formulation, citations (market microstructure, canonical correlation analysis, learning-to-rank), and the small-cap-context constraints. Claude Code then implements the feature inside the platform's contracts, tests, and continuous-integration gates.

Both agents stay inside their lane. Claude Code does not invent novel quantitative theory; Gemini DeepMind does not touch production code. The Model Context Protocol is the contract between them.

Research intelligence loop: Claude Code (Implementation and Orchestration Agent) on the left, Gemini DeepMind (Deep Reasoning and Research Specialist) on the right, exchanging via the Model Context Protocol. Centre shows NEW FEATURE: ASX_ALPHA_VECTOR_v2 with mathematical formulae, time-decay weights, and accuracy probability. Arrows label Claude requesting synthesis and Gemini providing definition. Right side shows academic papers (market microstructure, CCA, learning-to-rank), ASX small-cap market context, liquidity maps, and information dispersion rates. Left side shows platform codebase panel. A person in an AR headset gestures at the scene.
fig. 06 — Claude Code ▸ MCP ▸ Gemini DeepMind ▸ research synthesis ▸ feature definition ▸ implementation in repo

// repo guardrails — turning project memory into executable checks

check_contract_parity.pytable / feature contracts ↔ runtime projections
check_sql_ddl_parity.pyYAML ↔ SQL DDL column parity
check_repo_rules.pybanned patterns, unsafe thresholds, fold edits
check_notebook_contracts.pyrequired Fabric notebook artifact shape
check_pit_evidence.pypoint-in-time evidence presence on feature surfaces
check_pit_leakage_regression.pyregression scan for forward-leaking joins
check_yaml_crossrefs.pyreads_from & cross-config references
check_public_api_tests.pynew exported wheel APIs must have tests
check_docs_impact.pycontract / arch changes ↔ docs updates
check_handoff_template.pyrequired sections on agent handoffs
check_circular_deps.pyconfig dependency cycles
check_test_count.pytest count floor — no silent deletion
§ 08 — delivery topology

From commit to live Fabric, one artifact at a time.

Each commit produces one canonical wheel. Fabric Environment, Azure ML, and the live User Data Function all consume that exact SHA-pinned artifact. Promotion to Prod is a deliberate human action through Fabric Deployment Pipelines, after which main bookkeeps the promoted SHA.

  1. 1

    commit → push to dev

    Feature branch → squash merge. Pre-commit hooks run Ruff, mypy, and project-specific local checks. CI is the merge gate.

  2. 2

    CI: lint · type · contract · test · build

    Lint, mypy, repo rules, contract / SQL / notebook parity, unit + Spark + SQL integration, Hypothesis property-based tests, then build one canonical wheel artifact.

  3. 3

    deploy-dev: Fabric git sync ▸ Environment upload

    Fabric updates from the source tree, binds pipeline notebooks, then uploads the exact CI-built wheel to the Dev Environment and polls until publish completes.

  4. 4

    deploy-udf-dev: delegated User Data Function update

    Azure Linux self-hosted runner uses cached delegated Fabric tokens to updateDefinition on the live User Data Function with the same wheel artifact — never re-built, never re-hashed.

  5. 5

    fabric-smoke-test: pytest + notebook smoke

    Smoke pytest and a Fabric notebook smoke run verify build metadata, wheel identity, and key control-plane invariants on the live workspace.

  6. 6

    Dev → Prod promotion (human-gated)

    User-initiated through Fabric Deployment Pipelines. After successful promotion, main is advanced to the promoted SHA and an annotated release tag is created.

§ 09 — contracts as agent boundaries

Six contract surfaces. Every line of platform code lives behind one of them.

contract / table

Delta schema, status, keys, partitioning, write mode, notebook ownership — declared in configs/tables/*.yaml, compiled, enforced at write.

contract / feature

Gold feature columns, lookbacks, warmups, point-in-time notes, reads_from dependencies, and per-column type/nullability.

contract / rule

Validation rule families, severity, check type, target scope — synced into the SQL control plane as the runtime rule registry.

contract / temporal

Point-in-time join shapes, as-of/spine semantics, write-time visibility windows, and evidence expectations for feature/label surfaces.

contract / notebook

Required Fabric artifact shape: .Notebook directory, notebook-content.py, notebook-settings.json, .platform.

contract / deployment

Canonical wheel identity, User Data Function source / wheel payload shape, smoke expectations, and exact-SHA artifact propagation across runtimes.

asx_quant · runtime readout · 2026-05-24
$ az fabric capacity show --name fabriccapacityaustraliaeast1
▸ sku           F8
▸ state         Active
▸ provisioning  Succeeded
▸ region        australiaeast

$ az ml workspace show --name asx-quant-ml
▸ provisioning  Succeeded
▸ identity      SystemAssigned
▸ tracking      mlflow://asx-quant-ml

$ registry counts
▸ tables        269 (active 204)
▸ features      68 (active 57)
▸ sources       211 (active 173)
▸ rules         113

$ deployment identity
▸ wheel         artifact-authoritative
▸ udf           delegated · sha-pinned
▸ smoke         build_info verified