sifting/io
Developer Tutorials
6 min readSiftingIO Team

Why financial platforms need a clear market data methodology

A market data methodology explains where a price comes from and how it is validated. Here is why that documentation matters and how to read it in an API.

Why financial platforms need a clear market data methodology

Picture two dashboards showing two different prices for the same asset at the same second. One reads a single venue directly. The other reads a consensus price aggregated across several venues. Both look authoritative. Only one of them tells you how it got the number. When a price drives an alert, a valuation, a treasury report, or a research model, the difference between those two dashboards is not cosmetic. It is the difference between a value you can defend and a value you have to trust on faith.

A market data methodology is the document that closes that gap. It describes where each price comes from, how bad inputs get filtered out, and what the platform does when venues disagree. For any financial platform, publishing that methodology is not paperwork. It is the part of the product that lets an analyst, an auditor, or a downstream developer reason about whether a number is correct. SiftingIO publishes its methodology openly at Data Methodology page, and the sections below explain what a good one contains and why it earns trust.

What a market data methodology actually documents#

A raw price is easy to get. A price you can explain is harder. A clear methodology answers a short list of questions that every serious data consumer eventually asks.

The first question is provenance. Is this price a pass-through from one source, or a value formed from several independent inputs? A single-source number inherits every glitch of that source. If the feed freezes, prints a fat-finger trade, or goes thin during off hours, the price you receive is wrong and nothing in the response warns you.

The second question is validation. What happens to an obviously bad input before it reaches you? A methodology worth reading names its checks: staleness gateways that reject inputs that stopped moving, and hard outlier kills that drop prints far outside the cluster of agreeing venues. The third question is arbitration. When venues legitimately disagree, and for fragmented and on-chain markets they often do, how does the platform decide on one published number?

SiftingIO answers these with a four-stage pipeline that is safe to summarize in plain terms. It validates each input against staleness and outlier checks. It scores inputs using median absolute deviation and modified z-scores, a standard statistical method (Iglewicz and Hoaglin, 1993) for spotting values that do not belong. It remembers how each venue has behaved over time through a per-venue reputation score, quarantines feeds that misbehave, and catches the frozen-feed case where a source keeps sending messages but the price never changes while the market clearly moves. Finally it aggregates the survivors into one consensus value: a volume-and-reputation-weighted median across venues, not a simple average.

Why one published consensus price beats a raw feed#

The choice of a median is the load-bearing decision, and it is worth understanding why. An average moves the moment any single input moves, so one bad print drags the output with it. A median has a 50 percent breakdown point. A majority of venues would have to err in the same direction, at the same time, before the published number goes wrong. That is a meaningfully higher bar than trusting any one source.

For fragmented markets this matters even more than it does for a single deep venue. In on-chain and thinly traded markets there is often no single true price at all. Venues disagree, pools are shallow, and prints can be stale or manipulated. A cross-venue consensus is frequently closer to correct than any individual venue's number, because it is built to survive a minority of bad inputs rather than assume every input is clean.

Two honest boundaries keep this claim credible. The output is a synthetic reference value, not an official exchange-of-record print and not executable depth. Treat it as a reference and validation layer: use it to research, to model, to power dashboards and alerts, and to cross-check the price your own execution venue is showing so you can catch a stale or manipulated print. And no median-based method survives a majority of venues making the same coordinated error, so a good methodology promises resistance to a minority of bad feeds, never guaranteed correctness. A vendor that claims perfect data is telling you it has not thought about failure modes.

Reading the quality signals in the response#

A methodology only builds trust if the API exposes its judgments instead of hiding them. Quality should be visible on every response, never silent. SiftingIO attaches three timestamps to each tick (source, ingest, and publish), a cross-venue consensus value, and an explicit quality flag that reads Normal, or Degraded when the pipeline has fallen back to the safest available source.

Start with a live snapshot and inspect what comes back:

curl -H "X-API-Key: $SIFTING_KEY" \
     "https://api.sifting.io/v1/last/quote/crypto/BTCUSD"

The payload carries the consensus bid and ask alongside the quality metadata (see /docs for the exact field names). The pattern to build around is simple: read the quality flag on every response, and compare the source and publish timestamps to see how fresh the underlying inputs were, not just when the server answered. A price can be recent at the server and stale at the source, and only the source timestamp reveals that.

The same discipline applies to streaming. Connect to wss://stream.sifting.io/ws/v1?key=$SIFTING_KEY, then send {"op":"subscribe","product":"cex","symbols":["BTCUSD"]}. Each tick carries the same quality flag, so your handler can downgrade or annotate a value the moment it arrives Degraded rather than treating every tick as equally clean.

Common pitfalls#

A few mistakes show up repeatedly when teams first wire in a methodology-backed feed.

The first is ignoring the quality flag. Code that reads only the price and drops the metadata throws away the entire point of a documented pipeline. If a value arrives Degraded and your alert fires anyway, you have rebuilt the single-source problem the methodology was meant to solve. Branch on the flag.

The second is treating a reference price as an execution feed. The consensus value is a reference of record for research and validation, not a promise of executable depth or sub-millisecond timing. Fair-price update cadence is tiered (1 Hz on Free up to 10 Hz on Ultra, with Enterprise near real time), and those are refresh rates for the reference value, not execution latency. Size your polling and your expectations to that cadence.

The third is misreading a 503 stale_snapshot error as an outage. It is not the server failing. It means the live data is older than the freshness threshold, and the body includes last_t and server_now so you can measure exactly how stale. That is the methodology protecting you from silently serving an old price. Handle it by backing off and retrying, not by disabling the check.

A platform that documents its methodology and exposes its quality signals lets you verify these behaviors yourself instead of taking a marketing claim on trust. Read the full method at /data-methodology, then decide.

Keep reading

Related posts