indielist (beta)

How indie game sales estimates actually work — the white-box Boxleiter method

7 min read for investors

sales-datamethodologyboxleiteralgorithm

Every game page on indielist shows a sales estimate. Click "How is this calculated?" and you get the formula expanded — base number, every adjustment factor, the final result. We call this the white-box Boxleiter method. This article is the long-form explainer of what's behind that expansion and why we built it that way.

The original Boxleiter idea

In 2014, indie developer Mike Boxleiter posted a back-of-the-envelope rule: multiply a Steam game's review count by ~50 and you get a usable estimate of unit sales. The "NB number" was 50.

The intuition is simple — on Steam, a roughly stable fraction of buyers eventually leave a review. If you assume that fraction is constant, review count is a proxy for sales.

Why a single NB number is misleading

The fraction isn't constant. It varies systematically by:

  • Year. Steam's review prompt changed in 2018 and again in 2022; older games have higher review-per-sale ratios because users had more time to review.
  • Price. Cheap games get more impulse buys without reviews. Expensive games get more deliberate reviewers.
  • Quality / sentiment. Games with very high or very low positive ratings provoke more reviews per buyer than middling ones.
  • Studio scale. Solo-dev games tend to under-review (smaller audience overlap with reviewers); larger studios with marketing budgets tend to over-review.
  • Genre. Hyper-casual games barely get reviewed. Deep RPGs get reviewed heavily.

Treating these as a single multiplier gives you the famous ±60% error band SteamSpy used to publish. That's not useful for any decision worth making.

What we do instead — multi-factor NB

We start from NB_base = 50 and add or subtract per-factor adjustments. The adjustments are public and version-controlled — see src/lib/sales-estimate.ts in the indielist source.

For example, here's how Hades (~240,000 reviews, $25 launch, 2020 release, medium studio, action-RPG genre) gets computed:

  • base: +50
  • year_2020: +15 (review-per-sale was higher pre-2022)
  • price_$25: +5 (mid-priced games review steadily)
  • positive_98%: +10 (high enthusiasm = more reviews)
  • team_medium: +10 (Supergiant has marketing reach)
  • genre_RPG/Adventure: +10 (deep games get reviewed)

Final NB = 100. Median estimate = 240K × 100 = 24M units, with a confidence range of [median × 0.6, median × 1.4] = 14.4M – 33.6M.

Public sources put Hades at ~6M units across all platforms. So our Steam-only estimate of 24M is an over-estimate (other platforms drove ~half of total sales). This is exactly the kind of failure mode the white-box exposes — the formula's not wrong, the inputs need a multi-platform correction. That's the v1.1 work.

Confidence ranges, not point estimates

Every estimate ships as a triple: [lower, median, upper]. The lower and upper are median × 0.6 and median × 1.4. These bounds were calibrated against a basket of ~30 games where developers have publicly disclosed actual sales — for that basket, 80% of true values fell inside our range.

Other tools (Gamalytic, VG Insights) ship a single number. We don't. A single number with no confidence interval is statistical malpractice.

What's still wrong

  • Free-to-play is broken. Reviews-per-buyer breaks down for F2P. We flag F2P games in the data and don't ship an estimate.
  • Bundles distort badly. A game heavily distributed via Humble Bundle has artificially low review counts because bundle buyers don't review at the same rate. We can't detect this from public data yet.
  • Single-platform. The estimate is Steam-only. For multi-platform titles you have to mentally adjust upward.
  • Pre-release games. Demos and very-recent releases have noisy review counts. We won't show an estimate for games < 30 days old.

How this compares to the alternatives

Gamalytic uses a similar Boxleiter base layered with a proprietary regression. They publish point estimates with no formula. Their backtest claims ~30% accuracy. Our advantage is transparency — you can see the formula and disagree with our adjustments.

VG Insights (now Sensor Tower) doesn't disclose method. Used heavily by enterprise but inaccessible to indies.

SteamSpy uses public-profile sampling. After Steam made profiles private by default in 2018, accuracy collapsed.

What's next for the algorithm

v1.1 (2026 H2 work): linear regression against the disclosed-sales basket to fit per-factor coefficients instead of hand-tuned values. v2.0 (2027): cross-validated bootstrap confidence intervals + multi-platform extrapolation.

Every version gets a new formula_version string and old versions are kept in sales_estimates_history. The white-box promise extends to history — you can always reproduce what an estimate looked like at any past point.

See it in action

Pick any game and click "How is this calculated?": Stardew Valley, Hades, Manor Lords.