Interactive tutorial for Cheyette & Piantadosi (2020) · Code
People identify small sets of objects (up to about 4) almost perfectly — subitizing — but show linearly increasing error for larger sets — scalar variability. These two patterns have traditionally been attributed to two separate cognitive systems.
This paper shows both patterns emerge from a single system that optimally represents quantity under a limited informational capacity. Small numbers are common and cheap to represent exactly; large numbers are rare and must be approximated.
The frequency of encountering a numerosity n follows a power law: P(n) ∝ 1/n2. We encounter "1" roughly 100× more often than "10". Efficient representational systems should take advantage of this non-uniformity.
Imagine you see n objects and need to form an internal representation Q(k|n) — the probability you'd report each possible number k. The prior P(k) is shown in blue. Click and drag on the chart below to draw your representation Q in orange. The distribution is automatically normalized.
The KL divergence measures how many bits of information processing your Q requires relative to the prior. The squared error measures how far off your estimates would be on average. A good number system minimizes error while staying within an information budget.
Can you match the best squared error? Draw a Q that minimizes error while keeping KL within your info budget. Click "Show optimal Q" to reveal the model's solution.
The model finds the Q that minimizes expected squared error subject to an information processing bound:
where λn is chosen so the KL divergence equals the information bound. Adjust the bound and the true number below.
Each curve shows Q(·|n) for a different true numerosity. At low information bounds, only very small numbers get exact representations. As the bound increases, exactness extends further.
The core finding is the transition: below the capacity bound, numbers are represented nearly exactly (subitizing); above it, representations become approximate with scalar variability — and this emerges from a single system. Experimentally, people are found to have about 4.5 bits of capacity.
The model predicts underestimation of large numbers at low information, and a subitizing range (near-zero error) that expands with more information.
Comparing multiple information bounds shows how the subitizing range expands and error decreases. This matches the experimental finding that longer exposure times yield better estimates.