Silicon Sampling Will Ruin Polling

Source: nytimes.com

TL;DR

The story at a glance

Leif Weatherby of NYU and Benjamin Recht of UC Berkeley argue in a guest essay that silicon sampling—using large language models to mimic human survey answers—is spreading fast among pollsters to cut costs amid falling response rates for phone and web polls. They spotlight a recent Axios article on maternal health policy that cited a simulated poll by AI firm Aaru showing majority trust in doctors, without initial disclosure that no humans were surveyed; Axios later added an editor's note. This comes as traditional polling struggles and AI tools promise quick, cheap alternatives, but the authors say the practice will worsen accuracy problems.

Key points

Details and context

Silicon sampling works by prompting large language models with personas—like age, race, ideology—to produce answers mimicking real people. A foundational 2023 study showed GPT-3 could replicate patterns in American National Election Studies data, such as vote choices and attitudes, with high correlations (e.g., 0.90+ for some years). But it requires "silicon sampling" to correct the model's skewed demographics from web training data.[[2]](https://www.cambridge.org/core/journals/political-analysis/article/out-of-one-many-using-language-models-to-simulate-human-samples/035D7C8A55B237942FB6DBAD7CAA4E49)

The Axios incident highlights risks: the story linked to Aaru's simulation claiming most people trust health providers, but digging revealed no human respondents. Traditional polling already faces nonresponse bias—harder to reach working-class or rural voices—yet AI inherits and magnifies internet-sourced prejudices, like underplaying minority perspectives.

Authors contrast this with polling's role in democracy: cheap AI floods could drown quality surveys, letting partisan or profit-driven fakes shape narratives, much like 2022's right-wing poll surge distorted averages.

Key quotes

Why it matters

Silicon sampling could flood media and politics with fabricated "public opinion" that lacks real voices, undermining trust in data used for elections, policy, and reporting. For journalists, businesses, and voters, it means harder to spot genuine polls amid cheap fakes, potentially skewing decisions on health, campaigns, or markets. Watch for disclosure rules or regulations on AI polls, though firms may resist as costs drop further.