Sam Hames

Currently: “Content.”

[2111.15366] AI and the Everything in the Whole Wide World Benchmark

https://arxiv.org/abs/2111.15366

... we argue that benchmarks presented as measurements of progress towards general ability within vague tasks such as “visual understanding” or “language understanding” are as ineffective as the finite museum is at representing “everything in the whole wide world,” ...

Argues that the machine learning focus on a (relatively) small number of benchmarks is counterproductive. Progress in performance on arbitrarily chosen benchmarks is conflated with general progress in the field.

Tags

Linked Notes

Related By Tags

Details

Revised
Created
Edited