[2111.15366] AI and the Everything in the Whole Wide World Benchmark

... we argue that benchmarks presented as measurements of progress towards general ability within vague tasks such as “visual understanding” or “language understanding” are as ineffective as the finite museum is at representing “everything in the whole wide world,” ...

Argues that the machine learning focus on a (relatively) small number of benchmarks is counterproductive. Progress in performance on arbitrarily chosen benchmarks is conflated with general progress in the field.

Details

Revised: 2022-03-05 07:15:44Z
Created: 2021-12-09 12:10:00Z
Edited: 2022-03-05 07:15:44Z

Sam Hames

[2111.15366] AI and the Everything in the Whole Wide World Benchmark

Tags

Details