A recently discovered hole in Valve’s API allowed observers to generate extremely precise and publicly accessible data for the total number of players for thousands of Steam games. While Valve has now closed this inadvertent data leak, Ars can still provide the data it revealed as a historical record of the aggregate popularity of a large portion of the Steam library.
The new data derivation method, as ably explained in a Medium post from The End Is Nigh developer Tyler Glaiel, centers on the percentage of players who have accomplished developer-defined Achievements associated with many games on the service. On the Steam web site, that data appears rounded to two decimal places. In the Steam API, however, the Achievement percentages were, until recently, provided to an extremely precise 16 decimal places.
This added precision means that many Achievement percentages can only be factored into specific whole numbers. (This is useful since each game’s player count must be a whole number.) With multiple Achievements to check against, it’s possible to find a common denominator that works for all the percentages with high reliability. This process allows for extremely accurate reverse engineering of the denominator representing the total player base for an Achievement percentage.
As Glaiel points out, for instance, an Achievement earned by 0.012782207690179348 percent of players on his game translates precisely to 8 players out of 62,587 without any rounding necessary (once some vagaries of floating point representation are ironed out).
By July 4, Valve updated its Steam API to provide much less precision in its Achievement percentages, cutting off this new data source altogether. That move comes just months after Valve started protecting individual Steam usage data by default, cutting off the previous estimation method used by Steam Gauge and Steam Spy. Valve Head of Business Development Jan-Peter Ewert said the company is currently working on a “more accurate” way for users and developers to “get data out of Steam,” though apparently this kind of Achievement-derived data set wasn’t what he had in mind.