An Algorithm For An Automated Meritocracy

Published

on March 16, 2016

. 22 Comments

A Guest Post

by
Bennett Haselton

A 2006 study by Matthew Salganik and his co-researchers at Princeton suggests that a huge amount of effort is wasted in many different areas of human endeavor, and the resulting outcomes are far less than optimal — but that there is a simple algorithm that could fix both problems.

In Salganik’s experiment, users of a music-rating site were divided at random into eight artificial “worlds”. All of the users in all eight worlds had access to the same library of songs, which they could download or recommend to their peers, but they could only recommend songs to other users in the same world. Also, each user could view the number of times that a song had been downloaded, but only by other users in the same world.

The goal was to see whether certain songs could become popular in some worlds while languishing in others, despite the fact that all groups consisted of randomly assigned populations that all had equal access to the same songs. The experiment also attempted to measure the “merit” of individual songs by assigning some users to an “independent” group, where they could listen to songs and choose whether to download them, but without seeing the number of times the song had been downloaded by anyone else; the merit of the song was defined as the number of times that users in the independent group decided to download the song after listening to it. Experimenters looked at whether the merit of the song had any effect on the popularity levels it achieved in the eight other “worlds”.

The authors summed it up: “In general, the ‘best’ songs never do very badly, and the ‘worst’ songs never do extremely well, but almost any other result is possible.” They also noted that in the “social influence” worlds where users could see each others’ downloads, increasing download numbers had a snowball effect that widened the difference between the successful songs and the unsuccessful: “We found that all eight social influence worlds exhibit greater inequality — meaning popular songs are more popular and unpopular songs are less popular — than the world in which individuals make decisions independently.” Figures 3(A) and 3(C) in their paper show that the relationship between a song’s merit and its success in any given world — while not completely random — is tenuous.

As economists Richard Thaler and Cass Sunstein put it, in their book Nudge, when describing the Salganik study:

“In many domains people are tempted to think, after the fact, that an outcome was entirely predictable, and that the success of a musician, an actor, an author, or a politician was inevitable in light of his or her skills and characteristics. Beware of that temptation. Small interventions and even coincidences, at a key stage, can produce large variations in the outcome. Today’s hot singer is probably indistinguishable from dozens and even hundreds of equally talented performers whose names you’ve never heard. We can go further. Most of today’s governors are hard to distinguish from dozens or even hundreds of politicians whose candidacies badly fizzled.”

This squares intuitively with how we talk about success in entertainment and sometimes in politics, where an artist or a candidate gets a “big break” that leads to them becoming a star. The first-order effect of this randomness is, of course, that the songs (or politicians, or fads) which achieve breakout success are usually not the ones that are the “best” by any objective measure (for example, the songs that would have gotten the highest rating in the “independent” group), and thus consumers are not best served by the random process. The second-order effect is that most people know the amount of luck required to succeed in those industries — even extremely talented and extremely dedicated musicians often labor in obscurity for years before they achieve their own big break — and so most talented musicians and other artists don’t even bother trying to achieve breakout success in those fields. (By contrast, a person who would make a good doctor or a good programmer is rightly encouraged to go into those fields, because even though those professions don’t offer the same opportunities for stardom, they also don’t require a lot of luck.)

So much we know. But consider what would happen instead of Google (or Pandora or Spotify) implemented something like the following algorithm. (It would have to be implemented by a large company with a built-in audience. For reasons that will be obvious, it wouldn’t work on a small scale with a handful of users.)

Consider just the subset of users interested in a particular genre, like alt-rock. When an artist submits a new song to the system, the song is pushed out to a small random subset of those users. (The system is agnostic about how this is done — you can recruit volunteers to rate songs, you can pay them a modest fee to rate songs, or you can mix the songs in seamlessly with the music they’re already streaming and hope that some of them will rate the song afterwards.) Each user in this sample rates the song without seeing the ratings that have been given by other users in the sample, in the same way that Salganik’s experiment used a sample of blind ratings to determine the “objective merit” of a song. The sample doesn’t have to be large relative to the whole population, it just has to be large enough for the average rating to be statistically meaningful. If the average rating is high enough, the system pushes the song out to all other alt-rock fans in the system, which in this system we define as “success”.

It’s a simple algorithm, but consider how radically this would alter everything we think we know about what it takes to be “successful”. You don’t need to “network” and “build connections” and ask particular highly-connected users to help promote your song, because that won’t affect the rating. You can’t tell a mob of your Facebook friends to “come and vote for my song”, because only people in the random sample can cast votes. Generally, the conventional wisdom that “You need to get out there and hustle” — which is to say, engage in economically non-productive activity that doesn’t improve the product quality — is rendered useless. It’s a waste of time to do anything except focus on the actual quality of the song, insofar as it will be reflected in the average rating.

This eliminates the two problems listed at the beginning. The songs that get pushed out to the widest audience are the ones that provide the most value to users (as determined by the highest average rating), and highly talented content producers can get into the game and see their songs become popular without waiting for a lucky break. (Of course it also lets artists find out very quickly how their current output ranks against other artists; not everybody is the “highly talented content producer” that they think they are.)

This “random-sample-voting” system has other desirable properties:

– It preserves the average quality of everyone’s song feed. Suppose there are 20,000 alt-rock fans in the system, and suppose it takes 20 users to get a statistically meaningful rating of a new song. Then “bad” submissions will only waste the time of 20 users, whole “good” submissions will get pushed out to all 20,000, so the ratio of good-to-bad songs in the average person’s song feed would be 1000 to 1. In practice, users could also address their threshold depending on the minimum average rating that they want to listen to, so that even songs with a mediocre rating will get pushed out to some users, but top-rated songs get pushed out to many more. (This is the part that requires a large user base. If your user base is only 20 users, then every new submission wastes everyone’s time, and there’s no point.)

– It’s non-gameable. With a user base of 20,000, even if an artist tries to stuff the ballot box by signing up 1,000 fake accounts, that still only constitutes 5% of users selected in the average voting sample. This is a weakness of most sites driven by user ratings — most of them make it relatively easy to create fake accounts on your own behalf to vote up your own content. (It’s also a reason that this system only works with a large built-in user base.)

– It’s scalable; the system works regardless of the number of users or the number of submissions, as long as the number of users (who are available to rate songs in a random sample) grows in proportion to the number of submissions. (If the system gets overwhelmed with too many low-quality submissions, you can always charge submitters a fee, which gets redistributed in part to the raters who are selected in each random sample. Hopefully this would cut down on the number of junk submissions, but even if it doesn’t, at least the raters will be adequately compensated for the time spent rating the junk.)

– It’s non-arbitrary. As long as the sample of raters is large enough, the average rating achieved by a song should be close to the average rating it would get from the population as a whole. There’s almost no luck required to achieve success (although, conversely, an artist who gets a bad rating can’t blame it on bad luck either). As an extension of this, since the feedback is rapid (there’s no reason you couldn’t get an average rating from your song in just a few minutes), an artist can tweak their song to address any criticisms and see if the average rating gets higher on re-submission.

The algorithm could be applied to other types of content as well, such as:

– Abuse reports. Twitter’s abuse report problem is frequently in the news: They get too many abuse reports to review accurately, so that some unlucky people have their tweets removed or their accounts suspended for non-offenses, while most egregious harassers go unpunished. With a random-sample-voting system, users could sign up as volunteer “reviewers” of abusive tweets. If a tweet is flagged in an abuse report, with a specific citation of the terms of service clause that it violated (and the reporting party agrees for it to be shared with volunteer reviewers), it gets shared with a random sample of volunteers; if some threshold percentage of reviewers agree that it is abusive, then the complaint is upheld.

– Tutorial webpages and videos. From working with some entrepreneurs who have built very popular blogs and how-to websites or Youtube channels, I can tell you first-hand that everyone in the industry knows that success is not determined by quality of content; the content just has to be good enough, and the rest of the time is spent on optimizing pages for Google search results, negotiating links from higher-traffic sites — in short, hustling in ways that have no bearing on the product quality. With a random-sample-voting system, raters could rate a tutorial based on how well the directions worked for them, and the highest-rated tutorials could be released to a wider audience, with no “hustling” required from the content creator.

– Economic arguments! Some of Paul Krugman’s columns may be objectively “better” than some of Steve’s blog posts, but if we were to use a random-sample-voting system to determine every week whether Steve’s or Paul’s column would be pushed out to millions of New York Times readers, I’d like to think Steve would win some of the time. If the problem is that the average person isn’t qualified to review the arguments, then the random sample could be taken from among economics PhDs, or economics professors at accredited universities — the algorithm works even if the voting audience is limited by some criteria.

– Obama’s “We The People” website. Currently, the White House promises to respond to any petition which gets more than 100,000 signatures — however, the government can dismiss any petition by saying, quite validly, “Just because you were able to get a mob of 100,000 people to sign a petition, that just means you’re very good at hustling, or you got very lucky — it doesn’t mean there’s any merit to your idea.” But if an idea gets an extremely high average rating from a random sample of volunteers who have signed up to rate the submissions, it would at least be worth looking into why so many people support an idea that the government has not yet implemented. Again though, it would be worth having the idea reviewed by qualified experts — perhaps a random sample of economic PhDs could review the proposal alongside a random sample of regular citizens. If the two groups diverge widely in their ratings, that could mean either that (a) economics professors have lost their humanity or (b) regular citizens need some economic education, but at least the result would be interesting.

(I suspect the White House might be nervous that this system would actually work too well. Under their current system, it’s easy for them to dismiss a petition even if it crosses the 100,000 signature mark. But if a random sample survey shows that a change in economic policy is supported by over 80% of economics professors, it’s a lot harder to come up with an excuse for ignoring that.)

In one sense, applying this algorithm to any type of social-media site would be a radically new practice; on the other hand, scientists have been using the basic independent-random-sampling algorithm for centuries. Scientists use it because they care about eliminating arbitrariness and getting the objectively best answer to the question that they’re investigating; there’s no reason we can’t use the same algorithm in any other scenario where we care about the result. Especially if it would eliminate the economic waste associated with “hustling”, gaming the system, and waiting around for a lucky break.

22 Responses to “An Algorithm For An Automated Meritocracy”

Feed for this Entry Trackback Address

1 1 John Hall
March 16, 2016 at 9:47 am

Interesting post. Of course, you don’t need just one level (20 votes then 20,000), you could do 20 votes, then 200 votes, then 2,000 votes, etc., with corresponding wider audiences.
2 2 Oli
March 16, 2016 at 11:19 am

Sometimes there’s a benefit to arbitrary coordination. Eg teenage girls like all talking about Taylor Swift and UK men like talking about soccer.
3 3 nobody.really
March 16, 2016 at 11:28 am

(I suspect the White House might be nervous that this system would actually work too well. Under their current system, it’s easy for them to dismiss a petition even if it crosses the 100,000 signature mark. But if a random sample survey shows that a change in economic policy is supported by over 80% of economics professors, it’s a lot harder to come up with an excuse for ignoring that.)

1. We kinda have this system already with the IGM Forum.
http://www.igmchicago.org/igm-economic-experts-panel

2. Has Obama pushed policies that economics professors would disapprove of? Recall that professors – even econ professors – tend to be a pretty liberal bunch.

So, for example, the IGM Forum concluded that Obama’s stimulus plan was an appropriate way to respond to the economic contraction at the beginning of his first term, and that its benefits exceeded its costs. I don’t know that this consensus had any bearing on Congress’s ability to ignore the proposed policy change.
4 4 Bennett Haselton
March 16, 2016 at 1:48 pm

John @1 yeah I had considered that you could also implement two-level voting, to eliminate the chance that a low-quality song (or other piece of content) could squeak by in the first round due to a stroke of luck. If the first round of voting collects votes from 20 users, perhaps 1% of the time a song whose “real” average rating is about a 6 would actually get an average rating of 9. But then you could widen the audience to 200 voters, where the average rating from those voters would be much closer to the true average opinion of the whole population.

In practice I thought you usually wouldn’t need more than two levels. All you need is for the final round of voting to produce a rating that is close to the population average, and the average across 200 voters won’t be much more accurate than the average across 2,000 voters.
5 5 Bennett Haselton
March 16, 2016 at 2:09 pm

Oli @2 I agree, I just think you can still do that once the system has produced the “winners”. If a song gets promoted to all users across the system because they scored highly in a true blind-rating random sample, you’ll always be able to find other people to talk about it, and the artists’ other songs.
6 6 Bennett Haselton
March 16, 2016 at 2:25 pm

nobody.really @3 That’s actually exactly the point that I made when I wrote an article for Slashdot.org arguing that the White House should use random-sample voting for the We The People site:

https://politics.slashdot.org/story/13/01/03/1833222/why-we-the-people-should-use-random-sample-voting

saying that the IGM Forum would be a good example of the results of *credentialed* random-sample voting.

I don’t know how economics professors on average would feel about most of Obama’s policies, but just googling “Obama tariff” for example, I found this:
https://www.washingtonpost.com/news/wonk/wp/2012/10/23/how-obamas-tire-tariffs-have-hurt-consumers/
which is pretty reminiscent of Landsburg’s arguments about how tariffs are a bad deal.
7 7 AMT buff
March 16, 2016 at 3:38 pm

This is part of why people in the entertainment industry tend to be progressives. They see that success in their industry is determined by chance and by personal contacts rather than merit. They incorrectly extrapolate that valid observation to the entire economy, concluding that nobody truly earns his success and that redistribution is the only equitable solution.
8 8 Advo
March 16, 2016 at 4:09 pm

>>>but if we were to use a random-sample-voting system to determine every week whether Steve’s or Paul’s column would be pushed out to millions of New York Times readers, I’d like to think Steve would win some of the time. <<<

As long as the subject doesn't involve information problems.
9 9 nobody.really
March 16, 2016 at 5:06 pm

I don’t know how economics professors on average would feel about most of Obama’s policies, but just googling “Obama tariff” for example, I found this:
https://www.washingtonpost.com/news/wonk/wp/2012/10/23/how-obamas-tire-tariffs-have-hurt-consumers/
which is pretty reminiscent of Landsburg’s arguments about how tariffs are a bad deal.

Oh. Good catch, Bennett Haselton.
10 10 Bennett Haselton
March 16, 2016 at 6:07 pm

AMT buff @ 7 thanks, I hadn’t thought of that.

If that’s the case, it may be that people in the entertainment overestimate the amount of luck involved in success generally, if you also include doctors and lawyers, because success in those fields doesn’t involve a lot of luck.

The strongest evidence that that’s the case, is that med school and law school students are able to borrow money for their education pretty easily, because the lenders know that most of them will be able to pay it back without having to hope for a “big break”.

(By contrast, it’s much harder to get funding for a long-shot business venture, and some studies have found that only 3-5% of venture-capital-funded businesses even break even, much less go through the roof like Facebook.)

I’m glad that entertainers are progressive-minded, but maybe they could have their worldview broadened by hanging out with some doctors and others who made a good chunk of change without a lot of luck.

When any young people ask me for advice, I tell them: There’s more to life than money, but to the extent that you want to make good money, role-model the people who make hundreds of thousands of dollars, not millions of dollars. For exactly this reason.
11 11 David Levine
March 16, 2016 at 7:55 pm

The idea has merit. (Hey, one vote from an economics Ph.D.!)

I suggest a few minor extensions.

Focusing on 20 voters is unnecessary. Sometimes the 20 voters will identify a likely hit and sometimes they will identify a dog. For the almost-hits, a larger sample is called for. And for anything released “to the wild” as a probable non-hit that later gets a surprising number of downloads or likes, it can be re-rated by those not familiar with it.

Also categories such as “alt-rock” are unnecessarily broad. A machine learner can do much better by over-weighting votes of raters who have tastes similar to yours.

Users can choose how much they want to be surprised, so that everyone does not end up in an echo chamber of the music they listened to at age 19.
12 12 Max
March 16, 2016 at 9:47 pm

This idea makes a lot of sense. To the extent that it works (defined as “gets people to listen to more music”) I wouldn’t be surprised if Spotify and others already do something analogous. Spotify’s “Discover” list for instance serves music that is new to the listener and I’ll bet it is used to A/B test new songs to see whether others similar to the user might like them.

Facebook does something similar with posts new: there’s not enough newsfeed space for its algorithm to show a new post to every friend of the poster, so a new post is shown to a sample of the users who could potentially see it, and if it receives high engagement (likes, shares, comments) Facebook will show that original post (and maybe future posts like it) to more of the users friends when they log in.

Or that’s my understanding. A lot of the “exactly what’s going on here” is also going to be somewhat shrouded in machine learning algorithms.
13 13 Harold
March 17, 2016 at 5:56 am

“They see that success in their industry is determined by chance and by personal contacts rather than merit.” I have a sneaking suspicion that he succesful ones think everyone else depends on luck, but they got there themeselves based purely on merit.

I have found the Salganik papers (there are a few on this theme) interesting. From the point of view of the representative agent, they are evidence that viewing the economy as made up of individuals fails to capture the system

“More generally than auctions, it seems to be difficult to predict the behavior of systems with many interacting components, even with a reasonable understanding of these lower level components. That is, just as it is hard to reduce biology to the behavior of molecules, it is hard to reduce group behavior to individual psychology; in the words of Anderson (1972), ‘‘more is different.’’… However, it does suggest that models of collective behavior that are based solely on the behavior of a ‘‘representative agent’’ are seriously deficient.”

We understand that our arms are made of molecules, yet we do not model arm behaviour by modelling the molecules. Similarly, we know that economies are made of people acting, but we cannot model the economy by modelling individuals.
14 14 Luís Aguiar Conraria
March 17, 2016 at 8:10 am

Can we think of a similar process for scientific refereeing?
15 15 Harold
March 17, 2016 at 8:43 am

Slight correction. “but we cannot model the economy by modelling individuals.” Should be completely model the system. All models are wrong, some useful, according to Box and all right thinking folk. Modelling based on individuals is useful, but we must acknowledge that it is not the whole story.
16 16 Bennett Haselton
March 18, 2016 at 2:02 pm

David @ 11 Not sure what you meant by “For the almost-hits, a larger sample is called for” — but of the several different possible interpretations, I think they’re all good ideas :)

For example, if most users have accepted the default threshold that they only want to have songs pushed to them that have an average rating of 8.5, then if someone creates a song that gets an 8.4 or an 8.6, there’s a great deal riding on whether the rating is accurate, so it could trigger an automatic “appeal” whereby it’s rated by a larger sample. On the other hand, if users have set their own thresholds to a wide variety of numbers, then there’s no uber-significant cutoff point, and no particular reason to trigger an appeal for a particular rating.

Perhaps it would be simpler to let artists “appeal” a rating and ask to be re-rated by a larger sample, if the artist pays some sort of fee to compensate the raters. (It could simply be money, paid to the site and re-distributed to the raters, or it could be something else.)

Regarding the idea of re-rating anything that gets released “to the wild” and performs much better or worse than expected — the main thing I’d be cautious here is to make sure it can’t be gamed. When you have “self-selected voting” — where users can go to the song download page, and download it, and rate it — someone could create lots of sockpuppet accounts to download and rate their own song. (Or in the more socially acceptable form of cheating, they could ask their friends or followers to download and rate their song.) Given the potential for cheating here, I’m not sure you gain anything by allowing re-rating by self-selected voters. If your initial random sample consists of 200 voters, their average rating is almost always going to be close to the population average.
17 17 Bennett Haselton
March 18, 2016 at 2:18 pm

Max @ 12 that brings up another point related to yours — the random-sample-voting algorithm can be completely *transparent* and still retain its benefits.

Even if Facebook and Spotify are implementing something like this behind the scenes, it’s unlikely that they’re going to publicize all of the details.

However, with random-sample-voting, a site that implements the algorithm can be completely transparent about every detail — the size of the audience that your song will get pushed out to based on the threshold rating that you achieve — and it still can’t be gamed; the only way to win is to create a song that gets a high rating from your intended audience. You could even reveal to the artist the specific users who had given them the initial rating, in case the artist wanted to ask them more specifically about what they liked or did not like. (Only a few diva artists would likely start tearing into the people that gave them an initial bad rating; but in any case, we’d have a block-private-message feature for that…)

It can also be completely transparent in the sense that it’s *simple*. A lot of algorithsm are obfuscated from most users not just because they’re kept secret but because they’re so complicated that most users could not understand them anyway.
18 18 Nick
March 20, 2016 at 7:01 pm

If you haven’t seen it, you may find the paper ‘Optimal design for social learning,” by Yeon-Koo Che and Johannes Horner, of interest.

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2600931

The authors explore this idea formally, but they are mostly concerned with the trade-off between learning from user recommendations, and maintaining the quality of the recommendation system itself, since in their model, if users of the service know they’re being given random songs to listen to, they won’t waste their time listening.
19 19 Ben
March 23, 2016 at 2:46 pm

It’s wrong to say that promotion and marketing is an economically non-productive activity.

Sometimes the main barrier to making a sale is that the buyer is not aware that there is such a product or that such a product or service is even possible.

Marketing and sales solve the cost of information problem by making it the responsibility of the seller to get the word out.

If you have a better mousetrap, marketing lets people know that you do. It’s not like people read the Biannually Updated Product Catalogue of the Grand Soviet to see if there is a better mousetrap launched.

That’s the problem marketing solves.

Moreover fact that it is difficult and expensive makes it more effective evidence of the quality of the product: Evidence that the seller thinks the product will be well received (or else they are wasting their money on marketing). A poor product marketed well will sell somewhat better than a poor product marketed badly, but the ROI on marketing of a good product is much higher.
20 20 David Zetland
March 27, 2016 at 6:49 am

I like this idea, but I am not sure “PhD economists” (I’m one) are the best judges on merit, etc.

That said, I see the idea as similar to my idea (not too original, but certainly spelled out here http://www.springerlink.com/content/2q80214867370564/) to put academic papers “up for auction” as a means of imprpving on matching and reducing crony-path-dependency.
21 21 David Zetland
March 27, 2016 at 6:52 am

ps/Your idea must *also* overcome the tendency (accidental or intentional) for google and the like to bend results to their corporate values. http://www.wired.com/2015/08/googles-search-algorithm-steal-presidency/
22 22 Bennett Haselton
March 30, 2016 at 1:59 pm

Ben @ 19, I think this is a good point and the full answer is more complicated.

I would argue if a consumer has already decided they’re going to buy something in category X, then it is not economically productive activity to use marketing and sales to try and steer them toward product X1, as opposed to a meritocratic sorting of the products, which may or may not lead the consumer to X1 or X2 or whichever product is objectively best.

However as you say, if marketing makes someone aware of a type of product that they weren’t even thinking about, that can create value where there wouldn’t have been any otherwise. But even in that case I would argue there are ways to do it that do not waste resources:
(1) Occasionally users can be exposed to stuff outside of what they explicitly signed up for (e.g. if you signed up to stream country music, then along with the new country songs where the system is using you as a voter to obtain your rating, you could occasionally be exposed to new rock songs or something else).
(2) For stuff really outside people’s zone of what they signed up for, you can pay them to be an initial random sampler. The further outside their comfort zone, the more you can pay them.

If the system has 1 million users, you can randomly select 100 people and pay them $1 each to, say, “try out this free web-based game for 60 seconds”. If those users give it a high average rating, then now the system knows that even users who *expressed no prior interest in web-based games*, a lot of them still enjoyed the game, and the system can recommend it to the rest of the 1 million users. The $100 initial marketing cost is (1) miniscule compared to what it would have cost to market to those 1 million users directly, and (2) does not consume any actual resources.

More generally, I think that most users have (a) a large category of things X that they wouldn’t mind being exposed to for free, and (b) an even *larger* category X’ of things that they would review in exchange for a fee. Thus the whole system could work without using any resources on “marketing” in the conventional sense, or without trying to game the system in any way other than coming up with the best possible product.