A Modified Algorithm for Evaluating Logical Arguments

A Guest Post

by

Bennett Haselton

In a previous guest post I had argued that we should use a random-sample-voting algorithm in any kind of system that promotes certain types of content (songs, tutorials, ideas, etc.) above others. By tabulating the votes of a random sample of the user base, this would reward the content that objectively has the most merit (in the average opinion of the user population), instead of rewarding the content whose creators spent the most time promoting it, or who figured out how to game the system, or who happened to get lucky if an initial “critical mass” of users happened to like the content all at the same time. (The original post describes why these weaknesses exist in other systems, and how the random-sample-voting system takes care of them.)

However, this system works less well in evaluating the merits of a rigorous argument, because an argument can be appealing (gathering a high percentage of up-votes in the random-sample-voting system) and still contain a fatal flaw. So I propose a modified system that would work better for evaluating arguments, by adding a “rebuttal takedown” feature.

Arguments are still voted up or down based on the votes of an initial random sample of voters, like songs. But anyone can post a rebuttal to an argument, focusing on what they believe is a flaw. (The rebuttal should focus on a specific step in the argument that is believed to be flawed, and should not just be a global rebuttal arguing in the opposite direction.) The original poster (OP) can try to incorporate the objection into their argument, but if the OP doesn’t concede the point, then the dispute goes to a “jury” of other users, also selected by the random-sample-voting method. (The “jury” that decides between the OP and the rebuttal can be made up of laypersons, or limited to qualified experts, depending on the context.) If the rebuttal wins, then the original post is “disqualified” for containing a fatal flaw that the OP declined to correct.

This is an algorithmic distillation of something Professor Landsburg wrote in The Big Questions: “If you’re objecting to a logical argument, try asking yourself exactly which line in that argument you’re objecting to. If you can’t identify the locus of your disagreement, you’re probably just blathering.”

Before going into more detail, I want to make a fairly audacious claim: I believe that this algorithm is the optimal algorithm for any type of argument-rating or information-sorting problem. In cases where something like this algorithm is already implemented in practice, the implementation works insofar as it stays close to this ideal, and breaks down insofar as it deviates from this ideal.

Besides the benefits of random-sample-voting outlined in the previous blog post, this algorithm relies on several assumptions, which I believe are reasonable:

1) A good argument can be presented without any flaws (where a “flaw” is defined such that a majority of a random sample of peers agree that it is a flaw, so that a “rebuttal takedown” pointing out the flaw will win). That doesn’t mean that a valid argument has to be flawless on the first draft or else it’s worthless. But if the argument is valid, it should be possible to correct the flaws and the argument will still stand. On the other hand, an invalid argument (especially in rigorous subjects like mathematics) will often contain a subtle flaw, and if you try to correct the flaw in one place, that will introduce an inconsistency with another section of the argument, such that no matter how hard you try, you cannot fix the argument without a flaw existing somewhere. For this reason, a single flaw ought to be enough to disqualify an argument, if the OP can’t fix it. But if an argument is sound at its core and just happens to contain some minor errors in its presentation, then those errors can be fixed incrementally with feedback from the community.

2) When voting on whether “rebuttal takedown” invalidates an argument, we assume the jury of users will vote more honestly (less influenced by their own biases) if they are focused on a specific point of disagreement, than if they are asked to vote globally between two essays arguing opposite points of view. If each side of a debate presents a series of statements that vary between true, ambiguous, and blatantly false, there is a temptation for a voter to skew their perception towards the conclusion that they already want to believe in. That’s why I wouldn’t be very interested in a vote between the OP’s essay arguing one side, and a rebuttal that simply argues the opposite side from scratch. But when users are asked to focus on the correctness of a specific statement, or even a specific step in one’s reasoning, I believe they can be more clear-eyed. (For instance, I agree with most instances in which Politifact has rated some of Clinton’s statements “False” and Trump’s statements “True”, even though I still think Trump makes overwhelmingly more false statements of the two.) Moreover, if an argument is laid out rigorously and a reader disagrees with it, it is incumbent on them to find what they think is the flaw — and if they can’t find a flaw (at least, not one where a majority of peers would agree that it’s a flaw), an intellectually honest reader would be more inclined to think there’s some truth to the argument after all.

The modified system — random-sample-voting with rebuttal-takedowns — has a number of desirable properties:

1) The rigorous arguments that “win” in this system are the ones where no one has (yet) found a flaw.

2) If a reader does find a flaw, and the jury of their peers votes that it is indeed a flaw, then there is no need to get into a side debate about whether the flaw “really” undermines the whole argument, or is just incidental. If the OP won’t fix the flaw, then the argument is invalidated; if the OP (or someone else who wants to take up the mantle) really believes that the flaw is incidental, they should modify the argument to take out the flaw.

3) Under existing debate and forum systems, if one person makes an argument, and 100 other people make counter-arguments, it is not practical for a casual observer to know whether one of the counter-arguments is pointing out a fatal flaw in the OP’s argument, without reading through and analyzing all of them. But it is possible with this system. Consider the case of Argument A which has been posted in a typical present-day discussion forum (not using this algorithm), and has 100 counter-arguments posted in response, and none of the counter-arguments are valid (where a “valid” counter-argument is one that would win the vote in a rebuttal takedown). Meanwhile, argument B has been posted in a discussion forum and has 100 counter-arguments posted, and one of them is valid (and the objection is fatal to argument B, i.e., argument B is not valid). To an observer, it would be cumbersome to go down the rabbit hole into every counter-argument posted to both arguments, in order to find the one valid objection. And thus there is no way for a casual observer to know that argument A is valid but B is not. But with the new proposed system, a casual observer would be able to see that argument A had not been defeated by any objections posted by users, whereas argument B had been defeated. (It’s still not possible for a casual observer to know whether A might someday be defeated by a rebuttal, and similarly, there is a time period where argument B will look “valid” because nobody has posted a valid objection yet. But I believe that this is the best we can do algorithmically.)

Here are some example scenarios where I believe this algorithm would be optimal:

1) At the /r/lifehacks/ subreddit, users can submit simple but little-known techniques for solving a problem or otherwise improving your life. Users can browse the ideas sorted with the top-voted ideas listed first, and vote ideas up or down or post comments (and the comments themselves can be voted up or down). So this obviously is subject to the Salganik effect, where a coincidental flurry of initial upvotes can get an idea in front of more people, and trigger a snowball effect of more upvotes, even if the idea is not any better than others that were submitted at the same time. On the day that I visted the subreddit, one of the highest-rated ideas was to save thousands of dollars on your lifetime mortgage payment by cutting the payment amount in half, and then paying every 2 weeks instead of once per month.

(Now, as for the idea itself: The reason this “works” is because “every two weeks” is slightly more often than “twice a month”, so this just amounts to paying a larger mortgage payment every month, which you can already do anyway if you want to — but you might not want to, if you think the markets are a better investment. So this “life hack” just uses a simple math error to disguise a mundane financial choice, which may or may not be a good one, depending on your mortgage interest rate and how the markets end up doing. And yet the idea got thousands of upvotes and sat for days at the top of the leaderboard.)

Under a naive “random-sample-voting” system, without rebuttal takedowns, this idea might have been gotten a high initial rating as well. However, in a system that allows rebuttal takedowns, some user probably would have posted the counter-argument outlined above, and the rebuttal would have probably been approved, which would have disqualified the parent idea.

2) Companies like Google receive so many security vulnerability reports from the public, that occasionally a valid security report (including some submitted by friends of mine) can be lost in the shuffle.

Under a simple random-sample-voting system, incoming reports would be reviewed by a random subset of Google employees who are qualified to evaluate them, and the reports that get the most “upvotes” will get closer scrutiny. However, this could lead to errors if an incoming security report seems to depict a serious security flaw (thus, getting a large number of upvotes), but the security report is based on a subtle faulty assumption (for example, if the “exploit” depends on being able to run untrusted code on the user’s machine — this is usually not the case in the real world, and when it is, the attacker already has control of the user’s machine anyway, so the “security vulnerability” is moot).

With the rebuttal takedown feature, if an eagle-eyed Google employee happened to spot the faulty assumption, and their peers voted in favor of that “rebuttal”, then the security “vulnerability” would be disqualified, even if it received a high proportion of upvotes from the other employees who evaluated it.

3) In my original post about random-sample-voting, I had mentioned in passing that it could be used by Facebook or Reddit to handle abuse reports — if a user flags a post as violating the site’s Terms of Service, then rather than being reviewed by a company employee (which creates a bottleneck), the post could be reviewed by a random subset of the site’s users who had volunteered as abuse report mediators. Even in this context, there are scenarios where random-sample-voting would work better with a rebuttal-takedown feature.

Suppose a user posts a picture of Barack Obama with a Hitler mustache and uniform, and another user reports that post as a Terms of Service violation for being “racist”. I think that if this were put to a simple random-sample-voting jury, many users would agree with that assessment. But with rebuttal takedowns, a user can post a “counter-argument” to the original abuse report, essentially saying: “There is nothing in this post that references race. You can think it’s in poor taste to compare Obama to Hitler, but people can (and did) do the same to white politicians. It’s stupid, but it’s not racist.” Given a moment’s consideration, hopefully most people would agree with this response, and thus the “rebuttal” would invalidate the abuse report, which I would consider to be the correct result.

4) Academic journal debates. Academic journal review is one of the few systems that uses something like random-sample-voting to evaluate content — submissions are sent to a random subset of peers, anonymized from the author and from each other, who submit their “ratings”. But after a paper is published, another peer in the field might spot a flaw that largely invalidates the argument in the original paper. The academic review system is what I had in mind, when I said at the outset that some systems track very closely to the random-sample-voting-with-rebuttal-takedowns algorithm, and those systems are flawed only insofar as they deviate from the algorithm. In this case, the flaws are that (1) even if one paper has been “taken down” by a rebuttal from another, there is no “invalidating marker” applied to the original paper (other readers can still find it in archived journals, and the original author can still list it among their “published papers”); (2) there is no way for a reader to submit more minor improvements to the paper (correcting typos, or suggesting clarification of a difficult section of the paper); (3) after a paper is published, there is no sense of how many readers have read the paper without finding a flaw. In the algorithm I’ve been proposing, each time a person reads the argument and can’t find a flaw, they could mark it with a tentative pseudo-endorsement, signifying, “I read through this and I couldn’t find any mistakes.”

I first had a fully-formed version of this idea about 10 years ago, and ever since then it’s struck me how often I’ve run across a problem or a process that seemed like it could be optimized by some variation of this algorithm. In the same way that some economists believe that markets are almost universally applicable to problems of resource-allocation, I think this algorithm is almost as widely applicable to problems of information-sorting. And as Professor Landsburg has written before, there is no efficient marketplace to compensate people for good ideas and good arguments, which means that we probably need another system to bubble the best ones to the top — perhaps this algorithm is one way to do that.

Print Friendly, PDF & Email
Share

10 Responses to “A Modified Algorithm for Evaluating Logical Arguments”


  1. 1 1 entirelyuseless

    “Given a moment’s consideration, hopefully most people would agree with this response, and thus the “rebuttal” would invalidate the abuse report, which I would consider to be the correct result.”

    While I see what you’re getting at, I don’t have much hope that most people will respond in that way. It happens pretty frequently that I will make such a factual statement (like “this nowhere mentions race”), which is completely unarguable– no one can say, “oh look here, it did mention race”, and yet my comment is met with complete rejection, both on the part of the responding comments, and on the part of the votes, because it either is or appears to be opposed to the politics customary on that forum. As one example, I stated that a certain attitude would likely have negative effects on Christians, and was met with, “all you can see is disagreement with your religion” etc., despite the fact that I do not accept any religion, and I was simply stating an objective fact about the likely results of a certain attitude.

  2. 2 2 Nick

    The motivation behind random-sample-voting, as I understood it, was to avoid situations where an initial number of high votes lead to an information cascade where everyone is sampling one song and so other deserving music isn’t sampled and so never has a chance to get high votes.

    But what’s the motivation here for juries / rebuttals? In particular, why not just list all the rebuttals underneath an argument, let people decide on their own, and then use the random-sample-voting system?

    You mention one reason, which is that 100 separate rebuttals might be confusing for voters. In which case, why not extend the rationale behind random-sample-voting and only list some of the rebuttals? The platform itself could then, e.g., weigh down arguments which are less likely to be voted for after particular rebuttals are presented. The question of how to optimally do this seems interesting to me.

    The jury / disqualify system seems odd to me because one of the distinguishing characteristics of these systems is that participation by users is voluntary. There’s a tradeoff between the service the platform provides to the users (disseminating information) and the service the users provide to the platform (voting for content), if the scales are tipped too far in one direction, users won’t participate in the platform. The random-sample-voting system trades off these two things, providing some service to users in exchange for votes, the jury / disqualify system does not. Why should people participate in the jury?

  3. 3 3 Bennett Haselton

    @1 — I wonder if the results of an actual rebuttal-vote in my system would be different from what you’ve observed on those forums, for a couple of reasons:

    1) Most notably, the rebuttals in my system would be voted on by a random sample of the *entire* population of the site (who had signed up to review abuse reports). So you wouldn’t have the issue of political biases of only those who frequent a particular forum.

    2) People are more inclined to reply to something if they disagree than if they agree. So for every person who missed the point and replied to your comment saying “You’re just upset because they disagreed with your religion”, there were probably some who read your comment and thought, “Well, he’s right, objectively speaking, X will probably have a negative impact on people who identify as Christians.” But they just didn’t have any reason to say anything. (I know that when I read a comment — on this blog or elsewhere — and I agree with what was said, I usually don’t feel the need to chime in.)

    3) I am assuming that people are more honest when they are voting on a highly specific question — “Did you mention race or not?” — than when just expressing their preferences between one side of the debate or the other.

    4) I hope that most users would hold *abuse reports* to a higher standard of intellectual honesty, and thus if someone has filed an actual abuse report with the hope of getting a post removed, and you respond that the abuse report is not valid, people might be more inclined to support you if you’re right. That’s not quite the same as just a back-and-forth in the course of a normal debate on a forum.

    I think #1 would be the factor that makes the most difference. The standard for a bona fide abuse report would be the same across the whole site, so voting on an abuse report would be done by a sample of all users.

  4. 4 4 Bennett Haselton

    @2 Taking the “motivation” question first.

    I have been assuming that in all cases — whether rating songs, or voting on whether abuse reports are valid, or rating logical arguments, or acting as “jurors” to decide whether a rebuttal is correct — people would participate just because, well, they thought it was interesting.

    If you allow the assumption that people will rate songs or logical arguments because they find it interesting, it’s not clear to me why people would be less likely to vote on rebuttal “juries”, also because they find it interesting. Or do you think people are going to be more easily bored if they’re just voting on a yes/no question of whether a claim was right or not?

    In any case, if not enough people participate in voting “just for the hell of it”, we can create an incentive system for people to vote. Slashdot and Reddit give users karma points for posts, and you could award jurors points for voting or penalize them points for not voting.

    More importantly, the incentives can be used to prevent most disputes from coming to a vote in the first place, which I think helps to avoid the scenario you were talking about, where the site’s members get bogged down all the time in “jury summons” requests :) You can tell a user that if they post a rebuttal and the rebuttal gets voted down, they lose some of their own karma. On the other hand, if a user posts a rebuttal, and the author of the original post doesn’t concede the point and so the rebuttal goes to a vote, and the rebuttal wins, then the author loses points. (The author should always have a chance to concede first — you shouldn’t be “fined” because someone caught an accidental error in your post. You should, however, be fined if the error is pointed out and you refuse to fix it :) )

  5. 5 5 Bennett Haselton

    @2 Now, as to the proposal of using random-sample-voting to choose the best rebuttals (instead of having any winning rebuttal disqualify the parent argument), I think the rebuttal-disqualifier system keeps things simpler without sacrificing the quality of the discussion.

    One problem with simply rating rebuttals, is that it may not be effective against an argument that uses a “rolling trivial error” tactic — where each sentence contains one or two errors or misleading word choices, but in a conventional debate forum, you can’t pounce on each one individually because it just looks petty. Rightly or wrongly, if you respond to someone’s essay by nit-picking all the individual misleading word choices, I think it creates the impression to a third-party reader that you don’t have an effective rebuttal against their *main* point. But if you can’t respond to the individual petty errors, then the petty errors stand unchallenged, and their cumulative effect can sway the reader.

    So, consider instead your proposal where you allow “petty” rebuttals along with other types of rebuttals, and then voters vote on the best rebuttals overall. But how do they decide which rebuttals are “best” — if one user writes a rebuttal to a minor trivial point but the rebuttal is 100% right, and another user writes a rebuttal to a major point in the argument but the rebuttal is only 75% right, which rebuttal is “more important”? If the minor-point rebuttals get voted down, then a reader of the original argument will still be subject to the effect of the “rolling trivial error” tactic because those haven’t been rebutted. But if the minor-point rebuttals get voted to the top, the reader might not realize that the main point has been rebutted as well.

    Basically, if an argument contains a premise or a reasoning step that is incorrect (according to the votes of a clear majority of a random sample of users), I just don’t see the benefit of keeping the argument around so that people can compare other rebuttals that may only be mostly-correct. Why not just fix the original argument to take out the part that everybody is saying is wrong? (And if the original author refuses in a fit of pique to fix the error, the mantle can be taken up by someone else — the system can let a second author fork off a modified version of the argument, which attempts to fix the error.)

    Remember, even an argument that has been “defeated” by a rebuttal, will still be viewable, and users can still view the other rebuttals that have been posted as well. In other words, the system will work more or less like what you’ve proposed, except with a penalty applied to arguments that are defeated (either a big “badge of shame” at the top saying “this argument has been defeated by rebuttal X”, and/or a sorting system on the site that displays “undefeated” arguments much more prominently than “defeated” ones). The idea is to incentivize authors as much as possible to write “undefeated” arguments, which means the main thrust of their argument has to be valid, but also disincentivizes them from using the “rolling trivial error” tactic — minor errors or misleading word choices — because those can be defeated with a rebuttal as well.

  6. 6 6 Harold

    ” I am assuming that people are more honest when they are voting on a highly specific question — “Did you mention race or not?” — than when just expressing their preferences between one side of the debate or the other.”

    I am sure you have heard of “dog whistles”, where an argument is implied to certain groups. It may be difficult to asses whether these statements are racist (or whatever). So much is about context, tone and previously understood positions.

  7. 7 7 Harold

    Perhaps an example is in the next post. Did Henry ask for someone to kill Thomas Becket? Did Trump suggest people could shoot Hillary Clinton? He did not actually say that.

    I suppose we must put forward arguments for challenge.
    1) Trump suggested supporters of the second amendment should shoot Clinton in order to stop her.
    2) Trump suggested supporters of the second amendment should vote against Hillary Clinton.

  8. 8 8 Nick

    @5, 4

    We can’t really decide these issues without some empirical evidence or a theoretical framework to work in. It’s an awfully strong claim to make though that this is the best such system when there are so many competing alternatives. I’ll just mention again that there is a large and growing economic literature about this very question, I think I posted a link to a recent paper in a comment I left on your last post.

  9. 9 9 Johann S.

    Has this approach ever actually been tried out? How did it go? (Particularly interested in any publicly viewable deployments.)

  10. 10 10 David Grayson

    Where is the actual definition of the algorithm? I don’t think you supplied one because I don’t see any mention of inputs, outputs, operations, pre-conditions, and post-conditions. It seems like you have an idea for how to make an algorithm though.

Leave a Reply