GRTFL Update: The Challenge: Battle of the Exes Cometh
Happy Challengoliday! America’s fifth major sport, MTV’s The Challenge, returns to your television this evening, and we’ve decided to roll out a special GRTFL post just to get you prepared for this glorious event. I urge you to right now get a Challenge fantasy league together with your friends and hold a draft before this evening’s premiere. There is little more enjoyable in life than yelling, “C’mon, TJ, give her a, ‘You killed it!’ TJ. GIVE HER A ‘YOU KILLED IT!’” at your television. OK, there are actually a gabillion things more enjoyable in life than yelling at your TV about a reality show, but it’s pretty far up the list, so email your buddies, set the stakes and divvy up the alcopsychoholics. You won’t regret it.
Each season The Challenge adds a colon to its title that precedes the format gimmick de jour, and this season is The Challenge: Battle of The Exes. The idea is that each team is composed of a couple who was formerly romantically involved. The producers must feel like pairing exes together will bake in drama, but with this particular lot, extra drama need not be thrown in the oven; once their bellies soak up some spirits, the hook-ups, arguments, and fraudulent claims of coitus occur organically. I can’t wait. I really hope this is the season CT makes good on his claim to actually cannibalize a fellow castmate. I should add cannibalism to the scoring. Oh yeah, the scoring:
Challenge-Specific Scoring (in addition to the general rules):
- Making TJ say, “You killed it”: 25 points
- Winning elimination challenge: 10 points
- Winning final challenge: 50 points
- Leaving show due to injury: -50 points
- Announcing you are “in control of the game” (or something close to it): 5 points
- Slandering your ex’s sexual performance: 25 points
- Cannibalism: 100 points
- Inciting the arrival of a vehicle with a siren: 20 points (general rule added Season 2)
- Assault of an inanimate object: 10 points (general rule added Season 2)
- Cold sore possession: 15 points (general rule added Season 2)
We held our draft via electronic mail earlier this week and, like the show itself, it was full of joy, disappointment, insults, intoxication, and controversy. Here is where we ended up after about 30 hours and 40 emails:
1. House: CT and Diem
2. Lisanti: Johnny and Camila
3. Kang: Mark and Robin
4. Bill: Dunbar and Paula
5. Connor: Wes and Mandi
6. Jacoby: Leroy and Naomi
7. Jacoby: Vinny and Sarah
8. Connor: Abram and Cara Maria
9. Bill: Dustin and Heather
10. Kang: Aneesa and Rachel
11. Lisanti: Ty and Emily
12. House: Nate and Priscilla
13. House: Tyrie and Jasmine
That Tyrie and Jasmine slipped to last pick is shocking. The only possible explanation is that someone leaked their Wonderlic test scores. Anyway. I asked everyone to briefly summarize their feelings about their draft; below are the replies:
House: Jacoby asked for two sentences about my team; I prefer to write about the two most important assets on my team. Not CT’s ferocious competitiveness and not Jasmine’s wild … ummmmm … unpredictability. No sir. The assets that are going to deliver my currently scoreless fantasy-reality team from the depths of “Kang 2011″ are Priscilla’s breasts. This is your time to shine, Priscilla. Make your mom-sister proud.
Lisanti: Having not watched The Challenge in years, I’m mostly flying blind, but I figured I couldn’t go wrong picking a team including a guy with “Bananas” in his name. I expect to dominate.
Kang: Aneesa and Robin, why can’t I quit you?
Bill: I’m happy with my team — I have two former porn stars (Dunbar and Dustin) and a proven emotional wreck of a lunatic (Paula). Dustin is a sleeper: My scouts tell me that, during a go-kart challenge on a Real World: Vegas episode, Dustin turned into an overcompetitive weirdo and sulked about the results for an hour. Imagine him after a tough Challenge? Throw in CT’s homophobia, Johnny Bananas’ sarcastic jokes about Dustin’s gay porn past and Tyrie and Ty hitting on Dustin’s cute girlfriend and I predict Mount Dustin will have multiple volcanic eruptions.
Connor: Jacoby and Simmons have argued for years now that The Challenge should become our fifth major sport. If that is the case, then Wes is Kansas City’s most successful athlete (think about that). He had to be on my team.
Jacoby: I don’t feel great about this draft. Selecting last was a tough break. The only thing I’m rooting for now is that Leroy and Naomi reignite their casual, open relationship from their season of the Real World, leading to multiple cold sore and coitus-denial points.
OK, buckle up, because now it’s about to get REAL weird. Around the Grantland office we have always tossed out the idea of applying advanced metrics to The Challenge, acted like we were going to actually do it, and then lost interest after realizing executing a task like that requires patience, time, and intelligence, all things we are short on. Then I received an email from a gentleman named Dan at MIT. In that email Dan was all, “Hey man, I tracked The Challenge back 10 years and attempted to create a stat like Hollinger’s PER for the contestants. Interested in having a look?” Um, yeah, Dan, color me interested. Below is Dan’s work. To be totally honest, it’s a tough read — I haven’t even read the whole thing myself ̶ so feel free to skim it for the results. (For all I know Dan starts detailing bestiality in the third paragraph. If so, it’s been fun working here.) Without further ado, ladies and gentlemen, I present Dan from MIT’s advanced metric analysis of the last 10 years of The Challenge:
“As I mentioned to you over the phone, I have accumulated data, mainly from the Wikipedia pages, of the past 10 challenges for the 106 different people who have appeared over that span. The statistic I have been working on, inspired by John Hollinger’s PER, is similar in that it attempts to represent all of a player’s contributions into one number. This statistic eliminates some of the biases that are present in simple counting statistics like wins and losses. Still, the formula, like any other, has its faults and I will discuss after the explanation of my methodology.
“To outline my methodology, I will start with a specific challenge and then show how to combine all the challenges to arrive at a player’s single number. First, I found the number of wins each player had in the non-elimination competitions. I then multiplied this number by the number of opponents they beat. In big team challenges, like Inferno 3, the number of wins will be always be multiplied by 1 because they are beating 1 other team. In challenges like Fresh Meat or the Dual there is an ever changing number of teams, so each win will be multiplied by the corresponding number of teams left when the win was achieved. I then adjusted this number for team size by dividing by the number of players left on the winning team. This helps to better define who gets credit for wins. For the big team challenges the size of one’s team is constantly changing and the larger a team, the bigger the divisor, and the less credit you get. For challenges like Fresh Meat or the Dual the divisor is constant as the team size never changes. These totals are then divided by the length of time before a player was eliminated to better compare players who don’t fully last throughout the challenge. These calculations result in a player’s initial raw score and a final raw score is determined after factoring elimination performance. To account for the elimination challenges I broke it into 3 scenarios.
- “1. If the player lasted two or fewer competitions his raw score is multiplied by .1+(winning % in eliminations). This is because it’s possible and likely, especially on big team challenges, for someone to win their first non-elimination competition and then lose immediately in an elimination. In this case the previous raw score would artificially high and the score would see them as being a good competitor when they most likely aren’t.
“2. If the player lasted more than two competitions, but less than a quarter of the competitions, their raw score is multiplied by .5+(winning% in eliminations). This makes is possible for someone who went 1-for-2 in eliminations to retain their entire raw score, but penalizes someone who lasts only a quarter of the game and gets eliminated without winning an elimination.
“3. If a player lasts more than a quarter of the competitions their raw score gets added to a total that is equal to (# of Elimination wins)*(raw score of winning a 50%ile non-elimination competition). If a player reaches this level, performance is no longer biased by small sample size and raw score does not need to be adjusted downward for elimination losses. Additionally, any elimination wins add to the player’s raw score and boost their overall ranking.
“Once the elimination challenges are factored into the raw score each person is separated by gender and their raw scores are normalized on scale from 1-10 within the gender. An important note is that raw scores are not added across multiple challenges and then normalized. This would cause problems in that raw score points are on a different scale for each challenge and there would be a bias relating to the number of overall challenges someone participated. Instead each challenge is individually normalized from 1-10 and a player’s final normalized score is an average of the challenges in which they participated. The final step in my calculation was to account for small sample bias that occurs when someone participates in only one or two challenges. If a player has only competed in one or two challenges their normalized scores can be abnormally low or high — so I averaged them with the overall mean assuming mean reversion but if they have competed in three or more challenges no correction is made assuming that the score accurately captures their ability.
“The best possible score is a 10 and the worst is a 1. Both of these are virtually unattainable because it would have to result in the player being either the absolute best or worst every time they played for a minimum for 3 challenges. A point of note is that I developed an overwhelming majority of the above methodology without data mining.
“My result for the top 15 of each gender so far is as follows:
“The results are somewhat in line with the general consensus, but it does have flaws. Some of the flaws I think could still be fixed within this statistic or could be addressed in another measure that complements it. Some of the current flaws are that it:
- “Does not accurately capture players who perform well on losing teams in non-elimination competitions; similarly it does not distinguish those who perform poorly on winning teams
- “On a broader level, the amount of use from just pure wins/losses is relatively small for most players and analyzing each episode’s competition would allow for a more robust analysis
- “Does not account for strength of schedule. A win against CT is the same as a win against Mike. (I think an SoS modification is possible to implement on a simplified basis.)
- “Does not account for the competitions that are easier when you have more or less people on your team.
- “Hard to fully incorporate eliminations. Players like CT would accumulate a lot more points if they were thrown into more eliminations, but at the same time most ‘good’ players are thrown into relatively the same number. I think more than anything this affects the spread of the scores between ‘good’ and ‘bad’ players.
- “Does not incorporate performance trends. Evan, for instance, is the top ranked player based on the results of his 6 challenges as a whole but after watching last season, out-of-shape Evan may or may not suck.
- “This is probably the easiest problem to fix by developing some weighting system predicated on an already existence of a trend.
- “Does not incorporate performance of those who had significant history prior to Fresh Meat 1; Darrell (although he has retired) is one that comes to mind.
- “It only incorporates athletic performance and leaves out the effects of fights and drinking. I don’t know how it would fit into this statistic but like you said, something else could be developed to ‘quantify’ this aspect of the show. (I imagine there is a huge bias in the content MTV airs.)
“That is all the current progress I have made so far, but I think I have the ability to refine both this statistic and develop others. My data right now is more or less binary in that I only recognize the winning team or individual and the rest are losers. By charting the results of everyone in a given competition it will be more accurate as I can rank the teams based on finish (10, 9, 8 … etc.). Getting more precise results from each of the specific episode competitions is ideal but I think the framework for others metrics like Strength of Schedule, Win Shares, Plus/Minus, etc. can be developed without it.
“I’m sorry this explanation was so long but I wanted to be as thorough as possible. I hope my logic and methodology was clear to follow and I am curious about any suggestions or modifications you have.”
I love Dan from MIT. We are going to sic our dude Dan and his metric magic on this season of The Challenge and perhaps other GRTFL-related shows in the future. But only if Dan actually exists. There is a part of me that thinks there is no Dan from MIT and this is just an elaborate hoax to see if I would publish anything The Challenge-related that found its way to my inbox. If that’s the case, then well played, hoaxer. Well. Played.
Check Grantland for GRTFL scoring results and write-ups wrought with hacky puns. Until then, enjoy your Challengoliday.