Odyssey Adventure Comparison System
About the comparison system
The Odyssey Adventure Comparison System solves the problem of inflated ratings for Adventures in Odyssey episodes on websites like AIOWiki by making using a relative system. This system protects episodes against rating inflation or deflation by comparing them only against each other, rather than someone's abstracted notions of what an episode ought to be in a star-based system.
How it works
When a user loads the Odyssey Adventure Comparison System, they're presented with two episodes. The user simply chooses which episode they think is better and a new set of episodes are presented. Once enough pairings have been processed by users, the statistics page will show how each episode ranks relative to each other.
Understanding the ratings and statistics
The number of matchups this episode has been in against another episode. More are better for determining how reliable a ranking is as it has been rated by more people.
The number of matches the episode has won.
The percentage of wins out of total matches. A higher win rate is better. This can be used to help determine which episodes are better, in the absence of many matches. It does not account for the quality of the episodes that it was matched up against, however.
The centerpiece of the rating system and, given a large enough sample size of matches, the most valuable tool to understanding an episode's degree of success. Elo is a simple system invented by Arpad Elo for comparing the success of Chess players, and is easily adapted to any other system involving two players at a time. Whenever one player wins, some points are given to him while being taken away from the other player, in proportion to the odds. A very strong player will get few points beating a weak player, and the weak player will lose few points; the opposite is true when the weak player gets an upset. In this way the system accounts for quality of opposition. The default Elo rating used by this page for new episodes is 2000.
An attempt to simplify episode comparison by adjusting Elo slightly by win rates, and making a general set of tiers from the combination of those two results. The points should correspond roughly to a traditional 1 to 10 'star' system. This is experimental and can't be expected to be as accurate or precise as the statistics it's derived from, but by the nature of its imprecision and broadness it has the advantage of not leading a layman to read too much into a statistically insignificant difference in Elo (or win percentages).