Unlocking Infinite Innovation

Elo Ranking System for Bittensor Subnet 17

The ultimate goal of every Bittensor subnet is to drive innovation, which should be reflected in the incentives (reward formula).

Subnet 17 provides a platform to democratize 3D content creation and incentivize miners to generate 3D content, prioritizing the throughput of generated content that passes the quality threshold.

Current reward mechanism formula:

reward = quantity * quality

Explanation

Quantity refers to the number of generated assets that pass the validation criteria over a defined observation window.

Quantity depends on three factors: validator's capacity, generation time, and delivery speed. The last two are within the miners' control. Generation time is self-explanatory. Delivery speed is crucial for ensuring a good end-user experience in getting results promptly. We incentivize miners to deploy edge nodes with good connections to validators to trim fractions of seconds for each result delivery.

As for validator capacity, that has been our focus for the last couple of months. We've optimized the validation to handle 240 validations per minute (2x the initial load), with a theoretical maximum of 480 validations per minute (4x the initial load) on the same GPU. Scaling on multiple GPUs is also a configurable option.

Quality is a more challenging metric.

While we have success stories of artists using generated results, state-of-the-art text-to-3D models can't provide end-products and require some manual polishing. The ideal metric would be "how much work is needed to polish results before using," which is incredibly hard to implement.

We practice a threshold approach by introducing a score value that results need to achieve to be accepted. For each result, we have three possible outcomes: denied, passed as high quality, or passed as medium quality.

High-quality results get a 1.0 quality score, while medium-quality results get a 0.75 quality score (subject to change). Medium-quality results are not intended for use and are introduced to encourage new miners with positive feedback. They are also needed for transition periods (more on this below). The final quality score is the exponential moving average (EMA) of all quality results.

The high and medium quality scores of 1.0 and 0.75 will soon be replaced with linear scoring to be more precise.

To drive innovation, we are researching and implementing new ways of detecting defects. We closely monitor new developments in the market and increase quality thresholds with regular patches.

We must align with the speed of development in the sphere to avoid starving the network of generations by setting the threshold too high or demotivating miners from exploring new options by setting the threshold too low.

With each update, we intend to push current high-quality results down to the mid-range, penalizing miners and motivating them to improve.

The Challenge

While the formula allows us to incrementally drive quality improvement and has no defined limits, a cap exists.

Infinitely scaling subnet throughput with synthetic traffic brings zero value and raises the question of pumping numbers for nothing. We already have a daily bandwidth of 5 million generations with the ability to increase it 2x with just one constant changed.

We focus on increasing organic traffic (the demand) and maintaining a healthy proportion of organic/synthetic traffic.

New validation or validation threshold increments have a certain cadence, which means we have periodic time windows when innovation occurs, followed by stability periods.

Proposal: Elo Ranking System

We want to add an additional factor to the reward formula that reflects the average miner quality relative to other miners. We want it to be statistically correct and take into account occasional bad generations. We propose adding miners' duels (clashes) where two pairs of miners get the same prompt, and their rank is updated based on the generated results.

If we change the terminology, replacing "miner's quality" with "player's skill" and "miners' duel" with "chess match," we arrive at a problem that has been addressed for the last 70 years in statistically correct chess player ratings.

The most well-known system, used in chess federations since 1960, is the Elo system, designed by Professor Arpad Elo, an active participant in the USCF. Elo's central assumption was that each player's chess performance in each game is a normally distributed random variable. Although a player might perform significantly better or worse from one game to the next, Elo assumed that the mean value of any given player's performances changes only slowly over time. Elo thought of a player's true skill as the mean of that player's performance random variable.

From Wikipedia:

This formula has proven its value over the years and millions of matches, not only in chess but in other board games, online games, and some athletic sports.

The naive implementation of the Elo ranking system in our context would look like this:

reward = Elo * quality * quantity

Previously, if we had a new leader on the subnet, they would pull slightly ahead of the other miners, with the other miners following soon thereafter.

Now, assuming the subnet has reached its status quo with all miners producing similar results and having the same Elo rank, if a new leader emerges with superior results and wins each duel, their Elo rank would rise much more dramatically.

An additional positive side effect would be that this leader might even have longer generation times, but superior quality will allow them to get more incentives, motivating others to reach the same quality and optimize generation and delivery times.

Compensating for Elo's Deficiencies

Arpad Elo in 1960 didn't have access to modern data analysis instruments or the vast collected statistics we have today. While the computational simplicity of the Elo system has proven to be one of its greatest assets, it has alredy been improved and adopted by different chess federations. USCF and FIDE switched from normal distribution to logistic distribution and updated constants based on collected statistics.

Subjectively superior rating systems were developed in 1995 by Mark Glickman, called the Glicko and Glicko-2 rating systems. While not used in chess federations, they are implemented in several online games (e.g., CS: Global Offensive, Team Fortress 2, Dota 2, Guild Wars 2). Glickman's principal contribution to measurement is "ratings reliability," called RD for ratings deviation.

Given this, before implementing the Elo rating system (or Glicko) into our formula, we need to implement duels, collect statistics, and find the probability distribution our miner "skill" follows, as well as determine the proper constants for the formulas.

Additional Technological Challenges

Thorough results comparison might be computationally heavy. Fortunately, we don't have to do it for every single prompt. We just need statistically enough duels for each miner. Plus, there are ways to add computationally heavy calculations with minimal effect on subnet throughput, starting from the cheapest but easiest options:

Pause the validator to perform comparison and rank update calculations (with the new gateway/hub implemented, this will not affect organic traffic)
Dedicate a separate GPU for comparison and rank updates
Offload calculations to other subnets

Conclusion

If implemented correctly, this system will unlock continuous, unlimited innovations in our subnet, driving consistent improvement in the quality and efficiency of 3D content generation.

Last updated 7 months ago