Predicting Opus 4.8's Time Horizon from its AECI
And a note on Mythos Preview early vs launch
I have recently released some posts predicting METR Time Horizon values for recent AI models based on their ‘AECI’ values, taken from Anthropic’s internal version of the Epoch Capabilities Index (which I now work on).
As Opus 4.8 just released alongside new AECI values, I thought I would repeat the analysis with it included. In the system card Anthropic note:
To calculate Claude Opus 4.8’s point estimate, we used a smaller set of evaluations (n=11) than for previous launches (n=25)
I take this to mean all the values on the new plot are calculated only on this smaller set of evaluations, as all the values have slightly changed and thus we need to recalculate the relationship between the updated AECI and time horizon.
Here is the new plot Anthropic include in the Opus 4.8 system card:
Here are my1 extracted values for all of the data points:
Doing the same regression of ln(TH) on AECI as previously now gives the following results:
This slightly brings down the estimates compared to my preview results, giving:
Opus 4.7:
50% Time Horizon: 15.4 hours (down from 18.8)
80% Time Horizon: 2.2 hours (down from 2.6)
Opus 4.8:
50% Time Horizon: 20.0 hours
80% Time Horizon: 2.8 hours
Mythos Preview (launch)
50% Time Horizon: 33.8 hours (down from 40.3)
80% Time Horizon: 4.6 hours (down from 5.5)
A note on Mythos Preview (early)
In my previous post predicting time horizon from the AECI I estimated Mythos Preview would have 40.3 and 5.5 hours for its 50% and 80% time horizons respectively. METR then released results for an early pre-released version of Mythos Preview and while its 50% Time Horizon was above 16 hours and so beyond their ability to estimate accurately, its 80% time horizon was only 3 hours and 6 minutes, notably below the predicted value.
I think this confused some people who had seen my analysis into thinking the Mythos Preview was less of a jump than anticipated. I think this stems from a confusion between the early version of Mythos Preview from Feb/March that was tested by METR and the April 7th ‘launch’ version on which the AECI results were based.
AISI recently released an analysis showing a substantial gap between these ‘early’ and ‘launch’2 versions of Mythos Preview on some of their cyber evals, and so it seems plausible to me that a similar gap would also exist for software engineering ability.
Unless and until METR release results for the April 7th ‘launch’ version of Mythos Preview we will not be able to assess whether the relationship between the AECI and time horizon still holds (although as noted at the time, its predicted value of 40.3 hours would be well above METR’s ability to estimate with their current TH1.1 task suite).
Claude actually slaved away over extracting these for the longest yet
Called “Mythos Preview (new)” in their article


