Solving AI Everest (10.7) with counting logic

Alternate title: solving the most epic extreme hardest most difficultest puzzle ever made

Here's a question that has been asked many times before: What is the hardest Sudoku puzzle?

To answer this question we need to be more specific. Computers can solve any Sudoku in fractions of a millisecond using highly optimised backtracking procedures. So how about: what is the hardest Sudoku puzzle for a human to solve? How do you determine a puzzle's difficulty, and how do you compare the difficulties of different puzzles? Perhaps you could assign a numerical value and rank them in a list - but how do you calculate this value?

It's complicated and would require a lot of research so first let's do the civilised thing and Google it.

Hmm.

This article will be split up into three sections. Skip to the end if you're only here to see the solution.

- 1. Measuring difficulty

- 2. Arto Inkala

- 3. Solving the puzzle

Section 1. Measuring difficulty

To devise a human-friendly rating system we must first think like a human.

The majority of grids you will encounter in newspapers and puzzle books can be solved purely by placing singles one after the other, using hidden or naked singles. When you begin to encounter puzzles harder than this, which do not succumb to the simplest applications of the Sudoku rules, the way you approach the puzzle fundamentally changes. Rather than being a game of assignment, it becomes a game of elimination, where you begin the solve with myriad possibilities and eliminate them one by one until only the truth remains.

The first techniques to eliminate candidates you're likely to learn are pointing/claiming candidates and naked/hidden subsets. These are enough to solve roughly 85% of all randomly generated puzzles. For the rest there's an ever-growing scale of techniques like Fish, XY-Wings, Unique Rectangles, 2-String Kites, W-Wings, etc. These techniques all share consistent and easily recognisable patterns, typically composed of bivalue cells and/or bilocal candidates. Incorporating these into your solving will defeat 96% of all randomly generated puzzles.

So, the rating system begins to take form: puzzles solvable with singles are easier than puzzles solvable with pointing candidates, which are easier than puzzles that solve with Fish, which are easier than puzzles that require a W-Wing, etc.

If you assign each individual technique a difficulty score you can rate puzzles based on the highest-scoring technique required by iteratively searching for each technique at every step of the solve, going from easiest to hardest, and keeping track of the highest-rated step across the entire solve.

This was the train of thought that lead to the development of SudokuExplainer (SE) in 2006. It was a fast, consistent, easy-to-use, open-source solver that quickly became the standard. There were a few competing rating systems but ultimately everyone was using the SE rating to compare their puzzles and the research around difficult puzzles shifted to a drive to optimise for SE rating. For the remainder of this section I'll keep up the pretence of designing a system from scratch but just know I'm actually describing SE's process. You can find a summary of SE's rating system and implemented techniques here.

This is all well and good and you could keep piling on the named techniques as long as you want but of course these aren't enough to solve every puzzle. Eventually you'll find a difficult puzzle where you run out of techniques and your progress grinds to a halt. What do you do in this situation?

Guessing and backtracking if it's wrong is the simple answer but how does this translate to a rating? Do you find the minimum amount of guesses required to solve the puzzle with a given set of techniques? That's the basis for the T&E(N) procedure which I will not go into here.

How about we evaluate what would happen if we made a certain guess and continued the solve looking at the resulting placements from hidden/naked singles, and if we determine it leads to a contradiction in the puzzle later on, the guess must be a mistake and so we can remove the candidate. Here's an example:

If r7c8 (marked blue) is 7, a stream of implications leads to 7 being eliminated from every position in row 1 (marked purple). Therefore r7c8 cannot be 7.

The "contradictions" according to SudokuExplainer stem from impossible states where the puzzle is no longer solvable by the basic Sudoku rules. The two types are a cell with no candidates, and a region that contains no instances of a digit.

If you find the shortest path to a contradiction this can be thought of as the shortest proof that a guess would be incorrect, then you can run this procedure for every candidate on the grid and find the shortest proof that any candidate is incorrect, and you can say that's the next step in the solve. In this manner the program can measure how difficult a puzzle is to bypass via guessing without actually having to do it. This property also correlates quite closely with how difficult a puzzle is to solve with chaining strategies as there is much overlap with the logical procedures.

The vast majority of puzzles, probably over 99.9%, will be solved by Forcing Chains using this method and only using the simple implication streams of hidden singles and naked singles. If you add more techniques to the contradiction-finding procedure you unlock even more puzzles. If you add all named techniques implemented in Sudoku Explainer, you're able to solve every single puzzle known before the year 2006. (This is foreshadowing for later).

I'm oversimplifying matters slightly as SudokuExplainer implements a variety of Forcing Chain algorithms and rates them all differently, with complicated weight modifiers that overlap with other methods. It's a bit of a mess. But at the end of the day, longer and more complex chains generally correspond to a higher rating. You can read more about the process here and here.

Readers who are familiar with modern Sudoku methods may notice I didn't mention AIC at all - this is because AIC hadn't yet caught on in 2006. SudokuExplainer is entirely Forcing-Chain based and thanks to its influence & widespread adoption it would be very difficult for a new rating system to catch on today. Maybe some day a replacement will pop up that implements modern methods and is as portable and effective as SudokuExplainer. Until then we're stuck with it.

I briefly mentioned that there were other rating systems being developed at the same time as SudokuExplainer, and that few of them caught on. Most failed because they were closed-source and the authors were unwilling to share their methodology and code. Some authors even refused to share their programs altogether. One such author is Arto Inkala.

Section 2. Arto Inkala

Arto Inkala is a Finnish mathematician and Sudoku enthusiast who seems to have cracked the code when it comes to getting his name plastered all over the news. Google his name and you'll find countless articles lauding his genius. He's been called a mathematical whizz, a brainbox, a wizard, and more. His claim to fame is a set of three Sudoku puzzles released in 2006, 2010 and 2012, which were each the self-proclaimed "hardest puzzle of all time". He generated these puzzles and rated them using his own program named AIroot. AI stands for Arto Inkala, by the way - he'll never let you forget his name.

On September 22nd 2006 Inkala posted a thread to the Sudoku players' forum announcing his new method. His method is actually very interesting and I wish he had documented it more completely but unfortunately his 6 forum posts and a handful of resources on his website are all we have. AIroot is a Forcing Chain method very similar to the procedure described above, and especially similar to Denis Berthier's modern concept of Whips.

Recall from earlier that SudokuExplainer uses two definitions of invalid states - a cell containing no candidates and a region containing no instances of a digit. AIroot uses the following three generalised contradictions:

N cells within a common region containing N-1 candidates

N regions containing a candidate N-1 times

Deadly pattern (Unique Rectangle & its extensions)

These are very powerful and in modern terms they correspond to ALS, Kraken Fish and UR Guardians respectively. It appears these constructs would be used to build forcing chains. This post provides the most detailed example of the method we have, with a full explanation of a single move using 6 links. These links branch out and he considers every constraint propagation step as a single "link", even if there are multiple different branches being extended at once.

Blue cells are a unique rectangle on {67}, purple cells are an extended unique rectangle on {89}. This example appears to show all the implications of r9c1 being 1. After quite a lengthy chain (including a grouped link in box 7 and a naked Triple of {345} in box 9 which I marked in purple), r2c2 ends up being 2 as it is the last remaining extended UR guardian. However the chain is incomplete. According to Inkala:

"So at the next step or links all possible eliminations will be made and if this lead to conflict somewhere in the puzzle the number can be eliminated.

I don't know, if this clarify anything, but atleast it is other example, how links continues."

Giving r9c2 the candidates {12} would complete the demonstration and show that r9c1 cannot be 1 as it would empty r9c2 of candidates. I can verify this with Xsudo, please forgive the messy diagram.

It is my opinion that this method is absolutely sound and offers many advantages over the more conventional Forcing Chain algorithms used today. You won't find any other programs that can handle the use of multiple extended URs at once for example, and the only solvers I am aware of that can handle both ALS and Kraken Fish in their chains are YZF and Xsudo.

Inkala provides some more examples of links with diagrams here but they're all size-1 and the eliminations are very simple - mostly achievable by pointing candidates, hidden singles and naked subsets, so there isn't much to glean from this pdf. He provides the ratings of numerous puzzles that were known to him in 2006 on this page.

Unfortunately this was the final forum post made by Inkala (more on this soon), but I've neglected to mention something: two of his posts contained puzzles.

The first is named "AI Killer Application" and rates at 9.9 SE, which put it among the most difficult puzzles known at the time. It requires Dynamic Contradiction Forcing Chains, the variety discussed earlier that has to use all the techniques known to SudokuExplainer in order to disprove a candidate assumption. The hardest known puzzle at the time was named Fluid Drive and also rated at 9.9 SE.

The second, included in Inkala's final post (although already published in multiple newspapers), was a bit more notable.

This puzzle is "AI Escargot" and it rates at 10.5 today - but at the time any Sudoku hobbyists who keyed the puzzle into SudokuExplainer were amazed to see it pop up an error message saying it could not solve the puzzle. This was the first puzzle ever discovered that could not be solved with Dynamic Contradiction Forcing Chains. You can read the initial reaction in this thread.

This post wasn't the first time the forums became aware of AI Escargot as it had already made its way to newspapers, being posted to the link above on October 29th. By the 31st Ocean had come out with a puzzle that rated at 10.6 and which also defeated SudokuExplainer. On November 3rd he posted a further 4 puzzles that rated at 10.6... the same day Inkala officially shared his AI Escargot with the forums. Many more SudokuExplainer-defeating puzzles followed and the difficulty steadily increased.

By this point AI Escargot was taking the international media by storm and his name was in every rag on the planet. Many forum members attempted to reach Inkala and share the harder puzzles they had created but he was unresponsive. There is an indication that he was aware of this explosion of new puzzles:

Unfortunately this note and the link to the forums on Inkala's website were soon removed. Inkala never returned to the forums and rarely replied to emails after this. He never shared his methodology, or his source code, or even his program. Many questioned why a professor of mathematics would abandon the principles of peer review and the free sharing of information.

On the forums the momentum kept up. A 10.7 was posted by Ocean on December 21st. A mere two weeks later, on the 3rd of January 2007, he posted a puzzle with an incredible rating of 11.3.

SudokuExplainer was patched in a hurry and a solution was devised for these puzzles: if Dynamic Forcing Chains are not enough, a layer of recursion is used. Forcing Chains were added to the list of techniques used within the Forcing Chain itself. This technique of nested Forcing Chains is very computationally expensive and in 2006 certain puzzles would take several hours to rate - but it worked. Also, one level of nesting appears to be the most you'll need, as to this day a puzzle has not been found that requires two levels.

Many of the efforts in the discussion thread gravitated towards particular patterns that were likely to yield difficult puzzles: diagonal patterns of givens with an empty box were a common theme. Users experimented with many different limits such as clue counts per chute etc... everyone was searching for a mythical "gold nugget" which would have all these properties at once and break the difficulty records yet again. On February the 7th Mike Metcalf found a puzzle which rated at 11.4 - but on the latest version of SudokuExplainer, v1.2.1, it rates at 11.8 thanks to an update in the weighting of Nested Forcing Chains. StrmCkr accidentally found a related puzzle based on an incorrect permutation of this puzzle which rated at 11.4 but was believed to be harder than Mike's at the time.

Finally, on November 8th 2007, tarek posted the Golden Nugget everyone had been hunting for. At the time it was rated 11.5 SE but today we know it's 11.9, the first to be discovered, and still the record for highest SE rating. Ironically despite being the Golden Nugget it didn't follow any of the rules established so far. The givens weren't fully diagonal, there were too many repeated digits in the chutes, and more. Perhaps people were searching for gold in the wrong places. I've attempted to solve Golden Nugget many times in the past and never felt as if I was making any progress in the slightest, even after a dozen eliminations, so I definitely feel it deserves its rating.

I have to end this tangent here as this is supposed to be Arto Inkala's section. What was he up to while all this was going on? After all he triggered this landslide with his 10.5 puzzle.

Inkala quietly worked behind the scenes, perfecting his solver and his search methods. In 2010 - four years after AI Escargot and three years after the pinnacle of SE rating was reached with Golden Nugget - he announced his new puzzle and sent it off to hundreds of news outlets around the world.

This puzzle was never named but I'm dubbing it "AI Efamol".

...Oh, it's only 10.6. It's still hard I suppose, certainly harder than AI Escargot, but nowhere near what the community had been finding over the previous few years. Of course this didn't bother any of the outlets reporting on the puzzle - they repeated the claim that it was the hardest Sudoku without any research or peer review. They even repeated the multi-paragraph spiel on Efamol, Inkala's snake-oil salesman corporate sponsor which hosted the puzzle on their website. They get to sell their dubiously-scientific products to people who fall for their fearmongering about dementia and aging, Inkala takes home his own tidy sum, and the news sites get their ad money. Everybody wins...

Needless to say nobody was impressed by this puzzle's difficulty and most felt its press coverage was ridiculous. This didn't stop Inkala from pulling the exact same stunt in 2012 (complete with his old friend Efamol) with his final puzzle, which he pinky promise swears is the hardest ever this time...

This is AI Everest - and while AI Escargot deserved its recognition, and AI Efamol was largely forgotten, AI Everest towers over them both in terms of the absolutely ridiculous media coverage it received and still receives. To this day it's almost synonymous with hard Sudoku puzzles. Google puts this puzzle on a massive pedestal and more recently AI programs such as ChatGPT have it on command as their go-to "difficult puzzle". It gets posted on the r/Sudoku Reddit every few days, mostly from people who got it from ChatGPT and think they discovered it themselves. I see it on Discord, on YouTube, everywhere. While writing this article I saw it posted on Sudoku.Coach twice and r/Sudoku once. I'm almost reluctant to publish this article as it would be like pouring water on the sun.

Half of these sources call it "AI Escargot" or "AI Etana" which are both names for the 2006 puzzle, but this one was also unnamed. I've seen it referred to colloquially as "AI Everest" because a bunch of the media coverage of the puzzle compared it to Mount Everest. Ironically this puzzle is the easiest of the lot as it falls victim to a technique known as MSLS/Multifish, which had been known to the forums for a number of years but was discovered too late to be integrated into SudokuExplainer.

Presumably AI Efamol and AI Everest were generated and rated using AIroot. Whatever happened to that program? It has been 19 years since the original announcement and still no further information on the system has been released.
Well, in mid-2022, Reddit user "lordlouckster" contacted Arto Inkala and asked for a download for AIroot. This was his response:

"I don't think I will release it. I use my generator to make puzzles for newspapers and releasing the program wouldn't probably help the sales. The code is also difficult to use without an interface and quite messy due to its researching nature."

On that note, let's get on with the reason most of you are here and solve this thing.

Section 3. Solving the puzzle

Before noting down any candidates let's investigate the assigned digits.

Specifically let's take a look at which digits are present in which rows and columns.

Row 1: 8

Row 2: 36
Row 3: 279

Row 4: 57

Row 5: 457

Row 6: 13

Row 7: 168

Row 8: 158

Row 9: 49

Column 1: 8

Column 2: 579

Column 3: 138

Column 4: 156

Column 5: 49

Column 6: 57

Column 7: 247

Column 8: 136

Column 9: 8

It may not be immediately clear but there is a pattern here: most of the regions either contain digits from the set {1368} or the set {24579}. The only exception is cell r8c4 which is a 5. Let's colour in the rows containing {24579} and the columns containing {1368} blue and purple respectively, and colour the overlapping cells in pink.

Now we will prove that the remaining unsolved purple cells must contain {24579} and the unsolved blue cells must contain {1368}. All we will need for this proof is the golden rule of Sudoku:

Every region of the puzzle must contain each digit 1-9 exactly once.

This can be multiplied for our current situation:

Every region of the puzzle must contain each set of N digits exactly N times.

And again:

Every M regions of the puzzle must contain each set of N digits exactly M*N times.

Let's count digits. Within the blue cells the digits {24579} are present 10 times. We know that within the rows r3459 there must be 4*5 = 20 instances of these digits, so there are 10 more unaccounted for. A maximum of 10 of these can be in the pink cells.

The pink cells are all contained within the 5 columns c13489. These columns must contain the digits {24579} 5*5 = 25 times. If there are a maximum of 10 of these digits in the pink cells then there must be a minimum of 25-10 = 15 of them in the purple cells.

There are 14 unsolved purple cells and another of them is already a 5, in total that makes 15 available cells... but wait - we know that at minimum 15 of these cells have to contain {24579}, and at most 15 can hold {24579}. This is an exact fit so there must be exactly 15 instances of {24579} within the purple cells. You can't have more because there simply isn't space for it. You can't have less because then the pink cells would have to contain more than 10 of them, then the blue cells would have to contain under 10 and that ship has already sailed.

Now that we know there are exactly 15 instances of the digits {24579} in the purple cells, via simple subtraction we know there are exactly 10 in the pink cells, and exactly 10 in the blue cells. It's time to add candidates to the grid.

Since there are 15 purple cells containing {24579}, and there are 15 purple cells that can contain {24579}, we know via the Pigeonhole Principle that these cells cannot contain anything else. Let's add those as candidates.

We've proven there are 10 instances of the digits {24579} within the blue cells, but there are already 10 placed there, so we're done. No more blue cells can contain {24579}.

The counting argument is complete so we can fill in the remaining candidates and begin the solve. At this stage the SE rating has been lowered to 8.9 and the puzzle is within reach of ALS-AIC.

Hidden Pair

Naked Triple

AIC: (6)r1c7 = (6-1)r1c2 = (1-8)r5c2 = r6c2 - r6c7 = (8)r4c7 => r4c7<>6

AHS-AIC: (1)r4c9 = r4c1 - r3c1 = (1-3)r3c6 = r3c9 - (16)(r3c9 = r45c9) => r4c9<>249

Kraken Finned X-Wing: (3)r5c1 = c14/r49b5 - (3=6)r4c5 - (6=1)r4c9 - (1=269)r5c389 => r5c1<>269

ALS-AIC: (8)r5c4 = (8-3)r9c4 = r45c4 - (3=6)r4c5 - (6=1)r4c9 - (1=269)r5c389 => r5c4<>29

Kraken X-Wing: (6)r8c56 = r8c2 - r16/c27 = r6c56 - (6=3)r4c5 - (3=8)r5c4 - (18)(r9c4 = r9c56) => r9c56<>6

Naked Triple

Naked Quad

Finned Swordfish: (6)r359/c139b4 => r4c13<>6

AHS-ALS-AIC: (6)r1c7 = (6-5)r6c7 = r7c7 - (56)(r7c1 = r39c1) - (1)r3c1 = r45c1 - (1=86)r56c2 => r1c2<>6

Naked Pair

Naked Triple

Naked Triple

AIC: (2)r6c1 = r45c3 - r1c3 = r1c6 - (2=8)r2c6 - r2c5 = (8)r6c5 => r6c5<>2

STTE

Section 4. Closing thoughts

So there you have it. The so-called hardest puzzle ever, solved with a little logical trickery followed by some chaining. That's not to say it was easy - it's still a difficult puzzle, the cleanup after the counting argument took me half an hour, and I do respect Inkala's AIroot process. Perhaps if Inkala had been more open with it and released the source code to his program we would all be sharing AI ratings instead of SE ratings... then again, probably not...

I hope this article clears up some of the confusion around the "hardest puzzle" debate, especially when it comes to Inkala's puzzles. The truth is AI Escargot was the hardest puzzle for a matter of days and after that none of his other puzzles were even close to the hardest. He's also not a total fraud as some in the Sudoku community like to call him, his ideas originally had merit but in the end he chose personal profit over the collaborative advancement of Sudoku logic and chose to conceal the truth about how hard his later puzzles really are. Rags like the Daily Mail print whatever the British and American governments tell them to without a second thought so it's not surprising they wouldn't fact check a story about something as trivial as a Sudoku puzzle.

AI Escargot and AI Efamol do not succumb to this counting logic, in fact the solutions are quite complicated and utilise Almost-SK-Loops. AI Efamol is easier than AI Escargot as its SK-Loop has 1 Kraken cover set whereas Escargot's has 3.

It's worth noting certain puzzles that are highly-rated by SudokuExplainer can be easily solved with modern techniques, such as the 11.9 Loki, serving as a reminder not to treat a single-number difficulty rating as gospel. Sudoku is too complex for that to ever be enough and people are constantly improving solving techniques. I believe some day we will have logical solutions to every puzzle, even the absurd 11.9s, without having to resort to huge Forcing Chains.

Credit is due to:

- Everyone involved in the linked sources scattered throughout this post.

- Big thanks go out to "borescoper" whose thread on the Chinese Baidu forums helped me understand MSLS as a counting argument, and gave me the intuition on how to actually spot these big patterns.

- Thank you strmckr for fact-checking and reviewing this article.

- Thanks also to PhoenyxWinter.

Search This Blog

billablog