MathJax

Tuesday, July 30, 2019

Project Euler - Problem 15 - Solution Explained

Lattice paths

Starting in the top left corner of a 2×2 grid, and only being able to move to the right and down, there are exactly 6 routes to the bottom right corner.

How many such routes are there through a 20×20 grid?

The Permutation Solution (Brute Force)

This problem states that the only two directions we can travel, starting from the origin, are right and down. If we were to transcribe the traversal in plain english, the six paths that we can travel in a 2 x 2 grid are as follows:

  1. Right, Right, Down, Down
  2. Right, Down, Right, Down
  3. Right, Down, Down, Right
  4. Down, Right, Right, Down
  5. Down, Right, Down, Right
  6. Down, Down, Right, Right

There are a few observations we can make about the transcribed lists above:

  • Every list has $2$ down turns and $2$ right turns.
  • Every list has $2+2$ items total.
  • No two lists are repeated exactly. The primary difference between them is the order in which the turns appear.

We can generalize our observations to state: To determine the number of paths in any $g\times g$ grid, we need only to find the number of distinct permutations of a list that has a total of $2g$ items, contains $g$ right turns, and contains $g$ down turns

The intuition behind the algorithm I used to find all of the permutations can be found here. In fact, the function I use in this solution named nextPermutation is borrowed from the same source.

In the solution below, we are manually constructing an array of "rights" and "downs." Than we are manually permuting the list in lexicographical order while we increment a counter. After we run through all distinct permutations, the counter is then logged to the console.

DISCLAIMER: If you actually try to run the code, it will take a very, very long time to reach the solution. I recommend altering the gridSize variable to 4, it will log 70 to the console (which is the correct answer).

const gridSize = 20;
const route = [];
let count = 0;

/**
 * Permutes array in lexicographical order.
 * @param {number[]}
 * @returns {boolean} If false, no other permutations remain.
 * @author Project Nayuki
 * @see https://www.nayuki.io/page/next-lexicographical-permutation-algorithm
 */
function nextPermutation(array) {
 // Find non-increasing suffix
 var i = array.length - 1;
 while (i > 0 && array[i - 1] >= array[i])
  i--;
 if (i <= 0)
  return false;
 
 // Find successor to pivot
 var j = array.length - 1;
 while (array[j] <= array[i - 1])
  j--;
 var temp = array[i - 1];
 array[i - 1] = array[j];
 array[j] = temp;
 
 // Reverse suffix
 j = array.length - 1;
 while (i < j) {
  temp = array[i];
  array[i] = array[j];
  array[j] = temp;
  i++;
  j--;
 }
 return true;
}

for (let i = 0; i < gridSize; i += 1) {
  route.push(0); // 0 represents "right."
}

for (let i = 0; i < gridSize; i += 1) {
  route.push(1); // 1 represents "down."
}

count += 1;

while(nextPermutation(route)) {
  count += 1;
}

console.log(count);

It should be clear to everyone, by running the code above, that the permutation approach is not the correct answer. There is, however, a way to solve this problem by analyzing it mathematically.

The Combination Math Solution

If we start to think in terms of 'combination' instead of 'permutation', a mathematical solution opens itself to us. The following (binomial coefficient) equation can help us:

$$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$

The intuition behind the equation can be gleamed here:

Let's think again, about the transcribed list of routes of a 2 x 2 grid (in the previous section).

If we began with an empty list with 4 empty slots:

<empty>, <empty>, <empty>, <empty>

Let's ask ourselves, "How many ways can we insert 2 right turns into such a list?" The answer happens to be 6. You can insert them in the following ways:

  1. Right, Right, <empty>, <empty>
  2. Right, <empty>, Right, <empty>
  3. Right, <empty>, <empty>, Right
  4. <empty>, Right, Right, <empty>
  5. <empty>, Right, <empty>, Right
  6. <empty>, <empty>, Right, Right

Notice how this looks a lot like the lists in the previous section. We can omit the down turns and still arrive at the correct answer because we can assume the remaining empty slots represent down turns. This means we need to choose 2 slots out of 4 possible options; in other words, $k = 2$ and $n = 4$. Let's apply our understanding to the binomial coefficient equation:

$$\binom{4}{2} = \frac{4!}{2!(4-2)!} = \frac{4\cdot 3\cdot 2\cdot 1}{2\cdot 1\cdot (2\cdot 1)} = \frac{\not 4\cdot 3\cdot 2\cdot 1}{\not 4} = 6$$

We can generalize this example to assume that $k = g$ and $n = 2g$. We should also process the binomial coefficient equation in such a way that it can be coded efficiently. We can use the Multiplicative formula (Note: The $\underline{k}$ exponent in the numerator of the first expression is a falling factorial):

$$\binom {n}{k} = \frac{n^{\underline{k}}}{k!} = \frac{n(n-1)(n-2)\cdots (n-(k-1))}{k(k-1)(k-2)\cdots 1}=\prod_{i=1}^k \frac{n+1-i}{i}$$

We can implement the above solution, like so:

const gridSize = 20;

function choose(n, k){
  let product = 1;
  for (let i = 1; i <= k; i += 1) {
    product *= (n + 1 - i) / i;
  }
  return product;
}

console.log(choose(2 * gridSize, gridSize));

Tuesday, July 16, 2019

Project Euler - Problem 14 - Solution Explained

Longest Collatz sequence
The following iterative sequence is defined for the set of positive integers:
nn/2 (n is even)
n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.

The Naive Approach

The first thought I had when I read this problem was "This is easy." Of course, I could easily use the brute force solution:

const limit = 1000000;
let maxChain = 0;
let answer;

function Collatz(num) {
  let count = 0;
  while(num != 1) {
    if (num % 2 === 0) {
      num /= 2;
      count += 1;
    } else {
      num = 3 * num + 1;
      count += 1;
    }
  }
  count += 1;
  return count; 
}

for (let i = 1; i < limit; i++) {
  const chain = Collatz(i);
  if (chain > maxChain) {
    maxChain = chain;
    answer = i;
  }
}

console.log(answer);

The second thought I had was: "It can't possibly be this easy..."

The more I stared at the question, the more opportunities I found for optimization.

The Actual Approach

Optimization #1: Constraining the Bounds

Let's look at line #20 in the naive approach. Do we really need to search through almost 1 million positive integers to find our solution? The Collatz Conjecture states that the Collatz sequence for any positive integer $x$, after some number of steps, will eventually reach $1$. In other words:

$$x\to \cdots \to 1$$

Consider for a moment the Collatz sequence for the positive integer $2x$. We know for a fact that $2x$ must, by definition, be even. After all, it can be evenly divided by 2. From the question, we know that the handling of an even number is $n\to n/2$. Since we are assuming $n = 2x$, the Collatz sequence for $2x$ looks like this:

$$2x\to x\to \cdots \to 1$$

We can infer from this thought experiment that $\text{Collatz}(2x) = \text{Collatz}(x) + 1$, which means that $\text{Collatz}(2x) > \text{Collatz}(x)$ regardless of the value of $x$.

How does this fact help us? Look again at line #1 of the naive approach. If we assume that our answer is under our limit variable, than the lowest value that our solution could possibly be is limit / 2, since any value under that has a corresponding $2x$ above limit / 2.

Here is an example: For this problem, $\text{limit} = 1000000$ and $\text{limit}/2 = 500000$. That means that $\text{Collatz}(300000)$ could not possibly be the answer, because $\text{Collatz}(600000)$ is sure to be greater. It makes no sense for us to bother trying to calculate any number below $\text{limit}/2$.

We can then change line #20 of the Naive Approach to the following:

for (let i = Math.floor(limit / 2); i < limit; i++) {

Optimization #2: Double-stepping during Odd Number Handling

Let's look back at lines #12 & #13 in the naive approach. The problem states that if $n$ is odd, $n\to 3n+1$. It's interesting to note that, within the context of $n$ being odd, $3n+1$ will always yield an even number. To prove this, let's think about a hypothetical number $e$ which represents any even natural number. This means that $e-1$ represents any odd natural number. Let's now assume that $n=e-1$ and run it through our odd number handling equation.

$$3\times n+1$$ $$3\times (e-1)+1$$ $$3e-2$$ $$e+ e+ e - 2$$

We can see that when any odd number is run through the $3n+1$ equation, it reduces to a term for which every component is divisible by 2. Since, in our implementation, only odd numbers are run through $3n+1$ we can assume that it will produce an even number and apply the even number handling equation immediately (thereby saving an iteration). This means we change line #12 and #13 to read:

num = (3 * num + 1) / 2
count += 2;

Optimization #3: Re-using Past Work

If you think about the problem just a little bit, a glaring realization will pop up. You will realize that the naive approach is repeating work it has already done. Let's list the Collatz sequences for 4 and 5:

$$4\to 2\to 1$$ $$5\to 16\to 8\to ( 4\to 2\to 1 )$$

Once the Collatz sequence for 5 gets to 4, all the naive approach does is repeated. This is happening for almost 1 million integers. We can save all this time if we could store the Collatz chain length for all numbers that we previously calculated and reference it when calculating a new number.

Javascript's array object is a perfect data-structure to use to accomplish this. The index of the array represents the number to be checked and the value stored represents the length of Collatz chain for that index. To help conceptualize what that might look like, here is the contents of the array in question for numbers $1\to 10$.


Index Value
0 0
1 1
2 2
3 8
4 3
5 6
6 9
7 17
8 4
9 20
10 7

With each iteration, we can check if we've calculated the length of the Collatz chain of that number before, and if so, we can add that chain to the one we've calculated so far... and we're done! We can move on to the next number.

There is just one caveat. There are some chains that produce numbers in excess of 3 million. An array with that many indexed values may be a bit much for our computer to handle. It would make sense for us to focus on storing the chain lengths of those numbers that are below our limit variable, since those will be referenced most often.

To implement this optimization, we are going to add a variable named _cache in the global scope that we initialize to the value of [0, 1]. The 0th index will never be used, but it's important to include the 1st index by default. The _cache variable can be passed into the function as an arguement by altering line #5 of the Naive Approach like so:

function Collatz(num, cache = _cache) {

The while-loop will still terminate once num variable reaches 1 if we alter the beginning of the while-loop to look like this:

while(true) {
  if (cache[num]) {
    count += cache[num];
    break;
  }
  // ...
}

Optimization #4: Deducing Future Work

We saw from the previous section that every Collatz sequence contains a sub-sequence of another number. We saw that it's possible for us to cache the Collatz sequence length in an array. What I will try to suggest in this section is that we should not only cache the Collatz length of the number we are evaluating, but we can cache the Collatz length of all of the other numbers in the sequence in a way that would save many iterations of our while-loop (on line #7 of the Naive approach).

Let's look at some of the sub-sequences in the Collatz sequence for 13:

$$13\to (40\to 20\to 10\to 5\to 16\to 8\to 4\to 2\to 1)$$ $$13\to 40\to (20\to 10\to 5\to 16\to 8\to 4\to 2\to 1)$$ $$13\to 40\to 20\to (10\to 5\to 16\to 8\to 4\to 2\to 1)$$ $$13\to 40\to 20\to 10\to (5\to 16\to 8\to 4\to 2\to 1)$$

Notice that the above sequence also contains the sequence for 40. Assuming we are iterating from $1\to \text{limit}$, than when we arrive at 13, we will not yet have evaluated 40. When we actually do get to 40, we will have to evaluate 40 from scratch (assuming we are using the naive approach). This is unnecessary, since we have already processed it. In fact, it can be calculated mathematically. We can see from the above example that $\text{Collatz}(40) = \text{Collatz}(13) - 1$. In fact, every number in the sequence can be calculated this way:


Sequence       Calculation       Collatz Length
13 $\text{Collatz}(13)-0$ 10
40 $\text{Collatz}(13)-1$ 9
20 $\text{Collatz}(13)-2$ 8
10 $\text{Collatz}(13)-3$ 7
5 $\text{Collatz}(13)-4$ 6
16 $\text{Collatz}(13)-5$ 5
8 $\text{Collatz}(13)-6$ 4
4 $\text{Collatz}(13)-7$ 3
2 $\text{Collatz}(13)-8$ 2
1 $\text{Collatz}(13)-9$ 1

We can see from the table above, that every number in the sequence can be calculated mathematically. We need only 3 pieces of information:

  • The first number in the sub-sequence. So we know which index to cache the sub-sequence length. We can use the num variable from within the while-loop to access that information.
  • The collatz length of the parent sequence. This is calculated and stored in the count variable after the while-loop (line #17 in the Naive approach).
  • The position of the first sub-sequence number in the parent sequence. We can use count variable from within the while-loop because it is updated as the while-loop progresses.

It's important for us to restrict which numbers are calculated in this way. Firstly, in the Collatz sequence for 13, the numbers 40 and 20 represent future work but numbers below 16 have already been calculated and cached so we need not repeat that work. Secondly (and most importantly), we must restrict numbers that are so large that they become troublesome to store. Remember from the previous section that we are storing these numbers in an array. Some sequences produce numbers that are in excess of 3 million. Having an array that stores that many items will absolutely cause difficulties. In my solution, I've restricted storage to numbers under our limit variable.

To implement this optimization, we need to add a variable within our Collatz function. We will call it stack and intialize it to an empty array. The stack array will hold all the future work that we will calculate mathematically after the while loop is concluded.

We can now extend the if-statement from the previous section to store the information we need. We can then apply the mathematics once the while loop concludes:

while(true) {
  if (cache[num]) {
    // ...
  } else if (num < limit) {
    stack.push({
      num,
      position: count
    })
  }
  // ...
}

for (let entry of stack) {
  let {num, position} = entry;
  cache[num] = count - position;
}

Putting it all together!

const _cache = [0, 1];

const limit = 1000000;
let maxChain = 0;
let answer;

function Collatz(num, cache = _cache) {
  let count = 0;
  const stack = [];

  while(true) {
    if (cache[num]) {
      count += cache[num];
      break;
    } else if (num < limit) {
      stack.push({
        num, 
        position: count,
      });
    }
    if (num % 2 === 0) {
      num /= 2;
      count += 1;
    } else {
      num = (3 * num + 1) / 2;
      count += 2;
    }
  }

  for (let entry of stack) {
    const {num, position} = entry;
    cache[num] = count - position;
  }

  return count;
}

for (let i = Math.floor(limit / 2); i < limit; i++) {
  const chain = Collatz(i);
  if (chain > maxChain) {
    maxChain = chain;
    answer = i;
  }
}

console.log(answer);