Thursday, January 19, 2012

Project Euler in R: Problem 24

I had previously posted solutions in R to Project Euler problem 23 and problem 22.  This is the next problem from Project Euler. The statement of problem 24 is as follows.
A permutation is an ordered arrangement of objects. For example, 3124 is one possible permutation of the digits 1, 2, 3 and 4. If all of the permutations are listed numerically or alphabetically, we call it lexicographic order. The lexicographic permutations of 0, 1 and 2 are: 012   021   102   120   201   210
What is the millionth lexicographic permutation of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9?
This is a very interesting problem and it can be solved analytically. The key insight is that n distinct digits can be arranged in n! permutations.

Here's the logic you could use. If the permutations of the numbers 0 through 9 are arranged in lexicographic order,
  • The first 9! permutations, #1 to #362,880 are numbers starting with 0, in order from 0123456789 to 0987654321
  • The next 9! permutations, #362,881 to #725,760, are numbers from 1023456789 to 1987654320
  • The next 9! permutations, #725,761 to #1,088,640  are numbers from 2013456789 to 2987654310.  The millionth permutation is contained in this set.  Thus we know that the millionth number must start with a 2.  In fact, it is the 274,240th = (1000000 - 725760) number contained in this set.
  • Our problem has now reduced to finding the 274,240th lexicographic permutation of the numbers 0, 1, 3, 4, 5, 6, 7, 8 and 9 - the nine digits excluding 2.  We can repeat the above process to get each successive digit.
Essentially we're writing one million as the following sum
1000000 = (a1 x 9!) + (a2 x 8!) + ... + (a10 x 1)
As shown above, the index a1 is 2.  That is, only two times 9! numbers fit perfectly within 1,000,000.  So the first digit of the millionth number is the third digit in 0123456789.  The index a2 is 6, so the second digit of the millionth number is the seventh digit of the remaining numbers 013456789, which is 7.  The remaining digits can be found similarly. While finding the digits you have to be careful to ignore the digits that have been previously used. This is a permutation, so each digit appears exactly once.

The process is tedious, so here is some R code that will make it easier.  This generalizes the solution to find the nth permutation of the numbers 0 to 9.
findperm <- function (n) {
  # Find the indices a1 - a10
  indices = c()
  for (i in 9:0) {
    index = 0
    while (factorial (i) < n) {
      n = n - factorial (i)
      index = index + 1
    indices = c(indices, index)
  # Use the indices to find the nth permutation
  input = 0:9                         # The list of numbers to permute
  output = c()
  for (j in 1:10) {
    output[j] = input[indices[j] + 1] # Move digit to output
    input = input[-(indices[j] + 1)]  # Remove the assigned digits from input
  return (output)
That was straight-forward and a fun illustration of R's capabilities.

But you can do better. The strength of R lies in the wealth of user defined libraries and functions that can ease the implementation of tricky computation.  For a simpler solution, try the lexicographicPermutation function in the library R.basic. Here is the entire solution.

The first solution illustrates that even tricky computation can be coded easily in R and the code is pretty readable.  The second solution suggests that before writing your own code, it is a good idea to look for existing libraries that can help. Statistical computations can get complex and tricky. You can write your own k-means clustering algorithm but it is difficult getting all the edge cases correct. Rather than writing your own routine, look around and see if it has been written already. An expert does a better job writing such routines. And you can leverage their code and their expertise.