A Little Hacking Run: Permutations

Yesterday, I gave an overview of a little for-fun project I threw together the day before. Today, we'll look start looking at the code.

The first problem I chose to tackle was generating all of the possible "words" from a given string of letters. For those who (like me) didn't study a lot of math -- each unique arrangement is called a permutation, and for those who (like me) haven't memorized formulas and algorithims associated with permutations, I'll provide some background.

Permutations and Combinations involve the rearranging of items in a list. With permutations, order matters (i.e. 'ab' and 'ba' are different permutations) while for combinations order is irrelevant (i.e. 'ab' and 'ba' are considerd the same thing). Since for example, 'den' and 'end' are different words (and both are likely to be found in the anagram game) my solution needed to calculate the permutations of the string. So, I brushed up on permutations.

The first thing I wanted to know was "How many permuations are there for each jumble?" It turns out that this is simply the factorial (assuming no duplicate letters) of the number of letters in the string. Since the jumbles in the anagram game are six letters long, that meant there were 720 possible permuations of that string. That's a pretty small number, so I decided that a simple, brute-force algorithim for calculating the permuations would be sufficient for my needs. However, there were two problems with this approach that I had to account for:

  1. I needed to calculate the 3, 4 and 5 letter permutations as well as the 6 letter permutations (i.e. there are really 870 permutations)
  2. I didn't know any algorthims (even simple brute-force ones) for calcuating permutations.

I decided to solve the second issue first, since I figured I could solve for the smaller ones once I figured out to generate any permutations.

Besides, I figured the answer was just a Google search away.

The first problem I ran into was one of volume -- most of my early searches inundated me with CS syllabi and freshman requests for someone else to tell them how to solve this problem (I'm willing to bet there was a strong correlation between the two types of links). So, I took a break from finding the answer to see if I could work it out for myself.

Swinging over to Dr. Math's site, I refreshed my memory on permutations, and got to work. I knew that each letter had to spend time in each position while every other letter took its turn in every other spot. Unfortunately, my muse took a coffee break, and things kept getting ugly really fast. My approaches all had problems like this one (I actually finished later out of stubborness):

class String
  def rotate!(index = 0)
    return self if self.length == index
    self[0, self.length] = self[0, index] + self[index + 1, self.length - 1] << self[index,1]    

def per(permutations, original, s)
  s.length.times do 
    permutations << original.rotate!(original.length - s.length)
    per(permutations, original, s[1,s.length])

$permutations = Array.new
per($permutations, "abc", "abc")
puts $permutations.uniq

$permutations = Array.new
per($permutations, "abcd", "abcd")
puts $permutations.uniq

Here I take advantage of Ruby's ability to add behavior to a built-in class at run-time (one of my favorite features of Ruby to hack around with) to add a rotate in place method that takes the index of where to start rotating, picks the first value off, appends it to the part that is shifting to the left, and adds that to the part of the string not being rotated.

Then a recursive method that takes a collection to store our permutations in, the orginal string, and the part of the string we are interested in at the moment.

This works, but its a) ugly -- especially the recursive method, and b) produces duplicates (hence the use of the uniq method on Array), so I went back to Google.

This time I found a page describing an algorithim that finds no duplicates (and was not recursive) written in SNOBOL. I know next to nothing about SNOBOL, but I gave a go at trying to translate the example into Ruby. After an hour or two I gave up. I'm not sure whether there was a bug was in my translation or in the example, but I couldn't get all of the permutations using that algorithim. So it was back to Google, where this time I stumbled on Florian Frank's permutation library. The Ruby Application Archive, who'd have thought to look there?

I actually, wasn't too upset, because I learned a few things trying to implement my own permutation generator (despite the result), and only wasted a few hours (as if spending more time hacking around in a language of your choice is really a bad thing).

Florian's library is simplicity itself to use (It comes with good examples in its RDoc, too). Essentially, you provide it with size of your data set(e.g. 6 in this program) and it generates a collection of objects describing its permutations. You can then project their description of the resulting permutation onto your data set.

So after all of the aforementioned thrashing, a quick gem call ('gem install -r permutation'), and a couple lines of code and I had all 6-letter permutations of my jumbled string! Here's all of the code needed to do this: require 'rubygems' require_gem 'permutation'

perm = Permutation.new(6) $words = perm.map { |p| p.project($jumble) }

First create a permutation of the correct size (6 is the length of our jumbled string). Then call map on the resulting collection, so that each permutation object can 'project' itself ('project' is a method on the permutation object) into the string, resulting in an array of 'words' -- one for each possible permutation.

Actually, the library provides a shortcut for this that's probably better for my purposes (if not shorter) since it works with the string directly (but I didn't notice it until later): perm = Permutation.for($jumble) $words = perm.map { |p| p.project }

And we get the same result.

One other thing I didn't notice until later, was that Florian provides a reference (its in the rdoc manual) to where he got the algorithim: Steven Skiena's The Algorithm Design Manual -- this would've been nice to have before I started to reinvent the wheel (and I may pick up a copy anyway). :-)

So step one complete. What's next?

At this point the program could only generate 6-letter permutations. I also needed the 3, 4 and 5 letter permutations. But I'd turned the corner, and it was time to start looking at how to filter for the valid English words. To that I figured I'd need a dictionary (or a lot more time and patience).

Next time, I'll talk about how I added the 3,4 and 5 letter permutations to the list, and how I filtered it for valid English words.