James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 164 , comments - 1434 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in the Seattle area, who has been performing C++/C#/Java development for over 20 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

.NET

CSharp

Little Wonders

Little Wonders

vNext

Little Puzzlers–List all anagrams for a word

I like to keep my brain sharp by working on programming puzzlers. On off weeks I'm going to start posting programming puzzlers I've collected over the years. Hopefully you'll find them as entertaining as I do.

The Problem

Given a file of all valid words in the English language, write an algorithm that would be suitable for a web page that will list all valid English words that are complete anagrams of the word entered by the user on the page.

That is, if the user visits the page and types in POST, the web page should quickly process the word and display that the valid anagrams are STOP and POTS.

Note that we are only looking for single-word anagrams, not anagram phrases. 

Hints

Remember the use case, the algorithm proposed should be suitable for a web application and thus should be performant and not hog the CPU. 

The longest word in the English language (at least that I’ve found so far) is pneumonoultramicroscopicsilicovolcanoconiosis, which is 44 characters long. 

You need only consider words in the dictionary, any word not in the dictionary can be considered “not a valid word” for this exercise.

Spoiler Alert!

Fair Warning: there may be discussion of the problem and potential solutions posted in the comments below. Read at your own risk.

Print | posted on Tuesday, July 28, 2015 1:43 AM | Filed Under [ My Blog C# Software .NET Little Puzzlers Technology ]

Feedback

Gravatar

# re: Little Puzzlers–List all anagrams for a word

Pneumonoultramicroscopicsilicovolcanokoniosis is 45 characters long!
7/28/2015 11:30 AM | AF
Gravatar

# re: Little Puzzlers–List all anagrams for a word

AF:

Bah! You're right, miscounted.
7/28/2015 12:52 PM | James Michael Hare
Gravatar

# re: Little Puzzlers–List all anagrams for a word

Counted? That's what JS consoles are for :D http://puu.sh/jgGeb/2be714606d.png

Also, for people interested, I found this:
https://raw.githubusercontent.com/eneko/data-repository/master/data/words.txt
7/28/2015 6:09 PM | nelson
Gravatar

# re: Little Puzzlers–List all anagrams for a word

http://pastebin.com/HLdr8U0C

and got a list of over 300k words here https://github.com/docdis/english-words

7/28/2015 6:50 PM | Keith Nicholas
Gravatar

# re: Little Puzzlers–List all anagrams for a word

Here's what I came up with. Turns out it's basically the same as Keith's, but it seems that mine uses a bit less memory (~ 10MB on my machine, same word list). Likely that's due to mine requiring fewer hash buckets in my dictionary, since the way I "hash" the words will lead to more collisions. Of course, the trade off is that mine would be less performant. Such is the way of things, however :p

https://gist.github.com/nelsonlaquet/6e1c1112abd9b4870cc0
7/28/2015 8:59 PM | nelson
Gravatar

# re: Little Puzzlers–List all anagrams for a word

If speed is of the essence, then it's all about having the data prepared properly.

I'd use that word file just to create a dictionary for use in this project. it would map a list of anagrams for each set of letter, to those letter in alphabetic order, e.g.

[OPST] = "POST, POTS, STOP"

Making the actual algorithm just cleaning, uppercasing, and sorting the input word, and then looking it up in the dictionary.

The question of how this dictionary is stored is still open based on other criteria.... Can the entire dictionary sit in memory? If the word list is only 10MB, then even if formatted into the dictionary it jumps to 20MB, that's probably still small enough to stay in memory.

If not, then we'd need a simple way to subdivide the list. Perhaps separate files based on word length, or the first of the sorted letters.
7/29/2015 1:27 PM | James Curran
Gravatar

# re: Little Puzzlers–List all anagrams for a word

Mine is basically the same as Keith's, but using MultiValueDictionary (from https://www.nuget.org/packages/Microsoft.Experimental.Collections)

https://gist.github.com/thomaslevesque/d16c2141fccfb06c78f6
7/30/2015 7:19 AM | Thomas Levesque
Gravatar

# re: Little Puzzlers–List all anagrams for a word

Funny, I implemented this algorithm, in C, as part of my WScrabble IRC bot : http://wiz0u.free.fr/wscrabble/index.php
(sources available, not my most elegant code, but works fast !)
8/3/2015 8:57 AM | Wizou
Gravatar

# re: Little Puzzlers–List all anagrams for a word

Here's my implementation: http://pastebin.com/fr3ADcbB

Uses the same mechanism. Takes all words in the list, sorts the letters and builds a lookup table from them.
8/5/2015 5:19 AM | Jimmy Jam
Gravatar

# re: Little Puzzlers–List all anagrams for a word

Jimmy Jam: You could initialize the dictionary with a multiplier for it's capacity to trade memory for speed. I went down from 905ms to 777ms with a multiplier of 4, resulting in a heap of 43MB compared to the original 34MB.
8/18/2015 7:57 AM | Mike van Leeuwen
Gravatar

# re: Little Puzzlers–List all anagrams for a word

Not the full solution but this is how I created the lookup list:

var wordsLookup = File.ReadAllLines(@"words2.txt")
.ToLookup(w => string.Concat(w.ToLower().OrderBy(c => c)), w => w);

then to find a word convert it to a key:
var key = string.Concat(wordToFind.ToLower().OrderBy(c => c));
9/3/2015 7:39 AM | Darren
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

Powered by: