Skip to content

The “trie” data structure

2013 July 15
by Richard Harrington

Every Monday night at Hacker School we have a lecture of some kind, usually by one of the residents. Last Monday night it was a super-interesting one by our resident for the last two weeks, Peter Norvig, about natural language processing. Turns out with a few simple lines of Python and a large enough corpus of words (like a book), you can do an amazing amount of things, such as correct bad spelling mistakes and determinewhichwordsshouldbeseparatedwhenthey’rewrittenwithoutspaces.

On a mostly unrelated note, after I got home from the lecture I took a look at some code that two of my fellow Hacker Schoolers Doron Roberts-Kedes and David Lichtenberg had put together that day: an implementation of the “trie” data structure in JavaScript. It’s a variation on a basic key-value store, where each key is represented by a path in a tree. Each node on the tree has pointers to zero or more other nodes, and each traversal of the tree consists of an ordered series of these pointers. Usually the pointers are letters, and thus each traversal forms a string.

“Trie” is short for “retrieval,” and the person who first came up with it pronounced it like “tree,” but people since then have pronounced it like “try” in order to distinguish it from other kinds of trees. For more fun facts and a nice diagram, see the Wikipedia article.

Each node can optionally hold a value as well, but I’m told by fellow Hacker Schooler Sunil Abraham that one of the most obvious uses of this kind of data structure is for auto-completion. If someone has already typed in “at”, and the only other words in your corpus that they might be searching for that begin with those two letters are “ate” and “atelier”, you can start your search very efficiently for words to put into a drop-down menu, starting at the “at” node in the trie.

I got kind of obsessive about it and stayed up far too late, but it was fun being able to do a code review with Doron and David the next morning, learning about data structures from them and teaching them about JavaScript idioms. That’s the best of Hacker School.

The feature set of my final version of the code is kind of a hybrid of what I thought a trie was supposed to be for (an efficient key-value store), what Doron and Michael originally focused on (a way to count the frequencies of words), and what Sunil told me it’s mainly used for (auto-completion). Here it is:

(function(exports) {
 
    var splitWord = function(word, emptyCb, nonEmptyCb) {
        if (word.length === 0) {
            return emptyCb();
        }
        else {
            return nonEmptyCb(word[0], word.substring(1));
        }
    }
 
    function Trie(){
        // will be set with properties "val" and/or "children"
    }
 
    Trie.prototype = {
 
        walk: function(cb) {
            function recur(node, prefix) {
                cb(prefix, node.val);
                for (var letterKey in node.children) {
                    recur(node.children[letterKey], prefix + letterKey);
                }
            }
            recur(this, "");
        },
 
        getNode: function(word) {
            var self = this;
            return splitWord(word, function() {
                return self;
            }, function(firstLetter, restOfWord) {
                var children = self.children;
                return children && children[firstLetter] && children[firstLetter].getNode(restOfWord);
            });
        },
 
        setNode: function(word, cb) {
            var self = this;
            splitWord(word, function() {
                if (cb) {
                    cb(self);
                }
            }, function(firstLetter, restOfWord) {
                self.children = self.children || {};
                self.children[firstLetter] = self.children[firstLetter] || new Trie();
                self.children[firstLetter].setNode(restOfWord, cb);
            });
        },
 
        setNodes: function(words, cb) {
            var self = this;
            words.forEach(function(word) {
                self.setNode(word, cb);
            });
        },
 
        getValue: function(word) {
            var node = this.getNode(word);
            return node && node.val;
        },
 
        setValue: function(word, val) {
            this.setNode(word, function(node) {
                node.val = val;
            });
        },
 
        setValues: function(obj) {
            for (var key in obj) {
                this.setValue(key, obj[key]);
            }
        },
 
        getKeyValuePairsFromPrefix: function(prefix) {
            var subTrie = this.getNode(prefix);
            var results = {};
            subTrie.walk(function(word, val) {
                if (val !== undefined) {
                    results[prefix + word] = val;
                }
            });
            return results;
        },
 
        getSuffixesFromPrefix: function(prefix) {
            var subTrie = this.getNode(prefix);
            var results = [];
            subTrie.walk(function(suffix, val) {
                if (val !== undefined) {
                    results.push(suffix);
                }
            });
            return results;
        },
 
        getKeysFromPrefix: function(prefix) {
            return this.getSuffixesFromPrefix(prefix).map(function(suffix) {
                return prefix + suffix;
            });
        },
 
        inspect: function(){
            return this.getKeyValuePairsFromPrefix("");
        },
 
        inspectKeys: function() {
            return this.getKeysFromPrefix("");
        }
    };
 
    // Now for WordCountStore
 
    var addOrIncrementVal = function(node) {
        node.val = (node.val? node.val : 0) + 1;
    };
 
    var WordCountStore = function(){
        this.trie = new Trie();
    }
 
    WordCountStore.prototype = {
 
        insertWord: function(word) {
            this.trie.setNode(word, addOrIncrementVal);
        },
 
        insertWords: function(words) {
            this.trie.setNodes(words, addOrIncrementVal);
        },
 
        getWords: function() {
            return this.trie.inspectKeys();
        },
 
        getWordCounts: function() {
            return this.trie.inspect();
        }
    };
 
    // And for an auto-completer, the basic functionality would just be to get a list using
    // the "getKeysFromPrefix" method from Trie. Other stuff could be added,
    // but it's not currently implemented as a separate class.
 
    exports.Trie = Trie;
    exports.WordCountStore = WordCountStore;
 
}(typeof exports === 'undefined' ? this : exports));

(This code can also be found on github.)

Leave a Reply

Note: You may use basic HTML in your comments. Your email address will not be published.

Subscribe to this comment feed via RSS