Skip to content

Hacker School, Weeks 4, 5 and 6

2013 July 14
by Richard Harrington

tl;dr It’s been a bit of a whirlwind these past few weeks, but I’m back on track with my Robotwar project.

Before the end of the 4th week of Hacker School, I went to Ottawa, Ontario for five days to perform in a comedy I wrote with my friend Chris Kauffman, at a theater festival there (see harringtonkauffman.com). It was a lot of fun and I was able to reconnect with some old friends from the Canadian fringe theater circuit, a part of my life that seems very far away this summer. But I had to spend more time preparing for it than I expected, since I hadn’t done the show in two years.

I returned on Monday, July 1, and the 4th of July was the following Thursday, so my 5th week of Hacker School ended up being only two days long.

Because of all this, I got a little off my stride in terms of the big project I was theoretically going to work on over the course of the Hacker School batch: the reverse engineering of the Robotwar game. I started to feel a bit adrift again, as I had in the beginning of June. I wasn’t sure I was appropriately equipped to thrive in the unstructured environment of Hacker School, and I was nervous that I wouldn’t accomplish a tenth of what I’d hoped to accomplish. By the end of the 5th week, I had only barely gotten started on Robotwar.

But late in the 6th week (which just ended), I got my second wind with the help of a long pairing session with one of the facilitators, Zach Allaun, who helped me tease apart the various steps in the compilation process for the Robotwar language, making each one more sensible.

The end result of the compilation step (which itself is only the first of many things I have to do to make a functioning game) will be to turn this:

SCAN
     AIM+5 TO AIM
     AIM TO RADAR
LOOP
     IF RADAR < 0 GOSUB FIRE
     GOTO SCAN
FIRE
     0-RADAR TO SHOT
     ENDSUB

into this:

SCAN
0      ,      AIM
1      +      5
2      TO     AIM
3      ,      AIM
4      TO     RADAR
LOOP
5      IF     RADAR
6      <      0
7      GOSUB  FIRE
8      GOTO   SCAN
FIRE
9      ,      0
10     -      RADAR
11     TO     SHOT
12     ENDSUB

And that begins with lexing. For those of you (like myself before this summer) who have never heard of the term "lex," it means "lexical analysis." It's the first step in the process of translating human-written source code into machine-readable code, and it involves separating the strings of source code into words and other meaningful chunks. It's also the most notoriously difficult and tedious step, because a lot of syntactic forms look obvious to us but are baffling to a computer.

This Robotwar language is actually pretty simple in that respect -- the only tricky part is determining whether "-" characters are binary subtraction operator or unary minus sign operators.

So here's the version of the lexer I had before I worked with Zach. It separates strings into tokens and also adds an index property to each token, which will be useful information later when we add error reporting to the program. Note the nasty minus sign logic, which checks to see whether the previous token is a numeric value (either a number literal or a register name).

Also, note the use of the core.match module to do pattern matching. Since it's not binding any variables in the matches, one could argue core.match is overkill here, but I feel it does prevent me from having to use a series of suspiciously similar "if" expressions -- that is, if we're in the middle of parsing a token do one thing, otherwise do something else -- for each of the cases.

(use '[clojure.core.match :only (match)])
 
(def registers (set (concat (map #(-> % char str) (range (int \A) (inc (int \Z))))
                            ["AIM" "SHOT" "RADAR" "DAMAGE" "SPEEDX" "SPEEDY" "RANDOM" "INDEX"])))
 
(def operators #{\= \< \> \# \+ \- \* \/})
 
(defn digit?
  [ch]
  (re-find #"\d" (str ch)))
 
(defn conj-with-metadata 
  [coll s n] 
    (conj coll {:token s, :pos n}))
 
(defn lex-line
  [initial-line]
  (loop
    [line initial-line
     partial-token []
     saved-pos 0
     result []]
    (let [close-partial-token (fn [] (conj-with-metadata result (apply str partial-token) saved-pos))
          current-pos (- (count initial-line) (count line))
          previous-token (:token (last result) "")
          parsing-token? (not (empty? partial-token))
          ; a binary operator is any character on the operator list,
          ; unless it's a minus sign that is actually a unary operator
          ; because the token before it could not be a number
          binary-op? #(and (operators %)
                           (or (not= % \-)
                               (digit? (last previous-token))
                               (registers previous-token)))
          head (first line)
          tail (rest line)]
      (match [head parsing-token?]
        [(:or \; nil) true ]          (close-partial-token)
        [(:or \; nil) false]          result
        [(:or \space \t) true ]       (recur tail [] nil (close-partial-token))
        [(:or \space \t) false]       (recur tail [] nil result)
        ; if it's an operator of any kind and we're currently parsing a token, 
        ; close the partial token and recur on the same character.
        [(_ :guard operators) true]   (recur line [] nil (close-partial-token))
        ; we don't have to check here if the binary operator is false because
        ; if it weren't, it would have been caught by the less restrictive line above.
        [(_ :guard binary-op?) _]     (recur tail 
                                             [] 
                                             nil 
                                             (conj-with-metadata result (str head) current-pos))
        [_ true ]                     (recur tail (conj partial-token head) saved-pos result)
        [_ false]                     (recur tail (conj partial-token head) current-pos result)))))
 
(lex-line "IF RADAR < -5 GOTO SHOOT   ; some comments")
; -> [{:token "IF", :pos 0} 
;     {:token "RADAR", :pos 3} 
;     {:token "<", :pos 9} 
;     {:token "-5", :pos 11} 
;     {:token "GOTO", :pos 14} 
;     {:token "SHOOT", :pos 19}]

I was thinking of having one parsing step after the lexing step in the above code, and then a compiling step down to the Robotwar virtual machine code. But after this week I realized that I should have as many steps as I need to make it easy, and I also realized that there's no reason for this first step to have any logic in it at all. I ended up separating a couple of things out into their own functions: stripping out the comments (which happens before the lexing step now) and dealing with the ambiguity of the "-" character (which happens after the lexing step). The minus sign issue turns out to be pretty easy to do in the next step, the parsing step, once we've isolated all the operators and words in the lexing step.

Last but not least, I jettisoned core.match for the time being, since I realized that by getting rid of that minus sign nonsense, I was down to a level of simplicity where I could use regular expressions (it's possible I could have used them before, but I suspect not, and I wasn't quite enough of a regexp master to attempt that one). Here's the new code for the lexer, which is drastically shorter while only sacrificing a tiny bit of functionality that is better off elsewhere in the program anyway:

(def operators "-+*/=#<>")
 
(defn re-seq-with-pos
  "like re-seq, but returns a sequence of 2-element sequences
  in the form [match pos], where pos is the index of match in s"
  [re s]
  (let [matches (re-seq re s)
        indexes (reduce (fn [idxs match]
                          (conj idxs (.indexOf s match (inc (or (last idxs) -1)))))
                        []
                        matches)]
    (map list matches indexes)))
 
(defn build-lex-metadata 
  [s n] 
  {:token-str s, :pos n})
 
(def lex-re (re-pattern (str "[" operators "]|[^" operators "\\s]+")))
 
(defn lex-line
  [line]
  (map #(apply build-lex-metadata %) 
       (re-seq-with-pos lex-re line)))
 
(lex-line "IF RADAR < -5 GOTO SHOOT")
; -> ({:token-str "IF", :pos 0} 
;     {:token-str "RADAR", :pos 3} 
;     {:token-str "<", :pos 9} 
;     {:token-str "-", :pos 11} 
;     {:token-str "5", :pos 12} 
;     {:token-str "GOTO", :pos 14} 
;     {:token-str "SHOOT", :pos 19})

All that remains now is parsing, compiling to the virtual machine object code, writing an interpreter for it, and making a graphics engine to show robots fighting!

Now that I feel a little more grounded and solidly positioned to make some headway this coming week, I look over the last few weeks and I can see that I actually learned a great deal helping others on their projects, learning Clojure (particularly by doing problems on 4clojure.com), implementing a "trie" data structure in JavaScript, having a thousand fascinating conversations about programming, and slowly coalescing ideas for another semi-major project I might work on if I have the time: abstracting out the generic parts of my tic-tac-toe-playing AI, so that you could pass in a game description and have it play any game -- tic-tac-toe, connect four, checkers, chess, whatever. (More on that in a later blog post.)

Last but not least, the above code can be found on github in the following locations:

the main repository
the version that was current when I wrote this blog post
the version from the first code block above

Leave a Reply

Note: You may use basic HTML in your comments. Your email address will not be published.

Subscribe to this comment feed via RSS