Monday, February 24, 2014

Using regex to parse siteswap strings

A siteswap is a string that defines a juggling pattern. Explanation of siteswap is out of scope for this post, I'm simply providing additional documentation on how the gunswap Siteswap object parses its input. I think the Juggling Lab documentation has a great page explaining siteswap notation: http://jugglinglab.sourceforge.net/html/ssnotation.html. However, I am not going to consider any of the special extensions they use (at least for now).

Breaking a siteswap down by beats

Every siteswap can be broken down as instructions for a series of beats. Some siteswaps are very simple to break down: 531 is a 5 toss followed by a 3 followed by a 1. However, consider the siteswap (4,4)(4x,4x) which is a relatively long string, but only contains instructions for 2 beats. Each hand throws a 4, then each hand throws a crossing 4. Things can get even more complicated if you consider multiple jugglers, like the siteswap <[33]|(4x,4x)> which only contains instructions for 1 beat: juggler 0 is doing a multiplexed 3 ball cascade and juggler 2 is doing a synchronous 4-ball crossing fountain.

We need to develop a regex that can split any siteswap string into its component beats. So let's start simple.

Vanilla siteswap

TOSS = \d|[a-o]

First consider some simple vanilla siteswaps like 3, 55500, and b97531 (ok maybe this one isn't simple...). All of these siteswaps match the regular expression meaning they can be a digit (\d) or (|) any character "a" through "o" ([a-o]). Note that "a" through "o" really just represent the numbers 10 through 24. We stop at "o" because later on we'll see that "p" is going to be used for something other than representing a toss height.

Multiplexes

MULTIPLEX = \[(\d|[a-g])+\]

Multiplex notation allows us to throw multiple props with the same hand on the same beat. This is handled using brackets. So the siteswap 33[33] is a 4 ball version of the 3 ball cascade where the third throw of the pattern is done with 2 balls following the same path. Let's break down this regex:
  • \[ 
    • starts with a bracket (the bracket needs to be escaped)
  • (\d|[a-g])+ 
    • contains at least 1 of our valid toss expressions
  • \] 
    • ends with a bracket
Synchronous

SYNC = \((TOSS|MULTIPLEX),(TOSS|MULTIPLEX)\)

Now things start to get a little complicated. Consider a synchronous siteswap (4x,4x). This is a synchronous 4-ball fountain where every throw crosses to the other hand. In synchronous siteswap the left side of the comma is the left hand's instructions and the right side is the right hand's instructions. The instructions can be a vanilla toss or a multiplex. Thus, we check for an open and close parentheses with a comma separated vanilla toss or multiplex pattern in between.

You may have noticed that we aren't handling the "x" in (4x,4x). We need to update our TOSS regex to handle that. It is now going to be (\d|[a-g])x?. Note that the question mark means we're looking for 1 or 0 "x" characters following our valid toss pattern. Also note that the "x" character isn't necessarily limited to synchronous siteswap, though its use in vanilla siteswaps is somewhat limited (mostly for transition throws into synchronous).

A valid beat

BEAT = (TOSS|MULTIPLEX|SYNC)

We now have enough regexes to identify valid beats within a pattern, and that's a relatively simple pattern itself. It's simply checking for a vanilla toss, a multiplex, or a sync toss.

This allows us to break even complicated siteswaps into their component beats. Consider the siteswap 5(4,4)[33]. This would be split into: toss a 5, then toss a (4,4), then toss a [33]. While technically not a valid siteswap (we'll see that level of validation later on) it still has a valid siteswap format.

Passing

PASS = <BEAT(\|BEAT)+>

Consider the passing siteswap <3p|3p><3|3> which has two jugglers both running a 3 ball cascade and passing on every other beat. In passing patterns each beat is enclosed between "<" and ">" characters, with each juggler's instructions separated by a "|" character. Each juggler's instructions must be a valid beat, thus we can construct a pass regex solely from the BEAT regex.

Now we need to make another extension to our original vanilla siteswap regex, but only for passing patterns. If we have a 2 juggler passing pattern then we need to look for a "p" appended to any vanilla toss:

(\d|[a-g])x?p?

If there are three or more jugglers then we need a way of determining which juggler the pass is targeted at. This will be denoted by the number immediately following the "p":

(\d|[a-g])x?(p[1-numJugglers])?

Putting it all together

^(PASS)+|(BEAT)+$

The final regex checks to see if you have an entire string composed of any number of passing beats or non-passing beats. That's it! Recall that this does not guarantee the validity of a siteswap, it just ensures the string is a valid format. Later on we will use the concept of state diagrams to determine if a siteswap is valid.

No comments:

Post a Comment