Nearly Free Pattern Matching
After introducing a recursive dispatch system to ZynAddSubFX via librtosc one question that kept bugging me was ``what is the optimal means of performing the pattern recognition?''. While getting to one of the approximately six million possible addresses was reasonably quick I couldn’t help but ponder what the best method was without resorting to any manual code generation for each set of patterns. The first idea was to use the values of the first four characters to split the possible choices into a few classes, which did yield some performance gains, but as there are a number of longer prefixes in the source, this technique’s overhead often made it’s gains moot. Next I experimented by splitting the space of choices via a series of boolean comparisons at differing indices and splitting values. This mostly work, and it did produce some benefits in the matching, however the time to compute a set of parameters optimal only in the greedy sense, it took a sizable amount of time on the order of a second, which isn’t really tolerable at the startup of a program in my opinion. The number could be cached, but that would be quite hacky in the end or just overbuilt for what amounts to be a fun optimization.
After talking with some peers on the problem, the term `perfect minimal hashing' came out. Perfect hashing is a fairly intuitive idea where each hash uniquely defines an element when you know all possible elements in advance. Getting a hash that satisfies this property for the very small sets that each dispatch table has is something that could be accomplished with fairly naive algorithms, but getting this hash function to be minimal for a set of possible keys is something that at least to me seems to be considerably less intuitive.
Luckily enough there’s a tool for just this problem and if you’re running a gnu based unix system you might even already have it installed. 'gperf' generates approximately-minimal perfect hashing C code. For a given input it will generate C source which include a few tables for lookup and some (mostly) optional expansions of these tables into inlined expressions. 'gperf' generates hash functions of the format:
hash(input): length(input) + assoc[offset[1] + input[index[1]]] + assoc[offset[2] + input[index[2]]] + assoc[offset[3] + input[index[3]]] ...
By generating an optimal set of offsets, indicies, and an association table it is possible to uniquely assign each element a hash. While the implementation within librtosc is somewhat less optimal than the implementation within gperf, the same basic ideas hold true.
First, if a set of unique patterns are going to be matched, there exists a set of indices that forms a tuple which is unique to each element. In the worst case the tuple is composed of all characters, but it is frequently much smaller. Secondly, it must be possible to translate each tuple to a unique multiset by adding some offset to each tuple element. If the second statement appears non-obvious, consider adding 128 multiplied by position, which without loss encodes the position of each tuple element (for 7 bit ascii). In practice we shall see that this can be done with significantly smaller values. Lastly, if each multiset is unique, then there must be some way to use a table based lookup to sum the multiset elements to define a unique hash.
Below is one example of the output of gperf:
/* C code produced by gperf version 3.0.4 */
/* Command-line: gperf -7 addr-example */
/* Computed positions: -k'2,4' */
#if !((' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \
&& ('%' == 37) && ('&' == 38) && ('\'' == 39) && ('(' == 40) \
&& (')' == 41) && ('*' == 42) && ('+' == 43) && (',' == 44) \
&& ('-' == 45) && ('.' == 46) && ('/' == 47) && ('0' == 48) \
&& ('1' == 49) && ('2' == 50) && ('3' == 51) && ('4' == 52) \
&& ('5' == 53) && ('6' == 54) && ('7' == 55) && ('8' == 56) \
&& ('9' == 57) && (':' == 58) && (';' == 59) && ('<' == 60) \
&& ('=' == 61) && ('>' == 62) && ('?' == 63) && ('A' == 65) \
&& ('B' == 66) && ('C' == 67) && ('D' == 68) && ('E' == 69) \
&& ('F' == 70) && ('G' == 71) && ('H' == 72) && ('I' == 73) \
&& ('J' == 74) && ('K' == 75) && ('L' == 76) && ('M' == 77) \
&& ('N' == 78) && ('O' == 79) && ('P' == 80) && ('Q' == 81) \
&& ('R' == 82) && ('S' == 83) && ('T' == 84) && ('U' == 85) \
&& ('V' == 86) && ('W' == 87) && ('X' == 88) && ('Y' == 89) \
&& ('Z' == 90) && ('[' == 91) && ('\\' == 92) && (']' == 93) \
&& ('^' == 94) && ('_' == 95) && ('a' == 97) && ('b' == 98) \
&& ('c' == 99) && ('d' == 100) && ('e' == 101) && ('f' == 102) \
&& ('g' == 103) && ('h' == 104) && ('i' == 105) && ('j' == 106) \
&& ('k' == 107) && ('l' == 108) && ('m' == 109) && ('n' == 110) \
&& ('o' == 111) && ('p' == 112) && ('q' == 113) && ('r' == 114) \
&& ('s' == 115) && ('t' == 116) && ('u' == 117) && ('v' == 118) \
&& ('w' == 119) && ('x' == 120) && ('y' == 121) && ('z' == 122) \
&& ('{' == 123) && ('|' == 124) && ('}' == 125) && ('~' == 126))
/* The character set is not based on ISO-646. */
error "gperf generated tables don't work with this execution character set..."
#endif
#define TOTAL_KEYWORDS 55
#define MIN_WORD_LENGTH 4
#define MAX_WORD_LENGTH 25
#define MIN_HASH_VALUE 7
#define MAX_HASH_VALUE 75
/* maximum key range = 69, duplicates = 0 */
#ifdef __GNUC__
__inline
#else
#ifdef __cplusplus
inline
#endif
#endif
static unsigned int
hash (str, len)
register const char *str;
register unsigned int len;
{
static unsigned char asso_values[] =
{
76, 76, 76, 76, 76, 76, 76, 76, 76, 76,
76, 76, 76, 76, 76, 76, 76, 76, 76, 76,
76, 76, 76, 76, 76, 76, 76, 76, 76, 76,
76, 76, 76, 76, 76, 76, 76, 76, 76, 76,
76, 76, 76, 76, 76, 30, 76, 76, 76, 76,
76, 76, 76, 76, 76, 76, 76, 76, 76, 76,
76, 76, 76, 76, 76, 5, 76, 20, 5, 36,
0, 76, 76, 76, 76, 76, 5, 10, 76, 76,
20, 76, 76, 76, 76, 76, 0, 76, 76, 76,
76, 76, 76, 76, 76, 76, 76, 25, 0, 5,
76, 30, 0, 76, 76, 50, 76, 76, 20, 5,
0, 35, 25, 30, 0, 0, 0, 20, 76, 76,
60, 20, 76, 76, 76, 76, 76, 76
};
return len + asso_values[(unsigned char)str[3]] + asso_values[(unsigned char)str[1]];
}
#ifdef __GNUC__
__inline
#if defined __GNUC_STDC_INLINE__ || defined __GNUC_GNU_INLINE__
__attribute__ ((__gnu_inline__))
#endif
#endif
const char *
in_word_set (str, len)
register const char *str;
register unsigned int len;
{
static const char * wordlist[] =
{
"", "", "", "", "", "", "",
"Enabled",
"PFMVoice",
"PFMVolume",
"Presonance",
"Unison_size",
"PDetune",
"PFMVolumeDamp",
"PFMDetune",
"Unison_vibratto",
"PDetuneType",
"AmpLfo/",
"PFMDetuneType",
"Unison_invert_phase",
"Unison_stereo_spread",
"Unison_vibratto_speed",
"PFMFreqEnvelopeEnabled",
"Unison_frequency_spread",
"PFMVelocityScaleFunction",
"FMFreqEnvelope/",
"PFMAmpEnvelopeEnabled",
"PVolume",
"PPanning",
"FMAmpEnvelope/",
"",
"PDelay",
"PVolumeminus",
"Pfilterbypass",
"PFilterEnabled",
"PFMCoarseDetune",
"octave",
"PFilterLfoEnabled",
"FreqLfo/",
"Pextoscil",
"",
"PextFMoscil",
"PFilterEnvelopeEnabled",
"FreqEnvelope/",
"PAmpLfoEnabled",
"PFreqLfoEnabled",
"PFMEnabled",
"coarsedetune",
"PFMoscilphase",
"PAmpEnvelopeEnabled",
"PFreqEnvelopeEnabled",
"Poscilphase",
"VoiceFilter/",
"AmpEnvelope/",
"Type",
"PAmpVelocityScaleFunction",
"oscil/",
"",
"PCoarseDetune",
"",
"FilterLfo/",
"detunevalue",
"", "", "",
"FilterEnvelope/",
"", "", "", "",
"Pfixedfreq",
"",
"PfixedfreqET",
"", "",
"mod-oscil/"
};
if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
{
register int key = hash (str, len);
if (key <= MAX_HASH_VALUE && key >= 0)
{
register const char *s = wordlist[key];
if (*str == *s && !strcmp (str + 1, s + 1))
return s;
}
}
return 0;
}
Looking beyond the boilerplate, the assoc_values is the lookup table needed in the final hash function and the offsets and indicies have been inlined. In this case only two characters were needed to uniquely identify one of any of the words in the ``wordlist''. To understand the exact details, the process is almost entirely contained within gperf’s search.cc source, though it is somewhat hard to read as specifying the mathematical constraints in C/C++ isn’t elegant by any means. My reimplementation certainly does not find the solution quite as quickly, but it does appear to work. If you’re interested in implementing it and have gotten lost with gperf, then librtosc or the scrap notes used in the implementation might be useful.
Now let’s consider this code run on the performance testcase in rtosc. First a modification of gperf’s hashing function was done which is to remove the offset code, which wasn’t needed on the tested keyword sets, but it might be added back in later. The test case consisted of 54 possible choices with an average of around 14 characters for each possible pattern. The overall cost of each dispatch call on my development machine per dispatch changed from 1127 ns to 150 ns after the introduction of the perfect hashing. On my secondary machine this change was 437ns to 37.5ns. At this point the pattern matching itself contributes such a small portion of the time for dispatch, that optimizing it further doesn’t really improve things. Not completely free, but being able to dispatch somewhere between 6.7 million and 27 million messages per second isn’t doing too bad.