Ragel State Charts

The Ragel State Machine Compiler is one kick-ass piece of software by Adrian Thurston that I’ve been using for about a year now. What Ragel does is use a mixture of C/C++/D/Java code and a state machine specification to produce a functioning state machine. Using Ragel you can process text for network protocols, program logic, programming languages, parsers, etc.

NOTE: This essay is old and just here for historical purposes.

Specifying Servers With Ragel

What I’m going to cover in this essay is some of the wicked weird ways I’ve been using Ragel to create rock solid network servers. I’m going slightly outside the bounds of Adrian’s original intent, but he designed it to be flexible enough that Ragel just handles the new work easily. I’d say using Ragel has significantly cut down on the amount of work I do to design new protocols, and I’m dying to revisit old servers I’ve written.

The best thing you’ll get from this essay is how state machines can make your networking code very bullet proof. Specifying your operations in a state machine has always been a great way to validate correct operation, but in the past it’s been a major pain in the ass. Using Ragel and a few little tricks I’ve been able to easily specify some very stable software with minimal development effort.

Why Ragel Is Cool

This isn’t anything new as there’s been many software packages that do this, but what makes Ragel different is you can:

  1. inject actions at any point in the machine’s transitions
  2. use code to alter the machine’s state on the fly
  3. flexibly mix between regex specification and state chart
    specifications
  4. produce a variety of machine styles like table, flat, or goto
    machines
  5. get varying degrees of control over the state minimization
  6. easily mesh it with whatever IO or character access code you have
  7. and it’s fast as hell

All of this sounds super fancy, but it’s really based on solid research done on state machines over the years. Adrian simply took all the best research and built a sweet little domain language for state machines. My favorite feature is how he cleanly merged the two styles of specification: regex and state chart. Regex style machines are common in text parsing, but state charts are better for specifying program logic. Having the power to use both makes for some damn good results.

What many people don’t realize when they first look at Ragel is that its flexibility at specifying regular state machines means that it breaks a cardinal rule of lexical analysis: A regex cannot parse nested structures.

I won’t get into why, but try it yourself sometime on an XML document. Use some fancy regex to pull out certain tags in the document then mix up the structure. You’ll find yourself able to do it, but every little change requires a tweak. Eventually you get this monstrosity that can’t be maintained. It’s just better to realize a regex is bad at this and mix regex analysis with parsing.

Ragel however gives you an insane amount of flexibility at combining various complex machines so that you can get damn close to processing hierarchical structures. It’s not complete enough to fully replace a LALR grammar generator, but it’s advanced enough to handle almost all the nasty hand coded lexers out there. Ragel produces efficient machines from fairly complex specifications and outputs fast as hell code of varying styles.

Ragel In Mongrel

Ragel is the software behind Mongrel’s speed and correct HTTP processing. With very little effort I was able to use Ragel to create a full HTTP processor that can process the protocol at insane speeds. From a developer point of view Ragel let me crank out a fully functioning web server in about a week. In fact, the number of changes to the parser in Mongrel only accounts for 4.2% of the total revisions.

Specifying the protocol with a clear state machine also made Mongrel more secure than other servers. Simply being more explicit about what is valid HTTP means that most of the security attacks that worked on Apache were rejected outright when tried on Mongrel. In fact, we’d put Mongrel behind Apache and watch Mongrel reject about 80% of the attacks while Apache let them right on through.

I thought, if using Ragel on the protocol parsing made it more secure by default, then what would happen if I used it on other parts of the server? What if I created a similar state machine for the socket connection state or the server’s internal logic? Would that make things more secure as well? How hard is it to specify things this way?

This document is my first cut at answering these questions so other people can try them out and see if it works for them.

Enter Utu

Very quickly, Utu is my latest freaky experiment in writing better client/server software. It’s the culmination of years of experience implementing various protocols and inventing my own. What I’m trying to do is simultaneously try out a weird ass idea and use that idea as a catalyst for doing an alternative protocol design and implementation.

You don’t need to know too much about Utu really, but you need to understand a small bit to see why using a state machine is a great way to approach the problem.

Utu is a messaging server that has built-in cryptography and sender pays enforcement. Imagine an SMTP server that was always encrypted, could identify all senders and receivers by public keys, and could force senders to “pay” to send based on the opinion of receivers. In Utu this is called “hate calculations”. However, since Utu is just a generic messaging server with a simple hub/spoke design, it can do SMTP, HTTP, or IM style communications.

The goal for Utu is to fight the griefers of the Internet with hate. As you communicate with people over Utu you can tell the Hub you hate them. The amount you hate them turns into a throttling hate calculation they have to do before they can continue talking to anyone.

Let’s take an IRC channel as the best example of where Utu will help. Typically IRC channels are either chaotic free-for-alls or Nazi controlled kingdoms. It’s also common that the operators of a heavily controlled IRC channel don’t follow their own rules.

In Utu, a channel wouldn’t have operators. Instead, when a griefer showed up to cause mayhem, all the participants would be able to “pay the hate tax”. Once members pay to hate the griefer the Hub then starts throttling that person at the average paid level. The more people hate them the more they cook their CPU.

A fictional Utu conversation in this case might look like this:

1:00 < griefer > Ruby on Rails sucks!
1:01 * joe452 hates griefer at level 24
1:01 * frankyboy hates griefer at level 26
1:01 * argc12 hates griefer at level 32
1:10 * griefer's hate is now 27.3
1:11 < joe452 > that should shut him up
1:20 < griefer > Damn guys, it takes me 10 minutes to send a message.  You suck!
1:21 * griefer leaves #rubyonrails

The idea is that joe452, frankyboy, and argc12 all payed to hate griefer by performing the same calculation he’d be penalized with. This payment is a one time thing, and after they do that the Hub turns around and throttles griefer permanently at 28 (rounded up) whenever he talks to that room.

State Machines In Utu

Utu already uses Ragel to parse its simple wire protocol (another document on that coming soon). The problem was if the hate calculations are to work they have to be maintained across socket connections. If griefer above is able to say something, disconnect, and then reconnect to avoid doing the hate calculation then it won’t work.

What’s needed is to maintain the state of griefer’s connection between socket reconnects. If he was forced to pay some hate, and he disconnects, then when he comes back his socket is immediately put into the “Hated” state.

I originally coded this by hand and it was nasty. On a hunch I redid the same logic using a Ragel state chart and simplified it tremendously. The new state machine is just starting out but already I’ve got the following advantages:

* I can design and test the logic of a Hub Connection without actually having any network running.
* The new machine is explicitly specified so people trying to get around it can be booted.
* It’s a hell of a lot easier to maintain and understand.
* I can actually use this as the specification for an Utu server. A single state machine diagram and a Ragel specification works much better than pages and pages of RFC documentation.

The rest of this document describes how I did it so that you can try it in your own servers.

State Machines

When I show people this stuff they look at me like I’m crazy. Here’s a very recent chat with a bunch of high speed Ruby coders to demonstrate:

  21:14 < zedas> http://pastie.caboo.se/29404  check this out
  21:14 < zedas> that's the state machine for controlling an utu connection, with the hate 
                 calculating process included
  21:14 < zedas> then there's a unit test that basically just feeds various combos of events 
                 (UEv_*) to test that the logic all works
  21:15 < zedas> and then i can validate it with the graph ragel produces at the top
  21:15 < zedas> and what's really cool is this syntax is very close to CSP syntax
  21:16 < technoweenie> zooom
  21:16 < technoweenie> right over my head :)

Dammit this stuff is not hard, it’s just that nobody learns about state machines unless they take an obscure compiler design class. Very few universities teach state machines as a method of specifying the logic for a program. Even fewer cover Hierachical State Machines or State Charts.

To rectify this I’m gonna give you a crash course in state machines that will hopefully help you understand enough to get what’s going on. Skip this if you know about FSM already.

State Machines Are Classes (kinda)

I think the best way to understand state machines is to see how they might map to an Object Oriented language like Ruby. Fundamentally all state machines deal with State, Events, Transitions, and Actions. These can be mapped to OO code like this:

  • State – instance variables or data.
  • Events – public methods or messages other people can send.
  • Actions – private/protected methods or internal logic when something
    happens.
  • Transitions – changes to instance variables to move to a new state.

Take a look at this simple Ruby class below as an example:

001  class HelloMachine
002    def initialize
003       @message = "Hello World"
004     end
005 
006     def say_hello
007       if @message then write_message else write_error end
008     end
009 
010     def change_message(msg)
011       @message = msg
012       if !@message then write_error end
013     end
014 
015     private
016     def write_message; puts @message; end
017 
018     def write_error; puts "No hello available."; end
019   end

Each part of this class loosely matches what is in a state machine:

  1. message is the class’s current state. It can be something to print or nil.
  2. Changes to message amounts to a transition, that’s what change message does.
  3. say_hello and change_message are both events that cause something to happen.
  4. change_message changes the internal state (message) while say_hello changes the screen state by printing something.
  5. The if-statement in say_hello handles the two states of nil or not-nil.
  6. The private methods write_message and write_error are actions for the machine. They’re called on “transitions” or to perform activity when an event happens.

This is how most code these days is written in the OO world, but even this little class has a problem: it’s not explicit what message can be. It's just assumed that people won't be stupid and set it to anything other than nil or a String. Ruby's puts is pretty powerful, but if you want to really make sure that it's used properly you'd have to add extra if-statements to change\_message so thatmessage can’t be in an invalid state.

State Machines Are Explicit

What a state machine formalizes is mathematically how these four pieces (State, Events, Actions, and Transitions) fit together and then defines explicitly what is allowed. Rather than just letting `message be anything, a state machine defines a set of allowed values. That’s it. Rather than allowing any event a state machine defines a limited set of events. Transitions are also restricted to what’s defined in the state machine so that it’s clear how a machine moves from one state to another and what causes that move. Actions are quite as well formalized since they’re more of a practical addition to make state machines useful for processing.

What annoys most programmers about state machines (apart from having to learn tons of math) is that they require this explicit thinking and covering all the possibilities. When you code up some OO you just code. As you build it up you start adding more restrictions, cover the code in unit tests, and then assume it works. With a state machine you sit down and define all the possibilities, and then the state machine compiler you use makes sure it is always in this domain.

As a quick taste, I can take the above HelloMachine and rewrite it like this in Ragel:

001  Hello = (
002     start: ( 
003       say_hello @write_message -> start |
004       set_message_string @change_message -> start |
005       set_message_nil @write_error -> no_message 
006     ),
007 
008     no_message: ( 
009       set_message_string @change_message -> start |
010       set_message_nil -> no_message |
011       say_hello @write_error -> no_message
012     )
013   ) >initialize;

In this example we setup a new kind of event set_message_string and set_message_nil to indicate that you’re only allowed to set it in two ways. There’s also two basic states start and no_message. The > symbols mean transition to that state, and `write_error means run the write_error action before you transition.

Now, obviously this isn’t much smaller than your original code, and you’ve gotta understand the new syntax to get it, but take a look at the following diagram generated by Ragel from this machine (click for larger image):

HelloMachine

That’s a fairly nice concise diagram for what is going on. The terminology might be a little foreign, but right away you can see when each action is called, what causes the transitions to the different states (states are in the circles), and how loops are handled. Trying to get a similar description from a larger set of source code would be impossible without some serious tools and it wouldn’t even be close to as accurate.

Using State Machines In Protocols

When you design a server there’s several places where you have to control determinism to make sure things stay sane:

  1. lexical elements (lexemes).
  2. syntactic structure (grammar)
  3. semantic meaning and analysis (logic)

Typically you see these terms used in the design of compilers but not really in designing servers. The thing is clients and servers do much of the same work that compilers do, it just has to do it dynamically and with malicious user inputs from untrusted sources. Why not leverage the decades of theory and practice in compiler design to make servers more robust?

With this in mind we can see using a typical lexer for lexical elements and a parser generator for the grammar should be easy. If you design your protocol right you can get away with just using Ragel for both of those as well, but some protocols will need a real parser generator. I recommend “Lemon”:http://www.hwaci.com/sw/lemon/ if you need one that’s good with memory and error handling.

For semantic meaning we’re going to use the state chart style of machine specification for Ragel. This way of specifying a state machine uses labels and transitions inside nested state machines to make processing happen. You’ve already seen this style with the Ragel version of the HelloMachine we had above.

Utu Connections and Hubs

Our server will use two state machines to manage its operations. One will be for the Hub and will be very simple for now. Its job is simply dealing with new network connections, handling the start-up process, and stopping connections that are dead.

hub connection

The Hub manages a set of Connection machines that manage the state of each connected socket. The Connection receives events from the socket and other input points and, depending on what needs to be done, makes transitions around the machine. The Connection does the majority of the processing and is only alive while the socket is connected.

Utu Hub State

When the Utu server starts up it needs to create a Hub that’s listening on a socket so it can service connections. Part of this process involves loading a key either from a file or dynamically generating it. Once the cryptography is ready to go and the Hub is listening on the socket the Hub just has to keep processing clients as they open and close.

First important thing I do in “the hub.rl”:hub.rl file (after some C boilerplate and action definitions) is to list the events we’re interested in:

001    listen='L';
002    gen_key='K';
003    gen_key_done='k';
004    key_file_load='F';
005    conn_open='O';
006    conn_close='C';
007    done='D';

Normally Ragel operates on characters, but I can just as easily use these as events to the machine by defining similar enum definition in a C .h file. For now we’ve just got a bunch of characters that stand for each event that’s allowed.

After we say what events are possible, I specify the state machine, transitions, and actions to call:

001  Hub = (
002    start: (
003      gen_key -> GenKeys
004    ),
005
006    GenKeys: (
007      gen_key -> GenKeys |
008      gen_key_done -> CryptoReady |
009      key_file_load -> CryptoReady
010    ),
011
012    CryptoReady: (
013      listen -> Listening
014    ),
015
016    Listening: (
017      conn_open `open\> Listening |
018 conn*close `close -> Listening |
019      done -> final
020    )
021
022  ) >begin %finish `!error;
023
024 main := ( Hub %{ printf ("\\n"); } )\*;

This is not really enough logic to warrant a full state machine approach, but the Hub will start to pick up more complex decisions and logic as Utu progresses. Remember I already tried to do just the above in hand written form and it was disgusting. Hopefully this will be much better.

Let’s walk through this line by line so that you can start to understand how a Ragel state chart is specified and what each bit of syntax means. The Hub’s machine is simpler than the Connection’s so we’ll be able to cover the various features and understand it. First up, here’s the diagram from the above machine for you to take a quick look at (click for a larger version):

image

  • 001: This starts off the Hub machine. The closing parenthesis is on 020.
  • 002: The start: is a label. We’re basically telling Ragel where the Hub machine should start its life.
  • 003: This is the first event > state transition we’re making. We’re saying that when the genkey_ event happens, transition to the GenKeys state. In this case GenKeys: is on line 006.
  • 004: A very important point is that each of these labeled states must have a ‘,’ character separating them. Ragel won’t report an error but the machine won’t work right without it.
  • 006: Here’s our state for handling the key generation process. There’s three events that this state can accept.
  • 007: The first event we handle is another genkey_ so that we can report status during the lengthy key generation process. That’s why this transitions> GenKeys again.
  • 008: At the end of 007 there was a | (pipe, OR symbol) that separated genkey_ event transitions from this one, genkeydone.
  • 009: Just like genkeydone event the keyfileloaded event causes the machine to transition toCryptoReady_ indicating it’s done.
  • 012014: These lines are just like start: except that we’re waiting for the server’s socket to be ready for listening. Once we get the listen event we transition> Listening.
  • 016: Our Listening state that we’ll now loop continuously inside until we get a done event.
  • 017-019: Here we process the Connection open/close events, and the done event for when the hub shuts down. The new thing to notice here are the open_ and _close statements. These say to run the actions named “open” and “close” and are how you make the machine do things for the program.
  • 022: This line ends the Hub machine and then appends a few new actions. Ragel uses a funky but very cool syntax for actions that we’ll cover in a second.
  • 024: The last line simply tells Ragel that the main machine to run is Hub and a small action with the %{ .. } syntax prints out a newline when the Hub exits.

Take a look at the machine, the hub.rl file and the hub state diagram and bounce between them trying to trace how the actions move when different character events are handed to the machine. Using the above description of the syntax you should be able to use the diagram to sort out how the Hub works. If not, try picking an event and then matching it to lines in the diagram.

Actions In Ragel

Ragel’s unique feature is that you can embed an “action” anywhere in the state machine, and at the same time control what stage of the machine should run the action. Ragel uses a wide array of symbols prefixed to the action name to indicate when it should be run. Let’s take our machine above and some of the actions:

  • >begin on line 022 means run the begin action on entry to the Hub machine.
  • %finish on line 022 means run the finish action when the Hub is finally exited.
    *`!error means run the error action at any point where there’s an error.
  • The ‘' symbol also means on each transition that's not an entry or exit. So on 017 and 018 where we doopen and `close we’re basically saying to run those for that event and then transition.

The Ragel manual has a ton more of these kinds of specifiers and the syntax can get very advanced, especially once you start adding error terms and such. For now just know in a state chart machine when you want to run an action put it after the event and prefixed with an @ just like on lines 017 and 018 above.

Generating The Code

If you take a look at the hub.rl file you’ll see how this is all worked into the middle of some C code. When we want to generate this file into C code we run this command:

  $ ragel hub.rl | rlcodegen -o hub.c

The resulting hub.c file isn’t much to look at but it compiles and then you can start running it. We’ll show how it’s run and unit tested in the next section.

When you want to get a diagram like the ones I’m showing you run this command:

  $ ragel hub.rl | rlcodegen -V -p -o hub.dot
  $ dot -Tpng hub.dot -o hub.png

To make the above work you have to install the Graphviz diagram system and make sure dot is in your path.

How the ragel and rlcodegen tools work is that ragel reads your state machine specification and then outputs XML on stdout (you can save it to a file). This gets passed to rlcodegen which reads it and produces code of different styles and for different languages. For example I can have rlcodegen produce a goto style machine (they’re fast) for Java with this:

  $ ragel -J ajavathing.rl | rlcodegen -o -G2 ajavathing.java

You can also control other options so read the Ragel PDF for your version (down about mid page).

Unit Testing The Hub

A state machine like this has some interesting testing properties. First we can actually unit test the logic for the hub without connecting it to any other part of the system until we’re ready. Second we can actually fuzz the machine to see how it reacts to invalid inputs. Finally, testing the Hub involves feeding it varying streams of events, so we can try out different usage scenarios really easily.

Here’s a sample using the CUT test suite where we just test a simple run:

001  void __CUT__UtuHub_Hub_state()
002   {
003     HubState fsm;
004   
005     const char simple_events[] = {UEv_GEN_KEY, UEv_GEN_KEY_DONE,
006     UEv_KEY_FILE_LOAD, UEv_LISTEN, UEv_DONE, 0};
007   
008     hub_exec(&fsm, simple_events, 1, 1);
009   }

Notice on line 005 we setup an array of a bunch of C constants. Each of those is set to the characters in our events section above, so we’re basically feeding characters to the HubState fsm. The hubexec_ function used on 008 above looks like this:

001  int hub_exec(HubState *fsm, const char *events, int initialize, int finalize)
002   {
003     if(initialize) ASSERT(HubState_init(fsm), "failed to init HubState");
004   
005     while(*events) {
006       if(HubState_exec(fsm, *events) == -1) return 0;
007       events++;
008     }
009   
010     if(finalize) ASSERT_EQUALS(HubState_finish(fsm), 1, "failed to finish machine");
011   
012     return 1;
013   }

The hubexec_ method cuts down on repetition in the test cases (which we’ll demonstrate more with the Connection machine later). What it does is:

  • 003: Initialize the Hub with HubStateinit but only if asked.
    * 005-006: Process each event until it hits a 0 indicating the last
    event. It uses HubState
    exec to do this.
  • 010: Finishes the HubState fsm if asked to using HubState_finish.

Using this setup we can test out fairly complex usage scenarios with just arrays of events and some function calls to hubexec_ to make them happen.

Utu Connection State

The Connection machine (in the connection.rl file) is setup almost identically but is more extensive. There’s no new syntax in the file, so we’ll just show the events it handles and the machine and cover the new stuff.

001    open='O';
002    key_present='P';
003    key_reject='p';
004    key_accept='a';
005    peer_fail='T';
006    member_fail='M';
007    member_register='m';
008    close='C';
009    svc_recv='s';
010    msg_sent='e';
011    msg_queued='Q';
012    msg_recv='R';
013    hate_apply='H';
014    hate_challenge='h';
015    hate_paid='b';
016    hate_valid='V';
017    hate_invalid='v';

Nothing new here other than a larger number of events to deal with. Notice that I did reuse some of the events from the Hub machine so that the C code is easier to maintain. In an ideal situation the Ragel code wouldn’t even have these specified and instead they’d just come from C.

Here’s the Connection’s state machine specification, which is twice the size of the Hub’s specification, but doesn’t have anything new except for a single nested machine used to handle hatred:

001  Connection = (
002     start: ( open @open -> Accepting ),
003 
004     Accepting: ( key_present @key_check -> KeyCheck ),
005 
006     KeyCheck: (
007       key_reject @establish_failed -> Aborting |
008       key_accept @tune -> Tuning
009     ),
010 
011     Tuning: (
012       peer_fail @establish_failed -> Aborting |
013       member_fail @establish_failed -> Aborting |
014       member_register @established -> Idle
015     ),
016 
017     Idle: (
018       msg_recv @recv -> Delivering |
019       msg_queued @queued -> Sending |
020       svc_recv @service -> Servicing |
021       hate_apply @hate_apply -> Hated |
022       close @close -> final
023     ),
024 
025     Hated: (
026         start: ( hate_challenge @hate_challenge ->  Awaiting),
027         Awaiting: ( hate_paid @hate_paid -> Validating ),
028         Validating: ( 
029           hate_valid @hate_valid | hate_invalid @hate_invalid 
030         ) -> final
031     ) -> Idle,
032 
033     Aborting: ( close @aborted -> final ),
034 
035     Delivering: ( msg_queued @delivered -> Idle ),
036 
037     Sending: ( msg_sent @sent -> Idle ),
038 
039     Servicing: ( msg_queued @delivered -> Idle )
040 
041   ) >begin %finish;
042 
043   main := ( Connection %{ printf("\n"); } )*;

Which is diagrammed by Ragel as (click to enlarge):

image:ConnectionState.png

There’s nothing new in the Connection state, it’s just more complex. Each event triggers an action and then transitions to a new state in the Connection machine.

The main difference is we use a nested machine for the Hated label on line 025. Just like Connection is a machine with a bunch of labels in it, each label can also point at a machine with labels in it. Hated is such a nested machine and it handles the process of starting a hate calculation (012), paying it (027), validating the hate (029), and then going back to normal processing (031).

The big thing with nested machines is that you can only transition to labels in that machine or to final. This is why the whole Hated state machine has a transition to -> Idle on line 031.

Another thing you should see is I didn’t include the !error on the Connection machine like in the Hub machine previously. The reason is this makes the diagram more difficult to understand since it produces those "DEF/error" lines in "the Hub Diagram":HubState.png which are really annoying initially. What I do is add the!error action after I’ve worked out the logic more completely.

Unit Testing Connection State

Testing the Connection machine is just like the testing we did with the Hub machine:

001  void __CUT__UtuHub_Connection_state()
002  {
003    ConnectionState fsm;
004  
005    const char simple_events[] = {UEv_OPEN, 
006      UEv_KEY_PRESENT, UEv_KEY_ACCEPT,
007      UEv_MEMBER_REGISTER, UEv_CLOSE, 0};
008  
009    ASSERT(conn_exec(&fsm, simple_events, 1, 1), "simple events failed");
010  
011    const char default_setup[] = {UEv_OPEN, 
012      UEv_KEY_PRESENT, UEv_KEY_ACCEPT,
013      UEv_MEMBER_REGISTER, 0};
014  
015    const char hate_paid_valid[] = {UEv_HATE_APPLY, UEv_HATE_CHALLENGE, 
016      UEv_HATE_PAID, UEv_HATE_VALID, 
017      UEv_CLOSE, 0};
018  
019    const char hate_paid_invalid[] = {UEv_HATE_APPLY, UEv_HATE_CHALLENGE, 
020      UEv_HATE_PAID, UEv_HATE_VALID, 
021      UEv_CLOSE, 0};
022  
023    ASSERT(conn_exec(&fsm, default_setup, 1, 0), "default setup failed");
024    ASSERT(conn_exec(&fsm, hate_paid_valid, 0, 1), "hate paid valid failed");
025    ASSERT(conn_exec(&fsm, default_setup, 1, 0), "default setup failed");
026    ASSERT(conn_exec(&fsm, hate_paid_invalid, 0, 1), "hate paid invalid failed");
027  }

The connexec_ function is almost identical to the hubexec_ function and it’s just used to reduce repetition in the test code. In this test we’re trying out different interactions like a simple test then a few tests based on hate.

Otherwise there’s nothing more to the whole thing than just running different lists of events through the machines and making sure they are still operating.

When you start to compare the specification with the diagram you should be able to tell where the Hated machine starts working, how it’s transitioned into, and follow the path of other events. Ragel uses numbers for the states, but with a machine this small it’s not hard to figure out what’s what.

Next Steps

With my new found knowledge of Ragel state charts I’m starting to work out the Utu machines and testing their logic more extensively. The next step is to obviously make them do something useful. What will happen is based on this diagram you saw previously:

image

You can see the Connection gets events from the Hub and the sockets it’s managing. The events from the Hub are combinations of hateapply_ for transferring hate between people over the connections and open/close events for when the Hub needs to stop a Connection.

Events from the socket will come from two little threads that parse data off the wire into messages for the machine, or send out messages that are generated from the machine.

The follow-up to this article explain how these state machines are actually used in practice to make the server operate, and how well they do when compared with hand coding. I’ll explore things like whether it reduced the code compared to the previous version, how changes were handled, how robust the new version is, and probably see what fuzzing the machine does to it.

Stay tuned.