Boost.Spirit: take two

In a previous post I shared my first experience using Boost.Spirit using the example of parsing a greatly simplified material definition. Since then, I've learned a few new things about Spirit, and would like to share them by rewriting the previous example. In the previous example, we had a Material struct that looked like this:

// A (very) simple material type
struct Material
{
    std::string name;
    std::string shaderProgramName;
    std::vector<std::string> textureNames;
};

// Make our material type usable as a Fusion tuple
BOOST_FUSION_ADAPT_STRUCT(
    Material,
    (std::string, name)
    (std::string, shaderProgramName)
    (std::vector<std::string>, textureNames) )

a material definition script that looked like this:

 NAME sample_material
 SHADER some_shader_name

 TEXTURES
 {
     texture1
     texture2
     texture3
 }

and a Spirit grammar that looked like this:

template< typename Iterator >
struct MaterialDefinition : boost::spirit::qi::grammar< Iterator, Material() >
{
    MaterialDefinition()
        : MaterialDefinition::base_type( start )
    {
        using namespace boost::spirit::qi;

        start = &(requiredTermCount) >> (name ^ shader ^ textureList);
        name = *space >> lit("NAME") >> +space >> value >> *space;
        shader = *space >> lit("SHADER") >> +space >> value >> *space;
        textureList = *space >> lit("TEXTURES") >> *space >> lit('{') >> *space >> (value % +space) >> *space >> lit('}') >> *space;
        value = +char_("a-zA-Z_0-9");
        requiredTerm = name | shader | textureList;
        requiredTermCount = requiredTerm >> requiredTerm >> requiredTerm;
        space = lit(' ') | lit('\n') | lit('\t');
    }

    boost::spirit::qi::rule< Iterator, Material() > start;
    boost::spirit::qi::rule< Iterator, std::string() > name;
    boost::spirit::qi::rule< Iterator, std::string() > shader;
    boost::spirit::qi::rule< Iterator, std::vector<std::string>() > textureList;
    boost::spirit::qi::rule< Iterator, std::string() > value;
    boost::spirit::qi::rule< Iterator > requiredTerm;
    boost::spirit::qi::rule< Iterator > requiredTermCount;
    boost::spirit::qi::rule< Iterator > space;
};

This grammar is quite verbose and can be made more compact, declarative, and extensible.

First of all, it explicitly skips whitespace by defining a rule to represent whitespace and then using that rule to manually skip whitespace in other rules. In my haste to simply make something that works earlier, I neglected to take advantage of Spirit's built-in skipping functionality.

Second, the rules can be rewritten in a more declarative fashion now that whitespace isn't a concern.

Third, the grammar uses a hacky kludge to get the "start" rule to behave as expected. The "start" rule uses the permutation operator to express that we want a "name", "shader", and "textureList" in any order. But, this permutation by itself will match input that contains just a "name" and a "shader", or just a "textureList" and a "name". It doesn't force all three to be present for a successful match which is what we want. So, in my effort to make something that works, I came up with the best solution using what I knew at the time. I defined the rules "requiredTerm" and "requiredTermCount". Then, "start" first attempts to match "requiredTermCount" and simply ignores its output using the & operator. What if we wanted to add another property to a material, such as color? We would of course define a grammar rule named "color", but we would also have to add this rule to "requiredTerm" and update "requiredTermCount". Obviously this isn't desirable. What we would like is a way for the "start" rule to simply say "match these rules in any order and only allow the parsing to pass if they are all present".

Here is the updated Material struct and parser. I will explain how it addresses each of these issues:

struct Material
{
    // These are held as optionals to simplify making them all
    // required during parsing
    boost::optional< std::string > name;
    boost::optional< std::string > shaderProgramName;
    boost::optional< std::vector<std::string> > textureNames;
};

BOOST_FUSION_ADAPT_STRUCT(
    Material,
    (boost::optional< std::string >, name)
    (boost::optional< std::string >, shaderProgramName)
    (boost::optional< std::vector<std::string> >, textureNames) )

template< typename Iterator >
struct MaterialDefinition : boost::spirit::qi::grammar< Iterator, Material(), boost::spirit::ascii::space_type >
{
    MaterialDefinition()
        : MaterialDefinition::base_type( start )
    {
        using namespace boost::spirit::qi;
        using boost::spirit::qi::_0;

        start %= (name ^ shader ^ textureList)[_pass = NoMissingElements(_0)];
        name = lit("NAME") >> quotedString;
        shader = lit("SHADER") >> quotedString;
        textureList = lit("TEXTURES") >> lit('{') >> (quotedString % lit(',')) >> lit('}');
        quotedString = lexeme['"' >> +(char_ - '"') >> '"'];
    }

    boost::spirit::qi::rule< Iterator, Material(),                 boost::spirit::ascii::space_type > start;
    boost::spirit::qi::rule< Iterator, std::string(),              boost::spirit::ascii::space_type > name;
    boost::spirit::qi::rule< Iterator, std::string(),              boost::spirit::ascii::space_type > shader;
    boost::spirit::qi::rule< Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type > textureList;
    boost::spirit::qi::rule< Iterator, std::string(),              boost::spirit::ascii::space_type > quotedString;
};

I have also slightly modified the material definition script format:

 NAME "sample_material"
 SHADER "some_shader_name"

 TEXTURES
 {
     "texture1",
     "texture2",
     "texture3"
 }

Notice the addition of boost::spirit::ascii::space_type in the grammar definition and rules? This additional parameter simply tells Spirit what type of characters to skip. You can clearly see how this simplifies the rule definitions and makes them more clearly mimic the data they are meant to parse. Now, what is this [_pass = NoMissingElements(_0)] business in the "start" rule? On the surface, it appears to be what we want...a simple way to force all permutation arguments to be present for a successful parse. But how does it work? Well, it takes advantage of the fact that the permutation operator's attribute type is a Fusion tuple of boost::optional objects (indeed, their "optionalness" is the problem remember?). By passing this tuple to a Boost.Phoenix function that returns true only if all of the boost::optional objects it contains are valid, we can determine whether the parse should be successful or not. This solution doesn't have any of the extensibility issues present in the previous solution. This is also why the Material struct now stores its members as boost::optional objects. You see, Spirit will automatically convert the afore-mentioned attribute type of the permutation operator to the attribute type of the "start" rule. This conversion happens BEFORE "NoMissingElements" is called. This means we need to force "NoMissingElements" to be called with a Fusion tuple of boost::optional objects ourselves or it will result in a compile error. I am still interested in a solution for this issue that doesn't involve altering the Material struct. All that remains is to show the implementation for "NoMissingElements":

namespace detail
{

    class NoMissingElementsImpl
    {
    private:
        struct IsOptionalInitialized
        {
            template< typename T >
            bool operator()( const T& t ) const
            {
                // Simply exploits boost::optional<T>'s
                // convertibility to bool
                return t;
            }
        };

    public:
        template< typename T >
        struct result
        {
            typedef bool type;
        };

        template< typename T >
        bool operator()( const T& t ) const
        {
            return boost::fusion::all( t, IsOptionalInitialized() );
        }
    };

} // end namespace detail

const boost::phoenix::function<detail::NoMissingElementsImpl> NoMissingElements = boost::phoenix::function<detail::NoMissingElementsImpl>();