Writing functional code in C++ IV – Algebraic datatypes - Luca Bolognese

Writing functional code in C++ IV – Algebraic datatypes

Luca -

☕ 5 min. read

And here comes the guilt bit. I have the strong sus­pi­cion (but not cer­tainty) that what I am do­ing here can be done with tem­plates, but did­n’t take the time to do it. With that out of the way, let’s go.

Code for this post is here. Thanks to Steve Bower and Andy Sawyer for re­view­ing it.

Algebraic datatypes (discriminated unions in F#) are a pow­er­ful con­cept in func­tional pro­gram­ming. They are the main way to rep­re­sent type vari­a­tion in your pro­gram. Very roughly, where ob­ject ori­en­ta­tion uses de­riva­tion, func­tional pro­gram­ming uses al­ge­braic datatypes. An en­tire book could be writ­ten on the the­ory of this, but the goal of this post is to see how we can map them to C++ with­out loos­ing C++ness.

When talk­ing about this with C++ pro­gram­mers, they al­ways point me to boost vari­ant. That does­n’t do it for me for sev­eral rea­sons.

First of all, boost vari­ants rep­re­sent one of a fixed col­lec­tion of types. Algebraic datatypes rep­re­sent one of a fixed col­lec­tion of named types. That means that a sim­ple thing like the code be­low can­not be rep­re­sented as vari­ant:

type LivingEntity =
| Person of string  // name
| Dog of string     // name

Ok, ok maybe you could rep­re­sent it by typifing’ things us­ing boost strong type­def, but things get ugly syn­tac­ti­cally. Moreover, a lot of time the name is all you care about …

type Switch = On | Off

Are we go­ing to strong type­def for such a sim­ple thing? Oh boy. Even ac­cept­ing the syn­tac­tic weight of all this, there are other is­sues in play. Discriminated unions are used ex­ten­sively in func­tional pro­grams. So you want a nice syn­tax to work with them Something like the be­low F# syn­tax:

let print living =
    match living with
    | Person(name) -> printfn "I'm a per named %s" name
    | Dog(name)    -> printfn "I'm a dog named %s" name

Which could be made even sweeter by us­ing the function’ key­word as be­low:

let print = function
    | Person(name) -> printfn "I'm a per named %s" name
    | Dog(name)    -> printfn "I'm a dog named %s" name

In boost vari­ant, you ei­ther use the get func­tions or you write a vis­i­tor func­tion. In the first case you are prob­a­bly go­ing to write a chain of if’ state­ments or a switch’ state­ment. Both are con­fus­ing and come with plenty of syn­tac­tic weight. I don’t re­ally want to write a vis­i­tor like the one be­low for each match’ in my code. The magic is gone.

class times_two_visitor
    : public <a href="http://www.boost.org/doc/libs/1_49_0/doc/html/boost/static_visitor.html">boost::static_visitor</a><>
{
public:
    void operator()(int & i) const
    {
        i *= 2;
    }
    void operator()(std::string & str) const
    {
        str += str;
    }
};

Ok, so boost vari­ant does­n’t re­ally work for this. Remember that our over­ar­ch­ing goal was to stay close to C++. The lan­guage it­self has some­thing that comes pretty close to what we want in the form of a union, or bet­ter a tagged union. Again, the types are not named, but maybe we can work that in.

It turns out that Jared here did all the hard work. The gen­eral idea is to use macros to hide the con­struc­tion of a tagged union with meth­ods to test the type and re­turn the con­tained value.

For ex­am­ple this code:

DU_DECLARE(LivingEntity)
    DU_VALUE(Person,    string)
    DU_VALUE(Dog,       string)
DU_END

Becomes some­thing like:

struct LivingEntity {
    private:
        LivingEntity() {}
        unsigned int m_kind;
    public:
        static LivingEntity Person(const string& value) {
            LivingEntity unionValue;
            unionValue.m_kind = 19;
            unionValue.m_Person = value;
            return unionValue; }
        bool IsPerson() const {
            return m_kind == 19;
        }
        const string& GetPerson() const {
            (void)( (!!(m_kind == 19)) || (_wassert(L"m_kind == __LINE__", L"c:discriminated_union.cpp", 19), 0) );
            return m_Person; }
        string GetPerson() {
            (void)( (!!(m_kind == 19)) || (_wassert(L"m_kind == __LINE__", L"c:discriminated_union.cpp", 19), 0) );
            return m_Person; }
   private:
string m_Person;

You can see the out­line of a tagged union (i.e. m_kind) with a con­struc­tor for each type (i.e. Person) and meth­ods to test for at type and re­turn its value. You can also see the stor­age for the value (i.e. m_Per­son).

The only thing in DU_DECLARE that is dif­fer­ent from Jared’s so­lu­tion is the type­def be­low that al­lows not re­peat­ing the LivingEntity name in each DU_VALUE.

#define DU_DECLARE(name)                        \
    struct name {                               \
    private:                                    \
        typedef name unionName;                 \
        name() {}                               \
        unsigned int m_kind;                    \
    public:
#define DU_VALUE(entryName, entryType)                                                                      \
        static unionName entryName(const entryType& value) {                                                \
            unionName unionValue;                                                                           \
            unionValue.m_kind = __LINE__;                                                                   \
            unionValue.m_##entryName = value;                                                               \
            return unionValue;  }                                                                           \
        bool Is##entryName() const { return m_kind == __LINE__;}                                            \
        const entryType& Get##entryName() const { assert(m_kind == __LINE__); return m_##entryName; }       \
        entryType Get##entryName() { assert(m_kind == __LINE__); return m_##entryName; }                    \
    private:                                                                                                \
        entryType m_##entryName;                                                                            \
    public:

With all of that at your dis­posal it be­comes easy to write:

    auto entity = LivingEntity::Dog("Bob");
    DU_MATCH(entity)
        DU_CASE(Dog,   BOOST_CHECK_EQUAL(value, "Bob");)
        DU_CASE(Person,BOOST_CHECK(false);)
    DU_MATCH_END

There are some beau­ti­ful things about this. First of all, the con­struc­tion of any of such types is su­per sim­ple. You even get in­tel­lisense!

Moreover the value’ vari­able con­tains what­ever was passed in the con­struc­tor for the ob­ject. So this is se­man­ti­cally equiv­a­lent, if not syn­tac­ti­cally, to the match state­ment in F#.

Obviously the code part is not lim­ited to a sin­gle in­struc­tion:

    DU_MATCH(entity)
        DU_CASE(Dog,
            cout << "I should be here";
            BOOST_CHECK_EQUAL(value, "Bob");
        )
        DU_CASE(Person,
            BOOST_CHECK(false);
        )
    DU_MATCH_END

And for those of you ad­dicted to braces, I ven­ture:

    DU_MATCH(entity)
        DU_CASE(Dog,
        {
            cout << "I should be here";
            BOOST_CHECK_EQUAL(value, "Bob");
        })
        DU_CASE(Person,
        {
            BOOST_CHECK(false);
        })
    DU_MATCH_END

They all work with the same macro de­f­i­n­i­tion. They ex­pand to some­thing along the line of:

    if(false) {}
        else if(entity.IsDog()) {
            auto value = entity.GetDog();
            BOOST_CHECK_EQUAL(value, "Bob");
        }
        else if(entity.IsPerson()) {
            auto value = entity.GetPerson();
            BOOST_CHECK(false);
        }
        else {
            throw match_exception();
        }

I’ve not reached the pin­na­cle of macro nam­ing mas­ter­ing with this one. Making them low­er­case and risk­ing a bit more on the con­flict side would make the syn­tax much more palat­able. I call it, as it is, not too bad.

The last else’ clause as­sures you then if you add a new type to the dis­crim­i­nated union and for­get to up­date one of the MATCH clauses at least you get a run time er­ror. That is not good. Functional lan­guages would give you a com­pile time er­ror, which is much bet­ter. Maybe with ju­di­cious use of tem­plates you can bring the er­ror at com­pile time.

The macros are triv­ial:

class match_exception: std::exception {};
#define DU_MATCH(unionName) { auto du_match_var = unionName; if(false) {}
#define DU_CASE_TAG(entry, ...)                        \
    else if(du_match_var.Is##entry()) {                \
        __VA_ARGS__                                    \
    }
#define DU_CASE(entry, ...)                            \
    else if(du_match_var.Is##entry()) {                \
        auto value = du_match_var.Get##entry();        \
        __VA_ARGS__                                    \
    }
#define DU_DEFAULT(...)                                \
    else if(true) { __VA_ARGS__}
#define DU_MATCH_END else {throw new match_exception();} }

Let’s now go back to our ini­tial goal and see how far off we are. We were try­ing to do the fol­low­ing:

type LivingEntity =
| Person of string
| Dog of string
let print = function
    | Person(name) -> printfn "I'm a per named %s" name
    | Dog(name)    -> printfn "I'm a dog named %s" name

And here is what we ended up with:

DU_DECLARE(LivingEntity)
    DU_VALUE(Person,    string)
    DU_VALUE(Dog,        string)
DU_END
auto print(LivingEntity en) -> void {
    DU_MATCH(entity)
        DU_CASE(Dog,    cout << "I'm a dog named " << value;)
        DU_CASE(Person, cout << "I'm a per named " << value;)
    DU_MATCH_END
}

In our Switch case:

type Switch = On | Off

You get the good look­ing :

DU_DECLARE(Switch)
    DU_FLAG(On)
    DU_FLAG(Off)
DU_END

And along the way we lost com­pile time type safety in the very com­mon case of adding new types to the dis­crim­i­nated union. That’s bad.

Also some of you would strongly dis­like the (ab)use of macros. As for me, it looks work­able.

4 Comments

Comments

Cl&#233;ment

2012-07-21T11:27:20Z

Amusing solution! I'd like to expend on the first few lines of your article though:
"Very roughly, where object orientation uses derivation, functional programming uses algebraic datatypes. "
Expanding on this, you could implement your LivinEntity type like this:

#include
using namespace std;
class LivingEntity {
public:
string name;
virtual void print() = 0;
LivingEntity(string name) {
this->name = name;
}
class Person;
class Dog;
};
class LivingEntity::Person : LivingEntity {
public:
Person(string name) : LivingEntity(name) {}
void print() {
cout << "I am a per named " << name << endl;
}
};
class LivingEntity::Dog : LivingEntity {
public:
Dog(string name) : LivingEntity(name) {}
void print() {
cout << "I am a dog named " << name << endl;
}
};
int main()
{
LivingEntity::Dog a_dog = LivingEntity::Dog("a dog");
LivingEntity::Person a_person = LivingEntity::Person("a person");
a_dog.print();
a_person.print();
return 0;
}

Also, why do you use else if(true) { __VA_ARGS__} instead of just else { __VA_ARGS__} in your macros?
Thanks for the nice article!

Hi Clement,
Yes, you can simulate algebraic datatypes with inheritance and type testing, albeit more verbosely and less efficiently. The if(true) thing is just a bug. Probably at some point in the coding the condition was a bit more interesting.

Christoph Senjak

2017-02-02T03:30:51Z

Does this also work with recursive datatypes?

0 Webmentions

These are webmentions via the IndieWeb and webmention.io.