Click here to Skip to main content
15,867,756 members
Articles / Programming Languages / C++17

Hacking on Structured Bindings

Rate me:
Please Sign up or sign in to vote.
5.00/5 (5 votes)
23 Jun 2018CPOL8 min read 8.5K   5  
Using the new structured binding feature in your API design

Introduction

In case you have not seen it before, the new structured binding declaration in C++ lets you declare and initialize multiple variables of different types from one return value. For example, the awkward pair return from map::insert becomes nice!

C++
auto [it,added] = sales.insert({orderID,order});
if (!added) { ⋯ // deal with existing orderID being updated instead

It is also handy when iterating over a map.

C++
for (const auto& [key,value] : sales) { ⋯

You can use it where existing code used a pair, and being able to decompose the pair into different named variables is handier than using .first and .second throughout the block. That is for existing constructs (going back to C++98) where pair was needed but awkward.

But, it is not just for taming old awkward API details. Now that it exists, you can purposefully use it in your library or function design.

Exploration Test Case

To explore what is possible, how it works, and how you can make it do what you want, I will use the example of complex numbers.

What is Inside the Structure

C++
using ℂ = std::complex<double>;
ℂ z1 = sin(1.0+2.1i);
cout << z1 << '\n';

// this does not work with std::complex
auto [a,b] = z1;

The structured binding does not work on this type, because the data members are private (and possibly not even normal stuff, but using compiler intrinsics). But, we are guaranteed what the internal layout actually is, in order to make it compatible with the C language ABI. n4659 §29.5 ¶4 states:

If z is an lvalue expression of type cv complex<T> then:
— the expression reinterpret_cast<cv T(&)[2]>(z) shall be well-formed,
reinterpret_cast<cv T(&)[2]>(z)[0] shall designate the real part of z, and
reinterpret_cast<cv T(&)[2]>(z)[1] shall designate the imaginary part of z.

It is very unusual indeed for the standard to say that you can reinterpret_cast to do type punning. Normally, it says that doing so (for your own unions or data formats) is undefined behavior! But, here it is specifically allowed.

C++
auto [a,b] = reinterpret_cast<double(&)[2]>(z1);
cout << a << ", " << b << '\n';

Using the cast, we can indeed destructure the complex into the components that are actually stored inside the structure. This is a bit rude to use in open code, so make it a function instead:

C++
template <typename T>
using complex_array_ref_t = T(&)[2];

template<typename T>
complex_array_ref_t<T> cartesian (complex<T>& z)
{
    return reinterpret_cast<T(&)[2]>(z);
}

    ⋮

auto [re,im] = cartesian(z1);
cout << re << ", " << im << '\n';

Getters that are not Actually Data Members

The above code works because the structure really does contain those values as data members, and we are just telling the compiler to alias the declared names against the existing storage. But what if that’s not the case? Here is a more general approach:

C++
template<typename T>
auto cartesian_3 (const complex<T>& z)
{
    return std::make_pair (z.real(), z.imag());
}

As we saw with the std::map uses, returning a pair (or tuple of any size) can simply be used with structured binding to produce multiple return values.

Any time you want to return multiple results from one operation, you don’t have to be shy about it. Just return a tuple! Notice that with auto return type, I did not even have to list the individual types.

There is more than one way to decompose a complex value into two scalars: rectangular, as we have seen, and polar. Since there is more than one way to express it, it is reasonable that the user state the type of representation rather than having it be fully automatic. A user of the library may want one or the other, or worse yet, guess what the multiple returns are and get it wrong.

C++
auto [a,b] = z1;  // is this polar or rectangular?
auto [re,im] = cartesian(z1);  // clear what is wanted

Continuing on that theme, it makes sense to label the individual return values. Rather than a pair with meaningless .first and .second members, you can return a struct that has meaningful names, and a handy place to put comments on the definition of what is being returned.

C++
template <typename T>
struct pole_pair
{
    T magnitude;    // always positive
    T phase_angle;  // interval [−π through π]
};

You might now think of writing a function that takes a complex value and returns a pole_pair, just like cartesian_3 did with a pair.

But, consider that the way to create an instance of type pole_pair is with its constructor. Why have a function that returns a value of this struct, when the same function-call syntax can just directly make one in-place?

C++
// added to the struct body
test_pole_pair (const std::complex<T>& z) : magnitude{abs(z)}, phase_angle{arg(z)} {}

Use it like so:

C++
auto [r,θ] = pole_pair(z1);
cout << r << ", " << θ << " (" << θ*180/3.14159 << " degrees)\n";

Note that this is guaranteed not to re-copy the pole_pair structure. Even with all optimizations disabled, you will see that a local variable is created, the members initialized by the code in the constructor, and the local names r and θ refer directly to the fields of the local struct. This is one of those zero-cost abstractions that C++ is known for.

Built-in Binding Ability

In general, if you want to provide something that callers can use to perform structured binding, you can return a plain C array, a plain ol’ data struct, a std::pair, or std::tuple.

Full Customization

At the beginning, we showed that std::complex does not work with structured bindings because it’s not a POD struct. It is easy to use a named function to get multiple values, but consider now how to make it work as-is, with nothing special at the caller’s site.

Here, we will discuss the magic to enable this.

Consider this test class. It has no real purpose, but just serves to illustrate the issues and test the solutions.

C++
class fred_t {
    string name;
    int age;
    double raw_score;
public:
    fred_t (string_view s, int age, double score) : name{s}, age{age}, raw_score{score} {};
    string get_name() const { return name; }
    int get_days() const { return 365*age; }
    double get_score() const
        {
        int a = std::min(age,50);
        return raw_score * 50/a;
        }
};

The accessible properties are not stored directly in the class, but are computed. So, it resembles the polar form of the complex numbers example, not the cartesian version. No public/private casting tricks will reveal the desired values directly.

Yet, I want to enable the use of structured binding without needing to wrap it in another call, like we did with polar_pair above.

C++
cout << "Fred 1  ";
fred_t fred { "Fred", 26, 80.7 };
auto [name,days,score] = fred;
cout << name << ", " << days << ", " << score << '\n';

How to do this can be determined from the specification in the standard (n4659 C++17) §11.5 ¶3.

I’ll start by adding a generic get function as a member of the class. This could be provided as a member or a non-member, whichever is easier. Making it a non-member, assuming the information is all available publicly by other means already, has the advantage of not modifying the class itself. However, a member implementation could take shortcuts and avoid copying sometimes, as I’ll go over later.

C++
class fred_t {
       ⋮
    template <size_t I>
    auto get() const {
        if constexpr (I==0) return get_name();
        else if constexpr (I==1)  return get_days();
        else if constexpr (I==2)  return get_score();
        else static_assert (I>=0 && I<3);
    }
       ⋮
};

Writing this in C++17 is much, much easier than it was previously! The if constexpr allows all the different indexes to be implemented in a single function, even though they return different types. The use of auto for the return type adjusts itself automatically and you don’t have to worry about complex metaprogramming.

There is another hurdle to getting this to work, though. Unfortunately, the customization point requires specializing templates in the std namespace.

You need to enable the structured binding support and tell it how many values there are, by specializing std::tuple_size.

Then, you need to specify the types of each item by specializing std::tuple_element<I,T>::type.

C++
namespace std {

template<>
class tuple_size<fred_t> : public integral_constant<size_t, 3> {};

template<size_t I>
class std::tuple_element<I, fred_t> {
public:
    using type = decltype (declval<fred_t>().get<I>());
};

}

Some Necessary Details

It is important to understand the lifetime of the object. Even with the custom-get form, the inner workings are still pretty much the same.

C++
auto [name,days,score] = fred;

will be transformed into the generated code:

C++
auto __e = fred;

First, the binding names list is replaced by a regular variable. The name is internal to the implementation but you can follow along if I call it something. Note that all the decorations around the auto (const, &) will still be there, and the initializer is still there. It only changes out what’s in the square brackets for a regular variable — all the other details and nuances about the declaration stay the same.

Then, it declares your binding variables:

C++
std::tuple_element<0,fred_t>::type&& name = __e.get<0>();
std::tuple_element<1,fred_t>::type&& days = __e.get<1>();
std::tuple_element<2,fred_t>::type&& score = __e.get<2>();

By understanding what code is generated, you can immediately understand the details of the lifetimes involved. In this case, the instance fred is copied. __e is created by calling the copy constructor.

Since that is not what was wanted here, you can address that by writing it as:

C++
const auto& [name,days,score] = fred;

Now, the generated code declares...

C++
const auto& __e = fred;

...which simply aliases the original and does not duplicate it.

In both cases, however, the variables name, days, and score are references. Since all the get functions return values, you will wind up with a reference bound to the temporary, whose lifetime is extended to that of the reference.

For primitive types like int and double, the optimizer can take great liberties and get rid of all the complexity.

But name is a std::string. get<3>() constructs a string for the return value. Is that making extra copies? Again, the compiler just saves the return value anonymously and makes name an alias to it, so there is no extra copy when compared with writing something like:

C++
auto name { fred.get_name() };

What’s interesting is that the get functions could be written to return references to internal data. You normally don’t want to do this with your accessors, but for the normal automatic structured bindings that is exactly what happens to the data members. The hidden variable __e has the same lifetime as the named variables, and it is keeping the object alive.

So let’s update get<0> to return a reference to the internal field instead of a copy like get_name.

C++
template <size_t I>
decltype(auto) get() const {
    if constexpr (I==0) return static_cast<const string&>(name);
    ⋮
}

Note that the return type is now decltype(auto) rather than just auto, so that it preserves the value category of the return statement. That is, it keeps the &.

Now the generated code:

C++
std::tuple_element<0,fred_t>::type& name = __e.get<0>();

ends up binding the alias name to the __e.name data member directly.

As a reminder, look at what happens when you perform structured binding on a temporary value:

C++
auto [name,days,score] = get_player_stats();

The generated variable __e will hold the return value from the function call. If you wrote const auto& instead, it would still be OK because the temporary’s lifetime is extended to that of the variable, as usual.

Conclusion

Structured binding can be used to return whatever you want. It is simple to just return a plain struct or a tuple with the expectation that it will be used that way, but the custom get interface offers even more possibilities including avoiding extra copies of accessor function results.

The key to making it do what you want is understanding the code it generates. Look at it that way, and you already know all the rules about lifetimes and tradeoffs for efficiency.

The Example Code

All the code snippets are from a full test file which you can find on github.

Addendum

You can use θ (theta) as a variable name?! Certainly! It is a letter, after all. People in Greece can spell identifiers and write comments using regular words, and not be forced to use Roman symbols.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --