Monday, January 16, 2012

DotW #1

Dunce of the Week #1
If you unfamilar with "Dunce of the Week", please read my post explaining what DotW is and why I chose "dunce".

Please forgive me for the formatting of the code.  I'm still trying to figure out how to get code to be formatted correctly on Blogger.

I want to thank reddit user caballist for his feedback on this item and for teaching me a couple things.  I have incorporated much of his feedback when I updated this post.
Unsafe Usage of Iterators

Q1) The following function (modified from its original version) was flagged by CppCheck as using iterators in an unsafe manner. Can you identify what the problem with the code is?
void ErrorMgr::Clear()
{ 
    /* Clear only the clearable errors and alarms */
    std::set<AppError>::iterator it;
    {
        mutexLock autoLock(m_mutexActiveAlarms);
        for (it = m_setOfActiveAlarms.begin(); it != m_setOfActiveAlarms.end(); it++)
        {
            if (it->GetClearable())
            {
                m_setOfActiveAlarms.erase(it);
            }
        }
    }
    {
        mutexLock autoLock(m_mutexActiveErrors);
        for (it = m_setOfActiveErrors.begin(); it != m_setOfActiveErrors.end(); it++)
        {
            if (it->GetClearable())
            {
                m_setOfActiveErrors.erase(it);
            }
        }
    }
}
The important part of the definition for AppError is as follows:
        class AppError


{
    // ...
public:
    bool IsClearable() const;
    // ...
};

Q2) How can the code be fixed to use iterators in a safe manner.

Q3) Let us assume that the there is a need other places in our product to remove elements from sets based on the value stored in the sets. What can we do to help us from repeating the same mistake while (hopefully) at the same time making the code easier to read?



Answers

A1) The call to erase removes the iterator from the set. The iterator no longer points to any valid item – especially not one in the set.   (The iterator has been “invalidated” to use the term of the C++ standards committee.)   To then increment the iterator (via the last term in the for loop statement) will result in undefined behavior because the iterator is not valid.   The iterator may accidentally point to the “next” element in the set, but it does not have to.   (I suppose it did this with the STL library we were using with our product or the code never would have passed our tests.)

A2) There are two simple ways to fixing the problem (each block has a different solution):





void ErrorMgr::Clear()
{
    // Clear only the clearable errors and alarms
    std::set<AppError>::iterator it, itEnd;
    // std::erase returns the next valid iterator in the set

    {   
        mutexLock autoLock(m_mutexActiveAlarms);
        it = m_setOfActiveAlarms.begin();
        itEnd = m_setOfActiveAlarms.end();
        while (it != itEnd)
        {
            if (it->IsClearable())
            {
                it = m_setOfActiveAlarms.erase(it);
            }
            else
            {
                ++it;
            }


}
    }
    /* Use a temporary iterator (testIt) to store the old value of the loop iterator.
       Increment the loop iterator, then perform erase operations on the temporary
       iterator if necessary. */
    {
        mutexLock autoLock(m_mutexActiveErrors);
        std::set<AppError>::iterator testIt;


it = m_setOfActiveErrors.begin();
itEnd = m_setOfActiveErrors.end();
while (it != itEnd)
        {
            testIt = it;
            ++it;
            if (testIt->IsClearable())
            {
                m_setOfActiveErrors.erase(testIt);
            }
        }
    }
}

It is my opinion that the second strategy is preferable because:
  • it avoids the post-increment operator which I am trying to discourage our team from using (especially in iterators where it can lead to a performance hit).
  • there is no else clause in the second strategy which in my opinion makes the logic of the algorithm easier to follow.

A3) Well if we’re going to have the same problem elsewhere in the product, we should aim to remove duplicate code. We want to pull out the algorithm and put it somewhere where it can be used again. We don’t know what comparator member function we may use elsewhere so we will have to pass that into the algorithm somehow.
I am going to follow the spirit of the standard algorithms (found in <algorithm>) as much as I can for my solution and name this algorithm set_erase_if which will require that I pass in a reference to the set and a pointer to a set element member function. The resulting code will read:



void ErrorMgr::Clear()
{
    // Clear only the clearable errors and alarms

    {
        mutexLock autoLock(m_mutexActiveAlarms);
        set_erase_if(m_setOfActiveAlarms, &AppError::IsClearable);
    }
    {
        mutexLock autoLock(m_mutexActiveErrors);
        set_erase_if(m_setOfActiveErrors, &AppError::IsClearable);
    }


}

The problem now is to write the code for this algorithm cleanly so that it can be reused. Well, let us start off by factoring out the common code. We can look later at how we can make it more generic. The result of simply factoring out the common code is:
void set_erase_if(std::set<AppError>& theSet, bool (AppError::*compareMemFn)() const)
{
    std::set<AppError>::iterator it = theSet.begin();
    std::set<AppError>::iterator itEnd = theSet.end();
    std::set<AppError>::iterator itTest;
    while (it != itEnd)
    {
        itTest = it;
        ++it;
        if (((*itTest).*(compareMemFn))())
        {
            theSet.erase(itTest);
        }
    }
}
This code compiles and testing (you are writing tests for you code, right?) shows that it does what we expect it to do. But in its current state it only works for std::set<AppError>. We can fix this by templatizing the key value of the set. This leads us to the following code:
template<class Key>
void set_erase_if(std::set<Key>& theSet, bool (Key::*compareMemFn)() const)
{
    typename std::set<Key>::iterator it = theSet.begin();
    typename std::set<Key>::iterator itEnd = theSet.end();
    typename std::set<Key>::iterator itTest;

    while (it != itEnd)
    {
        itTest = it;
        ++it;
        if (((*itTest).*(compareMemFn))())
        {
            theSet.erase(itTest);
        }
    }
}

We had to add ‘typename’ to the beginning of the declarations for the iterator because ‘iterator’ is now an inner type of a class used as a template parameter. Without the keyword, gcc will fail to properly compile the code. 
We test and find out that everything still works. J We could stop here. But since we’re creating a new algorithm, we should ask ourselves to see if we can make it more generic. Here are some questions that pop to my mind:
  • The algorithm runs over the entire set. What if I wanted to only cover a range?
  • Why are we using a member function of type bool (Key::*memberFn)() const? Does the member function have to be ‘const’? How can we make it more generic?
  • We limit ourselves to std::set.  Is it possible to make this code generic enough to work with other associative containers?
I will tackle these one at a time.  First off, let’s figure out what we’d need to cover any range for a set. This new function can be a function overload – keeping the same name as the already defined function and offering new functionality while at the same time still allowing users to use the previously defined function.
The STL library (which I am trying to mimic) uses start and end iterators to define ranges. Most algorithms are generic enough that you only have to pass in the iterators, not a reference to the container on which you are operating.   Unfortunately for us, ‘erase’ is a member function of the set, not the iterator so we have to pass in a reference to set. With this is mind, we can construct the following function:
template<class Key>
void set_erase_if(std::set<Key>&                          theSet,
                  const typename std::set<Key>::iterator& rangeBegin,
                  const typename std::set<Key>::iterator& rangeEnd,
                  bool (Key::*                            compareMemFn)() const)
{
    typename std::set<Key>::iterator itTest;
    typename std::set<Key>::iterator it = rangeBegin;

    while (it != rangeEnd)
    {
        itTest = it;
        ++it;        if (((*itTest).*(compareMemFn))())
        {
            theSet.erase(itTest);
        }
    }
}

This compiles and tests we write for this pass. But now we realize we have duplicate code again and that the first function we wrote (the one without the range iterators) can be re-written to use the above function.
template<class Key>
void set_erase_if(std::set<Key>& theSet, bool (Key::*compareMemFn)() const)
{
    set_erase_if(theSet, theSet.begin(), theSet.end(), compareMemFn);
}

Nice and succinct. Now our logic is in one place. And we can remove elements by using either one of the following lines: 
set_erase_if(m_setOfActiveErrors, &AppError::IsClearable);

set_erase_if(m_setOfActiveErrors, m_setOfActiveErrors.begin(), m_setOfActiveErrors.end(),
             &AppError::IsClearable);

So on to the next improvement – making the algorithm work so it doesn’t have to rely on such a specific member function signature. A common way to make algorithms like this more generic is to pass in a functor instead of a specific member function. A common method for binding member functions (which can have different signatures) to functors is to use std::mem_fun defined in <functional>. The resulting functor’s call operator takes as its first argument a pointer to the class type. We need to know this because it will affect how we write our test condition.   After some tinkering around we come up with a solution:

template<class Key, class RemoveComparisonFunctor>
void set_erase_if(std::set<Key>&                          theSet,
                  const typename std::set<Key>::iterator& rangeBegin,
                  const typename std::set<Key>::iterator& rangeEnd,
                  const RemoveComparisonFunctor&          compare)
{
    typename std::set<Key>::iterator itTest;
    typename std::set<Key>::iterator it = rangeBegin;
  
    while (it != rangeEnd)
    {
        itTest = it;
        ++it;
        if (compare(&*itTest))
        {
            theSet.erase(itTest);
        }
    }
}

template<class Key, class RemoveComparisonFunctor>
void set_erase_if(std::set<Key >& theSet, const RemoveComparisonFunctor& compare)
{
    set_erase_if(theSet, theSet.begin(), theSet.end(), compare);
}

We can now remove elements by using one of the following statements:
set_erase_if(m_setOfActiveErrors, &AppError::IsClearable);

set_erase_if(m_setOfActiveErrors, m_setOfActiveErrors.begin(), m_setOfActiveErrors.end(),
             &AppError::IsClearable);

set_erase_if(m_setOfActiveErrors, mem_fun(&AppError::IsClearable));

set_erase_if(m_setOfActiveErrors, m_setOfActiveErrors.begin(), m_setOfActiveErrors.end(),
             mem_fun(&AppError::IsClearable));

// Let’s pretend there’s a free function whose prototype is:
//   bool isClearable(KasError* error);
set_erase_if(m_setOfActiveErrors, isClearable);

set_erase_if(m_setOfActiveErrors, m_setOfActiveErrors.begin(), m_setOfActiveErrors.end(),
             isClearable);

Now can we get rid of anything? The answer is yes. The non-range based functions are extremely similar. It turns out that the newer one (with the comparison functor) will forward correctly to the member function range-based overload. We could also remove the member function range-based overload too since mem_fun will do its work for us.   And I am currently debating whether or not to keep it.  Ideally, I would remove it (since we can use mem_fun), but many people on the development team on which I work do not use <algorithm> at all.  The additional call to mem_fun may tip them toward avoiding the use of algorithms.  For now I will keep it the member function ranged-based overload, but I will be discussing the issue with the members of the team to see how they want to proceed.  Afterall, it is important to work with others and developing a technically correct solution will not be helpful if you can't get the people you work with to use it.  I will update this section once the team comes up with a consensus.

OK. Now on to the last point. We want to make the functions handle different associative containers.   It is pretty obvious that we can make the code generic enough to work with std::multiset, std::unordered_set (for those who have access to that C++11 container), or other set-like containers.  The other major associative continer types are the map-like containers (std::map, std::multimap, std::unordered_map, etc...).  The map-like containers present us with some pretty large complications:
  • Which iterator member should we use -- key or value (first or second)?
  • How does the user easily specify which one they want to use?
This would be much easier to deal with if we had access to C++ lambdas, but we don't have access to those compilers yet.  For this reason, I will restrict the algorithm to work with set-like containers.  It has been suggested to remove "set" from the name of the algorithm if we are going to make it work with different types of containers.  However, we may want to develop erase_if algorithms in the future for map-like containers which will have a different interface.  For this reason, I am going to keep "set" in the name to denote that this particular algorithm works with set-like containers.  I (and the team on which I work) may re-evaluate this when we finally have access to compilers that support C++ lamdas. 
Anyway, back to generalizing the code to work with different set-like containers.  We can accomplish that by substituting std::set<Key> with a different template class.  The resulting code is:
template<class SetLikeContainer>
void set_erase_if(SetLikeContainer&                          theSet,
                  const typename SetLikeContainer::iterator& rangeBegin,
                  const typename SetLikeContainer::iterator& rangeEnd,
                  bool (SetLikeContainer::key_type::*        compareMemFn)() const)
{
    typename SetLikeContainer::iterator itTest;
    typename SetLikeContainer::iterator it = rangeBegin;
    while (it != rangeEnd)
    {
        itTest = it;
        ++it;
        if (((*itTest).*(compareMemFn))())
        {
            theSet.erase(itTest);
        }
    }
}
template<class SetLikeContainer, class RemoveComparisonFunctor>
void set_erase_if(SetLikeContainer&                          theSet,
                  const typename SetLikeContainer::iterator& rangeBegin,
                  const typename SetLikeContainer::iterator& rangeEnd,
                  const RemoveComparisonFunctor&             compare)
{
    typename SetLikeContainer::iterator itTest;
    typename SetLikeContainer::iterator it = rangeBegin;

    while (it != rangeEnd)
    {
        itTest = it;
        ++it;
        if (compare(&*itTest))
        {
            theSet.erase(itTest);
        }
    }
}
template<class SetLikeContainer, class RemoveComparisonFunctor>
void set_erase_if(SetLikeContainertheSet, const RemoveComparisonFunctor& compare)
{
    set_erase_if(theSet, theSet.begin(), theSet.end(), compare);
}

That is about as generic as I can make this algorithm. The code is pretty short and clear too. And the resulting code for the function we were originally analyzing is now more clear. (Go ahead and compare it again to the original at the top of the page.  And yes, this identical to what we wrote at the beginning of A3; I just put it here to make comparing easier.)




void ErrorMgr::Clear()
{
    // Clear only the clearable errors and alarms

    {
        mutexLock autoLock(m_mutexActiveAlarms);
        set_erase_if(m_setOfActiveAlarms, &AppError::IsClearable);
    }
    {
        mutexLock autoLock(m_mutexActiveErrors);
        set_erase_if(m_setOfActiveErrors, &AppError::IsClearable);
    }

}
Thank you for taking the time to finish this rather long first installment of DotW.

3 comments:

  1. std::set::erase doesn't return the next iterator because its iterators do not get invalidated from modifying the std::set.

    http://en.cppreference.com/w/cpp/container/set/erase

    ReplyDelete
    Replies
    1. How interesting. The Dinkumware and MS STL (which is licensed from Dinkumware) has std::set::erase returning an iterator.

      http://www.dinkumware.com/manuals/?manual=compleat&page=set.html#set::erase

      http://msdn.microsoft.com/en-us/library/8h4a3515(v=vs.100).aspx

      That's interesting that they deviated from the standard.

      Delete
  2. Nice read. We have a copy of Scott Meyers' Effective STL in our shop that is frequently used for "Uh, how do I get rid of all those elements in that std::map without blowing everything up again?".

    ReplyDelete