Approximate Nearest Neighbors (NN) Libraries in high-dimensions | FLANN vs CGAL vs BOOST


In this article, I will share with you my experience with FLANN, CGAL and BOOST libraries in (approximate) Nearest Neighbor (NN) Searching.

First thing to do is installing the library. BOOST was the easiest since it didn’t need compilation for our purposes. FLANN was easy too, with specific installation steps found in the manual. CGAL, was the hardest, but remember CGAL is huge (it covers many things in computational geometry).

For installing the first two, just reffer to their website. For CGAL do that too, but this installation for CGAL in Linux may be helpful too (it has steps for installing Linux too).

Once you install the library, comes the documentation and how clean the interface is. FLANN was the best, since it is small, robust and even if something isn’t clear in the documentation, you can easily find it in the code files. One disadvantage, is that the dataset must (at least from my perspective) be stored in a 2D array, but I use C++ for this purpose, so an std::vector would be nice.

CGAL, has good documentation, but you need to get familiarize with its concepts. Once you do that, you will be ok, but takes more effort than in FLANN.

BOOST was a failure. Remember, that we are exploring these libraries for high dimensions. It provides interface only for 2 and 3 dimensions. Moreover, you can not have your code accept any dimension the use wants (!), but you must limit the user to choose only some pre-defined (!) dimensions.

Just to define a point in d dimensions, I would need to write a recursive function, style that reminds me of Prolog. Moreover, the point had all the values initialized to the same value, not so realistic scenario. Just take a look at the code:

template <std::size_t D, std::size_t N>
struct fill
    template <typename Point>
    static void apply(Point& p, typename bg::coordinate_type<Point>::type const& v)
        bg::set<D>(p, v);
        fill<D + 1, N>::apply(p, v);
template <std::size_t N>
struct fill<N, N>
    template <typename Point>
    static void apply(Point&, typename bg::coordinate_type<Point>::type const&) {}

We should also compile it with a special option. Just to use 100 and 10000 dimensions, I had to write this: “c++ -I ../ bla.cpp -std=c++0x -ftemplate-depth-170001 -o bla”. That’s the reason you can’t have the user choose its dimension (because the depth of the compilation is limited). Further effort made me say: “Use BOOST only in 2D or 3D, not in higher dimensions.”

However, the main criterion is performance. It was clear, that FLANN outperformed CGAL in higher dimensions, since in experiments performed in 17D, CGAL already started loosing its efficieny over linear search. I tested FLANN on 100 and 10000 dimensions and the speedup was significant (but you find – usually – an approximation of the true NN).

CONCLUSION: FLANN wins CGAL and BOOST in high dimensions.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s