Bag-of-Words code from my PhD research is now online

License - Download - Documentation - Associated publications

This program allows my Bag-of-Words scheme to be used from any other programming language, e.g. python, C#, Java, via simple file-based i/o. C++ source code for the Bag-of-Words library available here.


License: my code is free for any use. Includes code based on Edward Rosten's FAST code with BSD license. Please email ( ) and let me know if you find it useful or find any bugs or can't get something working.


Download Windows executable and required libraries (OpenCV). Unzip all files to the same folder and run the executable. Configure by editing config file (included).

I should be able to build for Linux if this is useful for anyone?


1) Run the executable. It will create a folder ./IO

2) Copy some images to the IO folder. Filenames should be of the form 123.jpg or 2345.bmp where the numbers will be the image's id's in the database. Don't repeat ids. These images will be deleted once added.

3) Create a blank file called 'recluster' (no quotes) and copy to IO dir. This will [re]create the bag-of-words dictionary.

4) To query the db, create a file called '123' (or whatever the id is) and put in IO. A file called 'matches' will appear (in the main folder)--this contains a list of id,match_strength pairs. All images in the db are returned. The first few are the ones appearing most similar. Match strength is the Term frequency-Inverse document prequency (TF-IDF) score (a bit arbitrary).

5) To get feature matches (correspondences) between a pair of images (say 23 and 24) save a file called 'bb23,24' in the IO dir. A file called 'correspondences' will appear in the main folder. I will upgrade this to discard outliers at some stage...

Repeat 2,3,4 as needed. 'quit' will quit.


The config file 'BoWSLAM/DefaultPatch.cfg' will be used by default. otherwise pass in the filename of a config file as a command line argument.

Config documentation will be coming soon, check out

