CUDAOpenCV

Build/Compile OpenCV v3.3 on Windows with CUDA 8.0, Intel MKL+TBB and python bindings

 

 
Because the pre-built Windows libraries available for OpenCV v3.3 do not include the CUDA modules, I have included the build instructions, which are almost identical to those for OpenCV v3.2, below for anyone who is interested. If you just need the Windows libraries then see Download OpenCV 3.3 with Cuda 8.0.

The guide below details instructions on compiling the 64 bit version of OpenCV v3.3 shared libraries with Visual Studio 2013 (will also work with Visual Studio 2015 if selected in CMake), CUDA 8.0, support for both the Intel Math Kernel Libraries (MKL) and Intel Threaded Building Blocks (TBB), and bindings to allow you to call OpenCV functions from within python.

Before continuing there are a few things to be aware of:

  1. The procedure outlined only works for Visual Studio 2013 and 2015 and will not work for Visual Studio 2017 because this is not supported by the CUDA 8.0 Toolkit.
  2. You cannot call the CUDA modules from within python. The python bindings only allow you to call the standard OpenCV routines.
  3. If you have built OpenCv with CUDA support then to use those libraries and/or redistribute applications built with them on any machines without the CUDA toolkit installed, you will need to redistribute the following dll’s from your
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin 

    directory to those machines:

    • cudart64_80.dll
    • nppc64_80.dll
    • nppi64_80.dll
    • npps64_80.dll
    • cublas64_80.dll
    • cufft64_80.dll
  4. The latest version of Intel TBB uses a shared library, therefore if you build with Intel TBB you need to add
    C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\redist\intel64_win\tbb\vc_mt 

    to your path variable, and make sure you redistribute that dll with any of your applications.

 

Prerequisites

Assuming you already have a compatible version of Visual Studio (2013 or 2015) installed there are a couple of additional components you need to download before you can get started, you first need to:

  • Download the source files, available on GitHub. Either clone the git repo making sure to checkout the 3.3.0 tag or download this archive containing all the source file.
  • Install CMake – Version 3.9.5 is used in the guide.
  • Install The CUDA 8.0 Toolkit (v8.0.61) and Patch2.
  • Optional – Install both the Intel MKL and TBB by registering for community licensing, and downloading for free. MKL version 2018.0.124 and TBB version 2018.0.124 are used in this guide, I cannot guarantee that other versions will work correctly.
  • Optional – Install the x64 bit version of Anaconda2 and/or Anaconda3 to use OpenCV with Python 2 and/or Python 3, making sure to tick “Register Anaconda as my default Python ..”
     

     

 

Generating OpenCV Visual Studio solution files with CMake

In the next section we are going to generate the Visual Studio solution files with CMake. There are two ways to do this, from the command prompt or with the CMake GUI. Generating solution files from the command prompt is both quicker and easier, however using the GUI enables you to more easily see and change the available configuration options. My advice would be to use the command prompt if you just want to compile OpenCv with CUDA and use the GUI if you want to add extra configuration options to your build. Once you have decided proceed with the guide that applies to you:

Building OpenCV 3.3 with CUDA 8.0 from the command prompt (cmd)
  1. Open up the command prompt (windows key + r, then type cmd and press enter) and enter
    "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\bin\tbbvars.bat" intel64

    to temporarily set the environmental variables for locating your TBB installation.

  2. Then choose your configuration from below and copy to the command prompt where PATH_TO_BUILD_DIR is the location where you which to build OpenCV and PATH_TO_SOURCE_DIR is the location of the OpenCV source files. To build with Visual Studio 2015 instead of 2013 replace -G”Visual Studio 12 2013 Win64″ with -G”Visual Studio 14 2014 Win64″:
    • OpenCV 3.3 with CUDA 8.0
      "C:\Program Files\CMake\bin\cmake.exe" -B"PATH_TO_BUILD_DIR" -H"PATH_TO_SOURCE_DIR" -G"Visual Studio 12 2013 Win64" -DBUILD_opencv_world=ON -DCUDA_FAST_MATH=ON -DWITH_CUBLAS=ON 
    • OpenCV 3.3 with CUDA 8.0 and MKL multi-threaded with TBB
      "C:\Program Files\CMake\bin\cmake.exe" -B"PATH_TO_BUILD_DIR" -H"PATH_TO_SOURCE_DIR" -G"Visual Studio 12 2013 Win64" -DBUILD_opencv_world=ON -DCUDA_FAST_MATH=ON -DWITH_CUBLAS=ON -DWITH_MKL=ON -DMKL_USE_MULTITHREAD=ON -DMKL_WITH_TBB=ON
    • OpenCV 3.3 with CUDA 8.0, MKL multi-threaded with TBB and TBB
      "C:\Program Files\CMake\bin\cmake.exe" -B"PATH_TO_BUILD_DIR" -H"PATH_TO_SOURCE_DIR" -G"Visual Studio 12 2013 Win64" -DBUILD_opencv_world=ON -DCUDA_FAST_MATH=ON -DWITH_CUBLAS=ON -DWITH_MKL=ON -DMKL_USE_MULTITHREAD=ON -DMKL_WITH_TBB=ON -DWITH_TBB=ON
  3. Your solution file should now be in your PATH_TO_BUILD_DIR directory, open it in Visual Studio and select your Configuration.

    Note: If you are building with python bindings then you will need to build in Release mode unless you have the python debug libraries.

  4. Click Solution Explorer, expand CMakeTargets, right click on INSTALL and click Build.
     

    This will both build the library and copy the necessary redistributable parts to the install directory, PATH_TO_BUILD_DIR/install in this example. Additionally if you build the python bindings then the cv2.pyd and/or cv2.cp36-win_amd64.pyd shared libs will have been copied to your python Anaconda2[3]\Lib\site-packages\ directory, all that is required is to add the directory containing opencv_world330.dll (and tbb.dll if you have build with Intel TBB) to you path environmental variable.

    If everything was successful, congratulations, you now have OpenCV v3.3 built with CUDA 8.0.

  5. Building OpenCV 3.3 with CUDA 8.0 with the CMake GUI
    1. Fire up Cmake. If you want OpenCV to use TBB then open up the command prompt (windows key + r, then type cmd and press enter) and enter
      "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\bin\tbbvars.bat" intel64

      to temporarily set the environmental variables for locating your TBB installation, and

      "C:\Program Files\CMake\bin\cmake-gui"

      to launch CMake using those variables. Otherwise you can just start CMake normally.

    2. Making sure that the Grouped checkbox is ticked, select the location of the source files, downloaded from GitHub, and the location where the build will take place, E:/opencv/ and E:/build/opencv/vs2013/x64/cuda_mkl/ in this example.
       

       
    3. Skip if you are not building with MKL. We want MKL to use TBB but unfortunately the CMake script does not correctly locate the Intel MKL and TBB libraries when using the GUI. The following is an inelegant hack of the script to get MKL to use TBB.

      Open up OPENCV_SOURCE/cmake/OpenCVFindMKL.cmake (where OPENCV_SOURCE is E:/opencv/ in this example) in your favorite text editor and amend line 44 to activate MKL_WITH_TBB as

      OCV_OPTION(MKL_WITH_TBB "Use MKL with TBB multithreading" ON)#ON IF WITH_TBB)

      then comment out lines 55 and 63 so that the MKL libraries can be located

      #if(WITH_MKL AND NOT mkl_root_paths)
        if(WIN32)
            set(ProgramFilesx86 "ProgramFiles(x86)")
            list(APPEND mkl_root_paths $ENV{${ProgramFilesx86}}/IntelSWTools/compilers_and_libraries/windows/mkl)
        endif()
        if(UNIX)
          list(APPEND mkl_root_paths "/opt/intel/mkl")
        endif()
      #endif()
    4. Click the Configure button and select Visual Studio 2013 Win64 (32 bit CUDA support is limited). This may take a while as CMake will download ffmpeg and the Intel Integrated Performance Primitives for Image processing and Computer Vision (IPP-ICV).
       

       
    5. Skip if you are not building with MKL. If MKL and TBB are installed correctly, and you have modified the OpenCVFindMKL.cmake as above, the path to these should have been picked up in CMake, and MKL_WITH_TBB should have been selected, as below.
       

       
      Verify your output resembles that shown below.
       

       
    6. Skip this if you are not building with TBB. Expand the WITH group and tick WITH_TBB,
       

       
      then press configure and confirm that CMake has picked up the locations of your TBB installation
       

       
      and shows the correct parallel framework.
       

       
    7. Expand the BUILD group and tick BUILD_opencv_world (builds to a single dll).
    8. Expand the CUDA tab, the CUDA_TOOLKIT_ROOT_DIR should point to your CUDA 8.0 toolkit installation, if you have more than one version of the toolkit installed and it has picked that one then simply change the path to point to CUDA 8.0.

      The default CUDA_ARCH_BIN option is to build microcode for all architectures from 2.0-6.1 (FermiPascal). This setting results in a large build time (~3.5hours on an i7) but the binaries produced will run on all supported devices. If you only want to execute OpenCV on a specific device then only enter the compute capability of that device here, remember that this the produced libraries are not guaranteed to run on any device’s of a different major compute version to the ones entered, see the CUDA C Programming Guide for details.

      If you are comfortable with the implications, you can also enable CUDA_FAST_MATH which will enable the –use_fast_math compiler option, again see CUDA C Programming Guide for details.
       

       

    9. Expand WITH and enable WITH_CUBLAS to enable the CUDA Basic Linear Algebra Subroutines (cuBLAS).
       

       
    10. Skip if you are not including the Python bindings. If you have installed only one version of Anaconda, then CMake should pick up its location (as long as you ticked “Register Anaconda as my default Python” on installation) and already ticked the correct build option (BUILD_opencv_python2[3]). However, if you are building for both Python 2 and 3, you may have to manually enter in the locations for Anaconda3 as below.
       

       
      Then once you press configure again, both build options will be selected.
       

       
    11. Press Configure again, your CUDA options should resemble the below.
       

       
      There should be no warning messages in red displayed in the configuration window. If there are then the Visual Studio solution may be generated but it it will probably fail to build.

      Note: More recent versions of CMake, than the v3.7.1, may give warnings resembling the below:
       

       
      These can be safely ignored.

    12. Press Generate and wait until the bottom of the window indicates success.
       

       
    13. Press Open Project (not available in older versions of CMake, for those just locate and open the Visual Studio solution file) to open up the solution in Visual Studio.
       

       
    14. Note: If you are building with python bindings then you will need to build in Release mode unless you have the python debug libraries.

      Click Solution Explorer, expand CMakeTargets, right click on INSTALL and click Build.
       

       
      This will both build the library and copy the necessary redistributable parts to the install directory, E:/build/opencv/vs2013/x64/cuda_mkl/install in this example. Additionally if you build the python bindings then the cv2.pyd and/or cv2.cp36-win_amd64.pyd shared libs will have been copied to your python Anaconda2[3]\Lib\site-packages\ directory, all that is required is to add the directory containing opencv_world330.dll (and tbb.dll if you have build with Intel TBB) to you path environmental variable.

      If everything was successful, congratulations, you now have OpenCV v3.3 built with CUDA 8.0.

    15. NOTE: If you change remove any options after pressing Configure a second time, the build may fail, it is best to remove build directory and start again. This may seem over cautions but it is preferable to waiting for an hour for the build to fail and then starting again.

29 thoughts on “Build/Compile OpenCV v3.3 on Windows with CUDA 8.0, Intel MKL+TBB and python bindings

  1. I did all the steps and it got correctly installed with MKL and CUDA.
    Thank you for that.
    Now I want to import it into a python program.
    What do I import?

    1. I have updated the guide to include building the python bindings.

      If OpenCV has been built with the python bindings then on the your build machine the cv2.pyd and/or cv2.cp36-win_amd64.pyd shared libs should have been copied to the Anaconda2[3]\Lib\site-packages\ directory. If not you need to copy them to that directory on the machine you are using. They should be located in the build\lib directory, e.g. E:/build/opencv/vs2013/x64/cuda_mkl/lib/.

      Therefore to use OpenCV with python just fire up Anaconda Prompt, navigate to the directory containing opencv_world330.dll, e.g. E:/build/opencv/vs2013/x64/cuda_mkl/install/x64/vc12/bin. Start the python interpreter (type: python), then in the interpreter type import cv2. If this is successful then you can use python’s OpenCV bindings. If that works then, add the location of opencv_world330.dll to your system path.

      That said, I am pretty sure that there are no python bindings to the CUDA functions (https://stackoverflow.com/questions/42125084/accessing-opencv-cuda-functions-from-python-no-pycuda).

      Depending on what algorithms you want to accelerate in python, you may be able to use pytorch (if you have conda it can easily be installed with: conda install -c peterjc123 pytorch=0.1.12).

  2. Thanks for the detailed reply. I will try it out.
    But I am having some problem with the build phase in Visual Studio. It has been going on for hours and its stuck at 22%. It’s working with all the CUDA libraries (matmul.h, add.h). And I also get a lot of warnings about deprecated architectures (sm-20). Any chance I can speed up the build. For now, I deleted the directory and am starting again from the Cmake step.
    P.S. I am new to OpenCV and CUDA .

  3. It takes approximately 3.5 hours on a modern intel i7, the CUDA compiler performs a significant amount of optimization while compiling, hence the wait. Warnings regarding sm-20 are fine, as long as you are not getting any errors I would keep waiting.

  4. Hi James

    I am getting the following errors on building OpenCV.
    Severity Code Description Project File Line Suppression State
    Error C2535 ‘std::tuple &std::tuple::operator =(const std::tuple &)’: member function already defined or declared (compiling source file C:\Users\Anuvrat Tiku\Desktop\opencv\sources\modules\cudawarping\perf\perf_warping.cpp) opencv_perf_cudawarping C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\tuple 756

    Severity Code Description Project File Line Suppression State
    Error C2382 ‘std::tuple::operator =’: redefinition; different exception specifications (compiling source file C:\Users\Anuvrat Tiku\Desktop\opencv\sources\modules\cudawarping\perf\perf_warping.cpp) opencv_perf_cudawarping C:\Users\Anuvrat Tiku\Desktop\opencv\sources\modules\ts\include\opencv2\ts\cuda_perf.hpp 73

    Severity Code Description Project File Line Suppression State
    Error C2610 ‘std::tuple::tuple(const std::tuple &)’: is not a special member function which can be defaulted (compiling source file C:\Users\Anuvrat Tiku\Desktop\opencv\sources\modules\cudawarping\perf\perf_warping.cpp) opencv_perf_cudawarping C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\tuple 607

    and 5 more like this, 8 in total. There is no solution online to stop these errors.
    Can you help

    Thanks
    Anuvrat

    1. Hi, this looks like a historic bug with OpenCV, are you compiling version 3.3? Which version of VS2015 are you using, update 3?

  5. Hey James,

    Yes, it is the update 3 for VS community 2015.
    Now the build has 28 errors. Will this affect OpenCV in any way ?

    1. If the errors are just in the performance tests, then the OpenCv libs should have already compiled correctly. Check the bin folder, if opencv_world330.dll is present then you should be able to ignore the warnings.
      Are you certain that you have checked out the 3.3.0 tag and you are not building and earlier version of OpenCv?

  6. I downloaded the 3.3.0 executable from Github. The project is still building, cant find opencv_world330.dll in the path. If I build again without the performance tests, would the errors go away. Is there any catch if I build without performance tests.

    1. If opencv_world330.dll is missing from the your bin\Release folder, do you have any executables in there? Is opencv_world330.lib in your lib\Release folder?
      I cannot comment on removing the performance tests, because I am unable to recreate your issue on either of the two machines I have tried a fresh build on. If you can successfully build the OpenCv world lib and dll then I would expect that you can ignore the errors with the performance tests, however without recreating the issue on my machine I can not tests this to make sure.

  7. Hey, when compiling with anaconda, visual studio looks for python35_d.lib (the debugging library) to the best of my knowledge, the debugging lib is either increadably hard to build, or it just doesn’t exist. What am i doing wrong? can I point it to the “official” python35_d.lib, and go without issue?

    1. Hi, from memory I did not have any problems building in Debug or Release with python bindings, however I am unable to check at the moment because I don’t have access to my build machine. In cmake under the PYTHON3 drop down, was the location C:\Anaconda3\libs\python35.lib or equivalent? The OpenCv CUDA module is not supported in python, are you sure you need to build with python bindings?

        1. I will have a look later on. Is visual studio still looking for python35_d.lib when you build in Release mode?
          Which OpenCv CUDA routines are you looking to use? If it is mainly matrix operations, filtering etc. then you could use pytorch. If it is HOG, GMM, Haar cascades etc. then OpenCv is probably the way to go.

        2. Apologies, I had not built in Release since I included python. As you pointed out you will have to build in release unless you have the debug libraries.

          Do you need a debug build? The OpenCv Release build has debug symbols by default. It may be easier to build your project in Release and just disable optimization in the opencv_world project.

  8. Hi, I compiled opencv + mkl, but there is no change in speed of matrix multiplication(cv::gemm). could you please give me some instructions about using opencv for fast matrix multiplication?

    1. Hi, what are you comparing your compiled build with, the default binaries from OpenCv? I noticed a significant speed up in matrix operations when I build with MKL, I will see if I can find the results of the performance tests, and let you know, what to expect.

      1. Thank you for your response.
        I was comparing them with my own build without MKL. The reason for no speed up was that for matrices with size smaller than HAL_GEMM_SMALL_MATRIX_THRESH (=100) opencv is implementing its own gemm function and my test Mat size was 50*100000 (it is my size of work).
        Now I have speed up with matrices bigger than 100*100. But still it is 2~3 times slower than numpy. I looked at task manager and found that numpy is using all cpu threads. then I enabled MKL_with_tbb but there is again no change and it is using one thread of my cpu. should I enable multi-threading of MKL explicitly?

        1. Hi, I can confirm I am experiencing the same slowdown as you are when using cv2.gemm() instead of np.dot(), I work in c++ and had not previously compared with python. I am not sure what is causing this but from your observation and as both implementations (numpy and opencv) should be using Intel MKL it would point to a threading issue. I will investigate, if you find a solution please let me know.

          I don’t think TBB will have any effect because from what I have read MKL uses OpenMP for multi threading. From the documentation

          1. AFAIK numpy is using openBLAS which I couldn’t compile opencv with it. OpenBLAS is using multithreading for matrix calculations, so the speed is much higher than MKL without multithreading.
            If MKL is using openMP for multithreading, so what is the reason we use tbb instead of it ? wondering if Checking with_openMP solves the problem?

          2. Hi, ignore my previous comment regarding OpenMP, I had misinterpreted the Intel documentation. To get the MKL libs to use TBB you need to make additional modifications to the OpenCVFindMKL.cmake script before you press configure for the first time. I have updated the instructions, let me know if this solves your issue.

            On testing in python I now get almost identical results from cv2.gemm() and np.dot().

            My version of numpy installed through conda is using MKL, you can check yours by running
            np.__config__.show().

  9. Hi James,
    Thanks for the tutorial!
    I have a question concerning TBB: I’ve installed MKL (in the default path), and also decompressed TBB (it’s just an archive, not an installer) in another folder. I’ve adapted OpenCVFindMKL.cmake as instructed and ticked MKL -> MKL_WITH_TBB.
    Do I also need to tick WITH -> WITH_TBB then specify TBB_ENV_INCLUDE, TBB_ENV_LIB and TBB_ENV_LIB_DEBUG according to where I decompressed TBB, or is what comes with MKL sufficient?

    1. Hi, I installed the Intel TBB binaries from the Intel website, not from https://www.threadingbuildingblocks.org/. I am pretty sure that you only tick WITH_TBB if you want to build TBB from source which I have not done. I will try to dig out the OpenCv performance test results from including TBB in this way to see what the benefit is.

      1. Thank you! I didn’t think of getting the TBB binaries from the Intel website, I’ll do another pass with cmake once I’ve installed them to see what Cmake reports. As for WITH_TBB, I remember that when I compiled OpenCV 3.2 many months ago, I ticked WITH_TBB but didn’t build TBB from source, instead I pointed Cmake to where TBB (from threadingbuildingblocks, not from Intel) was decompressed and I was able to get everything to work (didn’t do any speed tests though).
        But if you have some performance test results on hand, I’d be happy to know. 🙂
        In the meantime, I’ve launched an OpenCV 3.3 build MKL_WITH_TBB + WITH_TBB (decompressed from threadingbuildingblocks) to see what happens.
        It’s still not clear to me if MKL_WITH_TBB impacts only the MKL part or also other parts of OpenCV that might benefit from TBB.

        1. Hi, I had not noticed that Intel had changed their TBB installation, I have amended the instructions above to allow OpenCv to be built with the 2018 version of Intel TBB. I will share some performance comparison results when I have them.

          I was incorrect with what I previously told you, enabling:
          MKL_WITH_TBB, (if you amend the CMake script as I mention above) will only impact MKL.
          WITH_TBB should (I am still testing) enable multi threaded parts of OpenCv to run, and it should work with the 2018 libraries downloaded from Intel.

  10. Thank you very much!
    new modification on openCVFindMKL worked for me and now numpy and opencv have identical performance!
    I have another request from you. MKL and openblas have similar performance. but openblas is free and mkl is not for commercial use (am I right?) if you could write a similar instructions for building opencv+openblas I would be so thankful of you. I have similar issues with compiling opencv + openblas. it seems that openCVFindopenBLAS is not working too.

    1. Hi, from the Intel documentation

      Performance Libraries – free for all, registration required, no royalties, no restrictions on company or project size, current versions of libraries, no Intel Premier Support access.

      I would imply that you can use MKL for commercial use, you just don’t get any support.

  11. Hi James, Thanks for this comprehensive guide. Although I have yet to be able to build it and keep getting this error for opencv_world CMake Error at cmake/OpenCVUtils.cmake:945 (target_compile_definitions):
    Cannot specify compile definitions for target “opencv_world” which is not
    built by this project.
    I was compiling using CUDA 9.0 and VS2013 (VS2015 didn’t work). I tried using VS2017 and CUDA 8.0 too (various combinations), but the same error occur. Do you know how I can rectify this problem? or if whether it’s fine not to compile the opencv_world? (error doesn’t occur if that is unchecked). I’ll be using this for my python programme (anaconda3 used). Thanks a lot again!!

    1. Hi, please see things to be aware of.

      CUDA 9.0 and/or VS 2017 are not supported by OpenCv 3.3, even if you can get it to compile none of the features of CUDA 9.0 (cooperative groups etc.) will have been implemented so I doubt there is any advantage over CUDA 8.0. If you want to use python then CUDA is also not supported so it would be best to disable the CUDA modules.

Leave a Reply

Your email address will not be published. Required fields are marked *