Build/Compile OpenCV v3.3 on Windows with CUDA 8.0, Intel MKL+TBB and python bindings


OpenCV 3.4 which is compatible with CUDA 9.1 and Visual Studio 2017 was released on 23/12/2017, go to Building OpenCV 3.4 on Windows with CUDA 9.1, Intel MKL+TBB, for the updated guide.

Because the pre-built Windows libraries available for OpenCV v3.3 do not include the CUDA modules, I have included the build instructions, which are almost identical to those for OpenCV v3.2, below for anyone who is interested. If you just need the Windows libraries then see Download OpenCV 3.3 with Cuda 8.0.

The guide below details instructions on compiling the 64 bit version of OpenCV v3.3 shared libraries with Visual Studio 2013 (will also work with Visual Studio 2015 if selected in CMake), CUDA 8.0, support for both the Intel Math Kernel Libraries (MKL) and Intel Threaded Building Blocks (TBB), and bindings to allow you to call OpenCV functions from within python.

Before continuing there are a few things to be aware of:

  1. The procedure outlined only works for Visual Studio 2013 and 2015 and will not work for Visual Studio 2017 because this is not supported by the CUDA 8.0 Toolkit.
  2. You cannot call the CUDA modules from within python. The python bindings only allow you to call the standard OpenCV routines.
  3. If you have built OpenCv with CUDA support then to use those libraries and/or redistribute applications built with them on any machines without the CUDA toolkit installed, you will need to redistribute the following dll’s from your
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin 

    directory to those machines:

    • cudart64_80.dll
    • nppc64_80.dll
    • nppi64_80.dll
    • npps64_80.dll
    • cublas64_80.dll
    • cufft64_80.dll
  4. The latest version of Intel TBB uses a shared library, therefore if you build with Intel TBB you need to add
    C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\redist\intel64_win\tbb\vc_mt 

    to your path variable, and make sure you redistribute that dll with any of your applications.



Assuming you already have a compatible version of Visual Studio (2013 or 2015) installed there are a couple of additional components you need to download before you can get started, you first need to:

  • Download the source files, available on GitHub. Either clone the git repo making sure to checkout the 3.3.0 tag or download this archive containing all the source file.
  • Install CMake – Version 3.9.5 is used in the guide.
  • Install The CUDA 8.0 Toolkit (v8.0.61) and Patch2.
  • Optional – Install both the Intel MKL and TBB by registering for community licensing, and downloading for free. MKL version 2018.0.124 and TBB version 2018.0.124 are used in this guide, I cannot guarantee that other versions will work correctly.
  • Optional – Install the x64 bit version of Anaconda2 and/or Anaconda3 to use OpenCV with Python 2 and/or Python 3, making sure to tick “Register Anaconda as my default Python ..”



Generating OpenCV Visual Studio solution files with CMake

In the next section we are going to generate the Visual Studio solution files with CMake. There are two ways to do this, from the command prompt or with the CMake GUI. Generating solution files from the command prompt is both quicker and easier, however using the GUI enables you to more easily see and change the available configuration options. My advice would be to use the command prompt if you just want to compile OpenCv with CUDA and use the GUI if you want to add extra configuration options to your build. Once you have decided proceed with the guide that applies to you:

Building OpenCV 3.3 with CUDA 8.0 from the command prompt (cmd)
  1. Open up the command prompt (windows key + r, then type cmd and press enter) and enter
    "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\bin\tbbvars.bat" intel64

    to temporarily set the environmental variables for locating your TBB installation.

  2. Then choose your configuration from below and copy to the command prompt where PATH_TO_BUILD_DIR is the location where you which to build OpenCV and PATH_TO_SOURCE_DIR is the location of the OpenCV source files. To build with Visual Studio 2015 instead of 2013 replace -G”Visual Studio 12 2013 Win64″ with -G”Visual Studio 14 2015 Win64″:
    • OpenCV 3.3 with CUDA 8.0
      "C:\Program Files\CMake\bin\cmake.exe" -B"PATH_TO_BUILD_DIR" -H"PATH_TO_SOURCE_DIR" -G"Visual Studio 12 2013 Win64" -DBUILD_opencv_world=ON -DCUDA_FAST_MATH=ON -DWITH_CUBLAS=ON 
    • OpenCV 3.3 with CUDA 8.0 and MKL multi-threaded with TBB
      "C:\Program Files\CMake\bin\cmake.exe" -B"PATH_TO_BUILD_DIR" -H"PATH_TO_SOURCE_DIR" -G"Visual Studio 12 2013 Win64" -DBUILD_opencv_world=ON -DCUDA_FAST_MATH=ON -DWITH_CUBLAS=ON -DWITH_MKL=ON -DMKL_USE_MULTITHREAD=ON -DMKL_WITH_TBB=ON
    • OpenCV 3.3 with CUDA 8.0, MKL multi-threaded with TBB and TBB
  3. Your solution file should now be in your PATH_TO_BUILD_DIR directory, open it in Visual Studio and select your Configuration.

    Note: If you are building with python bindings then you will need to build in Release mode unless you have the python debug libraries.

  4. Click Solution Explorer, expand CMakeTargets, right click on INSTALL and click Build.

    This will both build the library and copy the necessary redistributable parts to the install directory, PATH_TO_BUILD_DIR/install in this example. Additionally if you build the python bindings then the cv2.pyd and/or cv2.cp36-win_amd64.pyd shared libs will have been copied to your python Anaconda2[3]\Lib\site-packages\ directory, all that is required is to add the directory containing opencv_world330.dll (and tbb.dll if you have build with Intel TBB) to you path environmental variable.

    If everything was successful, congratulations, you now have OpenCV v3.3 built with CUDA 8.0.

  5. Building OpenCV 3.3 with CUDA 8.0 with the CMake GUI
    1. Fire up Cmake. If you want OpenCV to use TBB then open up the command prompt (windows key + r, then type cmd and press enter) and enter
      "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\bin\tbbvars.bat" intel64

      to temporarily set the environmental variables for locating your TBB installation, and

      "C:\Program Files\CMake\bin\cmake-gui"

      to launch CMake using those variables. Otherwise you can just start CMake normally.

    2. Making sure that the Grouped checkbox is ticked, select the location of the source files, downloaded from GitHub, and the location where the build will take place, E:/opencv/ and E:/build/opencv/vs2013/x64/cuda_mkl/ in this example.

    3. Skip if you are not building with MKL. We want MKL to use TBB but unfortunately the CMake script does not correctly locate the Intel MKL and TBB libraries when using the GUI. The following is an inelegant hack of the script to get MKL to use TBB.

      Open up OPENCV_SOURCE/cmake/OpenCVFindMKL.cmake (where OPENCV_SOURCE is E:/opencv/ in this example) in your favorite text editor and amend line 44 to activate MKL_WITH_TBB as

      OCV_OPTION(MKL_WITH_TBB "Use MKL with TBB multithreading" ON)#ON IF WITH_TBB)

      then comment out lines 55 and 63 so that the MKL libraries can be located

      #if(WITH_MKL AND NOT mkl_root_paths)
            set(ProgramFilesx86 "ProgramFiles(x86)")
            list(APPEND mkl_root_paths $ENV{${ProgramFilesx86}}/IntelSWTools/compilers_and_libraries/windows/mkl)
          list(APPEND mkl_root_paths "/opt/intel/mkl")
    4. Click the Configure button and select Visual Studio 2013 Win64 (32 bit CUDA support is limited). This may take a while as CMake will download ffmpeg and the Intel Integrated Performance Primitives for Image processing and Computer Vision (IPP-ICV).

    5. Skip if you are not building with MKL. If MKL and TBB are installed correctly, and you have modified the OpenCVFindMKL.cmake as above, the path to these should have been picked up in CMake, and MKL_WITH_TBB should have been selected, as below.

      Verify your output resembles that shown below.

    6. Skip this if you are not building with TBB. Expand the WITH group and tick WITH_TBB,

      then press configure and confirm that CMake has picked up the locations of your TBB installation

      and shows the correct parallel framework.

    7. Expand the BUILD group and tick BUILD_opencv_world (builds to a single dll).
    8. Expand the CUDA tab, the CUDA_TOOLKIT_ROOT_DIR should point to your CUDA 8.0 toolkit installation, if you have more than one version of the toolkit installed and it has picked that one then simply change the path to point to CUDA 8.0.

      The default CUDA_ARCH_BIN option is to build microcode for all architectures from 2.0-6.1 (FermiPascal). This setting results in a large build time (~3.5hours on an i7) but the binaries produced will run on all supported devices. If you only want to execute OpenCV on a specific device then only enter the compute capability of that device here, remember that this the produced libraries are not guaranteed to run on any device’s of a different major compute version to the ones entered, see the CUDA C Programming Guide for details.

      If you are comfortable with the implications, you can also enable CUDA_FAST_MATH which will enable the –use_fast_math compiler option, again see CUDA C Programming Guide for details.


    9. Expand WITH and enable WITH_CUBLAS to enable the CUDA Basic Linear Algebra Subroutines (cuBLAS).

    10. Skip if you are not including the Python bindings. If you have installed only one version of Anaconda, then CMake should pick up its location (as long as you ticked “Register Anaconda as my default Python” on installation) and already ticked the correct build option (BUILD_opencv_python2[3]). However, if you are building for both Python 2 and 3, you may have to manually enter in the locations for Anaconda3 as below.

      Then once you press configure again, both build options will be selected.

    11. Press Configure again, your CUDA options should resemble the below.

      There should be no warning messages in red displayed in the configuration window. If there are then the Visual Studio solution may be generated but it it will probably fail to build.

      Note: More recent versions of CMake, than the v3.7.1, may give warnings resembling the below:

      These can be safely ignored.

    12. Press Generate and wait until the bottom of the window indicates success.

    13. Press Open Project (not available in older versions of CMake, for those just locate and open the Visual Studio solution file) to open up the solution in Visual Studio.

    14. Note: If you are building with python bindings then you will need to build in Release mode unless you have the python debug libraries.

      Click Solution Explorer, expand CMakeTargets, right click on INSTALL and click Build.

      This will both build the library and copy the necessary redistributable parts to the install directory, E:/build/opencv/vs2013/x64/cuda_mkl/install in this example. Additionally if you build the python bindings then the cv2.pyd and/or cv2.cp36-win_amd64.pyd shared libs will have been copied to your python Anaconda2[3]\Lib\site-packages\ directory, all that is required is to add the directory containing opencv_world330.dll (and tbb.dll if you have build with Intel TBB) to you path environmental variable.

      If everything was successful, congratulations, you now have OpenCV v3.3 built with CUDA 8.0.

    15. NOTE: If you change remove any options after pressing Configure a second time, the build may fail, it is best to remove build directory and start again. This may seem over cautions but it is preferable to waiting for an hour for the build to fail and then starting again.
    Digiprove sealCopyright secured by Digiprove © 2020 James Bowley

48 thoughts on “Build/Compile OpenCV v3.3 on Windows with CUDA 8.0, Intel MKL+TBB and python bindings

  1. I did all the steps and it got correctly installed with MKL and CUDA.
    Thank you for that.
    Now I want to import it into a python program.
    What do I import?

    1. I have updated the guide to include building the python bindings.

      If OpenCV has been built with the python bindings then on the your build machine the cv2.pyd and/or cv2.cp36-win_amd64.pyd shared libs should have been copied to the Anaconda2[3]\Lib\site-packages\ directory. If not you need to copy them to that directory on the machine you are using. They should be located in the build\lib directory, e.g. E:/build/opencv/vs2013/x64/cuda_mkl/lib/.

      Therefore to use OpenCV with python just fire up Anaconda Prompt, navigate to the directory containing opencv_world330.dll, e.g. E:/build/opencv/vs2013/x64/cuda_mkl/install/x64/vc12/bin. Start the python interpreter (type: python), then in the interpreter type import cv2. If this is successful then you can use python’s OpenCV bindings. If that works then, add the location of opencv_world330.dll to your system path.

      That said, I am pretty sure that there are no python bindings to the CUDA functions (

      Depending on what algorithms you want to accelerate in python, you may be able to use pytorch (if you have conda it can easily be installed with: conda install -c peterjc123 pytorch=0.1.12).

  2. Thanks for the detailed reply. I will try it out.
    But I am having some problem with the build phase in Visual Studio. It has been going on for hours and its stuck at 22%. It’s working with all the CUDA libraries (matmul.h, add.h). And I also get a lot of warnings about deprecated architectures (sm-20). Any chance I can speed up the build. For now, I deleted the directory and am starting again from the Cmake step.
    P.S. I am new to OpenCV and CUDA .

  3. It takes approximately 3.5 hours on a modern intel i7, the CUDA compiler performs a significant amount of optimization while compiling, hence the wait. Warnings regarding sm-20 are fine, as long as you are not getting any errors I would keep waiting.

  4. Hi James

    I am getting the following errors on building OpenCV.
    Severity Code Description Project File Line Suppression State
    Error C2535 ‘std::tuple &std::tuple::operator =(const std::tuple &)’: member function already defined or declared (compiling source file C:\Users\Anuvrat Tiku\Desktop\opencv\sources\modules\cudawarping\perf\perf_warping.cpp) opencv_perf_cudawarping C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\tuple 756

    Severity Code Description Project File Line Suppression State
    Error C2382 ‘std::tuple::operator =’: redefinition; different exception specifications (compiling source file C:\Users\Anuvrat Tiku\Desktop\opencv\sources\modules\cudawarping\perf\perf_warping.cpp) opencv_perf_cudawarping C:\Users\Anuvrat Tiku\Desktop\opencv\sources\modules\ts\include\opencv2\ts\cuda_perf.hpp 73

    Severity Code Description Project File Line Suppression State
    Error C2610 ‘std::tuple::tuple(const std::tuple &)’: is not a special member function which can be defaulted (compiling source file C:\Users\Anuvrat Tiku\Desktop\opencv\sources\modules\cudawarping\perf\perf_warping.cpp) opencv_perf_cudawarping C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include\tuple 607

    and 5 more like this, 8 in total. There is no solution online to stop these errors.
    Can you help


    1. Hi, this looks like a historic bug with OpenCV, are you compiling version 3.3? Which version of VS2015 are you using, update 3?

  5. Hey James,

    Yes, it is the update 3 for VS community 2015.
    Now the build has 28 errors. Will this affect OpenCV in any way ?

    1. If the errors are just in the performance tests, then the OpenCv libs should have already compiled correctly. Check the bin folder, if opencv_world330.dll is present then you should be able to ignore the warnings.
      Are you certain that you have checked out the 3.3.0 tag and you are not building and earlier version of OpenCv?

  6. I downloaded the 3.3.0 executable from Github. The project is still building, cant find opencv_world330.dll in the path. If I build again without the performance tests, would the errors go away. Is there any catch if I build without performance tests.

    1. If opencv_world330.dll is missing from the your bin\Release folder, do you have any executables in there? Is opencv_world330.lib in your lib\Release folder?
      I cannot comment on removing the performance tests, because I am unable to recreate your issue on either of the two machines I have tried a fresh build on. If you can successfully build the OpenCv world lib and dll then I would expect that you can ignore the errors with the performance tests, however without recreating the issue on my machine I can not tests this to make sure.

  7. Hey, when compiling with anaconda, visual studio looks for python35_d.lib (the debugging library) to the best of my knowledge, the debugging lib is either increadably hard to build, or it just doesn’t exist. What am i doing wrong? can I point it to the “official” python35_d.lib, and go without issue?

    1. Hi, from memory I did not have any problems building in Debug or Release with python bindings, however I am unable to check at the moment because I don’t have access to my build machine. In cmake under the PYTHON3 drop down, was the location C:\Anaconda3\libs\python35.lib or equivalent? The OpenCv CUDA module is not supported in python, are you sure you need to build with python bindings?

        1. I will have a look later on. Is visual studio still looking for python35_d.lib when you build in Release mode?
          Which OpenCv CUDA routines are you looking to use? If it is mainly matrix operations, filtering etc. then you could use pytorch. If it is HOG, GMM, Haar cascades etc. then OpenCv is probably the way to go.

        2. Apologies, I had not built in Release since I included python. As you pointed out you will have to build in release unless you have the debug libraries.

          Do you need a debug build? The OpenCv Release build has debug symbols by default. It may be easier to build your project in Release and just disable optimization in the opencv_world project.

  8. Hi, I compiled opencv + mkl, but there is no change in speed of matrix multiplication(cv::gemm). could you please give me some instructions about using opencv for fast matrix multiplication?

    1. Hi, what are you comparing your compiled build with, the default binaries from OpenCv? I noticed a significant speed up in matrix operations when I build with MKL, I will see if I can find the results of the performance tests, and let you know, what to expect.

      1. Thank you for your response.
        I was comparing them with my own build without MKL. The reason for no speed up was that for matrices with size smaller than HAL_GEMM_SMALL_MATRIX_THRESH (=100) opencv is implementing its own gemm function and my test Mat size was 50*100000 (it is my size of work).
        Now I have speed up with matrices bigger than 100*100. But still it is 2~3 times slower than numpy. I looked at task manager and found that numpy is using all cpu threads. then I enabled MKL_with_tbb but there is again no change and it is using one thread of my cpu. should I enable multi-threading of MKL explicitly?

        1. Hi, I can confirm I am experiencing the same slowdown as you are when using cv2.gemm() instead of, I work in c++ and had not previously compared with python. I am not sure what is causing this but from your observation and as both implementations (numpy and opencv) should be using Intel MKL it would point to a threading issue. I will investigate, if you find a solution please let me know.

          I don’t think TBB will have any effect because from what I have read MKL uses OpenMP for multi threading. From the documentation

          1. AFAIK numpy is using openBLAS which I couldn’t compile opencv with it. OpenBLAS is using multithreading for matrix calculations, so the speed is much higher than MKL without multithreading.
            If MKL is using openMP for multithreading, so what is the reason we use tbb instead of it ? wondering if Checking with_openMP solves the problem?

          2. Hi, ignore my previous comment regarding OpenMP, I had misinterpreted the Intel documentation. To get the MKL libs to use TBB you need to make additional modifications to the OpenCVFindMKL.cmake script before you press configure for the first time. I have updated the instructions, let me know if this solves your issue.

            On testing in python I now get almost identical results from cv2.gemm() and

            My version of numpy installed through conda is using MKL, you can check yours by running

  9. Hi James,
    Thanks for the tutorial!
    I have a question concerning TBB: I’ve installed MKL (in the default path), and also decompressed TBB (it’s just an archive, not an installer) in another folder. I’ve adapted OpenCVFindMKL.cmake as instructed and ticked MKL -> MKL_WITH_TBB.
    Do I also need to tick WITH -> WITH_TBB then specify TBB_ENV_INCLUDE, TBB_ENV_LIB and TBB_ENV_LIB_DEBUG according to where I decompressed TBB, or is what comes with MKL sufficient?

    1. Hi, I installed the Intel TBB binaries from the Intel website, not from I am pretty sure that you only tick WITH_TBB if you want to build TBB from source which I have not done. I will try to dig out the OpenCv performance test results from including TBB in this way to see what the benefit is.

      1. Thank you! I didn’t think of getting the TBB binaries from the Intel website, I’ll do another pass with cmake once I’ve installed them to see what Cmake reports. As for WITH_TBB, I remember that when I compiled OpenCV 3.2 many months ago, I ticked WITH_TBB but didn’t build TBB from source, instead I pointed Cmake to where TBB (from threadingbuildingblocks, not from Intel) was decompressed and I was able to get everything to work (didn’t do any speed tests though).
        But if you have some performance test results on hand, I’d be happy to know. 🙂
        In the meantime, I’ve launched an OpenCV 3.3 build MKL_WITH_TBB + WITH_TBB (decompressed from threadingbuildingblocks) to see what happens.
        It’s still not clear to me if MKL_WITH_TBB impacts only the MKL part or also other parts of OpenCV that might benefit from TBB.

        1. Hi, I had not noticed that Intel had changed their TBB installation, I have amended the instructions above to allow OpenCv to be built with the 2018 version of Intel TBB. I will share some performance comparison results when I have them.

          I was incorrect with what I previously told you, enabling:
          MKL_WITH_TBB, (if you amend the CMake script as I mention above) will only impact MKL.
          WITH_TBB should (I am still testing) enable multi threaded parts of OpenCv to run, and it should work with the 2018 libraries downloaded from Intel.

  10. Thank you very much!
    new modification on openCVFindMKL worked for me and now numpy and opencv have identical performance!
    I have another request from you. MKL and openblas have similar performance. but openblas is free and mkl is not for commercial use (am I right?) if you could write a similar instructions for building opencv+openblas I would be so thankful of you. I have similar issues with compiling opencv + openblas. it seems that openCVFindopenBLAS is not working too.

    1. Hi, from the Intel documentation

      Performance Libraries – free for all, registration required, no royalties, no restrictions on company or project size, current versions of libraries, no Intel Premier Support access.

      I would imply that you can use MKL for commercial use, you just don’t get any support.

  11. Hi James, Thanks for this comprehensive guide. Although I have yet to be able to build it and keep getting this error for opencv_world CMake Error at cmake/OpenCVUtils.cmake:945 (target_compile_definitions):
    Cannot specify compile definitions for target “opencv_world” which is not
    built by this project.
    I was compiling using CUDA 9.0 and VS2013 (VS2015 didn’t work). I tried using VS2017 and CUDA 8.0 too (various combinations), but the same error occur. Do you know how I can rectify this problem? or if whether it’s fine not to compile the opencv_world? (error doesn’t occur if that is unchecked). I’ll be using this for my python programme (anaconda3 used). Thanks a lot again!!

    1. Hi, please see things to be aware of.

      CUDA 9.0 and/or VS 2017 are not supported by OpenCv 3.3, even if you can get it to compile none of the features of CUDA 9.0 (cooperative groups etc.) will have been implemented so I doubt there is any advantage over CUDA 8.0. If you want to use python then CUDA is also not supported so it would be best to disable the CUDA modules.

  12. Hello James,
    Congratulations on your detailed tutorial, it’s better and more updated than the official building tutorial.
    I downloaded your compiled binaries OpenCV 3.3 x64, VS2013 with CUDA, MKL(TBB), TBB and python bindings.
    When I try to run this example:
    I’ve got the following error:

    Maybe my hardware is the fault. I’m using a Intel i7 5500U CPU with a 920M Nvidia GPU.

    Any advice or troubleshooting I could try?


      1. Hello James,
        Thanks for clarifying, it seems that indeed what you point is the origin of the problem. I will need to use 7.5.
        It’s a shame that bugs like these invades OpenCV. If I gain more experience programming, I would love to send a patch to them.
        But in this case, seems that Nvidia removed an used function without a replacement.


  13. Hi, James,
    Thank you for your detailed guide! It saves me a lot of time on setting my environment up!
    However, I hit a problem when compiling OpenCV 3.3 in VS2015 Update 3, (CUDA + MKL +TBB, CMake 3.9.5, WIndows 10 x64):

    1>—— Build started: Project: opencv_core, Configuration: Release x64 ——
    1> : error LNK2038: mismatch detected for ‘RuntimeLibrary’: value ‘MT_StaticRelease’ doesn’t match value ‘MD_DynamicRelease’ in algorithm.obj
    1> Creating library C:/OpenCV/opencv/build/lib/Release/opencv_core330.lib and object C:/OpenCV/opencv/build/lib/Release/opencv_core330.exp
    1>LINK : warning LNK4098: defaultlib ‘LIBCMT’ conflicts with use of other libs; use /NODEFAULTLIB:library
    1>C:\OpenCV\opencv\build\bin\Release\opencv_core330.dll : fatal error LNK1319: 1 mismatches detected
    ========== Build: 0 succeeded, 1 failed, 10 up-to-date, 0 skipped ==========

    It seems that my CUDA binary was compiled with MT, but other binaries were compiled with MD. Or it might be a configuration issue in CMAKE, which I should force generating codes for MD. Any hints what should I do to fix it?

    Thank you!


    1. Hi Gary,
      I have not compiled OpenCV on windows 10 for a while, however I have never had any problems with conflicting crt libraries. It appears that you are not building the opencv_world330.dll which other configuration changes have you made from those in the guide?
      I would try deleting the build directory and generating the Visual Studio solution files from the command prompt, using the default options given in the guide before customizing the build options.

      1. Hi, James,
        Thank you so much for the hints! Yes, cleaning the build folder DOES work!!

        For other changes what I made in CMake are:
        1. including opencv_contrib by setting OPENCV_EXTRA_MODULES_PATH to C:/OpenCV/opencv_contrib/modules
        2. Uncheck BUILD_PREF_TESTS and BUILD_TESTS
        3. Specify the value of CUDA_ARCH_BIN and CUDA_ARCH_PTX for my GPU
        Those sound no impact to the build.

        The reason I didn’t build opencv_world330.dll because it caused problems (even without opencv_contrib) when configuring OpenCV in CMake with following error:
        Processing WORLD modules…
        module opencv_cudev…
        module opencv_core…
        CMake Error at cmake/OpenCVUtils.cmake:945 (target_compile_definitions):
        Cannot specify compile definitions for target “opencv_world” which is not
        built by this project.
        Call Stack (most recent call first):
        modules/core/CMakeLists.txt:67 (ocv_target_compile_definitions)
        modules/world/CMakeLists.txt:13 (include)
        modules/world/CMakeLists.txt:32 (include_one_module)

        Have you seen it before?

        One more tip from my side:
        When building the whole solution, I got the error below:
        Error C1083 Cannot open include file: ‘dynlink_nvcuvid.h’: No such file or directory (compiling source file C:\OpenCV\opencv\modules\cudacodec\src\cuvid_video_source.cpp) opencv_cudacodec c:\opencv\opencv\modules\cudacodec\src\precomp.hpp 59

        which was because the OpenCV source I downloaded was written for CUDA 9. The problem is gone after changing
        in precomp.hpp.

        Again, Thank you for your great tip!


  14. Hi~ James!
    Thank you for your excellent guide. I have read all your blogs about opencv and CUDA.
    Now I am starting to build opencv with CUDA following your blog. I am building the opencv master (up to 2018/06/16) with CUDA 8.0 installed, but I encountered some troubles.
    When I build in VS2015, it threw many errors, and there is only ffmpeg_xxx.dll in my lib directory, and nothing in my bin directory except the cmake files.
    I must faced some system problems and I don’t know how to deal with.
    Could you do me a favor to help build the opencv master with CUDA 8.0? I will very appreciate you. 🙂

  15. I met an error when building solution:
    16> Building NVCC (Device) object modules/world/CMakeFiles/cuda_compile.dir/__/core/src/cuda/Debug/
    16> nvcc fatal : Could not set up the environment for Microsoft Visual Studio using ‘C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/../../VC/bin/amd64/vcvars64.bat’
    16> CMake Error at (message):
    16> Error generating
    16> D:/Workspace/Building/build_opencv_cuda/opencv-3.3.0/build/x64/cuda_mkl/modules/world/CMakeFiles/cuda_compile.dir/__/core/src/cuda/Debug/
    16>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): error MSB6006: “cmd.exe” exited with code 1.

    Absolutely, I installed Cuda 8.0
    Help me!

  16. Hello James,

    I return with a question about the Commit charge in a program using OpenCV with CUDA filters. My issue is that even very simple programs have a very high Commit charge as displayed by Resource Monitor in Windows. Now I’ve been able to figure out that this Committed memory isn’t actually used, it’s just that the system needs to be able to reserve that much memory space (either in RAM or in the Page File). The problem is that on the application on which I am working, several instances of an exe are supposed to run simultaneously, and all that committed memory adds up; and the application is ideally supposed to run even on systems with not a lot of RAM and without a Page file.
    On systems with enough RAM, I don’t have any issues. On systems with insufficient RAM but with a large enough Page File, it’s slow at first, until Windows figures out what pages need to be kept in RAM and what in the Page File, and then it’s fast (there is a lot of unused committed memory that just sits comfortably in the Page File until the programs end). On systems with insufficient RAM + Page File, the application components cannot start.

    Here is an example code to replicate the issue:

    int main(int argc, char** argv)
    std::cout << cv::getBuildInformation() << std::endl;
    cv::Mat image = cv::Mat::ones(cv::Size(32, 32), CV_8UC1) * 128;
    cv::Ptr pFilter;
    pFilter = cv::cuda::createBoxFilter(CV_8UC1, CV_8UC1, cv::Size(5, 5)); //(!!!)
    cv::namedWindow(“Input”, cv::WINDOW_AUTOSIZE);
    cv::imshow(“Input”, image);
    return 0;

    This takes 718196 KB of Commit charge. If you comment the (!!!) line, it drops to below 1000 KB.
    I’ve tried this with several builds of OpenCV 3.3.0 and 3.4.2 with CUDA v8.0. I’ve tried disabling opencv_world but it doesn’t help.

    Do you know if there is any way to fix this?

    1. My first guess would be that the CUDA function is using a lot of pinned host memory to increase the speed transfers between the host and the device, as to why so much for a simple function I am not 100% sure. That said on investigation of the source (line 191->128) it appears the function is using the NPP libraries to do the filtering. Therefore my educated guess would be that this library pins a lot of host memory when it is initialized. If I were you I would check the NPP documentation to see if my suspicion is correct and if there is any optimization/flag etc. to prevent this.

      1. Thank you, that sounds like an interesting lead. I’ll do some digging and come back when I find out anything new.

      2. I just had another thought: pinned host memory is never supposed to go in the page file. However, in my tests I found out that even if I don’t have enough RAM but the Page File is big enough, I can still run my software, because Windows will put the unused pages in the Page File. I do more or less the same sequence of image processing operations for each image: the first image takes a long time (1-2 minutes), because Windows is swapping a lot. The second image takes 1-2 seconds!
        This makes me think that it’s not pinned host memory, but I’ll have a look at the NPP doc nonetheless.

        1. True, I was thinking that the pinned memory allocation would push everything else into the page file, but of course this would not work on a system with insufficient RAM and you are not getting any exceptions just a system slow down.

          It is equally possible that NPP is just allocating a lot of unpinned memory on the host in advance to speed things up, it may even try to pin memory and fall back on just allocating unpinned memory. Either way having a quick look at the docs has to be worth a shot.

  17. Update:
    Even the following code also uses up a lot of Commit charge (~700 MB)when linking with opencv_world (3.4.2):

    int main(int argc, char** argv)
    std::cout << cv::getBuildInformation() << std::endl;
    cv::Mat image = cv::Mat::ones(cv::Size(32, 32), CV_8UC1) * 128;
    cv::namedWindow("Input", cv::WINDOW_AUTOSIZE);
    cv::imshow("Input", image);
    return 0;

    However, using the separate libs and linking to opencv_core342.lib;opencv_highgui342.lib shows a normal Commit size (8 MB).

    More tests: from now on with separate DLLs. In my linker I have: opencv_core342.lib;opencv_highgui342.lib;opencv_cudafilters342.lib;%(AdditionalDependencies)

    Test 1:

    int main(int argc, char** argv)
    std::cout << cv::getBuildInformation() << std::endl;
    cv::Mat image = cv::Mat::ones(cv::Size(32, 32), CV_8UC1) * 128;
    cv::cuda::GpuMat dMat; //comment 1
    //dMat.create(cv::Size(10, 10), CV_8UC1); //comment 2
    //cv::Ptr pFilter; //comment 3
    //pFilter = cv::cuda::createBoxFilter(CV_8UC1, CV_8UC1, cv::Size(5, 5)); //comment 4
    cv::namedWindow(“Input”, cv::WINDOW_AUTOSIZE);
    cv::imshow(“Input”, image);
    return 0;
    Just declaring a GpuMat keeps the Commit at 8MB (ok).

    Test 2: (decomment “comment2” line): calling the create function increases Commit to 108 MB.
    Test 3: also 108 MB.
    Test 4: 1068 MB !!!

    I haven’t been able to find any info in the NPP docs. The createBoxFilter call is particularly surprising, since it isn’t really calling any NPP functions, it’s just initializing a function pointer towards the correct NPP function…

    1. Ah sorry I thought we were talking about memory allocated by the program itself, not required to load the dll’s. It is very likely your problem is a combination of factors, the OpenCV CUDA dll’s are pretty big especially if you have compiled them with CUDA_ARCH_BIN 2-6.1 and using a single NPP function may load that dll as well and possibly allocate some additional host memory as discussed above. I suspect that the 1068GB is just the OpenCV dll being loaded following the first call which requires that dll. To be sure you can check this in process explorer/visual studio of similar in the modules pane where you should be able to see the size of the dll’s which are loaded by your program when it is running. If it is the OpenCV dll, try compiling just for your architecture to see if it significantly reduces the size of that dll.

      1. Ah, okay, so it is the size of the dll itself that’s the issue here. I think I see the solution for my case: build opencv without opencv_world, and link my application to the separate libs. Since only a recent part of the application requires GPUs, older PCs without GPUs on which I will deploy it will not need to load the huge dlls, while newer PCs with GPUs will have enough RAM anyway.

        Now I have a question about the target CUDA architectures when building OpenCV. I’m doing development and preliminary testing on a laptop with a Quadro M1000M (compute capability 5.0). Sometimes it will also run on Quadro M2200 (5.2) and GeForce 1070 and 1080 (6.1). Deployment is done on Quadro P4000 (6.1), and speed is critical only for deployment. What should I use as OpenCV build flags to minimize the dll size, while being able to run on all the aforementioned GPUs, while having optimal speed on the Quadro P4000?

        1. I don’t really know what would be best without compiling a few versions for myself and testing. I would refer to this before making your decision.

          My assumption would be that the optimum solution would be to target each of the architectures you mention separately by compiling binary for said compute capability. However having several versions of OpenCV compiled may not be the most convenient thing for you. In that case I would try compiling with PTX for compute version 5.0, however this will mean that their will be a delay the first time you run your code on a new device, while the JIT compiler produces the binary for your specific GPU, which again may not be ideal.

  18. James,

    Do you know if this method automatically makes FFMPEG compile with CUDA enabled? I’m not sure if there’s a way to find out. I’m trying to accelerate video decoding by using the GPU.

    1. Hi, the quick answer is no. The ffmpeg binary is not compiled locally, it is downloaded during the cmake configuration stage. Even if the ffmpeg binary could be build locally, Nvida’s hardware video decoding has moved away from CUDA (NVCUVID) to specific hardware decoding (NVDEC), see this GTC presentation on Nvidia Video Technologies.

      Depending on your requirements you may find directly using ffmpeg using nvdec is sufficient.

Leave a Reply

Your email address will not be published.