There is a wide variety of statistical models which may be applied to the semantic segmentation tasks. Let us now illustrate their impact on the computation of the label maps. For this purpose we use synthetic Green Field data-set, with 3 classes, described by two features. If we quantize all the features by 8 bit, we can map the whole data set to the 2-dimensional 256 x 256 feature space. If we accumulate the sample points in such representation, it will correspond to the probability densities. For the visualization we will mark these densities, belonging to different classes, with tree different colors: red, green and blue (see Figure below).
About Sergey Kosov
Posts by Sergey Kosov:
Project X Hyperlapse Stabilization is a technology that removes annoying shaky motion from videos and creates smooth and stabilized time lapses from image sequences. Video stabilization is an important video enhancement technology which is in great demand in timelapse and hyperlapse photography. We develop a practical and robust approach of video stabilization that produces stabilized videos with good visual quality.
The release includes one new classifier (k-nearest neighbors) and now supports efficient calculation also on GPU. Moreover, the library now has DGMConfig.cmake files to be easily integrated in other projects with CMake.
The next version of the DGM library will make use of the fast parallel computing using Graphics Processing Units (GPU) of the graphics cards, supporting Direct X. The current version of the DGM library supports the Parallel Patterns Library (PPL), which allows for the parallel multi-core computing on Central Processing Unit (CPU). The Accelerated Massive Parallelism (AMP) library takes advantage of the data-parallel hardware that’s commonly present as a GPU on a discrete graphics card. Actually, the internal mechanisms of the AMP library may decide where to execute the C++ code: on CPU or GPU.
AMP VS PPL
We have made one performance test, using the task of the sparse coding dictionary learning. The bottleneck of this function is the algorithm, calculating the matrix product. Using the parallel implementation of this algorithm, we may achieve the sparse coding dictionary learning function with 90%-95% sequential processing. Our implementation of the matrix product may be found here.
Our test was performed using two systems:
- Intel® Core™ i7-4820K @ 3.70 GHz + NVIDIA GeForce GTX 780
- Intel® Xeon® X5450 @ 3.00 GHz + NVIDIA GeForce 210
The speed-up, when using PPL and AMP libraries for the first system may be seen at the following figure:
here, 1,00 corresponds to the time needed for 2 iterations of the training algorithm (2 180 seconds). Accordingly, for the AMP | GPU this time is 15,21 times smaller (142 seconds)
The speed-up, when using PPL and AMP libraries for the second system may be seen at the figure:
Here, the AMP library decided to run the code on CPU instead of GPU. This lead to significant performance drop. Again, 1,00 corresponds to the time needed for 2 iterations of the training algorithm (4 437 seconds on X5450). The parallel computing on CPU with the PPL library took 3,65 times less time and parallel computing on CPU with the AMP library took even more time – 4 819 seconds.
Instead of conclusion
On the systems with powerful graphics cards, please build the DGM library with the option ENABLE_AMP in CMake. Otherwise, please use only ENABLE_PPL option.