The post Weekly CoT News appeared first on Project X Research.
]]>
Rough Rice (Chart 4) has reached a 3years low (Chart 5) in terms of short positions of commercials. This fact is based on recent Commitment of Traders Report (CoT) and indicates that it is more likely that the price will rise. The observation of the net positions of the commercials also support this conclusion. The net positions of the commercials have reached a bullish area compare to the last 52 weeks (Chart 6). Considering these 2 facts it is more likely that we will see rising prices in the next few weeks or months. Here are the relevant charts:
These news were generated with help of DGM library
The post Weekly CoT News appeared first on Project X Research.
]]>The post Conditional Random Fields say Hello to macOS appeared first on Project X Research.
]]>We are glad to present our next big release of DGM, v.1.6.0, which summarizes the v.1.5.x line with further improvements and bug fixes. This is the first crossplatform release: since now on the DGM library is available also for macOS. The binaries built for macOS High Sierra (OS X 10.13) are now available for download. See the changelog for details.
DGM is a C++ library which extends popular OpenCV by implementing various tasks in probabilistic graphical models for Conditional Random Fields. In particular, the DGM learning units include:






The library is also supplied with advanced feature extraction and visulaization modules. The demo code could be run directly after installation and may serve as a base for user projects.
The post Conditional Random Fields say Hello to macOS appeared first on Project X Research.
]]>The post Vaihingen Double Layer Dataset (VaihingenDL) is released appeared first on Project X Research.
]]>The VaihingenDL dataset contains aerial images of Vaihingen village in Germany, associated with corresponding digital surface models (DSM) and two ground truth images – one for the base and the second – for the occlusion layer.
Base layer  Occlusion layer  

class 0  Road  class 0  Void 
class 1  Traffic island (asphalt)  class 1  Tree 
class 2  Sidewalk  class 2  Car 
class 3  House  class 3  Bridge 
class 4  Grass  
class 5  Agriculture  
class 6  Water  
class 7  Sealed  
class 8  Traffic island (vegetation)  
class 9  Beach  
class 10  Railway 
The VaihingenDL dataset can be used to test image segmentation, feature extraction, classification approaches, etc especially for occluded areas. It has two layers of reference labels, thus the occluded areas of the scenes are also covered with ground truth labels. Two layers of labels could be used with the multilayer CRF classification framework which is a part of the Direct Graphical Models library.
Copyright of the images in the VaihingenDL fully belongs to their owners. In no event, shall owners be liable for any incidents, or damages caused by the direct or indirect usage of the images. The dataset should be only used for noncommercial research and/or educational purposes.
The VaihingenDL dataset can be downloaded from this link: VaihingenDL.rar (61MB). If you use the dataset in your publications, please cite it, using this BibTex file.
The post Vaihingen Double Layer Dataset (VaihingenDL) is released appeared first on Project X Research.
]]>The post Environmental Microorganism Dataset (EMDS) has been just released appeared first on Project X Research.
]]>The EMDS dataset contains environmental microorganism (EM) images downloaded from the Internet, associated with corresponding binary ground truth images. In total there are 21 classes of EMs. Each class is represented with 20 EM images with the corresponding binary ground truth bitmap. In ground truth images, EMs are marked with white (value: 255) and the background is marked with black (value: 0).
class 1  Actinophrys  class 8  Paramecium  class 15  Keratella Quadrala 
class 2  Arcella  class 9  Rotifera  class 16  Euglena 
class 3  Aspidisca  class 10  Vorticella  class 17  Gymnodinium 
class 4  Codosiga  class 11  Noctiluca  class 18  Gonyaulax 
class 5  Colpoda  class 12  Ceratium  class 19  Phacus 
class 6  Epistylis  class 13  Stentor  class 20  Stylongchia 
class 7  Euglypha  class 14  Siprostomum  class 21  Synchaeta 
The EMDS dataset can be used to test Image segmentation, feature extraction, classification approaches, etc. Copyright of the images in the dataset fully belongs to their owners. In no event, shall owners be liable for any incidents, or damages caused by the direct or indirect usage of the dataset. The dataset should be only used for noncommercial research and/or educational purposes.
The EMDS dataset can be downloaded from this link: EMDS4.rar (110MB). If you use the dataset in your publications, please cite it, using this BibTex file.
The post Environmental Microorganism Dataset (EMDS) has been just released appeared first on Project X Research.
]]>The Knearest neighbours classifier (KNN) is a type of instancebased learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. Thus, the KNN approach is among the simplest of all discriminative approaches, but this classifier is still especially effective for lowdimensional feature spaces. However, the application of the KNN model in practical applications is problematic because of its lowspeed performance for large datasets represented in highdimensional feature spaces and for the large number of neighbors – K. In this article we address exactly this problem of the KNN model.
The input for the KNN algorithm consists of the K closest training samples in the feature space and the output is a class label l. An observation (or testing sample) y is classified by a majority vote of its neighbours, with the observation being labelled by the class most common among its K nearest neighbours (see figure below, center). In case of K = 1 the class of that single nearest neighbour is simply assigned to the observation y.
In order to estimate the potentials we consider the class of every neighbour as a vote for the most likely class of the observation. If the number of neighbours, having class l is K_{l} we can define the probability of the association potentials as: (see figure above, right)
It can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1 / r, where r is the distance to the neighbor. For our weighting scheme we modify this idea as follows: let r will be the Euclidean distance from the test sample to the nearest training sample in feature space and r_{i} – Euclidean distance to every found neighbor. Then we can rewrite the previous equation with weighting coefficient:
where 1_{l} means 1 if the class of the training sample is l and 0 otherwise.
The search algorithm aims usually to find exactly K nearest neighbors. However it may happen, that distant neighbors do not affect probability p(x = ly) much. For example, the nearest neighbor with r_{i} = r contributes value of 1 / K to the probability. And a neighbor, twice as distant from the testing sample (r_{i} = 2r) will contribute only 1 / K(1 + r)^{2}. For the optimization purpose we stop the search once the distance from the test sample to the next nearest neighbor exceeds 2r. Thus, only K’ ≤ K neighbors in area enclosed between two spheroids of radii r and 2r are considered (see figure below) and weighted according to the equation: p(x = ly) = K_{l} / K’.
The neighbors are taken from a set of objects for which the class is known. This can be thought of as the training set for the algorithm, though no explicit training step is required. A peculiarity of the KNN algorithm is that it is sensitive to the local structure of the data.
Our implementation of the KNN model in DGM C++ library is based on the KDtree data structure, which is used to store points in kdimensional space. Leafs of the KDtree store feature vectors with corresponding groundtruth and every such feature vector is stored in one and only one leaf. Tree nodes correspond to axisoriented splits of the space. Each split divides space and dataset into two distinct parts. Subsequent splits from the root node to one of the leafs remove parts of the dataset until only small part of the dataset (a single feature vector) is left.
KDtrees allow to efficiently perform searches “K nearest neighbors of N”. Considering number of dimensions k fixed, and dataset size N training samples, the time complexity for building a KDtree is O(N · logN) and for finding K nearest neighbors – close to O(K · logN). However, its efficiency decreases as dimensionality k grows, and in highdimensional spaces KDtrees give no performance over naive O(N) linear search.
In order to evaluate the performance of our KNN model, we perform a number of experiments: 2rKNN, 4rKNN, 8rKNN, 16rKNN and 32rKNN – models, where the nearest neighbors enclosed between two spheroids of radii r and 2r (4r, 8r, 16r and 32r respectively) are only taken into account. In the ∞rKNN experiment all the K neighbors were considered. And finally the KNN experiment is the OpenCV implementation of KNN (CvKNN) based on linear search. The overall accuracies and the timings for all 7 experiments are given in table below:
2rKNN  4rKNN  8rKNN  16rKNN  32rKNN  ∞rKNN  CvKNN  

Training:  4659 sec  4659 sec  4659 sec  4659 sec  4659 sec  4659 sec  102 sec 
Classification:  8,3 sec  22,2 sec  52,8 sec  97,2 sec  134,9 sec  216,1 sec  45,3 sec 
Accuracy:  81,39 %  81,65 %  81,97 %  82,11 %  82,33 %  82,42 %  82,36 % 
Accuracies and timings for Intel® Core™ i74820K CPU with 3.70 GHz required for training on 1016 scenes and classification of 1 scene.
Our 2rKNN model gives almost the same overall accuracies as the reference KNN model, but needs almost 5.5 times less time. The training time of the xrKNN models, which includes the building of the KDtree, takes 78 minutes, what is much more slower then 1,7 minutes for KNN training. However, the training in practical applications is performed only once and could be done offline, when the classification time is more critical for the whole classification engine performance. In the table above we can also observe almost linear increase of the classification time with increasing the outer spheroid radius to 4r, 8r, etc. Figure below shows the classification results for the experiments 2rKNN – ∞rKNN.
The post Efficient KNearest Neighbours appeared first on Project X Research.
]]>The post Training Statistical Models appeared first on Project X Research.
]]>As we can observe from the Figure above, the generative models (Bayes and Gaussian mixtures) try to reproduce the original distributions. In order to do this precisely, a methods need to remember all the samples from the Green Field dataset – 160’000 parameters. Or, in general, restricting ourself to the 8bit features, a method needs to remember k·256^m values, where k is the number of categories and m is the number of features. The main idea of the generative models is to rebuild the original distribution using much less parameters and therefore generalize the model for samples, that were not observed during training. Bayes model approximates the distribution using only k·256·m parameters, and the Gaussian mixture model — k·G·m·(m +1) parameters, where G is the number of Gaussians in the mixture.
As opposed to the generative models, the discriminative models (Neural Networks, Random Forests, Support Vector Machines and kNearest neighbors) do not approximate the original distributions, but provide direct predictions for all testing samples. This grants the discriminative models more generalization power: In the areas, where hardly any training sample was met (left bottom and right top corners of the initial distribution image on Figure) all the generative models show black areas with almost zero potentials, while all the discriminative models how a high confidence about the class labels for these areas.
The post Training Statistical Models appeared first on Project X Research.
]]>The post Time to Sell Brent Oil, Gasoline or Wheat appeared first on Project X Research.
]]>Brent oil, gasoline and wheat (SR) have reached a 3years high in terms of short positions of commercials. This is an extreme and it indicates that it is more likely that the price will fall. This interesting fact is supported by the net positions of the commercials, which have reached a bearish area compare to the last 52 weeks. Counting these 2 facts together it is more likely that we will see falling prices in the next few weeks or even months.
These news were generated using the DGM library.
The post Time to Sell Brent Oil, Gasoline or Wheat appeared first on Project X Research.
]]>The post Time to buy platinum! appeared first on Project X Research.
]]>Platinum has reached a 3years low in terms of short positions of commercials. This is an extreme and it indicates that it is more likely that the price will rise. This interesting fact is supported by the net positions of the commercials, which have reached a bullish area compare to the last 52 weeks. Counting these 2 facts together it is more likely that we will see rising prices in the next few weeks or even months.
The post Time to buy platinum! appeared first on Project X Research.
]]>The post DGM library v.1.5.3 has been just released appeared first on Project X Research.
]]>and from now on uses unittesting based on Google Test framework.
The post DGM library v.1.5.3 has been just released appeared first on Project X Research.
]]>The post DGM library v.1.5.2 has been just released appeared first on Project X Research.
]]>The post DGM library v.1.5.2 has been just released appeared first on Project X Research.
]]>