Wednesday, 11 January 2017

Machine Learning for Trading (Part 4) - Decision Trees

Decision tree learning uses a decision tree as a predictive model which maps observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).

It is one of the predictive modelling approaches used in statistics, data mining and machine learning.

Decision trees where the target variable can take a finite set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees.

Tools are available (for example scikit-learn Python library) to enable efficient generation of decision trees.

The trees are generally binary trees, i.e. each node has 2 branches. A "decision" at each node causes the search to proceed down either the left or right "branch", and so onto the next node, etc. until a "leaf" node is reached.

The decisions are basically tests about the data set. For example, suppose we collected data from various technical indicators: RSI, STOCHASTICS, MVA, etc. A tree might be constructed something like the diagram below...


The above example is a classification tree, i.e. the leaf nodes are things like buy/sell/flat. It's also possible to construct trees that result in a numerical value, e.g. the leaf nodes could be some numeric value, e.g. prediction of price move. These are sometimes called regression trees.

By the way, the above tree is just for illustration purposes.

The tree doesn't have to be balanced (i.e. no need to have equal number of nodes on both sub-branches), criteria can appear multiple times, not all criteria need to be used, etc. It's actually a very flexible approach, since the data can be very heterogeneous (numerical values, classes, yes/no, etc.)

The most efficient way to develop the tree will select criteria that provide the most information at each step. For example, suppose we were using a decision tree to guess a playing card; the first question should be "is the card red?" as this instantly splits the data in half.

Decision trees are prone to over-fitting, but this can be mitigated by using ensemble methods (e.g. several different trees and using majority vote).

The process of generating a decision tree is also potentially useful in itself. For example, we can throw a heap of technical indicator data at the tree generation process, and see which are the most important (i.e. the ones performed earlier). This information may be useful for further studies.

No comments:

Post a Comment