INFORMATION GAIN IN DECISION TREE THROUGH GINI INDEX

Best suited for classification and regression problems, decision trees are supervised machine learning algorithms. These algorithms are constructed by breaking down the training data into subsets of output variables of the same class by implementing the particular splitting conditions at each node. The information gain in decision trees is directly formulated through training  into a hierarchical structure. The knowledge is holed and displayed by this structure  in such a way that it can easily be understood, even by non-experts.



While building a decision tree,  each node focuses on identifying the attribute and a split condition on that attribute which minimizes the class labels mixing, because pure homogeneous subsets is not possible to achieve consequently giving relatively pure subsets.


In this blog, we will discuss the concept of , information gain, gini index and gini index.


With the root node of the decision tree, this process of classification begins and  gets expanded by applying some splitting conditions at each non-leaf node,  and the datasets is divided into a homogeneous subset through its use. 


What is Information Gain in Decision Tree ?


Information gain is solely based upon the information theory.

For determining the best features/attributes which render maximum information about a class, information gain is used. Beginning from the root node to the leaf nodes, it follows the concept of entropy while aiming at decreasing the level of entropy. The differences between before entropy and after split is computed by information gain,  and specifies the impurity in class elements.


Information Gain = Entropy before splitting - Entropy after splitting


What is Gain Ratio?

Gain Ratio or Uncertainty Coefficient was proposed by John Ross Quintains  which is used to normalize the information gain of an attribute against how much entropy that attribute holds. Formula of gini ratio is given below:


Gain Ratio=Information Gain/Entropy


What is Gini Index?


The degree of probability is computed by the gini index, or gini coefficient, or gini impurity of a specific variable that is wrongly being classified when chosen randomly and a variation of gini coefficient. It provides outcomes to either be “successful” or “failure” and hence conducts binary splitting only and works on categorical variables.


The range of varying of gini index is from 0 to 1.


Here, 0 depicts that only one class exists here or all the elements are allied to a certain class. All the gini index representing the value as 1 signifies that across various classes, all the elements are randomly distributed. A value of 0.5 denotes the uniform distribution of  elements into some classes.


Conclusion


Because of their visual representation/interpretation, it is often observed that decision trees are very catchy to understand. With actual good accuracy,  it can also handle high dimensional data. I hope this article has come into great  help for your understanding of the basis of the decision tree in the context of entropy, information gain, gini ratio.

Comments

Popular posts from this blog

Fundamental & Technical Analysis | Analytics Steps

TIMELINE OF ARTIFICIAL INTELLIGENCE