Abstract:
While demand of the massive amount of data
to be more effective and efficient mining strategies is
increasing significantly, practitioners and
researchers are trying to develop scalable machine
learning algorithms and strategies in turning
mountains of data into nuggets. High dimension of
data makes the memory, storage requirements and
computational costs increased significantly.
Therefore, reducing dimension can mainly improve
learning performance. Feature selection, a data
preprocessing technique, is effective and efficient to
enhance data mining, data analytics and machine
learning. Most feature selection algorithms have
been trying to eliminate irrelevant features. However,
removing only irrelevant features is not enough to get
the best insight and patterns. Not only irrelevant
features but also redundant features can degrade
learning performance. Feature selection methods
which can eliminate both irrelevant and redundant
features are demanding in high dimensional data
analytics. To solve this problem, information gain
measured feature selection is presented in this work.