dc.description.abstract |
Car insurance companies face a major challenge in dealing with insurance claims,
which are prone to fraud and increasing in volume. This makes it difficult for insurers
to classify claims during the review process. To address this issue, the aim of this study
is to develop four Car Insurance Claim Prediction Classifiers with Random Forest and
Logistic regression based on the car insurance claim dataset respectively and supports
for comparison which method and attributes are more suitable for car insurance
companies. Firstly, this proposed system creates a feature selection model using
Variance Threshold Selector method to select the important attributes impact on the
accuracy of car insurance claim prediction classifiers. The data set is split into training
with 80% and testing sets with 20% randomly and the two classifiers with all attributes,
the training dataset is used to create the LR classifier and RF classifier. For two
classifiers with the feature selection method, the system creates the new training dataset
and new testing dataset by removing low variance value of attributes using Variance
Threshold Selector method. After that, two LR classifier and RF classifier are been
created by using new datasets. The system has analyzed the different attributes: 30, 32,
34, 36, 38, 40 and 42 to choose the number of attributes and important attributes and
tested 10 times for each attribute number because of splitting training and testing
datasets randomly. Finally, the system compares the evaluation results with metrics:
accuracy and f score. RF classifiers with and without the feature selection method are
suitable for the proposed system than LR classifiers. Among different attribute
numbers, the classifiers based on 38 attributes and 40 attributes are the best classifiers
and classifier based on 42 attributes are the second best classifier. |
en_US |