An intelligent hybrid model for identity document classification
Digitization, i.e., the process of converting information into a digital format, may provide various opportunities (e.g., increase in productivity, disaster recovery, and environmentally friendly solutions) and challenges for businesses. In this context, one of the main challenges would be to accurately classify numerous scanned documents uploaded every day by customers as usual business processes. For example, processes in banking (e.g., applying for loans) or the Government Registry of BDM (Births, Deaths, and Marriages) applications may involve uploading several documents such as a driver’s license and passport. There are not many studies available to address the challenge as an application of image classification. Although some studies are available which used various methods, a more accurate model is still required. The current study has proposed a robust fusion model to define the type of identity documents accurately. The proposed approach is based on two different methods in which images are classified based on their visual features and text features. A novel model based on statistics and regression has been proposed to calculate the confidence level for the feature-based classifier. A fuzzy-mean fusion model has been proposed to combine the classifier results based on their confidence score. The proposed approach has been implemented using Python and experimentally validated on synthetic and real-world datasets. The performance of the proposed model is evaluated using the Receiver Operating Characteristic (ROC) curve analysis.