At present, the approximately duplicate records of massive data can not be detected effectively by current methods, an algorithm based on entropy feature selection grouping clustering ( FSGC ) is proposed. The basic idea is that through constructing an entropy metric based on similarity between objects, the importance of each property can be evaluated and a key property subset can be obtained, According to the key property to split the data sets into small data sets, the approximately duplicated records are identified based on the algorithm of density-based spatial of applications with noise (DBSCAN). The theory analysis and experimental results show that identification precision and detection efficiency of the method are high and it can effectively solve the problems of identification in approximately duplicate records of the massive data set.