Due to the Hadoop itself is not suitable for processing of the mass of small files. And current data de-duplication methods are mainly based on the binary characteristics of the file, so it cannot recognize the same song after the signal process- ing and also cannot meet the requirements of the online processing of massive data. This paper presented a de-duplication stor- age architecture of the mass of the MP3 file based on the acoustic fingerprint. It combined with music files on the acoustic char- acteristics and the recta-information of MP3 files, de-duplication by index, merge online and NAF, solved the memory bottle- neck problem effectively in the face of too many small files. At the same time it provided a better de-duplication effect. Offline merge and the replication place module optimized storage continually according to the operating conditions of the system. The experimental results show that the architecture can achieve a good balance on performance, the rate of de-duplication, manage- ability and scalability.