As the transistor resources and delay of interconnect wires increase, the tiled multi-core processor has been a new direction for multi-core processor. In order to thoroughly study new type processor and explore the design space of it, this paper designs and implements a user-level performance simulator for the tiled CMP architecture. The simulator adopts the directory-based Cache Coherence Protocol and the architecture of store-and-forward Network- on-Chip with Godson-2 CPU as the processing core model, and depicts out-of-order transacted requests and responses and conflictions of requests and their timing characteristics in detail. The simulator can be used to evaluate all kinds of important performance features of the tiled CMP (chip multiprocessor) architecture by running all kinds of sequential or parallel workloads, and thus provides a fast, flexible and efficient platform for architecture design of multi-core processor.