Writing test scores of the New HSK, similar to those of any other language proficiency tests, are most vulnerable to reliability criticism. The study collected writing samples (n = 89) of two mock tests of writing of the New HSKS. Data analysis was performed from the perspective of Generalizability Theory. Variance components were estimated for effects of items, raters, and rating speeds. Phi was also estimated for various settings of the test. Major findings are: (a) according to the current test setting, the descending order of the Phi coefficients for each item type is the ordering of the inner- sentence components, writing based on the keywords given, and writing based on the photo given; (b) to keep Phi at least . 8 for each item type, ordering items needs to increase to 20 while each of the other two needs to increase 2 and 3 items ; (c) with current allocation of item quantities for each item type, if calculation of the comprehensive score of writing uses weight propor- tional to the raw scores, then the Phi coefficient for the writing test could marginally reach the level of. 74. The study explored various approaches reaching a Phi coefficient at least . 85 with relatively lower costs (for details, please refer to section 3.3.2 of this paper). To do this, the analysis applied solver functions of Microsoft Excel ; (d) the study did not find a significant effect of rating speed. How- ever, this conclusion was limited to the two different speeds investigated: each rater' s comfortable speed, and a speed under which each rater felt a little rush but still had confidence about his/her rating quality. Effect of rating speeds needs to be investigated with more rigorous designs. The authors also called for more attention to reliability issues of writing test.