Friday, 23 September 2011

HSK Proficiency and Literacy

I have talked before about my interest in pursuing the HSK as a long-term goal. The idea of a ranked certificate to showcase my Chinese ability appealed to me as a way to put a definite achievement milestone along the practically infinite road of learning Chinese characters. I will probably never be able to say I speak Chinese fluently, but if I pass HSK level 3 or 4, I will at least have that to hang my hat on.

Recently, though, I decided to test out just how much Chinese proficiency that level of mastery would actually get me. Using the character lists I found here, I coded up a little program that takes a website and highlights the characters that are included at a given HSK level.

The resulting program was actually pretty interesting to play around with. Trying it on different websites, with different types of content, allowed me to see, visually, how much I would be able to read after having learned a given number of characters. For example, here is a section from the Wikipedia article on railroads (chosen as an example of a page with fairly straightforward content), with only the level one vocabulary (176 characters) highlighted in pink:

That should make it fairly clear that at HSK Level 1, one remains quite illiterate. (As I can testify from experience!) Now here is the same text with HSK levels 1-4 highlighted (the most I ever expect to learn, 1067 characters). In this and the following image, the different shades of pink are progressively lighter according to the level of the character (1, 2, 3, or 4 in this case):

Finally, here is what one who achieves the full HSK levels 1-6 (that's 2631 characters) would know. Again, the lightest characters are those of the highest level; the black ones are those that a reader still would not recognise even after learning the entire HSK list:

While it is said that at level 4, one has mastered enough characters to read 90% of Chinese text, and at level 6 that number rises to 98%, viewing the texts in this way allows one to see things in more practical terms: a level 4 reader can read a text, but it will require a lot of trips to the dictionary to do so, making it quite a chore to get through anything more than half a page long.

At level 6, reading is much more fluid, but still by no means perfect. Still, dictionary trips are rare enough that one should be able to read real texts, even long ones, when motivated enough to do so (in the example above, learning 轨—gui3, 'rail'—alone, would eliminate half of the black characters remaining in the text).

This is probably why the HSK only tests up to this level: once one has attained this level of literacy, the remaining 1500-odd characters that an adult Chinese person knows can be picked up in the wild, in the course of immersing oneself in the Chinese language, rather than through further classroom learning.

I thought I would share these findings, because I think it is a useful visual illustration, even for someone who cannot read any Chinese, of what knowing a certain number of characters actually gets you.

Posted by jon at 7:30 AM in Languages 
 
« September »
SunMonTueWedThuFriSat
    123
45678910
11121314151617
18192021222324
252627282930 
       
 
Non enim id agimus ut exerceatur vox, sed ut exerceat.