普通人掌握中文到底要认识多少汉字?——关于汉字识字量的多维解析
How many characters does the average Chinese person know?
译文简介
老外一听到有人谣传说学中文要掌握5万个汉字,头都大了!那么作为母语使用者,普通中国人平均认识多少个汉字呢?
正文翻译
Tony Yau
Some foreigner friends of mine think that Chinese characters are "impossible" to master. What? 50000 characters?
什么玩意儿?五万个汉字?我的一些外国朋友认为掌握那么多的汉字是“根本不可能”完成的任务。
The truth is, as other answers have stated, there are only 3500 common characters. As an average Chinese speaker, I had mastered all common Chinese characters before I graduated from primary school, in terms of writing and reading. (You can't imagine how hard those Hong Kong students compete with each other in their examinations...)
事实是,正如其他回答所说的那样,常用汉字只有3500个左右。作为一个普通的中文使用者,我在小学毕业前就已经掌握了全部常用汉字,无论是读还是写。(你可能无法想象香港(特区)的学生在考试中彼此竞争有多激烈……)
I have seldom used a Chinese dictionary since I was 12. I have seldom met any unknown Chinese characters since I was 12.
从12岁起,我就很少再使用汉语词典了;从12岁起,我也几乎没再遇到过不认识的汉字。
Now as you have noticed, the learning curve between English and Chinese is quite different. In my humble opinion:
现在你应该注意到了,英语和中文的学习曲线非常不同。依我浅见:
PS: A foreigner friend on Facebook just asked me: There is 10 times the common vocabularies in English, is 3500 characters really enough for daily communication for Chinese speakers? I am not a linguist, but my simple (and shallow) answer is that it is more than enough. Unlike English, Chinese bind characters together to form vocabularies. For an instance: horse(馬)+car(車) = carriage (馬車), fire(火)+car(車) = train (火車) etc. We use "old" characters to form "new" vocabularies. It is basically how modern Chinese works.
什么玩意儿?五万个汉字?我的一些外国朋友认为掌握那么多的汉字是“根本不可能”完成的任务。
The truth is, as other answers have stated, there are only 3500 common characters. As an average Chinese speaker, I had mastered all common Chinese characters before I graduated from primary school, in terms of writing and reading. (You can't imagine how hard those Hong Kong students compete with each other in their examinations...)
事实是,正如其他回答所说的那样,常用汉字只有3500个左右。作为一个普通的中文使用者,我在小学毕业前就已经掌握了全部常用汉字,无论是读还是写。(你可能无法想象香港(特区)的学生在考试中彼此竞争有多激烈……)
I have seldom used a Chinese dictionary since I was 12. I have seldom met any unknown Chinese characters since I was 12.
从12岁起,我就很少再使用汉语词典了;从12岁起,我也几乎没再遇到过不认识的汉字。
Now as you have noticed, the learning curve between English and Chinese is quite different. In my humble opinion:
现在你应该注意到了,英语和中文的学习曲线非常不同。依我浅见:
PS: A foreigner friend on Facebook just asked me: There is 10 times the common vocabularies in English, is 3500 characters really enough for daily communication for Chinese speakers? I am not a linguist, but my simple (and shallow) answer is that it is more than enough. Unlike English, Chinese bind characters together to form vocabularies. For an instance: horse(馬)+car(車) = carriage (馬車), fire(火)+car(車) = train (火車) etc. We use "old" characters to form "new" vocabularies. It is basically how modern Chinese works.
评论翻译
很赞 ( 13 )
收藏
我希望我的回答不会惹某些人生气。之所以先这么说,是因为我必须坦白:作为一个每天使用汉字的中国人,我认为汉字系统“表现欠佳”。请耐心读完我的回答。
How many characters one may need to learn to become efficient enough to read? I agree with everyone on the number that statistically 2500 can cover 97% of the characters in all reading material. It’s pretty simple and straightforward. But it probably means nothing to understand why the Chinese character sucks and why the Chinese themselves keep debating the subject of the reform of their writing system. My analysis starts with following question, why our ancestors had created over 50000 characters, not to include the 40000 variant characters, if 2500 characters can almost do the job? Let me first present my figure below to summarize a piece of the history of Chinese characters below.
一个人到底需要掌握多少汉字才能高效阅读?我同意大家的说法:统计上,2500个汉字就能覆盖97%的阅读材料中的字符。这个数字看起来很简单直接。但它可能并不能解释为什么汉字“表现欠佳”,以及为什么中国人自己一直在争论是否要改革书写系统。我的分析从以下问题开始:如果2500个字几乎就能应付一切,那我们的祖先为什么要创造出超过五万个汉字(还不包括四万个异体字)呢?下面这张图总结了一部分汉字历史:
如果你看一下这些数据,会发现汉字总数在过去3300多年里呈“/\\\\”形缓慢上升,却在短短100年内急剧下降到3500个,以满足记录口语的需求。如果你能耐心理解这种疯狂意味着什么,大概就不会怪我“骂”汉字了。
The figure above is based on the oracle bone scxts discovered and major Chinese dictionaries edited in the past 3500 years. It basically tells a story most people don’t know that the Old Chinese character is the worst invention, second to none, of the Chinese civilization.
这张图基于甲骨文的发现和过去3500年主要汉语词典的数据。它讲述了一个大多数人不知道的故事:未改进前的远古汉字可能是历史上“最糟”的发明,没有之一。
At 1400 BC, about the same time the Greek invented their alphabets, the Chinese had their oracle bone scxts, the beginning of today’s characters. By the time of Confucius (~500 BC), which is about 100 years before plato’s, the total number of Chinese characters, excluding synonyms, was about 4500 estimated based on the data. Remember, it means there were only about the same number of words because it was one character for one word for most characters. The consequence? Confucius’ work, after a total of 40 years thinking and teaching and traveling and debating, reflected in the great Analects is a book of less than 16000 characters, with a total of 1355 different characters (words). In comparison, plato’s work which has now all been translated into Chinese, has about 3,500,00 characters. Even downsize by 10 times it is still 20 times more than that for Analects. Does it tell us anything about the characters?
公元前1400年左右,希腊人发明了字母表的同时,中国人也有了甲骨文——也就是今天汉字的起源。到孔子时代(约公元前500年,比柏拉图早约100年),不计同义字,汉字总数估计约为4500个。要知道,那时基本是一字一词。结果呢?孔子花了40年思考、教学、游历、辩论,其思想结晶《论语》全书不到16000字,仅用了1355个不同的字(词)。相比之下,柏拉图的著作如今全部译成中文后约有35万字——即使按比例缩减十倍,仍是《论语》的20倍。这难道不能说明些什么吗?
What about Mohism, the only school of the Chinese classics at Confucius time that had developed knowledge of logics, physics, and other natural sciences? There were only about 10000 characters of his work related to the above subjects, and nobody continued to work on those ideas. Why? Nobody knows exactly but there is a truth going with it: Until the first century, there was no such a character representing being (to be) exactly. Can anyone imagine how to develop scientific thinking and formal logic without such words?
再说墨家——孔子时代唯一发展出逻辑学、物理学等自然科学的学派。相关著作总共只有约一万字,之后无人继续研究。为什么?没人确切知道,但有一个事实相伴而生:直到公元一世纪,汉语中还没有一个字能准确表达“存在”(to be)这个概念。你能想象没有“是”这类词,如何发展科学思维和形式逻辑吗?
In the past 1500 years Chinese scholars have spent countless time to read and study classics along with the growth of the number of Chinese characters till 20th century. The 13 Classics, regarded as the core work of Confucianism, were composed of about 650,000 characters in a total, but received all kinds of analysis and critiques composed of more than 320,000,000 characters, reaching a ratio of 1:400~500. Why did they do so? Because there were too many wrong characters and words and sentences without definition and notes. Why? Because people using ideographic writing initially didn’t need to define anything just like that an artist never gives definition on his work. When one was educated thru reading character written essays his mind would be inevitably wired up that way.
过去1500年里,中国学者花费无数时间研读经典,伴随汉字数量持续增长直至20世纪。儒家核心典籍“十三经”共约65万字,而历代对它们的注释、评析却多达3.2亿字,比例高达1:400~500。他们为何这样做?因为古文中错字、歧义句、无定义词太多。为什么?因为早期使用表意文字的人并不需要明确定义——就像艺术家从不给自己的作品下定义一样。长期阅读这种文字,人的思维方式也会被潜移默化地塑造。
To make the story shorter, let me jump to 1900 the time China started its modernization. It was the time that the Chinese began to understand that writing could be done in two ways: to write what one says, or to write what one means. The Chinese had been doing it in the second way without knowing there was another way. It was a shock, and many educated started to realize how bad the characters had dragged them down. “Let’s write what we say with our hands!” cried out so many Chinese. Of course, many disagreed and criticized those advocates as “traitors” for they wanted to change what their ancestors had created. It continues to this day.
为节省篇幅,我直接跳到1900年——中国开始现代化的时代。那时中国人首次意识到,写作有两种方式:一是“写你说的话”,二是“写你想表达的意思”。中国人一直采用后者,却不知还有前者。这一发现令人震惊,许多知识分子意识到汉字严重拖累了国家发展。“让我们用手写下我们说的话!”无数人高呼。当然,也有很多人反对,指责这些倡导者是“背叛祖先”的叛徒。这场争论至今未息。
But, the “New Culture Movement” was welcomed from bottom up and eventually became a top down reform that classic writing was discarded in school education and Chinese characters were for the first time used as sound characters to record what one was saying, however, with thousands of characters. It’s still difficult but a lot easier than the old way of writing. Different phonetic systems had been created to help standardize the pronunciations of the characters until Pinyin was officially chosen as the only one. Now, everyone uses it to type in computer, and it has made our writing so much easier, and programing too. Chinese literature has taken a huge surge since the reform.
但“新文化运动”自下而上兴起,最终成为自上而下的改革:文言文从学校主流教育体系中被逐出,汉字首次被当作记音符号来记录口语——尽管仍需掌握数千个字。虽然仍难,但比过去容易多了。各种拼音方案相继出现,最终“拼音”被官方选定为唯一标准。如今人人都用拼音打字,写作和编程都变得轻松许多,中国文学也因此迎来大爆发。
I can give a number of issues with using characters. The discussion on characters is only in adults who have been trained with the characters and become used to them after years of effort made to adapt to the system. If we let kids try to learn characters and alphabets, and comparing the merits and drawbacks of each, we will know the answer much better. Unfortunately they are not asked and even not allowed to complain.
我能举出很多汉字使用中的问题。关于汉字的讨论,往往只发生在那些经过多年训练、已习惯这套系统的成年人之间。如果我们让孩子们同时学习汉字和字母文字,并比较各自的优劣,答案会更清晰。可惜,他们既没被问过,也不被允许抱怨。
If a people do not have a writing system, they can’t develop their culture continuously generation after generation; If a people have a writing system that can accurately record what they say, they have a good chance to continue culture development; and, If a people have a poor writing system, they are still better off than having nothing but less efficient. In the past 3500 years, how much more extra work have the Chinese spent on fighting with the difficulty of learning and understanding character writing comparing with those who have written in alphabetic systems? How potentially is their thinking twisted even with their hard work thru using it? These are the questions way more important than the topic question.
一个民族若没有文字系统,就无法代代延续文化;若有能准确记录口语的文字系统,就有很大机会持续发展文化;而若有套低效的文字系统,虽比没有强,但效率低下。在过去3500年里,中国人为了克服汉字学习和理解的困难,比使用字母文字的民族多付出了多少额外努力?他们的思维方式又在多大程度上因这套系统而变形?这些问题远比“认识多少字”重要得多。
The last but not the least: my answer is NOT trying to discourage anyone to learn Chinese but a reminder to those who believe in the superiority of Chinese characters. Good luck on your learning characters!
最后强调:我的回答并非劝退任何人学习中文,而是提醒那些盲目相信“汉字优越论”的人。祝你学习顺利!
Chun Kai Lau According to the list of most frequently used characters on 粵語審音配詞字庫, the frequency of use of a particular character drops below 100 around the 30xxth most frequently used characters. So I'd assume an average Chinese person has a good grasp of 3000 something characters due to constant exposures.
根据“粤语审音配词字库”提供的高频字表,大约到第3000多个常用字时,单字使用频率已降至100次以下。因此我推测,普通中国人因长期接触,大概能熟练掌握3000多个汉字。
While it's tempting to say you'd have no problem reading Chinese newspaper after mastering the 3000 something characters, the reality is much more intricate than that.
虽然有人以为掌握这3000多字就能无障碍阅读中文报纸,但现实要复杂得多。
Many often incorrectly assume an equivalence between English words and Chinese characters since each Chinese character intrinsically conveys meanings, as opposed to the Roman letters which are strictly building blocks of words. But the truth is, in modern Chinese, seldom are we using monosyllabic words only(words consisting of a single character), rather there're plentiful of polysyllabic words(words consisting of multiple characters) in use. So Chinese characters are somewhere in between letters and words, leaning more towards letters indeed.
很多人错误地将英文单词与汉字等同,认为每个汉字自带完整含义,不像拉丁字母只是构词部件。但事实上,现代汉语极少单独使用单音节词(单字词),大量使用的是多音节词(多字词)。因此,汉字其实介于字母和单词之间,更偏向字母。
Chi Xu Here I have a file named “List of Common Chinese characters by frequency”, which sorts the most common 3500 Chinese characters by their occurrences in multiple sources and areas. Based on their data these 3500 characters would cover 99.82% of all the texts.
我有一份名为《现代汉语常用字频度表》的文件,按多个来源和地区的使用频率排序了最常见的3500个汉字。数据显示,这3500字可覆盖99.82%的文本。
If you look at the list I mentioned above, I can try to translate some characters at different frequencies so that people will have a general idea about the 3500 characters.
如果你看一下我上面提到的列表,我可以尝试以不同的频率翻译一些字符,以便人们对这3500个字符有一个大致的了解。 排名 汉字 含义
------ ------ ------
50 作 做,成为 98 定 固定;一定(definitely);约定(appointment) 99 见 看见 100 两 二(口语);重量单位(50克) 499 古 古老 500 远 远 999 录 记录 1000 港 港口 1498 泼 泼洒 1499 祸 灾祸;车祸 1500 刊 出版 1998 刮 刮风;刮擦 2000 锡 锡(金属) 2498 芹 芹菜 2499 姥 外婆 2500 馋 馋嘴 2996 缕 一缕(烟、思绪等) 2997 屉 抽屉;笼屉 2998 砚 砚台 2999 楔 楔子 3000 腻 油腻;厌烦 3496 捌 数字“八”的大写 3497 檩 房屋横梁 3498 柒 数字“七”的大写 3499 瞭 瞭望 3500 啰 啰嗦;喽啰
To me the transition from high usage to low usage happens around the 2500th. After all mastering 3500 characters would be enough for daily uses.Here is the lix to the list: http://www.doc88.com/p-002207023112.html
对我而言,高频到低频的转折点大约在第2500字左右。掌握3500字足以应对日常生活。附链接。
Anonymous The problem is that it is very hard to define what qualifies as ‘knowing’ a character. One can know the meaning and not the pronunciation or vice versa. Most characters also have multiple and frequently, unrelated meanings. Chinese ‘words’ are formed from combinations of characters based on one of the meanings of each character. So how many of these pairings must one know to be said to know a character, or should these terms be counted as multiple words instead of one despite sharing the same character?
问题在于,“认识一个汉字”很难定义。你可能知道意思但不知道读音,或反之。大多数汉字还有多个甚至毫不相关的含义。中文“词”由汉字组合而成,每个字取其中一个义项。那么,要掌握多少种组合才算“认识”这个字?还是说这些应算作多个词?
Yvonne Liu I just really want to add that there's Classical Chinese which is way more concise (and therefore complicated) in both grammar and characters. This is why in the past Chinese people used to study for like ten years before they could actually write poems or articles. The current Chinese (白话文) is less concise and grammatically more similar to oral Chinese, and the characters are simplified (in mainland China) to make the language more accessible to people. Though recent Chinese literature is all in modern Chinese 白话文 (and many novels in the past like Four Classics 四大名著 are somewhere in the middle), you can't deny that the most valuable parts of Chinese literature in the past, such as historical texts, poems and essays, were all written in Classical Chinese. If you go a bit more in depth of the Chinese language and literature, those things are there and will always be there to study. The question asks for 'average Chinese people'. Classical Chinese is an important part of middle school and high school Chinese syllabus, and every Chinese high school graduate should know it to some extent. In that you can safely add another one to two thousand words to the 'average words' one needs to know.
我想补充一点:还有文言文,其语法和用字极为简练(也因此更难)。过去中国人需苦读十年才能写诗作文。如今的白话文更接近口语,大陆还推行简化字以普及教育。尽管当代文学全是白话文(四大名著等古典小说介于两者之间),但中国最珍贵的文献——史书、诗词、散文——皆用文言文写成。深入学习中文必然涉及这些内容。问题问的是“普通人”。文言文是中学语文必修内容,高中毕业生应有一定基础。因此,可安全地在3500字基础上再加1000–2000字。
All above points are just my understanding of the language. Please correct me if I'm wrong :)
以上仅为个人理解,欢迎指正。
I am going to summarize the ideas of some of the answers to the question I mentioned above below.According to the requirements of compulsory education in China, who have finished middle school should know at least 3800 characters.
我将在下面总结一下我上面提到的问题的一些答案的观点。根据中国义务教育的要求,初中毕业生至少应该掌握3800个汉字。
An answerer designed a website for people to approximate their size of vocabulary, to which the lix is http://hanzi.sjz.io . The test isn’t necessarily accurate but it provides a reference to some extent. For instance my vocabulary is said to be somewhere between 6500 to 7500.
一位答题者设计了一个网站,供人们估算自己的词汇量,附网址。这个测试不一定准确,但可以作为参考。例如,我的词汇量被测出来大约在6500到7500之间。
The conclusion based on these information and my bold guess is that an average person in China knows more or less 3500 to 4500 characters.
根据这些信息以及我的大胆猜测,可以得出结论:中国人的平均认知水平大约在3500到4500个汉字之间。 Terry蔡琳Thatcher Waltz
These days, the question isn't how many characters a person "knows" (what does that mean anyway, know them to write them by hand from memory? to type them correctly in context? know them to recognize them in print? In handwriting? In context or in isolation?) Chinese teaching needs to catch up with the reality that most native-speaking young people living in-country today come up blank on handwriting many, many characters, and usually refer to a cell phone when necessary. So it makes little sense to continue basing our classes for non-native speakers on the expectation that they will perfectly memorize and reproduce thousands of characters by hand.
如今,问题不再在于一个人“认识”多少汉字(这究竟意味着什么?是能凭记忆手写?还是能在语境中正确打字?抑或能识别印刷体?手写体?语境理解还是孤立理解?)。汉语教学需要正视这样一个现实:如今大多数生活在中国的以汉语为母语的年轻人,很多汉字都无法手写,必要时通常需要借助手机。因此,继续以期望非母语者能够完美记忆并手写数千个汉字为教学基础,是毫无意义的。 软糖魔法
Most people know about 4,000 Chinese characters. People with high education know 5,500~6500 Chinese characters. Probably less than 1% of people know more than 6,500 Chinese characters. If you want to test Chinese literacy, you can download the software I made. Chinese characters ranked 6500~13000 appear very infrequently in modern books, and only lovers of ancient languages will know them. Chinese characters ranked after 13,000 only appear sporadically in ancient texts, and I am afraid that ancient poets would not dare to say that they knew all of them. Some people say that ordinary people know 8000 Chinese characters, this is impossible.
大多数人认识约4000字。高学历者认识5500–6500字。不到1%的人认识超6500字。 第6500–13000位的字在现代书籍中极少见,仅古文爱好者知晓。 13000位之后的字仅零星见于古籍,恐怕古代诗人也不敢说全认识。有人说普通人认识8000个汉字,这不太可能。
Zier Liu A Chinese native speaker who graduated from ordinary high school has about 4000-5000 literacy. For native Chinese speakers, there are about 3,500 Chinese characters commonly used in daily life. The mastery of vocabulary varies from person to person. However, there are more than 50,000 words in the "Commonly Used Words List of Modern Chinese" published by the Ministry of Education.
一个普通高中毕业的汉语母语者大约掌握4000-5000个汉字。对于汉语母语者来说,日常生活中常用的汉字大约有3500个。词汇量的掌握程度因人而异。然而,教育部出版的《现代汉语常用词汇表》收录了超过5万个词汇。
You could be curious about the statistic it shows. Why does the Chinese language require only a small amount of vocabulary for daily communication?If you have read the Chinese and English translations of the same novel, you will find that the Chinese version is obviously much thinner than the English one. This mainly due to the major feature of Chinese, simplicity. The reason is that a single Chinese word can carry more meaning than English.
你或许会对这个统计数据感到好奇。为什么汉语日常交流只需要少量词汇?如果你读过同一部小说的中英文译本,你会发现中文版明显比英文版简洁得多。这主要是因为汉语的一个主要特点——简洁。原因在于,一个汉语词汇可以表达比英语词汇更丰富的含义。
As the information carried by a single word is richer, the total number of words needed is naturally less, so it is more concise. The average amount of information per unit vocabulary is greater in Chinese than in English.
由于单个词语所承载的信息更丰富,所需的词语总数自然更少,因此语言也更简洁。汉语单位词汇的平均信息量比英语更大。
Anonymous Um, for the exact number, I'd say may be several thousands, depending on the education background, but not so much to make you either to read or speak the language. Chinese words are made up from combinations of character, as well as the the Chinese characters are usually made up of parts called radicals. What you should remember is just some meaning of those radicals, then just guess the meaning by your experience.
嗯,具体数字因教育背景而异,但不必太多即可读写。中文词由字组合而成,字又由偏旁构成。记住偏旁含义,结合经验即可推测字义。
IMHO, it's quite usual for Chinese to guess meaning of a character or a word (though sometimes we get wrong) and what we need is just brief knowledge about how are word and characters made.
中国人常能推测字词含义(有时会错),关键在于了解构字组词规律。
Yuzu I will say the average number of Chinese characters that we know are between 2000-3000, but even with 1000 characters you can have a pretty fluent conversation or writing. My dad knows around 3000-4000 but he said that it's pretty hard not to forget, so you have to continuously write Chinese characters to not forget.
我想说,我们平均认识的汉字数量在2000到3000个之间,但即使只掌握1000个汉字,也能进行相当流利的对话或写作。我爸爸认识3000到4000个汉字,但他告诉我汉字很容易忘记,所以必须不断地书写才能避免遗忘。
From all the people I know, they know all around 2000-3000 characters, so yeah, this is pretty much the average number of Chinese characters that Chinese people knows, even though there is thousands and thousands characters in the Chinese language, you can have a very fluent Chinese already with 2000 characters.
据我所知,我认识的所有人大概都认识 2000-3000 个汉字,所以,是的,这差不多是中国人平均掌握的汉字数量了。虽然汉语有成千上万个汉字,但只要掌握 2000 个汉字,你就可以说一口流利的中文了。
Patrick Di This is the most popular dictionary for searching letters.
这是最流行的查“字母”字典。
这是英英词典。
这是最流行的具备“以词搜词”功能的综合搜索词典,当然还有其他版本的。
因此,汉字数量远少于中文词汇,但远多于拉丁字母。 所以提问者指的是哪个?“平均识字数还是识词数”?我不确定。
Andrew Yao I was told by my parents that grade school level students should have a 1,000 word vocabulary. "Word" as defined here are strokes and radicals that would form a single "block", "space" or "unit". The English equivalent would be the things that consist of letters in the alphabet, separated by spaces. Even at that level, they should able to pick up a newspaper, look at it, and be able to derive thoughts and concepts in their heads like any literate person would of any other written language. Alas, I'm not fluent and experienced enough to break it down into "Chinese characters", but that hopefully, that should still provide more insight.
父母告诉我,小学生应掌握约1000个“词”——这里“词”是指由笔画和偏旁组成的单个“块”“空格”或“单位”,相当于英文中由字母组成、以空格分隔的单位。即使如此,小学生们也应能读懂报纸并理解内容,如同任何其他识字者阅读母语文字一样。 可惜,我的汉语水平还不够,无法将这种“块”分解成单个“汉字”(意思类似把“啦”分成“口”+“拉”:,答主可能是个海外华人),但我希望这仍能为您提供一些新的见解。