Learning vocabulary in a new language
Saturday, March 29, 2008
There's a really good study on how much vocabulary a person has to know in a language to understand most of it, and this page should be bookmarked by all language students:
http://nflrc.hawaii.edu/RFL/April2005/chujo/chujo.html
English is the sample language here but the point is the same for other languages as well. Take a look at how much of a language's written material you can understand simply by learning the most frequent words first:
Table 1: Coverage and Standard Deviation with Varying Vocabulary Size
[Text Length = 1,000 / Sample Size = 4 / Iteration = 1,000]
Vocabulary Size | Coverage (%) | SD |
---|---|---|
100 | 53.1 | 1.60 |
200 | 60.1 | 1.63 |
300 | 63.9 | 1.67 |
400 | 66.8 | 1.69 |
500 | 69.4 | 1.68 |
600 | 71.2 | 1.68 |
700 | 72.9 | 1.60 |
800 | 74.2 | 1.66 |
900 | 75.5 | 1.62 |
1,000 | 76.8 | 1.61 |
2,000 | 84.2 | 1.35 |
3,000 | 87.9 | 1.23 |
4,000 | 90.4 | 1.08 |
5,000 | 92.0 | 1.00 |
6,000 | 93.1 | 0.87 |
7,000 | 94.0 | 0.77 |
8,000 | 94.7 | 0.77 |
9,000 | 95.2 | 0.69 |
10,000 | 95.7 | 0.72 |
11,000 | 96.0 | 0.61 |
12,000 | 96.3 | 0.58 |
13,000 | 96.6 | 0.55 |
14,000 | 96.9 | 0.51 |
That means that with 100 words you can already read half, with 900 words you can read 75%, and once you've gone past 4,000 words you can understand 90%. There's a certain point you get to in a language where you start to be able to grasp the meaning of words you don't know simply from context. I don't remember at what level this begins but I think it comes after a few thousand words, and once a person reaches this level the only way to get fluent is through massive amounts of material, which means just chilling and reading for hours and hours a day.
See the Wikipedia page on collocation for more on this subject. Only reading and hearing reams and reams of material will make one aware of how you can say that a person is tall but mountains are high, that coffee is weak as opposed to thick, and how to use all the other words that might seem to the non-native speaker to be acceptable but really are not.
This is one reason why I like Ecclesiastes. About 3,000 words in total, and some 950 or so individual words to know after you understand the whole thing, plus lots of repetition and context. That means that anyone who has memorized the whole thing will now be able to understand 75% of the language. Not bad for a single book.
(of course that requires full memorization, not just skimming through a few times. That's the hard part)
3 comments:
I have added the link you provided to a Wikipedia article:
http://en.wikipedia.org/wiki/Keyword_%28linguistics%29
Cool, that's a good link. Bob Petry first provided it on Auxlang a few months back and I found it again the other day by chance.
I have added the link you provided to a Wikipedia article:
http://en.wikipedia.org/wiki/Keyword_%28linguistics%29
Post a Comment