The speaker shapes the language and the language shapes the speaker

Language changes. And an individual’s use of language changes, as they develop from child to teenager to adult. How might these two kinds of development be related?

Researchers at Stanford University have been looking at the question. They took as their data two discussion websites (both for the discussion of beer, though they assure us the topic wasn’t critical.) A nice thing about these websites was that they could download everything: all the discussions, with every contribution labeled with who made it. On both websites most contributions were in English, but, like any community, the community defined by each website developed a distinctive vocabulary and way of expressing itself. Each is a society in miniature. Both had been going for over ten years, and the researchers could look at how this shared language changed over time.   They could also look at each individual’s language, from their first post, their ‘birth’ in the community, to their last, their ‘death’. Both communities had several thousand contributors, some long-lived and some short-lived, some talkative, contributing several hundred posts, others meek, contributing just once or only a few times.

The core method for the research was to identify the recurring expressions in each community, observing their rise and fall over the lifetime of the community, and also to observe the expressions used by each individual. They could then plot how the two sets of patterns interacted. An example is the term used in one of the communities to refer to a beer’s smell. Two options were to use the word aroma, or the letter S (short for smell). What was the norm at any point in time, and how it changed, and which users used which option, can all be followed in the data.

When we join a new community, we do not know its distinctive vocabulary: if we want to become a fully integrated member of the community, we will need to learn and adopt it. Thus, new members’ early contributions were not ‘on the nail’ of the community’s norms, but over time they learnt the community’s vocabulary and fitted in better. This was their community ‘childhood’.

It often happened that, once they had fully learnt the language of the community, they started playing with it, contributing new expressions and being quick to adopt others’ innovations. So at this phase they were contributing to language change. This was their adolescence.

And then, typically a third of the way through their lifetime in the community, they tended to stick. They carried on using the terms that they and ‘their generation’ had coined, but were less inclined to adopt new terms, or to innovate themselves.   This was their adulthood.

Strange to say, the division was into these three stages, with the peak of creativity one third of the way through a lifetime, whether the lifetime was one year long or ten.

For the full paper, see

No country for old members: User lifecycle and linguistic change in online communities. Cristian Danescu-Niculescu-Mizil, Robert West, Dan Jurafsky, Jure Leskovec, Christopher Potts. Proceedings of WWW Conference, 2013.  (Winner of best paper award.)


  1. Interesting paper, thanks for sharing. I just started exploring Slovene specialized forums from the perspective of knowledge transfer and it seems that there is a clear distinction between roles, e.g. novice/lay vs. semi-expert vs. expert/moderator, and these roles seem to correlate with language use, esp. the use of terminology and “specialized neologisms” such as S in the beer community. Apparently the time dimension should also be considered, as roles may change over time.

  2. That article is fascinating. I couldn’t help look at it from a personal perspective. It’s interesting that our language doesn’t change much amongst our oldest friends, but we adapt in new “communities” and adopt new language, often driven by younger generations.
    Adam – wishing you all the very best with your treatment. Alison

  3. This interesting study looks like an ideal candidate to use in class for students to replicate. Unfortunately, the data have been removed from the SNAP archive. This difficulty to get a hold of data used in earlier studies is illustrative of the copyright challenges that language research faces.

    1. Fascinating that here ‘data’ is plural. It is a plural Latin word but mostly we say stadiums not stadia. In French ‘data’ is plural (donnees) and it is difficult to translate into English the singular form (donnee). How big a ‘piece’ of data is a donnee? One byte? One word (when I was young that was two bytes)? We cannot use ‘datum’ as that means something else entirely. (Does it not?). We nowadays have something known as ‘big data’. That does not look like a plural to me. I think I have heard Adam being very tolerant of changing usage (e.g. ‘fewer’ supplanted by ‘less’) and I am becoming more tolerant by the minute. I’m off to Google ‘corpus’ now. (A sentence that was probably meaningless a few years ago, not only because there was no Google then). I suspect that ‘corpus’ means the sum total of what people say rather than the sum total of what people ought to say (i.e. the OED).

  4. Thanks for this – I agree fascinating data, especially if you are a beer lover (I’m not but I know some people who are…). I think of my first non-work-related online community, Mumsnet, and well remember the feeling of being a learner of a second language … but I gradually acclimatised to it. Looking forward to more of your discoveries.

  5. I think this is fascinating – the way members passed through the entire lifecycle regardless of whether they were longtime subscribers or only joined for a while. And that they stopped being so creative and open to new ideas after one third of their subscription life had passed. But we’re not like that, are we? I hope not!

  6. Would it be a fair extrapolation to say one can predict how quickly people get bored with an on-line community by how long they remain responsive to developments in its tone and vocabulary? The longer one stays tuned-in to the tone, the longer one will stay active in the community, in a ratio of approximately 1:2. And – though this doesn’t seem to be covered by the research – presumably the process is repeatable (indefinitely?), and an individual can be simultaneously a child, an adolescent, an adult, and senile in different on-line communities. But at that point the parallel to language acquisition in the off-line community might start to feel a bit strained?

  7. Thanks for this paper which is interesting and related to my interest in how one progresses in a discipline, especially during teenage years when one starts as a novice.

  8. I’m interested in how much the adoption of a new specialised language is intended to deliberately signal your membership of a community. I think that in many cases your ideas would be quite as clear without the specialised language but your membership and status might be in doubt. I have just done a teacher training course and I do wonder how much of our new language is simply intended to make us sound like teachers. It’s psychologically understandable, but how much of one’s belief in the ability of a professional could be just down to them ‘sounding right’? What are the little tropes that you’d expect to find from a computational linguist, for example? :)

