Thursday 3 October 2024

Not my first Rodeo No 6: Corpus Schmorpus...


A shameless reposting of 13-year old content again, but much updated:

One question that people ask when you say you write dictionaries is: "well how do you decide which words to put in?" Good question, with lots of different kinds of answers. 

In the past you could just copy your competitors (only joking 😉) or like Murray on the huge Oxford English Dictionary, you could get hundreds of 'spotters' to send in words on cards. 

But one of the good answers and the one I would have given from when I started in 1988 is "you use a corpus". Corpus is a good old academic term for a usually large body of written evidence that you can search. In biblical terms a concordance of the corpus of the bible would show you each word used in it and how it is used.

For writing dictionaries you need a good sized sample of your target language (millions of words at least) so that you can spot and have evidence for rarer words.

In the past (before Joan Clarke, IBM and Berners-Lee) a corpus could be on paper, printed out as a huge set of concordances. I have worked with one. I think each million words filled an entire filing cabinet. But now we have not just billions, but many trillions of words which are in principle searchable. So the evidence is gob-smackingly large.

And if you search for the word gobsmackingly I promise you you will never run out of examples. 

In fact back in the early days of the internet when Netscape would tell you how many examples it had found we lexicographers would bet on the number and even suggest things that would only come up once, such as rare idions like 'black as Newgate's knocker' or 'God willing an the crick don't rise'. Try them now.

So what do you do with all this evidence? Well you can see *how* the word is used, which to my mind is the most important thing you need to know. Apart from its meaning, which in fact the corpus can't show you directly but is usually easy to work out, you can see if this word is informal or technical or very vanilla. By looking at the words with it you can see clusters of different meanings. 'Bite' will show you dogs and mosquitoes and even clutches, all biting. With a bit of care you will also find the word 'bug' there, almost always 4 positions after the verb. (Answers on a postcard 🙂). And now I am terribly tempted to search for the word 'vanilla'.

Since our first online fumblings (and I'm now terribly tempted to search for the word 'fumblings': you can see how lexicographers get distracted) the technology has improved massively and corpus searching underlies many of the miracles of things such as Google Translate. A late colleague of mine, Adam Kilgarriff, did more than anyone else before his premature death to bring computer technology to the lexicographers aid. You can find out more about it here:

https://www.sketchengine.eu/

Compared to Murray in his scriptorium (see below) we are truly blessed.




Thursday 1 November 2012

Issues around Lingo Bingo




Now the concept of 'Lingo Bingo' is, I am sure, well understood by those who have attended team-building sessions, monthly departmental meetings etc.

It is a harmless and interesting pastime in which you have to predict several of the meaningless 'lingo' items that will come up during the meeting, using competition as your incentive. You write several likely pieces of 'lingo' on a card. As each one comes up in the meeting you cross it off your card. The first person to fill their card is the winner.

Sharp-eyed readers will see that I have tee-ed (teed?) up one of those lingo items in the title of this piece, because for me one of the most egregious pieces of social sciences lingo is the phrase 'issues around X'. Just stop and think for a moment: What does it mean? How close around something does an issue need to be for it to be an 'issue around'? By 'around' do we mean that it has any kind of causal relationship with it or that it simply co-occurs? Are coffee and biscuits an 'issue around' business meetings? If not, how can you prove this?

But my purpose today is not to diss the peccadilloes of social scientists or marketing managers. Instead I would like to move us towards a theory of lingo (did you see what I did there?)

Can we define lingo just as the necessary tackle and trade of a particular activity? Is it lingo for a software developer to refer to an API? Surely this is merely a useful acronym to speed up communication and make it more precise. Surely this is no crime? Admittedly, is does carry the sub-text of 'I can talk knowledgeably about APIs', but you'd sort of expect that as a bare minimum from a software developer. But what if they use the phrase 'non-trivial' in a context that does not involve programming per se? Where does lingo for convenience turn into lingo for effect?

I would argue that this is precisely where 'issues around' is so sinful a usage. Without adding any more content whatsoever it is intended to convey to you that the person speaking knows the lingo of the social sciences field and that you should therefore respect them for that. It is not used for the precision of its meaning but rather for the 'feel' that it gives to what is being said. And like all such language it is - I would contend - the enemy of communication.

So come on, what can you do to take us closer to a theory of lingo? Give your favourite examples and say why they are lingo.





Thursday 26 January 2012

What's a hater? It's someone who hates on you

I had seen the word 'hater' cropping up a lot. It was in rap videos, on urban dictionary (http://www.urbandictionary.com/), and in Facebook comments. But you could tell by the feel of it that it didn't just mean someone who hated.

So I asked my stepson what a hater was. He said "It's someone who hates on you" (with that 'duh' tone of voice that makes it clear that this fact was self-evident). Damn. Thought I was getting to the meaning of 'hater' and I all do is lose touch with the meaning of the verb 'hate'.

So, my back-of-an-envelope definition for 'hater' would be 'someone who dislikes and criticises a particular person or thing'. It's not the same as just 'someone who hates X' because I could hate asparagus or milk but I don't think that makes me a hater.

And what about 'hate on someone'. As a middle-aged Brit I have to say that's not in my personal lexicon at all. I'd be interested to hear from US-based readers about how normal/abnormal the phrase 'hate on' sounds to them.

Wednesday 12 October 2011

Jobs that begin with the letter 'L'

Such as 'lexicographer', were one of the subjects on our local radio station today, so I spent ten minutes talking to Lesley Dolphin, host of the BBC Radio Suffolk afternoon show, about the life of a lexicographer. There is a link here:

http://www.bbc.co.uk/iplayer/episode/p00kl2rj/Lesley_Dolphin_11_10_2011/

(the interview is from about 0.55.21 and goes up to about 1.05.00)


Because this is on BBC's 'Listen again' Iplayer facility I think it will expire on Oct 18th 2011 (one week after the show was aired) and it is probably only available in the UK.


Here are some of the points that came up:

  • Yes, at some point someone did write every entry that you read in the dictionary. At some point there was a blank screen that was filled in by a fallible human to the best of their abilities. And therefore it is occasionally wrong.
  • There are lots of ways to decide what to write in a dictionary but, because we now have a lot of evidence (Corpus) you would have to work very hard to persuade me that you would write a better dictionary without looking at Corpus or other genuine real-person language output
  • Language changes and always has done. English is no exception (and maybe changes more than some others)
  • You don't need to patrol language. Just record it. Language is perfectly capable of protecting itself.

Friday 9 September 2011

English phrases borrowed from Chinese - kind of

What I'm talking about are not the obvious Chinese-sounding words like tofu and Feng Shui (which are indeed borrowed or imported from Chinese - hands up those who know that Feng Shui means 'Wind + Water')

What I'm interested in now is how many other English phrases and idioms are derived from Chinese by translation of the original Chinese words. There was after all a lot of contact between England and China from about 1800 onwards and there has been a continuous contact via the entrepôt of Hong Hong.

One example I had heard of was the phrase a 'look-see' (as in "I think I might just wander over there for a quick look-see"). This is apparently well-attested as being a translation (or 'calque' for the more linguistically-minded) of the Chinese phrase 看见 (kàn jiàn) which  means 'look-see'.

However I was very surprised to realise that the idiom 'to lose face' is also taken directly from Chinese. It seemed so comfortably English that I had never suspected it was an  interloper, but it seems that it was taken in the 19th century from the much older Chinese expressions regarding face. The usual Chinese phrase for 'lose face' is 丢面子 (diū miàn zi). The concept of 'face' is much richer in the Chinese language than in English and you can, for example, make a conscious effort to 'give someone face' by treating them as important or worthy in front of others.

If anyone knows of or suspects other loans of this type I'd be glad to hear.

Monday 15 August 2011

duì​bu​qǐ 对不起! - apologies for absence for last three weeks

(duìbu is the normal Chinese word for "sorry!" by the way. It's the one you use when you bump into someone in a queue)

We've just come back from three weeks holiday in South-West China. I should have perhaps mentioned this on here in advance. But it does mean that I have come back with lots more ideas about China and particularly about how it looks to someone working with language. I will post some of these things over the next few weeks, but in the mean time here is the link to a 'Blipfoto' journal that Caroline posted while we were there. She's put 21 photos there - one for each day - of things that really caught the eye. The first one is for July 23rd (here: http://www.blipfoto.com/entry/1309168) and they go through to Aug 13th. I'm particularly keen (for obvious reasons) on the nice picture of a calligrapher's shop on Aug 11th. More later

Monday 18 July 2011

Chinese Sayings No.5


杯盘狼藉 (bēi pán láng jí)

To find out what this idiom means, why don't you just copy the Chinese characters, go to Google, select 'Image search' and see what comes up? (This is a sneaky trick of the bilingual lexicographer or even of a language-curious non-lexicographer who wants to find out the meaning of something in a foreign language)


The 'literal' meaning of this idiom is something like "cups and plates scattered all over the place" (although the Chinese word bēi can stand for glasses too, so this is not just an unruly teaparty)

The idiomatic meaning is 'the scene after the feast', ie a sign that everybody has just had a very good evening. In an idiomatic English reading of it, you'd think of it as "what the kitchen looked like the morning after the party". But in China all carousing is done in a restaurant, so the mess would be cleared away within the hour.