Page 1 of 1
Conundrum Affixes - A Statistical Approach
Posted: Thu Aug 06, 2009 10:00 pm
by Simon Myers
For many who play Countdown, a common heuristic used to help solve conundrums is affix matching. In particular, looking for common prefixes (UN-, OVER-) and suffixes (-ING, -IEST).
I wanted to see to what extent this was useful and see whether conventional wisdom is accurate in this case. Using the set of Apterous conundrums (thanks Charlie) I wrote a program that looked at the number of conundrums that contain a given set of letters (say I, N, G) and compared this with the number of conundrums that began/ended with those letters (-ING in this case). So in effect it assesses the positives-to-decoys ratio.
I decided to post this here rather than in the Apterous subforum because I think the findings should have some use with respect to the show, with the caveat that heat game conundrums tend to have more common affixes than the whole Apterous set; finals (and CofC) game conundrums have less.
PREFIXES
Code: Select all
73% of 30 - COMM
61% of 184 - EX
42% of 55 - COMP
41% of 173 - OVER
41% of 66 - SQ
33% of 1269 - RES
28% of 149 - QU
24% of 96 - APP
23% of 1308 - UN
22% of 183 - UNDER
21% of 196 - SUB
20% of 227 - FOR
18% of 564 - CON
16% of 437 - PRO
15% of 144 - IMM
14% of 605 - DIS
12% of 199 - SUP
11% of 412 - OUT
10% of 3144 - RE
09% of 693 - PRE
07% of 342 - INTER
07% of 498 - MIS
Also of interest might be the single-letter "prefixes" J (45% of 121), F (37% of 944), P (34% of 1727), W (33% of 652) and B (32% of 1296). Interestingly the letter N is the first letter of only 3% of the 4317 conundrums in which it appears. L, E, I, K, and O all sit around the 7% mark.
SUFFIXES
Code: Select all
94% of 33 - FULLY
87% of 30 - OLOGY
76% of 1474 - ING
75% of 75 - IZED
72% of 53 - OUSLY
68% of 481 - NESS
62% of 798 - LY
58% of 60 - IFIED
50% of 2244 - ED
50% of 42 - WORK
48% of 48 - BOARD
44% of 315 - ABLE
42% of 206 - IZE
42% of 78 - ABLY
29% of 252 - LESS
27% of 812 - IEST
24% of 471 - TION
22% of 97 - IZER
21% of 482 - ATED
20% of 405 - IVE
14% of 3144 - ER
14% of 498 - ISM
12% of 1450 - ATE
10% of 396 - SION
Powerful single-letter "suffixes" include Y (70% of 1286), G (51% of 2220), D (49% of 2719) and E (22% of 5803). Avoid U (0.08% of 2384), V (0.1% of 715), and I (0.3% of 5145).
The next step would probably be an attempt at identifying letter modifiers to a prefix or suffix (e.g. perhaps the presence of the letter F increases the chance of an -ING ending to 85%, etc).
Another idea I've had is to take, for example, the 24% of words that contain I, N, G but do not end -ING. Perhaps there's a useful set of suffixes to check for ING-decoys. Some research into a blended strategy (if one tries both -ABLY and -ABLE what are the chances of success?) could also prove fruitful. But that's for another day.
Re: Conundrum Affixes - A Statistical Approach
Posted: Thu Aug 06, 2009 10:07 pm
by Jon Corby
Brilliant work. That's really interesting.
Re: Conundrum Affixes - A Statistical Approach
Posted: Fri Aug 07, 2009 1:31 am
by Jon O'Neill
This is excellent. Well done!
Re: Conundrum Affixes - A Statistical Approach
Posted: Fri Aug 07, 2009 9:17 am
by Charlie Reams
Nice work. Some surprising finds at the top, although I wonder how many of these patterns good conundrumists would be subconsciously aware of. Questions about methodology:
* What minimum (if any) did you set on how many words each set of letters had to appear in?
* Did you do some kind of prefix elimination? (e.g. I'd expect SQU- to make the list if SQ- did.)
Re: Conundrum Affixes - A Statistical Approach
Posted: Fri Aug 07, 2009 11:09 am
by Kevin Thurlow
That is interesting... So when I got "Neighbour", it was doubly interesting as it starts with "N" and the ING is mixed up... Perhaps someone who's more alert than me at the moment can say what ended with "V"?
Re: Conundrum Affixes - A Statistical Approach
Posted: Fri Aug 07, 2009 11:24 am
by Dinos Sfyris
Kevin Thurlow wrote:That is interesting... So when I got "Neighbour", it was doubly interesting as it starts with "N" and the ING is mixed up... Perhaps someone who's more alert than me at the moment can say what ended with "V"?
Off the top of my head LEITMOTIV
Re: Conundrum Affixes - A Statistical Approach
Posted: Fri Aug 07, 2009 1:30 pm
by Simon Myers
Charlie Reams wrote:
* What minimum (if any) did you set on how many words each set of letters had to appear in?
In the program itself I set no minimum. I had it go through the whole list once and record all "affixes" of up to 5 letters. So for COMPUTING, say, it recorded C, CO, COM, COMP, COMPU and G, NG, ING, TING, UTING (or incremented the counter for those that had already been seen before). Once I had my full list of these I then went through the whole list of affixes and for each conundrum in turn incremented a counter for words that included the letters of the affix in any place. So in the end I had a list of affixes, number of conundrums with that affix, and number of conundrums that contained the letters of the affix in any position.
My point is that all the raw data is available. For reporting I limited it to those affixes that have appeared 20 times or more in the set, which means they must represent at least 0.25% of all conundrums. If I didn't do this you would have some stuff like COMMU- at 100% of 7, MIDWI- at 100% of 4 and -ZZLED at 75% of 4.
Charlie Reams wrote:
* Did you do some kind of prefix elimination? (e.g. I'd expect SQU- to make the list if SQ- did.)
Yes I did this by hand afterwards. The stats for SQU- and SQ- are identical as you might have assumed. There are lots of other things like -KING (34% of 101) and -LOGY (37% of 73) that make the list (209 prefixes and 249 suffixes occur at least 20 times in the set) but their inclusion above would probably obscure the useful results. There were no real surprises that I could see anyway, such as -NDING having a stronger correlation than -ING (it doesn't), so it would just make things confusing.
For those that are interested, I've uploaded the CSV file with the raw data
here
Re: Conundrum Affixes - A Statistical Approach
Posted: Fri Aug 07, 2009 8:58 pm
by Jason Larsen
That's very helpful, Clive!
Thank you!
Re: Conundrum Affixes - A Statistical Approach
Posted: Fri Aug 07, 2009 11:16 pm
by Gavin Chipper
Jason Larsen wrote:That's very helpful, Clive!
Thank you!
Normally when you do that, it's at least in response to some post in the same thread! Now we have to search the whole forum for the relevant post!
Re: Conundrum Affixes - A Statistical Approach
Posted: Sat Aug 08, 2009 2:10 am
by Jason Larsen
Gavin, I knew what I was talking about!
Re: Conundrum Affixes - A Statistical Approach
Posted: Sat Aug 08, 2009 6:13 pm
by Kirk Bevins
Jason Larsen wrote:Gavin, I knew what I was talking about!
But nobody else does, which is the idea of a forum.
Re: Conundrum Affixes - A Statistical Approach
Posted: Sat Aug 08, 2009 6:31 pm
by Shaun Hegarty
I notice a lot of bio- conundrums, perhaps that could be included. Overall, though, and interesting set of statistics though.
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 2:31 am
by Jason Larsen
Really?
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 2:33 am
by Simon Myers
Shaun Hegarty wrote:I notice a lot of bio- conundrums, perhaps that could be included. Overall, though, and interesting set of statistics though.
4% of 257.
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 6:44 pm
by Simon Myers
As an accompaniment to the regular conundrums, here are the hyper conundrums:
PREFIXES
Code: Select all
55% of 119 - EX
42% of 57 - COMM
39% of 186 - OVER
27% of 1133 - UN
22% of 98 - COMP
22% of 90 - EXT
21% of 108 - SUPER
21% of 205 - UNDER
21% of 586 - DIS
20% of 113 - HYP
20% of 1265 - CO
19% of 170 - SUB
17% of 131 - APP
16% of 893 - PR
16% of 833 - CON
15% of 137 - MICRO
15% of 198 - SUP
14% of 1205 - DE
14% of 332 - COM
14% of 201 - DISC
11% of 175 - DEMO
11% of 254 - IMP
11% of 565 - PRO
11% of 2207 - RE
11% of 226 - UNP
11% of 317 - TRANS
11% of 186 - CONC
10% of 365 - CONS
Letters P (29% of 1270), W (26% of 160), D (26% of 1398), and F (25% of 511) are quite handy; N (2% of 3153), L (3% of 2403) and G (6% of 1510) are not.
SUFFIXES
Code: Select all
95% of 38 - LESSNESS
90% of 110 - IZING
86% of 37 - FULLNESS
77% of 26 - FULLY
76% of 33 - OLOGICAL
75% of 61 - ABILITY
73% of 71 - IZATION
72% of 50 - OUSNESS
71% of 118 - IZED
69% of 913 - LY
68% of 1157 - ING
67% of 36 - ISHNESS
66% of 140 - OUSLY
65% of 163 - ICALLY
64% of 56 - TIVELY
62% of 300 - ALLY
56% of 46 - OLOGIST
55% of 479 - NESS
53% of 105 - VELY
53% of 49 - LOGICAL
44% of 314 - ABLE
41% of 1205 - ED
40% of 700 - ATION
34% of 158 - ABLY
33% of 1012 - TION
17% of 236 - TORY
Letters worth looking at include Y (81% of 1144), G (52% of 1510) and D (37% of 1398) [next best is E with 16% of 3319]. Avoid I (0.1% of 3566), A (0.6% of 1924), X (0.7% of 130) and F (1% of 511).
A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 6:52 pm
by Paul Howe
Simon Myers wrote:
A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
PORTMANTEAUX?
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 6:55 pm
by Simon Myers
Paul Howe wrote:Simon Myers wrote:
A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
PORTMANTEAUX?
Indeed. Should've posted when I knew you weren't lurking around here Paul. Try the 4 that end in I, which are more challenging I think.
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 7:01 pm
by Paul Howe
Simon Myers wrote:Paul Howe wrote:Simon Myers wrote:
A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
PORTMANTEAUX?
Indeed. Should've posted when I knew you weren't lurking around here Paul. Try the 4 that end in I, which are more challenging I think.
Ha, I think it's the first time I've logged in this weekend so you were quite unlucky!
I is harder, the only word that comes to mind atm is CARAVANSERAI? Sadly this will now be at the back of my mind for the rest of the evening.
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 7:07 pm
by Simon Myers
Paul Howe wrote:
I is harder, the only word that comes to mind atm is CARAVANSERAI?
No, that's not one of the four. I suppose you could also consider yourself unlucky; falling foul of Charlie's somewhat arbitrary judgement in selecting fair conundrums.
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 7:09 pm
by Charlie Reams
Simon Myers wrote:Paul Howe wrote:
I is harder, the only word that comes to mind atm is CARAVANSERAI?
No, that's not one of the four. I suppose you could also consider yourself unlucky; falling foul of Charlie's somewhat arbitrary judgement in selecting fair conundrums.
It was far from arbitrary, I selected exactly the conundrums I thought Paul
Howe wouldn't be able to guess.
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 8:37 pm
by Paul Howe
Charlie Reams wrote:Simon Myers wrote:Paul Howe wrote:
I is harder, the only word that comes to mind atm is CARAVANSERAI?
No, that's not one of the four. I suppose you could also consider yourself unlucky; falling foul of Charlie's somewhat arbitrary judgement in selecting fair conundrums.
It was far from arbitrary, I selected exactly the conundrums I thought Paul
Howe wouldn't be able to guess.
You did a good job!
On further reflection, -US to -I plurals look to be good candidates, so I'm going for:
STREPTOCOCCI (vague memories of being stumped by this on a hypernundrum attack)
and, less confidently,
STRATOCUMULI and CUMULOSTRATI, which at least have some google hits but could easily be cases of latinus malapropis
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 9:11 pm
by Phil Reynolds
ELECTROPHORI?
Re: Conundrum Affixes - A Statistical Approach
Posted: Sun Aug 09, 2009 9:53 pm
by Simon Myers
Paul Howe wrote:
STREPTOCOCCI
Yes.
Paul Howe wrote:
STRATOCUMULI
Yes.
Paul Howe wrote:
CUMULOSTRATI
No.
Phil Reynolds wrote:ELECTROPHORI
No.
Re: Conundrum Affixes - A Statistical Approach
Posted: Mon Aug 10, 2009 11:13 am
by Kevin Thurlow
Thanks Dinos
(for LEITMOTIV)
Re: Conundrum Affixes - A Statistical Approach
Posted: Tue Oct 20, 2009 7:25 pm
by Paul Howe
Simon Myers wrote:Paul Howe wrote:Simon Myers wrote:
A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
PORTMANTEAUX?
Indeed. Should've posted when I knew you weren't lurking around here Paul. Try the 4 that end in I, which are more challenging I think.
Right, I've been thinking about this non-stop for the last two months and still don't know the answer. Time to spill the beans, Myers.
Re: Conundrum Affixes - A Statistical Approach
Posted: Wed Oct 21, 2009 3:45 am
by Simon Myers
Paul Howe wrote:Right, I've been thinking about this non-stop for the last two months and still don't know the answer. Time to spill the beans, Myers.
Ah yes. In all fairness the two you didn't get were nigh on impossible. They are:
APPARATCHIKI
GASTROCNEMII
Re: Conundrum Affixes - A Statistical Approach
Posted: Sat Jan 02, 2010 5:14 pm
by Simon Myers
I've finally got around to doing the newly added apterous conundrums. There are only about 1100 of these so I've lowered the threshold for inclusion to 6 instead of 20. Due to the smaller sample there are a few differences between this and the main list. When the two lists are interpolated, the actual difference made to the main stats changes very little (INGs change by about 1% for example).
PREFIXES
Code: Select all
55% of 11 - QUA
43% of 35 - EX
42% of 26 - OVER
40% of 20 - QU
36% of 165 - UN
25% of 73 - PRO
22% of 55 - OUT
22% of 23 - UNDER
18% of 33 - SUB
14% of 85 - DIS
10% of 462 - RE
SUFFIXES
Code: Select all
80% of 10 - IZED
80% of 10 - ISHLY
72% of 11 - WOOD
71% of 160 - ING
63% of 11 - INGLY
61% of 54 - ABLE
56% of 39 - NESS
55% of 42 - IZE
53% of 17 - ALLY
47% of 43 - LESS
38% of 24 - FUL
33% of 18 - TORY
32% of 19 - BIRD
28% of 104 - IEST
27% of 462 - ER
22% of 64 - IVE
21% of 161 - EST
20% of 75 - ISM
16% of 58 - TION
15% of 61 - MAN
13% of 161 - IST
12% of 73 - ISH
12% of 52 - OUS
There are a whole bunch of 9 letter words that are conundrum-valid (have no anagrams, not plurals), around 3500, so sometime soon I might do the stats on those to see if there are any inherent biases in Charlie's conundrum selections.