Calling statisticians

Charlie Reams · Post by **Charlie Reams** » Sat Nov 29, 2008 3:28 pm

Do any of our resident statisticians have a smart guess for what sort of distribution this data might be drawn from? I've clipped off the long tail but it approaches zero pretty steadily.

Paul Howe · Post by **Paul Howe** » Sat Nov 29, 2008 3:55 pm

Log-normal maybe?

They kind of look similar. I'm not too strong on stats so that's about the most insight I can offer.

Ben Wilson · Post by **Ben Wilson** » Sat Nov 29, 2008 4:03 pm

Does kinda looked like a skewed normal to me too, but my stats are so rusty it's unreal.

Charlie Reams · Post by **Charlie Reams** » Sat Nov 29, 2008 4:06 pm

Log-normal seems very plausible based on the source. It's the data on how long it takes people to solve conundrums on Apterous, if you're interested. I'm doing something interesting with this data which I'll share at some point.

Paul Howe · Post by **Paul Howe** » Sat Nov 29, 2008 4:14 pm

Just had an idea that it might be an Erlang distribution, but you'd expect that to have a flatter peak given the length of the tail, and I can't see any reason that conundrum times would generate Erlang data now that's been revealed as the source.

Charlie Reams · Post by **Charlie Reams** » Sat Nov 29, 2008 4:22 pm

It does look a bit Erlangy (in fact now you've said that I realise that's what was making it look familiar in the first place) but I know that human reaction times are distributed log-normal so it seems possible that other brain activities would be similar. I'll do some tests and find out.

Kai Laddiman · Post by **Kai Laddiman** » Sat Nov 29, 2008 4:43 pm

My ranking on Apterous before and after I cheated?

Frank Rodolf · Post by **Frank Rodolf** » Sun Nov 30, 2008 12:40 pm

Charlie Reams wrote:It does look a bit Erlangy (in fact now you've said that I realise that's what was making it look familiar in the first place) but I know that human reaction times are distributed log-normal so it seems possible that other brain activities would be similar. I'll do some tests and find out.

And today's Daily Duel was one of those tests?

Kirk Bevins · Post by **Kirk Bevins** » Sun Nov 30, 2008 2:05 pm

This is the best off topic thread yet - love the curiousity of Charlie and love the responses.

Gavin Chipper · Post by **Gavin Chipper** » Sun Nov 30, 2008 2:27 pm

Frank Rodolf wrote:
Charlie Reams wrote:It does look a bit Erlangy (in fact now you've said that I realise that's what was making it look familiar in the first place) but I know that human reaction times are distributed log-normal so it seems possible that other brain activities would be similar. I'll do some tests and find out.
And today's Daily Duel was one of those tests?

Would that work? Without any competition from any opposition, people are more likely to check and double check their answers. Unless Charlie has done that thing that was talked about where only the fastest gets the points. I'll do the duel now...

Charlie Reams · Post by **Charlie Reams** » Sun Nov 30, 2008 2:58 pm

Frank Rodolf wrote:
Charlie Reams wrote:It does look a bit Erlangy (in fact now you've said that I realise that's what was making it look familiar in the first place) but I know that human reaction times are distributed log-normal so it seems possible that other brain activities would be similar. I'll do some tests and find out.
And today's Daily Duel was one of those tests?

You overestimate my organisation. That duel was lined up ages ago. I just meant statistical tests on the existing data.

Howard Somerset · Post by **Howard Somerset** » Sun Nov 30, 2008 8:07 pm

It has a vague likeness to a Poisson Distribution with a mean of around 3 to 5, though it doesn't tail off quite quick enough. See the mean 4 example here.

Michael Wallace · Post by **Michael Wallace** » Sun Nov 30, 2008 8:31 pm

My first thought was a gamma, but log-normal looks about right too (depending on the parameters, obviously). If I wasn't in the middle of playing computer games I might think about the actual problem to try and decide which distributions are most appropriate.

Also, these data, not this data, n00b

Charlie Reams · Post by **Charlie Reams** » Sun Nov 30, 2008 11:39 pm

Michael Wallace wrote:My first thought was a gamma, but log-normal looks about right too (depending on the parameters, obviously). If I wasn't in the middle of playing computer games I might think about the actual problem to try and decide which distributions are most appropriate.

Log-normal fits the data fairly well, but I'm still open to better suggestions. If anyone wants the raw data to play with then let me know.

Michael Wallace wrote:Also, these data, not this data, n00b

I'll start saying "these data" when you start saying "one panino please".

Michael Wallace · Post by **Michael Wallace** » Mon Dec 01, 2008 12:41 am

Charlie Reams wrote:I'll start saying "these data" when you start saying "one panino please".

The wife and I make a point of saying pannino, not pannini, so nyer.

(not that I can remember ever asking for a pannino (or pannini))

Kirk Bevins · Post by **Kirk Bevins** » Mon Dec 01, 2008 1:14 am

Michael Wallace wrote:
Charlie Reams wrote:I'll start saying "these data" when you start saying "one panino please".
The wife and I make a point of saying pannino, not pannini, so nyer.

(not that I can remember ever asking for a pannino (or pannini))

Please try and spell them correctly. I always ask "do you do panini?" which sounds a bit odd and they then say "yes, we have bacon paninis, or cheese paninis". "I'll have a bacon panino please". I then had one woman say "sorry?" and I just said "a bacon one please" out of semi-embarrassment. Why should I get embarrassed by being correct?

Michael Wallace · Post by **Michael Wallace** » Mon Dec 01, 2008 1:26 am

Kirk Bevins wrote:The wife and I make a point of saying pannino, not pannini, so nyer.

(not that I can remember ever asking for a pannino (or pannini))

Please try and spell them correctly.[/quote]

Weird - I thought it was panini and the wife corrected me, and then I (somehow) thought that the forum spellchecker agreed with him, but clearly my eye was playing tricks on me.

Basically it wasn't my fault >_>

Ben Hunter · Post by **Ben Hunter** » Mon Dec 01, 2008 2:12 am

Kirk Bevins wrote:
Michael Wallace wrote:
Charlie Reams wrote:I'll start saying "these data" when you start saying "one panino please".
The wife and I make a point of saying pannino, not pannini, so nyer.

(not that I can remember ever asking for a pannino (or pannini))
Please try and spell them correctly. I always ask "do you do panini?" which sounds a bit odd and they then say "yes, we have bacon paninis, or cheese paninis". "I'll have a bacon panino please". I then had one woman say "sorry?" and I just said "a bacon one please" out of semi-embarrassment. Why should I get embarrassed by being correct?

Correctness is a matter of context when it comes to language, though I'll probably use 'panino' in future, purely as a pretext for charming banter with attractive sandwich shop girls.

Michael Wallace · Post by **Michael Wallace** » Mon Dec 01, 2008 11:04 am

Ben Hunter wrote:Correctness is a matter of context when it comes to language, though I'll probably use 'panino' in future, purely as a pretext for charming banter with attractive sandwich shop girls.

I don't know about anyone else, but I for one am certainly interested to find out whether your panino exploits get you anywhere...

Jon Corby · Post by **Jon Corby** » Mon Dec 01, 2008 11:24 am

It looks like my pyjama bottoms in the morning

Charlie Reams · Post by **Charlie Reams** » Mon Dec 01, 2008 11:46 am

Ben Hunter wrote: Correctness is a matter of context when it comes to language, though I'll probably use 'panino' in future, purely as a pretext for charming banter with attractive sandwich shop girls.

I actually did this last time I was in Clowns, a cafe in Cambridge which is run by Italians. The ASSG (attractive sandwich shop girl) said "ohh, very good Italian" and smiled at me. It wasn't quite the full sex I was expecting, but still rewarding.

Michael Wallace · Post by **Michael Wallace** » Mon Dec 01, 2008 3:51 pm

So I was thinking about this on the tube this morning. My main thoughts were about what factors are going to affect solving time, and then once you have these you can try and fit a model.

The two most obvious ones are player ability and conundrum difficulty. The first is easy to factor into our model, thanks to ratings (give or take the various problems with the system), the second one less so. I don't know how many conundrums have been given in multiple games, but that's one option for trying to assess their difficulty. Another might be some statistic for each conundrum on how often the word is used in English (although that's probably not easily available).

There are obviously going to be heaps of other things that influence the solving time, such as whether it's crucial (I would imagine people might be trying less hard if they've already won), or if the conundrum is needed to make a game a particularly good score. I doubt the second has much of an influence, and I'm not really convinced the first would either. There are probably other factors too, though.

But yeah, I'd start with data on the first two, assuming there's some extra information available to assess the conundrum difficulty, and then stick them into a model, maybe Time ~ Gamma(a,b) where a and b are functions of those factors. More interesting though would probably be using these data to get an assessment of the difficulty of conundrums, which is probably easier to do anyway.

Charlie Reams · Post by **Charlie Reams** » Mon Dec 01, 2008 3:54 pm

Michael Wallace wrote: But yeah, I'd start with data on the first two, assuming there's some extra information available to assess the conundrum difficulty, and then stick them into a model, maybe Time ~ Gamma(a,b) where a and b are functions of those factors. More interesting though would probably be using these data to get an assessment of the difficulty of conundrums, which is probably easier to do anyway.

That's exactly what I'm doing, although it's harder than it sounds because, with over 8000 conundrums, the data for any given conundrum is pretty sparse. There are some other complications too, which I'll share when I write up the results some time next week.

c4countdown

Calling statisticians

Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians

Re: Calling statisticians