Brainstorm for a metric for the "interestingness" of a hit

Do you want to test the possibilities of this forum? Want to test out your new signature? You can do it here.

Moderators: avij, Phaseolus, Crazy Bob, Fons

Post Reply
User avatar
Math Murderer
Euro-Master
Euro-Master
Posts: 1062
Joined: Fri Nov 17, 2006 1:19 pm
Location: Erfurt, Germany
Contact:

Brainstorm for a metric for the "interestingness" of a hit

Post by Math Murderer »

From this thread:
Nerzhul wrote:What EBT needs is a good metric for the "interestingness" of a hit. It should be as simple as possible, transparent, fair, universal and comprehensible. A score by which all hits can be ranked and filtered and whose outcome roughly represents the order a normal, unbiased user would apply.
First attempt:
Hit ratio location 1 * Hit ratio location 2 * ... * Hit ratio location n / Instances of hits between n locations

Examples:
Helmbrechts-Helmbrechts hit: 64,68 * 64,68 / 1433 = 2,92
Helsinki-Tampere hit: 43,36 * 37,72 / 1546 = 1,06
Berlin-Ljubljana hit: 241,93 * 87,56 / 9 = 2.353,71
Zaragoza-Bari hit: 2523,24 * 1926 / 1 = 4.859.760,24
Frankfurt-Groningen-Leuven hit: 204,58 * 106,89 * 103,85 / 1 = 2.270.945,71
Klagenfurt-Vienna-Vienna-Vienna hit: 101,63 * 103,96 * 103,96 * 103,96 / 1 = 114.188.071,44

Critcism: Ridiculously biased towards hits with places with poor hit ratios and towards multiple hits. Formula doesn't take distance into account, resulting in the Helsinki-Tampere hit being LESS interesting than a local Helmbrechts hit due to Finland having better hit ratios. However, a different pair of Finnish cities would probably have a better score (less hits between them), so the formula, while needing some work, doesn't appear to be completely broken. It might be a start to build on...
||\\ .. //|| ||\\ .. //|| There is nothing right in my left brain.
||. \\// .|| ||. \\// .|| There is nothing left in my right brain.
User avatar
Math Murderer
Euro-Master
Euro-Master
Posts: 1062
Joined: Fri Nov 17, 2006 1:19 pm
Location: Erfurt, Germany
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by Math Murderer »

Second attempt:
sqrt (Hit ratio location 1 * Hit ratio location 2 * ... * Hit ratio location n) / ln (1 + Instances of hits between n locations)

Examples:
Helmbrechts-Helmbrechts hit: sqrt (64,68 * 64,68) / ln (1 + 1433) = 64,68 / 7,27 = 8,90
Helsinki-Tampere hit: sqrt (43,36 * 37,72) / ln (1 + 1546) = 40,44 / 7,34 = 5,51
Berlin-Ljubljana hit: sqrt (241,93 * 87,56) / ln (1 + 9) = 145,55 / 2,30 = 63,28
Zaragoza-Bari hit: sqrt (2523,24 * 1926) / ln (1 + 1) = 2.204,49 / 0,69 = 3.194,91
Frankfurt-Groningen-Leuven hit: sqrt (204,58 * 106,89 * 103,85) / ln (1 + 1) = 1.506,97 / 0,69 = 2.184,01
Klagenfurt-Vienna-Vienna-Vienna hit: sqrt (101,63 * 103,96 * 103,96 * 103,96) / ln (1 + 1) = 10.685,88 / 0,69 = 15.486,78

Criticism: Needlessly complicates the previous formula, but it does have the benefit of bunching the scores closer to each other. The hypothetical quadruple Austrian hit is now "only" 2800 times more interesting than the regular Finnish hit, and not 107 Million times as was the case with the previous formula. Still doesn't take distances into account, so this is definitely not something to build on. Garbage, but since it's a brainstorm this does get posted. :)
||\\ .. //|| ||\\ .. //|| There is nothing right in my left brain.
||. \\// .|| ||. \\// .|| There is nothing left in my right brain.
User avatar
Jes
Euro-Master
Euro-Master
Posts: 4932
Joined: Thu May 24, 2007 9:38 pm
Location: Away from home (once again)
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by Jes »

Shall I give you a hand on this? (if not, I will delete this post :oops: )

What do you think about the folowing formula:

((1+number_of_country_borders_crossed) * Hit_ratio1 * Hit_ratio2...) * Sqr(Total_distance(Km)) / hits_between_locations^2

Edit: This gives me a huge range of numbers, I think it might be improved this way:

(1+number_of_country_borders_crossed) * Sqr(Total_distance(Km) * Hit_ratio1 * Hit_ratio2...) / hits_between_locations^2

Examples:
Emskirchen-Leende: (1+1)*Sqr(415*94.9*182.35)/1^2 = 5360
Wien - St. Andrä-Wördern: (1+0)*Sqr(17*129.03*103.96)/24^2 = 396
Portorož - Oulu: (1+4)*Sqr(2287*104.62*48.14)/1^2 = 16969
Düsseldorf - Didam - Terrassa (1+4)*Sqr(1284*146.45*93.79*770.88)/(3+1)^2 = 36438

Ok, it gives us big numbers, but we can divide results by 1000 and then:

05.360
00.396
16.969
36.438

Not that far... :|

BTW: I love your signature :D
Jes Speaks English, French, Spanish, Tokpisin and Esperanto. (Currently learning Swahili).
Don't fear perfection, you'll never reach it! (by Salvador Dali)
my EBT: http://es.eurobilltracker.com/profile/?user=121292" coins and banknote collector. :)
User avatar
Math Murderer
Euro-Master
Euro-Master
Posts: 1062
Joined: Fri Nov 17, 2006 1:19 pm
Location: Erfurt, Germany
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by Math Murderer »

Jes wrote:Edit: This gives me a huge range of numbers, I think it might be improved this way:

(1+number_of_country_borders_crossed) * Sqr(Total_distance(Km) * Hit_ratio1 * Hit_ratio2...) / hits_between_locations^2

Examples:
Emskirchen-Leende: (1+1)*Sqr(415*94.9*182.35)/1^2 = 5360
Wien - St. Andrä-Wördern: (1+0)*Sqr(17*129.03*103.96)/24^2 = 396
Portorož - Oulu: (1+4)*Sqr(2287*104.62*48.14)/1^2 = 16969
Düsseldorf - Didam - Terrassa (1+4)*Sqr(1284*146.45*93.79*770.88)/(3+1)^2 = 36438

Ok, it gives us big numbers, but we can divide results by 1000 and then:

05.360
00.396
16.969
36.438

Not that far... :|
I love the range of results that you're getting with these examples, whether it's in the original value or after it's divided by 1000, and the fact that this formula does take the hit distance into account. But I can't help but bring up the extremely common examples of my post:
Helmbrechts-Helmbrechts hit: (1+0) * Sqr(1 * 64,68 * 64,68) / 1433^2 = 0,0000315 (that's assigning a 1 Km distance and NOT divided by 1000 :!: )
Helsinki-Tampere hit: (1+0) * Sqr(164 * 43,36 * 37,72) / 1546^2 = 0,000217 (not divided by 1000)
So this formula seems to punish common hits even more than mine.

I also have two other questions for this formula, and both can be addressed by comparing one example that you posted to a slightly modified version. I'm talking about the Düsseldorf - Didam - Terrassa triple hit. Let's assume that the hit would have been Düsseldorf - Terrassa - Didam instead of what it was.
The first question would be: How many borders does this fictional hit cross? If you draw a straight line between Düsseldorf and Terrassa I think you would cross from Germany to Belgium, then back to Germany, then to Luxembourg, then to France and finally to Spain. That means either two border crossings (Germany -> France -> Spain) if you go for simplicity and possible actual bill movement or five border crossings if you want actual geographic accuracy.
The second question would be: Should the score for the fictional Düsseldorf - Terrassa - Didam hit be different than the score for the actual Düsseldorf - Didam - Terrassa triple hit? I'm asking this because the fictional hit would have a significantly higher score (more borders crossed, greater total distance and less hits between locations).
Jes wrote:BTW: I love your signature :D
Thanks, and I love your avatar. Tuve la suerte de vivir ocho meses en Guatemala, el hermoso país de donde provienen los quetzales. :D
||\\ .. //|| ||\\ .. //|| There is nothing right in my left brain.
||. \\// .|| ||. \\// .|| There is nothing left in my right brain.
User avatar
Jes
Euro-Master
Euro-Master
Posts: 4932
Joined: Thu May 24, 2007 9:38 pm
Location: Away from home (once again)
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by Jes »

Math Murderer wrote:
I love the range of results that you're getting with these examples, whether it's in the original value or after it's divided by 1000, and the fact that this formula does take the hit distance into account. But I can't help but bring up the extremely common examples of my post:
Helmbrechts-Helmbrechts hit: (1+0) * Sqr(1 * 64,68 * 64,68) / 1433^2 = 0,0000315 (that's assigning a 1 Km distance and NOT divided by 1000 :!: )
Helsinki-Tampere hit: (1+0) * Sqr(164 * 43,36 * 37,72) / 1546^2 = 0,000217 (not divided by 1000)
So this formula seems to punish common hits even more than mine.
Yes, true. perhaps the "hits_between_locations^2" is too much... we can try to substitute it by just "hits_between_locations" (no exponentiation) although that would give us some huge numbers.
Math Murderer wrote:I also have two other questions for this formula, and both can be addressed by comparing one example that you posted to a slightly modified version. I'm talking about the Düsseldorf - Didam - Terrassa triple hit. Let's assume that the hit would have been Düsseldorf - Terrassa - Didam instead of what it was.
The first question would be: How many borders does this fictional hit cross? If you draw a straight line between Düsseldorf and Terrassa I think you would cross from Germany to Belgium, then back to Germany, then to Luxembourg, then to France and finally to Spain. That means either two border crossings (Germany -> France -> Spain) if you go for simplicity and possible actual bill movement or five border crossings if you want actual geographic accuracy.
The second question would be: Should the score for the fictional Düsseldorf - Terrassa - Didam hit be different than the score for the actual Düsseldorf - Didam - Terrassa triple hit? I'm asking this because the fictional hit would have a significantly higher score (more borders crossed, greater total distance and less hits between locations).
I understand your point. Perhaps it might be better to substitute "number_of_country_borders_crossed" by "number_of_countries_involved." :?: That would be easier to compute and at the same time, it takes into account the international hits as more interesting than the national ones (as usually are).

What I tried to say with the formula, is that:

1) The longer the distance, the more interesting the hit is. (not always, but often) that is why I included it whithin the "Sqr"
2) The higher the hit ratio, the more interesting as well (as getting a hit in such places is suppoused to be "uncommon")
3) The more hits between those locations, the less interesting is the hit. There are a lot of Hits between Tampere and Helsinki, thus: one more hit is not something very special.

A negative point in this aspect (IMO) is that whenever you get a new hit between 2 locations, the interestingness ratio of each hit involving those cities, is changed.
Well, it makes sense if you assume that the first hit between two locations would attract more attention than the second one, and the second more than the 3rd one etc.
Math Murderer wrote:
Jes wrote:BTW: I love your signature :D
Thanks, and I love your avatar. Tuve la suerte de vivir ocho meses en Guatemala, el hermoso país de donde provienen los quetzales. :D
8O Wow!! ¡Muchas gracias! Me encantan los quetzales. Por eso tengo éste como avatar :D ¡Qué bonito haber vivido en Guatemala! :D
Last edited by Jes on Wed Sep 09, 2009 8:21 pm, edited 1 time in total.
Jes Speaks English, French, Spanish, Tokpisin and Esperanto. (Currently learning Swahili).
Don't fear perfection, you'll never reach it! (by Salvador Dali)
my EBT: http://es.eurobilltracker.com/profile/?user=121292" coins and banknote collector. :)
User avatar
Jes
Euro-Master
Euro-Master
Posts: 4932
Joined: Thu May 24, 2007 9:38 pm
Location: Away from home (once again)
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by Jes »

So, let me try a new formula. At this instance, I changed the RMS of Hit ratios by an arithmetic mean. I think it may be better that way as we give less importance to these factors and more to the others.

number_of_countries_involved * Sqr(Total_distance(Km)) * Mean_of_Hit_ratios / hits_between_locations

When I said: "Mean_of_Hit_ratios" I meant: (Hit_ratio_1 + Hit_ratio_2 + ... Hit_ratio_n) / n

For instance:

Helmbrechts-Helmbrechts (1km) : 1 * 1 * (64,68 + 64,68) / 2 * (1 / 1433) = 0.045
Helsinki-Tampere: 1 * Sqrt(164) * (43,36 + 37,72) / 2 *(1 / 1546) = 0.33
Berlin - Ljubljana: 2 * Sqrt(720) * (242.54 + 87.52) / 2 *(1/9) = 984
Zaragoza - Bari: 2 * Sqrt(1485) * (2527.88+1926)/2 * (1) = 171633
Emskirchen-Leende: 2 * Sqrt(415) * (94.9+182.35)/2 * (1) = 5648
Wien - St. Andrä-Wördern:1*Sqrt(17)* (129.03+103.96) / 2 * (1/24) = 20
Düsseldorf - Didam - Terrassa: 3*Sqrt(1284)* (146.45+93.79+770.88) / 3 * 1/(3+1) = 9058

Well... it may not be the best, but at least it is an idea. :|

Main disadvantage that I am aware of: it doesn't take into account the number of times that the bill has been tracked; which usually is a very interesting issue. (Specially when the note is tracked in different cities, ie: not only in Wien four times or Helsinki 3 times...) in some way, it is solved with the "number_of_countries_involved"
Jes Speaks English, French, Spanish, Tokpisin and Esperanto. (Currently learning Swahili).
Don't fear perfection, you'll never reach it! (by Salvador Dali)
my EBT: http://es.eurobilltracker.com/profile/?user=121292" coins and banknote collector. :)
User avatar
Math Murderer
Euro-Master
Euro-Master
Posts: 1062
Joined: Fri Nov 17, 2006 1:19 pm
Location: Erfurt, Germany
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by Math Murderer »

Jes wrote:number_of_countries_involved * Sqr(Total_distance(Km)) * Mean_of_Hit_ratios / hits_between_locations

When I said: "Mean_of_Hit_ratios" I meant: (Hit_ratio_1 + Hit_ratio_2 + ... Hit_ratio_n) / n

Main disadvantage that I am aware of: it doesn't take into account the number of times that the bill has been tracked; which usually is a very interesting issue. (Specially when the note is tracked in different cities, ie: not only in Wien four times or Helsinki 3 times...) in some way, it is solved with the "number_of_countries_involved"
I like this one. A lot. The disadvantage you mention could be easily fixed by multiplying the score by a factor depending on the number of times the bill has been tracked. The factor could be (n - 1)^2, (n - 1)^3 or even something higher.

So if we take (n - 1)^3 our formula would be:

(n - 1)^3 * number_of_countries_involved * Sqr(Total_distance(Km)) * Mean_of_Hit_ratios / hits_between_locations

Helmbrechts - Helmbrechts (1km) : (2 - 1)^3 * 1 * 1 * (64,68 + 64,68) / 2 * (1 / 1433) = 0.045
Helsinki - Tampere: (2 - 1)^3 * 1 * Sqrt(164) * (43,36 + 37,72) / 2 *(1 / 1546) = 0.33
Berlin - Ljubljana: (2 - 1)^3 * 2 * Sqrt(720) * (242.54 + 87.52) / 2 *(1/9) = 984
Zaragoza - Bari: (2 - 1)^3 * 2 * Sqrt(1485) * (2527.88+1926)/2 * (1) = 171633
Emskirchen - Leende: (2 - 1)^3 * 2 * Sqrt(415) * (94.9+182.35)/2 * (1) = 5648
Wien - St. Andrä-Wördern: (2 - 1)^3 * 1 * Sqrt(17)* (129.03+103.96) / 2 * (1/24) = 20
Düsseldorf - Didam - Terrassa: (3 - 1)^3 * 3 * Sqrt(1284)* (146.45+93.79+770.88) / 3 * 1/(3+1) = 72464

Opinions?
||\\ .. //|| ||\\ .. //|| There is nothing right in my left brain.
||. \\// .|| ||. \\// .|| There is nothing left in my right brain.
User avatar
claudio vda
Euro-Master
Euro-Master
Posts: 9110
Joined: Wed Nov 07, 2007 2:35 pm
Location: Bremen (DE) + Pisa (IT)
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by claudio vda »

Personally I don't like at all the idea of "border crossed" or "country involved" because like this very little hits as Rome - Vatican or Gorizia - Nova Gorica are valorised too much, but I read all the attempts and I think there are many good ideas.
At the moment I am terribly studying for an exam that I have the 15/09, so I can do few, only read your work, but it is a good work ! :P

I think the first idea of Math Murderer (to involve the hit ratio of a location and the instance of hits between the two location) is good, but it have a little problem: when a hit is done not from an hot spot, but from a little village just outside this hot spot, the coefficients explode.

What do you think to have a mix between hit ratio / instance and days /km? Maybe using log if there are problems with too little or too big numbers...

I know that it is not fair to suggest without proposing, but I am very very busy for my exam... I am sorry :oops: The 15th Sept I will try me too :oops:
My statistics on EBTCHECK (Latest update 11.03.2024)
User avatar
Jes
Euro-Master
Euro-Master
Posts: 4932
Joined: Thu May 24, 2007 9:38 pm
Location: Away from home (once again)
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by Jes »

Sure: add the factor "number_of_tracks" is also interesting and it would give us better results I guess.

Thinking about what claudio vda and Math Murderer said, I suggest the following formula:

I = n * Sqrt(D * T) * MHR / (Hab + Hbc + ... + H(n-1)n)

Where:
  • I = Interestingness factor
    n = Number of tracks for that particular note
    MHR = Arithmetic mean of Hit Ratios of locations involved
    Hab = Number of Hits between location "a" and location "b"
    D = Distance in Kilometres
    T = Time in months (ie: number of days / 30)
Not sure, but it is an idea.

I took the Time factor in months, as it gives us a more reasonable range of numbers (since it is in a Sqrt, numbers trend to 1) for that reason I was thinking about changing the distance measure too (Mm for instance. Should be something simple, as divide it by 10 or so.) If we don't do so before extracting the Sqrt, we will end up getting huge numbers on the one hand, and super-tiny ones on the other. :idea:
Jes Speaks English, French, Spanish, Tokpisin and Esperanto. (Currently learning Swahili).
Don't fear perfection, you'll never reach it! (by Salvador Dali)
my EBT: http://es.eurobilltracker.com/profile/?user=121292" coins and banknote collector. :)
doiknow
Euro-Master in Training
Euro-Master in Training
Posts: 766
Joined: Sun May 20, 2007 4:04 pm
Location: Hannover, Germany

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by doiknow »

Math Murderer wrote: So if we take (n - 1)^3 our formula would be:

(n - 1)^3 * number_of_countries_involved * Sqr(Total_distance(Km)) * Mean_of_Hit_ratios / hits_between_locations
Opinions?
I wrote this before Jes posted, but I think I somehow got the time question solution...

Well we regarded the distance - but what about the time the bill travelled?
A hit is interesting imho if it travelled very fast or very slow. On the other hand a long hit is more interesting than a short one - I agree in this point.
So we need some kind of factor that points out both - slow and fast hits.
First attempt:
'days_between_entering / distance_travelled' gives us the slow hits
'distance_travelled / days_between_entering' gives us the fast hits.

Multiplying both is useless as it tends to be one...so let's add them. Ugh - how to add them as they have different dimensions? Let's introduce some dimensional factors that sort out their dimensions so we get two numbers:
'days_between_entering / (distance_travelled+1km) * km/days' =speed^-1
'distance_travelled / days_between_entering * days/km' =speed

so we multiply the formula cited above with the factor: (speed + speed^-1)
Finally we get a problem concerning 0km hits. So let's add one to the distance:

(n - 1)^3 * number_of_countries_involved * Sqr(Total_distance/km+1) * Mean_of_Hit_ratios / hits_between_locations * (speed + speed^-1)

Some current examples:
Škofja Loka > Maribor (2-1)^3*1*sqrt(111+1) * (62,92 + 84,28)/2 / 9 * (831/112 + 111/831) = 730,96
Kuopio > Markdorf (2-1)^3*2*sqrt(2025+1) * (39,68 + 302,93)/2 /1 * (2025/1711 + 1711/2026) = 31.274,92
Pentuple Hit (5-1)^3*1*sqrt(1002+1) * (41,78 + 38,17 + 28,6 + 30,55 + 39,75)/5 /1 * (915/1003 + 1002/915) = 145.536,34

the factor should be more likely logarithmic as it turns out better results (I don't want the very far hits to be beaten by the very very far* hits):
(n - 1)^3 * number_of_countries_involved * Sqr(Total_distance(Km)+1) * Mean_of_Hit_ratios / hits_between_locations * ln(speed + speed^-1)
this results in:

Škofja Loka > Maribor (2-1)^3*1*sqrt(111+1) * (62,92 + 84,28)/2 / 9 * ln(831/112 + 111/831) = 174,99
Kuopio > Markdorf (2-1)^3*2*sqrt(2025+1) * (39,68 + 302,93)/2 /1 * ln(2025/1711 + 1711/2026) = 10.903,90
Pentuple Hit (5-1)^3*1*sqrt(1002+1) * (41,78 + 38,17 + 28,6 + 30,55 + 39,75)/5 /1 * (915/1003 + 1002/915) = 50.520,27


Edit:
*Set the speed of light to one or multiply the time with the speed of light => you come to a world where you can talk of time to be a distance...then study physics :) to understand what you just did 8)
Edit2: I added the Pentuple hit as example
One Currency, one Union, one Eurobilltracker...
My dots are of the same order as 10.
User avatar
-STAR-
Euro-Master
Euro-Master
Posts: 1540
Joined: Sat Jun 18, 2005 1:23 am
Location: 1100 Wien [Vienna, Austria]
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by -STAR- »

Hi guys,

I like all your ideas and formulas, but IMHO there's two more factors missing froum your calculations:
One is the number of hits each involved user has in total and two is the number of hits the users have in common.
This way hits from users with a lot of hits (like me for example) are less interesting than for users with few hits
and uncommon hits between users not having lots of hits together are favoured over those happening more often.

Examples:
A Vienna-Vienna hit involving team nossi and me is less interesting than a hit involving diver-vienna and Ernesto_due.
A Klagenfurt-Vienna hit between carinthia and CobbDouglas is way more interesting than between Moise and zimmerge.

Maybe you find something for the formula to take these factors into consideration. ;-)
Rgds, Franz
Recent & 1000 day Stats | EBT Profile | AT Short Codes | VIE Revisited
30.08.2010 • U2 • 360° • Praterstadion • Vienna • Austria • I was there!
User avatar
Jes
Euro-Master
Euro-Master
Posts: 4932
Joined: Thu May 24, 2007 9:38 pm
Location: Away from home (once again)
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by Jes »

True, but we shouldn't complicate too much the formula so that the number is easy to compute. Besides, if you put too many factors in the formula, results will spread over a huge range which will be difficult to controll.

IMO I think that the users are "secondary" as to the interest of a hit that is why I missed them.

mmm not sure :? :oops:
Jes Speaks English, French, Spanish, Tokpisin and Esperanto. (Currently learning Swahili).
Don't fear perfection, you'll never reach it! (by Salvador Dali)
my EBT: http://es.eurobilltracker.com/profile/?user=121292" coins and banknote collector. :)
User avatar
claudio vda
Euro-Master
Euro-Master
Posts: 9110
Joined: Wed Nov 07, 2007 2:35 pm
Location: Bremen (DE) + Pisa (IT)
Contact:

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by claudio vda »

I don't know if he user is important: should a Wien-Paris hit have the same interest if it is done by a big user or a small user?
My statistics on EBTCHECK (Latest update 11.03.2024)
User avatar
Nerzhul
Euro-Master
Euro-Master
Posts: 1417
Joined: Tue Apr 27, 2004 12:33 am
Location: Berlin

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by Nerzhul »

Good ideas so far. Keep them coming. I'll try to contribute a bit, but I'm currently travelling through Laos with unreliable internet connections.

IMO we should start from scratch and first identify the factors which make a hit interesting or not. Then we should see how we can mathematically combine these attributes, normalize them and maybe simplify them.

So, what have we got:

1) Number of entries. A quintuple hit is much more uncommon than a "normal" hit (2 entries) and thus much more interesting.

2) Distance. The bigger, the better?

3) Time distance. Same.

4) Hit ratios of users / cities / countries involved. A hit involving France is much rarer than a hit involving Finland.

Anything else? Internationality? Hits involving different cities? I think 4) already covers this. However, I'm not really content factoring in the hit ratio (it will probably complicate things a lot so that the final score won't be transparent), but it really is a good indicator for interestingness / commonness, so let's work with it.

Now, how do we weigh the factors against each other? A 100days/100km is probably more interesting than a 99days/100km hit and could be considered equal to 99days/101km, but does this scale linearly? Probably not since the max travel time is currently around 2800 days, but max distance is much higher than that.

Should we apply the number of entries as a factor, i.e. a triple be 1.5 times as interesting as a normal hit, a quintuple 2.5 times? Intuitively I wouldn't want this to scale linearly but prefer the really uncommon combinations over more common ones (actually a quadruple is about 30 times more common than a quintuple, a tiple ~1800 times).

So what do you think about normalizing the factors to their commonness? Basically the same approach you already took with the hit ratio, but more global. We (EBT) can easily provide these numbers, e.g. average / median travel distance (of all hits). Likewise with total time difference and probability for being entered N times.

Basically, the more uncommon each factor, the higher it is weighed. If the probability for a specific event is extremely low, the specific attribute should be weighed very high, thus having a big impact on the interestingness score.

Of course the reverse should also be true. If the probability for an event is high (e.g. a Vienna - Vienna hit), the score should be lowered significantly.

Let's say the average hit is 100km, 100 days. The more a hit differs from these averages, the more interesting / less interesting it becomes. A 200 days / 100 km hit would then get factors of 2 and 1 (if scaled linearly).

The probability for the number of entries could be normalized ("how many times more likely than a normal hit is this N?") and simply factored into the score.

So a crude formula could be:

P(N) * max(days/avg days, km/avg km) * avg (hitratio between all involved cities)

P(2) = 1, P(3) = 59 etc.

Just some food for thought. What do you think?
EBT Webmaster | Author of the EBT-Tool | Dothunter! | EBT News on Twitter
doiknow
Euro-Master in Training
Euro-Master in Training
Posts: 766
Joined: Sun May 20, 2007 4:04 pm
Location: Hannover, Germany

Re: Brainstorm for a metric for the "interestingness" of a hit

Post by doiknow »

I thought a bit over our attempts and faced two problems we should discuss or agree to ignore for now:
1. If we take average values - which average do we take? The average of the time the hit happened or the average of now? Let's assume place A in country B is really untracked for the moment. At this time a hit happens and due to its rareness it gets scaled 'very interesting'. The local press then writes about EBT and our Place gets tracked like Vienna or Finland. Surely hits like that will occur much more often and decreasing the hits interestingness. Resulting in one and the same hit-data having 2 different ratings. Should we rank down the first hit or shouldn't we? Why is the first hit still as interesting as it was at that time it happened? Why isn't it? Or think backwards. What if a hit happens in a tracked place that gets untracked? This leads us to some kind of 'average paradoxon'
Every answer will seem unlogic in one of those situations - so what to do?

2.The independent factor problem:
Up to now we talked of hits being interesting if they fulfill some criteria (large distance, long travelling time....). At least some of those criteria are not independent: For example: the hitratio has influence on the number of hits that have happened between two or more users/cities etc. Second the distance interferes with the number of involved countries as well as the number of times the note was entered does. If we take those factors into account we could overestimate some hits as others are underestimated which results in some 'unfairness'. How can we get rid of non-independent factors?
One Currency, one Union, one Eurobilltracker...
My dots are of the same order as 10.
Post Reply

Return to “Test-Forum”