Introducing Absolute Ratio

Let’s define the absolute ratio for positive numbers:

abs_ratio(x) = 1 / x when x < 1, otherwise: x

When x is smaller than 1 return 1 / x, otherwise return x. Here are a few example values:

x abs_ratio(x)
0.5 2
2 2
0.2 5
5 5

And a graph:

Absolute Ratio Graph

Another spelling for the same operator would take 2 positive numbers and give their absolute ratio:

And a graph:

Absolute ratio in 3D

Use case examples

  • Music and audio – an octave of a frequency F is 2F. More generally a harmony of a frequency F is N*F where N is a natural number. To decide if one frequency is a harmony of another we just need to get their absolute ratio and see if it’s whole. E.g. if abs_ratio(F1, F2) == 2 they’re octaves. If abs_ratio(F1, F2) is whole – they’re harmonies.
  • Computer vision – to match shapes that have similar dimensions e.g. their width is only 10% larger or smaller. We don’t care which is the bigger or smaller, we just want to know if 0.91 < W1 / W2 < 1.1 which may be easier to pronounce as abs_ratio(W1, W2) < 1.1
  • Real life – when we see 2 comparable objects we’re more likely to say one is “three times the other” vs “one third the other”. Either way in our brains both statements mean the same concept. We think in absolute ratios.
  • General case – When you want to know if X is K times bigger than Y or vice versa and you don’t care which is the bigger one.

Interesting Properties

  • abs_ratio(Y / X) == abs_ratio(X / Y)
  • log(abs_ratio(X)) = abs(log(X))
  • log(abs_ratio(Y / X)) = abs(log(Y / X)) = abs(log(Y) – log(X))
  • You can see from the above that absolute ratio is somewhat of an absolute value for log-space.

What’s next for absolute ratio

  • I’d love to hear more use cases and relevant contexts.
  • What would be the written symbol or notation?
  • How can we get this operator famous enough to be of use to mainstream minds?
  • About negative numbers and zero – right now that’s undefined as I don’t see a use case for that domain.
  • For some code and graphs in python checkout https://github.com/ubershmekel/abs_ratio

EDIT – I’m growing to like the binary form of the operator more so from now on let’s call it like this in python:

def abs_ratio(a, b):
    return a / b if a > b else b / a

Precision, recall, sensitivity and specificity

Nowadays I work for a medical device company where in a medical test the big indicators of success are specificity and sensitivity. Every medical test strives to reach 100% in both criteria. Imagine my surprise today when I found out that other fields use different metrics for the exact same problem. To analyze this I present to you the confusion matrix:

Confusion Matrix

Confusion Matrix

E.g. we have a pregnancy test that classifies people as pregnant (positive) or not pregnant (negative).

  • True positive – a person we told is pregnant that really was.
  • True negative – a person we told is not pregnant, and really wasn’t.
  • False negative – a person we told is not pregnant, though they really were. Ooops.
  • False positive – a person we told is pregnant, though they weren’t. Oh snap.

And now some equations…

Sensitivity and specificity are statistical measures of the performance of a binary classification test:

Sensitivity

Specificity

sensitivity and specificity

Sensitivity in yellow, specificity in red

 

In pattern recognition and information retrieval:

Precision

Recall

Let’s translate:

  • Relevant documents are the positives
  • Retrieved documents are the classified as positives
  • Relevant and retrieved are the true positives.
Precision, recall

Precision in red, recall in yellow

Standardized equations

  • sensitivity = recall = tp / t = tp / (tp + fn)
  • specificity = tn / n = tn / (tn + fp)
  • precision = tp / p = tp / (tp + fp)

Equations explained

  • Sensitivity/recall – how good a test is at detecting the positives. A test can cheat and maximize this by always returning “positive”.
  • Specificity – how good a test is at avoiding false alarms. A test can cheat and maximize this by always returning “negative”.
  • Precision - how many of the positively classified were relevant. A test can cheat and maximize this by only returning positive on one result it’s most confident in.
  • The cheating is resolved by looking at both relevant metrics instead of just one. E.g. the cheating 100% sensitivity that always says “positive” has 0% specificity.

More ways to cheat

A Specificity buff – let’s continue with our pregnancy test where our experiments resulted in the following confusion matrix:

8 2
10 80

Our specificity is only 88% and we need 97% for our FDA approval. We can tell our patients to run the test twice and only double positives count (eg two red lines) so we suddenly have 98.7% specificity. Magic. This would only be kosher if the test results are proven as independent. Most tests are probably not as such (eg blood parasite tests that are triggered by antibodies may repeatedly give false positives from the same patient).

A  less ethical (though IANAL) approach would be to add 300 men to our pregnancy test experiment. Of course, part of our test is to ask “are you male?” and mark these patients as “not pregnant”. Thus we get a lot of easy true negatives and this is the resulting confusion matrix:

8 2
10 380

Voila! 97.4% specificity with a single test. Have fun trying to get that FDA approval though, I doubt they’ll overlook the 300 red herrings.

What does it mean, who won?

Finally the punchline:

  • A search engine only cares about the results it shows you. Are they relevant (tp) or are they spam (fp)? Did it miss any relevant results (fn)? The ocean of ignored (tn) results shouldn’t affect how good or bad a search algorithm is. That’s why true negatives can be ignored.
  • doctor can tell a patient if they’re pregnant or not or if they have cancer. Each decision may have grave consequences and thus true negatives are crucial. That’s why all the cells in the confusion matrix must be taken into account.

References

http://en.wikipedia.org/wiki/Confusion_matrix

http://en.wikipedia.org/wiki/Sensitivity_and_specificity

http://en.wikipedia.org/wiki/Precision_and_recall

http://en.wikipedia.org/wiki/Accuracy_and_precision

Pendulums, WebGL and three.js

Here’s the waves pendulum three.js simulation I made.

 So I wanted to simulate a magical pendulum with waves to prove my point that the shapes are the result of a dead simple arithmetic progression. I was almost correct.

After testing, I saw that when the frequency is an arithmetic progression we get the awesome patterns. The problem is that achieving such a feat by modifying the length of the strings alone is a bit harder. Here’s omega, or the angular frequency from hyperphysics:

w = sqrt(g/L)

So all I had to do was choose omegas, increment them and from that calculate the string lengths. I got mixed up and solved the problem in a much more complicated way.

Anyhow, by faking it (choosing my omegas with bogus L’s) I get a prettier result. Headache averted. I’m not sure these swing angles are simple pendulums anyway.

WebGL and three.js are indeed awesome. It does have its gotchas but I was just so impressed with http://lights.elliegoulding.com/ and other things in the three.js gallery. It’s amazing how simple and accessible opengl is now that it’s in the browser. The “hello world” of about 20 lines for a rotating cube was good though I think it should include the WebGL detection in it.

Thoughts On Ecosystems And If There Is No Cold

  1. Does a self sustaining ecosystem aquarium/zoo exist? I mean like a closed space with a few different kinds of animals that sustains itself without an external food supply. Of course I’ve heard of a few lakes that only have water coming in and out, but what is the minimum size of a body of water or piece of land that can sustain a non-trivial amount of life?
  2. We like to think of ‘cold’ as the lack of heat. Why? Because there is a minimum of heat, or you could say a maximum of cold. But actually there’s also a maximum of heat, though it probably would be hard to reproduce in a lab. The maximum temperature is what you get when you make all the particles in the world explode thus summing up all their energies into one particle. That one last surviving particle that contains all of the universe’s energy, it’s temperature is MAX_T. So maybe this whole ‘there is no cold’ fiasco was a simplification of the real situation. Hot/cold is just a scalar with two end-points, call it heat, call it coolness, whatever.