Dominant misconceptions of recessive genes

I still remember my 10th grade Biology teacher explaining that because blonde hair is a recessive trait it will slowly vanish off the face of the earth. Recently I read a news article with a similar claim, and had an argument with a few other biology and genetic majors. So I wrote a simulation.

TraitSim is an animated toy for exploring random, neutral gene proliferation with brown and blue eyes as an example. I also wrote a python version of the simulation and ran it tens of thousands of times. Thanks pypy for the 8.3x speedup! Surprisingly the chance to win the gene-race is linear with the gene’s initial prevalence as can be seen in the following graph thanks to matplotlib.

The thing with dominance is that it affects the gene expression which can be seen in the following graph:

 

And here’s the source that generated these from the data. There’s more information in the project page on github. These github pages are strange creatures which I’m not sure how to work with yet. Can I modify the CSS or structure somehow without losing the benefit of the page generator?

Why I’m not leaving Python for Go

First of all, Go seems like a great language. It has an excellent tutorial which I joyfully went through and found:

  • Go is Fast.
  • Concurrent by design.
  • Typed (important for JIT and IDE’s) but not cumbersome and ugly like C or C++’s spirals.
  • Duck-type-esque interfaces.
  • The defer mechanism is really nifty.

But there’s one problem I can’t live with. Which is a shame as I was eager to make the leap of faith in the name of concurrency. That problem is errors are handled in return values. 70’s style.

Verbose and repetitive error handling

The designers of go consider this a virtue.

In Go, error handling is important. The language’s design and conventions encourage you to explicitly check for errors where they occur (as distinct from the convention in other languages of throwing exceptions and sometimes catching them). In some cases this makes Go code verbose, but fortunately there are some techniques you can use to minimize repetitive error handling.

This is one of the things I can’t stand in C. Every single line requires an if statement to prevent programs from doing crazy things. This is an official, canonical example from the aforementioned link with perhaps “minimal repetitive error handling”:

    if err := datastore.Get(c, key, record); err != nil {
        return &appError{err, "Record not found", 404}
    }
    if err := viewTemplate.Execute(w, record); err != nil {
        return &appError{err, "Can't display record", 500}
    }

The correct way to call a function in Go is to wrap it in an if statement. Even Println returns an error value that I’m sure most on the planet will never check. Which brings me to…

Errors passing silently – ticking time bombs to go

To quote Tim Peters:

Errors should never pass silently
Unless explicitly silenced

Go isn’t just stuck with verbose and repetitive error handling. It also makes it easy and tempting to ignore errors. In the following program we would trigger the doomsday device even if we failed protecting the presidential staff.

func main() {
    http.Get("http://www.nuke.gov/seal_presidential_bunker")
    http.Get("http://www.nuke.gov/trigger_doomsday_device")
}

What a shame. Oops.

In theory we could require the programmer never ignore returned errors. By static analysis or convention. In practice it’d be a pain worth enduring only in the most error critical programming tasks. Perhaps that’s Go’s purpose.

panic/recover

Panic and recover aren’t good enough as long as the standard library rarely uses them. Why is an array out of bounds any more cause for panic than a bad format string or a broken connection? Go wanted to avoid exceptions entirely but realizing they can’t – a few exceptions were tacked on here and there, leaving me confused as to which error happens when.

Perhaps another time

So I say this with much regret because Go has a lot of amazing ideas and features, but without modern error handling – I’m not going.

I’m still waiting for that open source, concurrent, bottom left language to come along. Any suggestions are more than welcome.

Introducing Absolute Ratio

Let’s define the absolute ratio for positive numbers:

abs_ratio(x) = 1 / x when x < 1, otherwise: x

When x is smaller than 1 return 1 / x, otherwise return x. Here are a few example values:

x abs_ratio(x)
0.5 2
2 2
0.2 5
5 5

And a graph:

Absolute Ratio Graph

Another spelling for the same operator would take 2 positive numbers and give their absolute ratio:

And a graph:

Absolute ratio in 3D

Use case examples

  • Music and audio – an octave of a frequency F is 2F. More generally a harmony of a frequency F is N*F where N is a natural number. To decide if one frequency is a harmony of another we just need to get their absolute ratio and see if it’s whole. E.g. if abs_ratio(F1, F2) == 2 they’re octaves. If abs_ratio(F1, F2) is whole – they’re harmonies.
  • Computer vision – to match shapes that have similar dimensions e.g. their width is only 10% larger or smaller. We don’t care which is the bigger or smaller, we just want to know if 0.91 < W1 / W2 < 1.1 which may be easier to pronounce as abs_ratio(W1, W2) < 1.1
  • Real life – when we see 2 comparable objects we’re more likely to say one is “three times the other” vs “one third the other”. Either way in our brains both statements mean the same concept. We think in absolute ratios.
  • General case – When you want to know if X is K times bigger than Y or vice versa and you don’t care which is the bigger one.

Interesting Properties

  • abs_ratio(Y / X) == abs_ratio(X / Y)
  • log(abs_ratio(X)) = abs(log(X))
  • log(abs_ratio(Y / X)) = abs(log(Y / X)) = abs(log(Y) – log(X))
  • You can see from the above that absolute ratio is somewhat of an absolute value for log-space.

What’s next for absolute ratio

  • I’d love to hear more use cases and relevant contexts.
  • What would be the written symbol or notation?
  • How can we get this operator famous enough to be of use to mainstream minds?
  • About negative numbers and zero – right now that’s undefined as I don’t see a use case for that domain.
  • For some code and graphs in python checkout https://github.com/ubershmekel/abs_ratio

EDIT – I’m growing to like the binary form of the operator more so from now on let’s call it like this in python:

def abs_ratio(a, b):
    return a / b if a > b else b / a

Ah the old Reddit switch-a-roo analyzed

So after clicking through what seemed an infinite amount of tabs from one of these switcheroo comments I finally wrote down the script which analyzed the graph. I’d suggest you ignore the following png and take a gander at the network pdf of the switcharoo graph because you can click through to the links.

The old reddit switch-a-roo analyzed image

To recap – 50 nodes, 52 edges, though there are probably more out there that point into some point of that chain. And here are the awards:

There. I hope that didn’t take away from the magic.

Appendix – The hardships

This was overly hard to do – first of all NSFW links gave me the “are you over 18?” prompt which for some reason I wasn’t able to solve by cookies. I eventually turned to the mobile version of the site (append “.compact”) to avoid the prompts completely. Also, matplotlib and networkx aren’t that fun for drawing graphs it seems. To visualize and output the graph I eventually used gephi which was somewhat easy although has it’s clunkiness baggage.

Python isn’t English and iterator “labels”

Us python fanboys like to think of python as similar to English and thus more readable. Let’s examine a simple piece of code:

for item in big_list:
    if item.cost > 5:
        continue
    item.purchase()

For our discussion there are only 3 kinds of people:

  1. People who have never seen a line of code in their life.
  2. Have programmed in other languages but have never seen python.
  3. Python programmers.
We’ll dabble between the first 2 groups and how they parse the above. Let’s try to forget what we know about python or programming and read that in English:
  • “for item in big_list” – either we’re talking about doing something for a specific item in a big_list or we’re talking about every single item. Ambiguous but the first option doesn’t really make sense so that’s fine.
  • “if item.cost > 5” – non-programmers are going to talk about the period being in a strange place, but programmers will know exactly what’s up.
  • “continue” – That’s fine, keep going. English speakers are going to get the completely wrong idea. As programmers we’ve grown used to this convention though its meaning in English is very specifically equivalent to what pythonistas call “pass” or “nop” in assembly. We really should have called this “skip” or something.
  • “item.purchase()” – non-programmers are going to ask about the period and the parentheses but the rest grok that easily.

So I’m pretty sure this isn’t English. But it’s fairly readable for a programmer. I believe programmers of any of the top 8 languages on the TIOBE index can understand simple python. I definitely can’t say the same for Lisp and Haskell. Not that there’s anything wrong with Lisp/Haskell, these languages have specialized syntax for their honorable reasons.

Continue is a silly word, what about iterator labels?

Let’s say I want to break out of an outer loop from a nested loop, eg:

for item in big_list:
    for review in item.reviews:
        if review < 3.0:
            # next item or next review?
            continue
        if review > 9.0:
            # stop reading reviews or stop looking for items?
            break

Java supports specific breaks and continues by adding labels to the for loops but I think we can do better. How about this:

items_gen = (i for i in big_list)
for item in items_gen:
    for review in item.reviews:
        if review < 3.0:
            items_gen.continue()
        if review > 9.0:
            items_gen.break()

But how can that even be possible you may ask? Well, nowadays it isn’t but maybe one day if python-ideas like this idea we can have nice things. Here’s how I thought it could work: a for-loop on a generator can theoretically look like this:

while True:
    try:
        item = next(gen)
        # do stuff with item
    except StopIteration:
        break

But if it worked like I propose below we can support the specific breaks and continues:

while True:
    try:
        item = next(gen)
        # do stuff with item
    except gen.ContinueIteration:
        pass
    except gen.StopIteration:
        break
    except StopIteration:
        break

So every generator could have a method which throws its relevant exception and we could write specific breaks and continues. Or if you prefer a different spelling could be “break from mygen” or “continue from mygen” as continue and break aren’t allowed as method names normally.

I think this could be nice. Although many times I found myself using nested loops I actually preferred to break the monster into 2 functions with one loop each. That way I could use the return value to do whatever I need in the outer loop (break/continue/etc). So perhaps it’s a good thing the language doesn’t help me build monstrosity’s and forces me to flatten my code. I wonder.

Statistics on reddit’s top 10,000 titles with NLTK

Drawing inspiration from this blog post on title virality I wanted to investigate what makes these top 10,000 titles the best of their breed. Which are the best superlatives? Who/what’s the most popular subject? Let’s start with some statistics:

  • On Feb. 03, 14:10:45 (UTC) the all-time top 10,000 submissions on reddit (/r/all) had a total of 82,751,429 upvotes and 62,655,532 downvotes (56.9% liked it).
  • 5.2 years between the oldest and newest submission
  • 8,331,382 comments. That’s about 833 comments per submission.
  • The #1 post has 26,758 – 4,882 = 21,876 points
  • The #10,000 post has 15,166 – 13,679 = 1,487 points
  • And now some graphs….

Adjectives – reddit loves “new”, “old”, “good” and “right”

Adjectives

Top Adjective, Superlative – “Best” is the best

Questions reddit loves how?

Questions

What’s reddit talking about? People.

Or news, the president, man…

Reddit appreciates personal content about you, this, it and I.

Even NLTK doesn’t understand these…

I’m pretty sure you don’t need example links for these…

The top 10,000 seem to come mostly from 17:00 UTC and rarely from around 12:00 UTC

This isn’t exactly the probability of succeeding to hit the front page as it’s not clear at what time submission count is highest. But it’s something.

An apology

This is my first time using NLTK and though I’m ok at coding I most certainly have no idea how to parse natural language. Here’s hoping this was somewhat insightful.

I have no idea what I'm doing

Appendix

The Google App Engine Glass Ceiling

With the new billing arrangements, each and every paid GAE app costs at least $2.10 per week which is supposedly $9 per month ($9.125 by my calculation) regardless of quota usage. This cost does cover whatever quotas your app consumes and the regular free quotas are still “free”.

Now I have a GAE app that sends 70-80 emails per day where the free limit is 100. I’d gladly switch over to the paid side of GAE just to be sure that if it ever passes the 100 mark I don’t have any failed email requests but the price of that is 9$ per month. GAE is extremely expensive for apps that just barely brush the end of their free quotas. In order to actually use the $9 per month minimum I’d have to send out 3000 emails per day (at $0.0001 per email).

I don’t know if the free quota on email recipients is really low or if sending out an email is extremely cheap. Either way, GAE expects me to scale from 100 to 3000 while paying the price of 3000. Who knows if I’ll ever even reach that mark?

If google keeps with this plan, I’m probably never going to start another GAE app that has a chance to grow. Every time I have a chance of hitting the quota limits I have 2 choices:

  • Pay google and be screwed over for an indefinite amount of time until I reach the next landmark.
  • Migrate to a cheaper shared hosting option until I reach the next landmark.

Thanks, but no thanks. That’s the GAE glass ceiling.

Appendix

  • Other than this problem I do like GAE. It’s a shame I have to leave it.
  • I’ve made about 11 small python GAE apps. Only 2 of which ever reached the aforementioned glass ceiling.
  • This issue shouldn’t bother you if your app is already big enough to cost more than $9.
  • Maybe google can’t bill less than $9 per month? I doubt it, android apps can cost $0.99.
  • A proposed solution: Google takes $9 of credit at a time from your google wallet and eats quotas out of that. When the $9 run out, it bills another 9. Sounds reasonable and “don’t be evil” to me. Another thing that could be nice would be to allow multiple paid apps to feed from the same budget.

Python 3 Wall of Shame Updates

Earlier in December I was approached by Chris McDonough with a reddit pm asking if I could or would implement some kind of behavior regarding a  “Python 2 only” classifier on the wall of shame. After some aggressive googling I found the original discussion in catalog-sig. The idea was to add a classifier that signified “the authors have no current intention to port this code to Python 3”. By declaring such an intent, Chris explained, a python package should be erased from the wall of shame. Not that I completely understood this intuition but still I tried to somehow apply myself to the effort of improving the WOS. So here’s what’s new:

  • Packages with the “Programming Language :: Python :: 2 :: Only” trove classifier will have a lock next to their package with a mouse over explaining their intent.
  • Packages that have an equivalent py3k package are now not erased from the wall but rather show a link to the equivalent package. This rightfully boosts the compatibles count by 4. Note that packages that would doubly boost the count are still erased (eg Jinja is erased because Jinja2 is in the top 200).
  • Packages that are python 3 compatible but lack the trove classifier won’t stay red if brought to my attention. I’ve always stated the WOS can only be as good as pypi, not better. Hoping that in time PyPI would become more accurate, this move saddens me a bit. To keep a bit of the spirit the artificially green packages have a red triangle signifying the maintainer’s lack of trove classifiers (again with a relevant mouse over).
  • The WOS is now written for python 2.7 and migrated to the HRD, woohoo!

Please  do contact me if there are any more inaccuracies or mistakes. I’m reachable at ubershmekel at gmail and by comments on this blog.

Ps, we’re at 57/200, so maybe by this time next year we can have that Python 3 Wall of Superpowers party! Amen to that…

Duplicating Streams of Audio with Python

This morning I made a python script that uses pyaudio to read from one audio device and pipe to the next, I call it replicate.py.

This is a really old problem for me, ever since I first had 4.1 speakers and winamp only played on the front 2. Nowadays I just want VLC to play on both the TV and the computer speakers without switching between audio output modules in the preferences or fiddling with the default audio output in Windows 7.

PyAudio was really nice and easy to use, I just wish asynch io was added so I could lower the latency a bit as I’m getting 240 ms right now which is very far from perfect.

Python 2/3 and unicode file paths

This bug popped up in a script of mine:

For Python 2:

>>> os.path.abspath('.')
'C:\\Users\\yuv\\Desktop\\YuvDesktop\\??????'
>>> os.path.abspath(u'.')
u'C:\\Users\\yuv\\Desktop\\YuvDesktop\\\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5'

For Python 3:

>>> os.path.abspath('.')
'C:\\Users\\yuv\\Desktop\\YuvDesktop\\\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5'
>>> os.path.abspath(b'.')
b'C:\\Users\\yuv\\Desktop\\YuvDesktop\\??????'

That odd set of question marks is a completely useless and invalid path in case you were wondering. The windows cmd prompt sometimes has question marks that aren’t garbage, but I assure you, these are useless and wrong question marks.

The solution is to always use unicode strings with path functions. A bit of a pain. Am I the only one who thinks this is failing silently? I’ll file it in the bug tracker and we’ll see.