Python 3 Wall of Shame Updates

Earlier in December I was approached by Chris McDonough with a reddit pm asking if I could or would implement some kind of behavior regarding a  “Python 2 only” classifier on the wall of shame. After some aggressive googling I found the original discussion in catalog-sig. The idea was to add a classifier that signified “the authors have no current intention to port this code to Python 3”. By declaring such an intent, Chris explained, a python package should be erased from the wall of shame. Not that I completely understood this intuition but still I tried to somehow apply myself to the effort of improving the WOS. So here’s what’s new:

  • Packages with the “Programming Language :: Python :: 2 :: Only” trove classifier will have a lock next to their package with a mouse over explaining their intent.
  • Packages that have an equivalent py3k package are now not erased from the wall but rather show a link to the equivalent package. This rightfully boosts the compatibles count by 4. Note that packages that would doubly boost the count are still erased (eg Jinja is erased because Jinja2 is in the top 200).
  • Packages that are python 3 compatible but lack the trove classifier won’t stay red if brought to my attention. I’ve always stated the WOS can only be as good as pypi, not better. Hoping that in time PyPI would become more accurate, this move saddens me a bit. To keep a bit of the spirit the artificially green packages have a red triangle signifying the maintainer’s lack of trove classifiers (again with a relevant mouse over).
  • The WOS is now written for python 2.7 and migrated to the HRD, woohoo!

Please  do contact me if there are any more inaccuracies or mistakes. I’m reachable at ubershmekel at gmail and by comments on this blog.

Ps, we’re at 57/200, so maybe by this time next year we can have that Python 3 Wall of Superpowers party! Amen to that…

Google App Engine costs the same as any shared hosting

There are many good and bad things about GAE but this issue with the new pricing is just strange in my eyes:

Every paid app must pay google at least $9 each month regardless of usage.

The main awesome thing about GAE has always been the pay-as-you-need pricing model. This concept is completely shattered now for a certain scale of apps. The Python 3 Wall of Shame needed a few extra DB writes to finish the day nicely which would have cost roughly 10 cents a day. But now google will be rounding that up to 30 cents a day.

Apps that grow to use $1 worth of quotas a month are much better off heading to some form of shared hosting for $5.50 a month until they hit that $9 a month necessity. That’s the case with the Python3WOS and another one of my apps. I can’t move the wall of shame as I don’t have the time to do the porting (locked in ಠ_ಠ), but that other app is simple enough. So google will never find out if it could have been an app that’s actually worth $9 monthly.

Android API Bugs – AudioRecord

When asking android to record from the mic to a buffer you have to do something like:

mRecorder = new AudioRecord(AudioSource.MIC, RATE, CHANNEL_MODE,
 ENCODING, BUFFER_SIZE);
if (mRecorder.getState() != AudioRecord.STATE_INITIALIZED) {
    // fail and bail
}
mRecorder.startRecording();
bytesRead = mRecorder.read(audioBuffer, offsetInBuffer, readSize);
mRecorder.stop();
mRecorder.release();

Now the bug that drove me mad was that read() returned zero when I left the application (using the home button) and immediately returned to it. Luckily I was toying around with BUFFER_SIZE when I noticed

mRecorder.getState()

failed when BUFFER_SIZE was set to 0x100, I guessed it was too small and indeed one must call getMinBufferSize() to find out the limits of AudioRecord instantiation as documented.

Don’t use getMinBufferSize() + 1 btw as that throws an IllegalArgumentException at AudioRecorder construction:

12-10 21:49:33.359: E/AndroidRuntime(28950): java.lang.IllegalArgumentException: Invalid audio buffer size.

Using something like getMinBufferSize() + 100 does work at first but when you use the home-button to leave and return it causes the read() returning zero bug from time to time. Not always, but from time to time. Also make sure you release() the recorder because forgetting that will also allow you to construct the recorder but reads will fail.

I can’t begin to describe how frustrating finding that bug was.

Now audio programmers should know they’re supposed to read a size that’s a divisor of the buffer size, yes. This doesn’t excuse the API from behaving itself and throwing relevant exceptions and errors that are debuggable and documented. This was SdkLevel 8 btw. Things like this make you feel as though these are the guys that wrote the api:

Eventually the working code will be pushed to the android tuner google code project for your enjoyment.

Duplicating Streams of Audio with Python

This morning I made a python script that uses pyaudio to read from one audio device and pipe to the next, I call it replicate.py.

This is a really old problem for me, ever since I first had 4.1 speakers and winamp only played on the front 2. Nowadays I just want VLC to play on both the TV and the computer speakers without switching between audio output modules in the preferences or fiddling with the default audio output in Windows 7.

PyAudio was really nice and easy to use, I just wish asynch io was added so I could lower the latency a bit as I’m getting 240 ms right now which is very far from perfect.

Pendulums, WebGL and three.js

Here’s the waves pendulum three.js simulation I made.

 So I wanted to simulate a magical pendulum with waves to prove my point that the shapes are the result of a dead simple arithmetic progression. I was almost correct.

After testing, I saw that when the frequency is an arithmetic progression we get the awesome patterns. The problem is that achieving such a feat by modifying the length of the strings alone is a bit harder. Here’s omega, or the angular frequency from hyperphysics:

w = sqrt(g/L)

So all I had to do was choose omegas, increment them and from that calculate the string lengths. I got mixed up and solved the problem in a much more complicated way.

Anyhow, by faking it (choosing my omegas with bogus L’s) I get a prettier result. Headache averted. I’m not sure these swing angles are simple pendulums anyway.

WebGL and three.js are indeed awesome. It does have its gotchas but I was just so impressed with http://lights.elliegoulding.com/ and other things in the three.js gallery. It’s amazing how simple and accessible opengl is now that it’s in the browser. The “hello world” of about 20 lines for a rotating cube was good though I think it should include the WebGL detection in it.

Android api quirk number 5

Wow this is silly.

public void drawRect(float left, float top, float right, float bottom, Paint paint)

Usually it doesn’t matter if top is higher than bottom. Either way a rectangle is drawn.

UNLESS, top is outside of the screen. Then no rectangle is drawn at all.

So you can have rectangles that are partially visible, but only if their top is greater than their bottom.

ZOMG that was a hard bug to figure out….

Python 2/3 and unicode file paths

This bug popped up in a script of mine:

For Python 2:

>>> os.path.abspath('.')
'C:\\Users\\yuv\\Desktop\\YuvDesktop\\??????'
>>> os.path.abspath(u'.')
u'C:\\Users\\yuv\\Desktop\\YuvDesktop\\\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5'

For Python 3:

>>> os.path.abspath('.')
'C:\\Users\\yuv\\Desktop\\YuvDesktop\\\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5'
>>> os.path.abspath(b'.')
b'C:\\Users\\yuv\\Desktop\\YuvDesktop\\??????'

That odd set of question marks is a completely useless and invalid path in case you were wondering. The windows cmd prompt sometimes has question marks that aren’t garbage, but I assure you, these are useless and wrong question marks.

The solution is to always use unicode strings with path functions. A bit of a pain. Am I the only one who thinks this is failing silently? I’ll file it in the bug tracker and we’ll see.

import – complex or complicated?

In Python, life is really easy when all your .py files are in one directory. The moment you want to organize your code into folders there’s a wall of challenges you have to climb. I believe this is an issue that can be alleviated with one small fix.

Here’s a comparison of how a developer shares code across a project in C/C++ and Python:

C/C++ Python
Forms #include <from_env_dirs_first>
#include “from_local_dir_first”
#include “abs_or_rel_file_system_path”
import module
import module as alias
from module import var
from module import *
from ..package_relative_path import module
from package.absolute_path import module
try:
    import one_thing
except ImportError:
    import another as one_thing
Namespacing Public toilet – everything included is global. “Namespaces are one honking great idea — let’s do more of those!”
Seriously, module encapsulation is fantastic.
Helpful extra knowledge Makefiles/vcproj configurations of paths
#ifdef
sys.path
__all__
__path__
Mandatory extra
knowledge (“Gotchas”)
#pragma once (or the equivalent #ifdef)
certain things aren’t allowed in .h files
please don’t use absolute paths
__init__.py
syntax for intra-package imports
modules intended for use as the main module of a Python application must always use absolute imports.

Now this isn’t an exhaustive list as I want to discuss just a small subset from the above table. Also note that I didn’t go into “ctypes”, “#pragma comment(lib…)”, etc. as we’re talking about sharing code, not binaries.

For the 6 years of keyboard tapping I’ve done in C and Python, I never once was confused as to how to access code between directories in C; Python on the other hand has gotten me quite a few times and I always need to rertfm. And I consider myself far more interested and fluent in Python than in C/C++. This may be just a problem with my head, but I’d like to vent either way.

Blah, blah, what’s the problem?

Skip this section if you’ve already had experience with said problem, I’m sure it’s as painful to read as it was to write.

Python has this really elegant solution for one-folder-mode, “import x” just gives you what you expected, either from the standard library (sys.path, etc) or your local directory. If you have “os.py” in that local directory then you shadow out the standard “import os”. Once you mix directories in there, python is suddenly afraid of shadowing and you can’t import things from a folder named “os” unless it has an “__init__.py”. So shadowing here is allowed and there not. If you want to access modules from the outside (dot dot and beyond), then you have to be in a package, use sys.path, os.chdir or maybe implement file-system-imports on your own.

Personally, I find myself doing this design pattern a lot:

  1. The App directory
    1. main_app_entry.py
    2. framework
      1. general_useful_things.py
      2. more_frameworkey_goodness.py
    3. components
      1. this_solves_a_problem.py
      2. another_tool.py

I usually have an “if __name__ == ‘__main__’:” in my modules and there I have some sort of test, utility function, or a train of code-thought not yet organized.

How can another_tool.py access general_useful_things.py? First things first – __init__.py everywhere! After trying a few ways to do the import – here are a few results.

So what’s needed for another_tool to import general_useful_things:

  • “from framework import general_useful_things” works in another_tool.py if we only use main_app_entry.py, it does not work if we run another_tool.py directly. Does this mean __name__ == “__main__” is a useless feature I should ignore?
  • Here’s the rest of the list of failed attempts:
    #from app.framework import general_useful_things
    #from .app.framework import general_useful_things
    #from ..framework import general_useful_things
    #from .framework import general_useful_things
    #from . import framework
    #from .. import framework
  • And this little recipe works in most cases:
    SRC_DIR = os.path.dirname(os.path.abspath(__file__))
    os.sys.path.append(os.path.join(SRC_DIR, '..', 'framework'))
    import general_useful_things

If you want to tinker around with that example directory structure here you go: http://dl.dropbox.com/u/440522/importing%20is%20hard.zip

Python doesn’t have file-system imports

To summarize my rant – python has this mantra that your import lines should be concise and thus a complex searching import mechanism was built to avoid filesystem-path-like imports. The price we pay for that searching import mechanism is that you really need to learn how to use its implicit kinks and even then it’s not that fun to use.

The theoretical ideal solution

“import x” always imports from sys.path etc, if you want to import something local you use “import ./local_dir_module”, the forward slash signals the parser and the developer that a file-system import is taking place. “local_dir_module.py” needs to be in the current folder for the above example to work. Just in case it isn’t clear, the module “local_dir_module” will be accessed as usual, without the “.py”, dots or slashes. The import statement is the only place where slashes are allowed and the result of the import is a module in the stater’s namespace.

That’s as explicit, simple, concise and useful as it gets.

The practical solution

I don’t mind if “import x” still works as it does today, the main point is that now you can do things like “import ../../that_module_from_far_away”. So you can actually keep python 100% backwards compatible and still add this feature.

Concerning the backslash/forwardslash debate – I’m a windows guy and I don’t mind using the forward slash for Python, Windows doesn’t mind it either (“/” only fails in a few specific scenarios like cmd autocomplete). Another fun fact is you can avoid littering your app with __init__.py if you aren’t going to be accessed using that big old search-import-package mechanism.

I realize this whole fiasco might raise the question of absolute path imports, in my opinion these shouldn’t be allowed. Absolue includes in C/C++ destroy portability, impose annoying folder structure constraints and they’re ever-so tempting at late hours where you don’t really want to calculate the amount of “..” needed. For the special cases that might still need this, the instrumentation existing in python and e.g. import_file are enough.

The good things about __init__.py

Many packages use __init__.py as a way to organize their API’s to the outside world. Your package folder can have tons of scripts and only what you included in __init__.py is exposed when your folder is imported directly (eg json in the std-library). So don’t take this as an attack on __init__.py, it’s just that the import mechanism seems incomplete in my eyes. Just to be a bit specific – package maintainers don’t need to do stuff like “import os as _os” to avoid littering their module namespace when they use __init__.py as their API, that’s a nice thing to have.

Also, I’d like to hear other justifications as I’m sure more than a few exist.

The drawbacks of slashes and file-system-imports

  1. From a compatibility viewpoint, old packages aren’t affected as we’re introducing the “forward slash” in whatever future python version. Whoever uses this feature won’t be compatible with older python versions.
  2. Windows users and *nix users might argue over whether or not to allow backslashes, I think it’s not that important. Though the internet has forward slashes, so that makes it 2 platforms against 1.
  3. It’s uglier (though today’s relative imports are just as ugly and harder to learn).
  4. People might ask for absolute imports.
  5. Dividing the community and its packages into “file-system-importers” and “package-search-importers”.
  6. *reserved for complaints in the comments*

Summary

I’ve tried to do packages the existing python way and I think we can do better. The __init__.py based search mechanism works great for whatever is in sys.path, though I believe its pains outweigh its gains for organizing code. Here’s to hoping there’s a chance for relative-file-system imports in standard python.

References

http://www.python.org/dev/peps/pep-0328/

http://en.cppreference.com/w/cpp/preprocessor/include

http://effbot.org/zone/import-confusion.htm – January 07, 1999 – “There are Many Ways to Import a Module” – “The import and from-import statements are a constant cause of serious confusion for newcomers to Python”

http://stackoverflow.com/questions/448271/what-is-init-py-for

http://stackoverflow.com/questions/1260792/python-import-a-file-from-a-subdirectory 

http://docs.python.org/tutorial/modules.html

http://www.slideshare.net/stuartmitchell/python-relative-imports-just-let-me-use-the-file-system-please

The GIL Detector

Ever wonder if your flavor of python has a Global Interpreter Lock? The recipe gil_detector.py checks that.

In this day and age CPython is still core-locked; though we are seeing improvements thanks to Antoine Pitrou and other great people of the Python community. Wanting to measure just how bad the problem is – my way of looking at it was through a number I call the “python effective core count”.

Python Effective Core Count

How many cores does python see? If you bought a quad core, how many cores can each python process utilize? Measuring how long it takes to complete a given amount of work, W, then measuring how long it takes for python to run 2W using 2 threads, 3W on 3 threads, etc. The script gil_detector.py calculates:

effective_cpus = amount_of_work / (time_to_finish / baseline)

Where the baseline is the time_to_finish for 1 work unit. E.g. if it took the same amount of time to finish 4W (amount_of_work = 4) on 4 threads as it took 1W on 1 thread – python is utilizing 4 cores.

Results

I recommend reading the whole output to see the exact numbers.

Implementation Effective Core Count
Jython 2.5.2 3.8/4 cores
IronPython 2.7 3.2/4 cores
PyPy-1.5 1.0/4 cores
Stackless Python 3.2 1.0/4 cores
CPython 3.2 1.0/4 cores
CPython 2.7 0.2/4 cores

Basically, Jython has the best multithreading with IronPython not too far behind. I know multiprocessing is really easy in python, but it’s still harder. We have to solve this before 8 core CPU’s become the standard at grandma’s house. Those other languages (C/C++) utilize 3.9-4.0 cores of a quad core machine easily, why can’t we? An honorable mention goes to PyPy which by far was the fastest to execute the benchmark (10x faster). PyPy is definitely the future of Python, hopefully they can save us all. A note about CPython 2.7 – yes that number is under 1 because adding threads on python-cpu-intensive tasks hurt performance badly on older versions of the GIL, Antoine fixed that on CPython 3.2.

In my opinion the community should call this a bug and have a unittest in there yelling at us until we fix it, though it’s easy for me to complain when I’m not the core-dev. Maybe I can join PyPy and see what I can do when I find some free time. Hopefully so.

Edit – updated gil_detector.py as corrected at reddit.

A new module – import_file

So I had this python web server and I wanted to write a script that does some maintenance. The problem was, if the maintenance script isn’t part of the “__init__.py” package tree, it couldn’t use any of the web server modules. A hacky way to get around this is to add the target module’s directory to the path:

    import sys
    sys.path.append('/usr/wherever_that_module_is')
    import your_module

Another trick is using os.chdir. Even when importing modules from the same package things can become confusing as can be learned from PEP 328, PEP 366, an abundance of stackoverflow questions on the subject and many more. I’d like to quote The Zen of Python:

    Simple is better than complex.
    There should be one-- and preferably only one --obvious way to do it.
    If the implementation is hard to explain, it's a bad idea.

I don’t believe any of these can be said about python imports, at least not for anything past the trivial case of one-folder-with-all-the-modules. The moment you want to organize your project in folders it becomes complex if not complicated and unnatural.

“import math” just works and that’s great, I just wished there was an equivalent to the banal:

#include "path/to/module"

From those inferior languages. So I wrote import_file which can be used like this:

    >>>from import_file import import_file
    >>>mylib = import_file('c:\\mylib.py')
    >>>another = import_file('relative_subdir/another.py')

It’s very similar to the imp module syntax, except the function requires one argument less. This is the way it should be imo. Enjoy import_file at google code, from the cheese shop, “easy_install import_file” or pip, etc.