Curing Python’s Neglect
I’ve been working hard on the 0.9 release of Lamson and really enjoying myself while I do. Python is a great very complete and solid language with probably the best email handling capabilities I’ve ever seen.
After reading Jesse’s things he hates about Python I decided to do the same, but with a general theme. In general I really like Python, but if there was one thing I would change, it’s the culture of “Neglect” that permeates everything. No, not the neglect you think, but Hemispatial Neglect.
For those of you who refuse to read the article, neglect is where someone has a brain injury or defect that causes them to “neglect” one side of their body or visual field. It’s not that the person is blind in their right eye, as apparently everything functions and is being processed by the brain from that side. It is simply that they don’t cognitively comprehend that side, but, when this fact is pointed out to them they also don’t comprehend that they don’t comprehend. They’re just surprised that suddenly this event on the right side happened.
A good example is eating food. A normal person will eat everything in front of them, but a person with neglect will happily eat only the things on the left side of their body. They look at the right side, but they just don’t comprehend it. Then, you point out that they missed the whole right side and they look at it and go, “Oh! Where’d that food come from?” However, it’s not necessarily that they suddenly pointed their eyes at it, it’s that you pointed it out to them and made them focus on it with their left side.
Now for the creepy part. If you ask them why the didn’t see that, they can have any reaction from anger to giving a confused but totally plausible explanation. They honestly just don’t see that anything is wrong, but they also kind of know something is wrong so need to justify what just happened.
Demonstrating The Most Basic Neglect
To demonstrate this let’s look at Python’s list. Here’s how you add an element:
That seems pretty reasonable. Now, if “append” is how you “append” something from the end of a list, how do you think you remove that particular item from the list? Yep, it’s remove
Now, what if you want to delete the item at index 4? Would you guess it’s this:
When you first see this your brain (if you’re normal) goes, “WTF that’s not like the others.” However, if you have Neglect you would not only think this is totally normal, but wouldn’t even recognize it until someone pointed it out to you. Then when you are told about it, you’d make up excuses trying to explain why it is totally normal.
This is exactly what happens in Python, and to a lesser degree other languages, but I’ve never seen it so bad as in Python. Everything from the easy_install tool not begin able to uninstall things to using del to delete an index but functions to append. You see it in the APIs, tools, and generally all over the Python world.
Here’s some of my favorite examples.
I jokingly made fun of this at PyCon one year, and people laughed, but then someone came up and told me the reason why is because of some bizarre complexities in setuptools about where things might be installed and how they’re loaded and whether you’re in a vitualenv or not and you might have a project that’s on a space ship that has a drag queen between you and the actual file that’s in UTF–8…
Honestly, if your install tool can’t uninstall because of the complexity of your packaging and module design, then your design is wrong. Redesign it so people can reliably uninstall. End of subject. It don’t matter why you can’t do this, you are missing an important feature everyone needs and all the justifications as to why it can’t be done are pointless.
For the longest time (at least until 2000 when I last looked) Python’s os.rmdir and friends refused to remove directory trees that had files in them. They would remove the directory if it had only directories in it but not files. You had to use os.walk to find everything and remove it. In fact, they claimed this was because they wanted to protect the programmer from themselves, and yet then they would put the code to do this in the help telling you that they crippled the feature on purpose to protect you.
Now they fixed it, but the way they did that is to create the new shutil even though everything in shutil has nothing to do with “shell” operations like running programs and everything to do with files. If you were looking for the rmtree now you wouldn’t find it where all the other file stuff is (os), but instead have to go to shutil, which makes no sense at all.
Honestly every programming language makes time conversion a royal pain in the ass, but Python particularly makes this hard. It’s improved, but for the longest time you could get a string from a time object, but you couldn’t get a time object from a string. Even though this is basic POSIX functionality, Python just didn’t support it.
The explanation I was given at the time (this would be around 2000) was that I should buy the mxDateTime library and use it. Yes, that was the official answer.
I always wondered who the mxDateTime people had to blow to get Python to cripple time to the point that you needed to buy their stuff.
It’s improved, but it’s still a byzantine morass of API to do simple things with time in Python. Again though, this is universally a problem in programming languages, so I won’t place any blame on Python.
What I will blame Python for is the lack of even the most basic time conversion features that even C has. If all they did was give me the exact same POSIX C API I’d be happy. Instead, I got some half-assed library and some half-assed rationale about why this is a better way to do it.
Python’s email support is fantastic and very complete. I wouldn’t have been able to do Lamson in another language without writing a giant chunk of my own code just to do email. Thank you Python email, I love you.
However, it also suffers from some serious Neglect. There’s a ton of operations and APIs where you do one thing, and then the inverse is totally different for no reason.
- Create an email message object with email.messagefromstring or email.messagefromfile. Remember those two.
- Access the mail like it’s a dict. Awesome, the way it should be.
- Alright, now to get the Body of the mail, you call get_payload, but that doesn’t work all the time.
- Now, create a mailbox.Maildir object and you can just take these nice email object and shove them onto the queue.
- Create another Maildir object and you can get, pop, remove, all the normal things you would expect from a queue.
Quiz time, if you get an email object from the Maildir, what do you get? Do you get a email message? NO! You get a MaildirMessage which is sort of like what you’d expect, except to read the body of the message you have to do msg.fp.read to read the file pointer.
Oh wait, it gets better. You can rewind this fp “file pointer” but it won’t go to the beginning of the original file, only to the body, so it’s not really the file pointer. Well what if you want the damn whole mail contents so you can do email.messagefromstring on it?
In the email module you simply do str (msg) and you get the whole thing in its original form as a string. Do you think that MaildirMessage supports this? Even worse, it supports it, but only returns the headers as a string. Yes, you end up having to do this:
str (msg) + “\n” + fp.read ()
to get the message back.
I really looked hard. All over the place in fact, and even asked smart folks on twitter and combed through google. Yes, this is how you have to do it, but the explanation I was given for why was absolutely bizarre.
If you go look at the API docs for MaildirMessage again you’ll see that it includes all sorts of operations for marking the messages status and disposition. The design decision was apparently that the object you get from a Maildir would be both the message and the operations you need to tell Maildir how to handle it.
This could have been much better designed, with either of these options:
- Make str (msg) work like str (msg) in email and implement get_payload.
- Have mailbox queues return email messages, and then have a second API call to get at any specific implementation’s meta-attribute modifications.
- Return an object that’s not a message, but a combination of the meta-data manipulation and the message as an attribute.
Any of these options would be better than what is currently there, and this is basically what I’ve ended up implementing in Lamson.
API Documentation Generation
I need to do some documenation for the Lamson APIs, so I figured that PyDoc could generate them in HTML. This turned into a 2 hour yak shaving expedition into Python documentation tools which demonstrated that all of them just either don’t work, or seriously miss the basics of what a real programmer needs to generate documentation.
First off, I write comments in my code that serve as the documentation, and then I write full documents that show you how to use them. If you need to get started you read the documents, if you need to use it while you work you read the API docs.
Now, if I’m generating documentation from a Python API the entire process should be this:
doctool module html_directory
That’s it. Anything above that is cream. If your tool can’t do that then you fail. Miserably.
Yet, here’s what you have to do for Sphinx which is an insane amount of work for something that JavaDoc, POD, Doxygen, RubyDoc and nearly every other tool does better and easier.
PyDoc produces HTML. Sure, if you can call it that. It’s some bizarro nested colorized HTML that’s nearly impossible to read or use. PyDoc is basically this old as hell doc tool that (AFAIK) can’t take a template to modify the resulting output.
I thought, well maybe I could get Pygments to do it. Pygments will take a source file, and it will output a source highlighted version. That might be good. Oh, but pygemntize will only output the body to a file, and then the style has to be put into a separate file, and then you have to wrap the output file yourself with your own HTML header. Pygments is awesome for doing color output, for just dumping a directory of source to HTML files it is useless.
Apydia looked promising. At least it produced something alright, could be more compact but that’s fine. Then you run it and realize it forces you to explicitly indicate every module you want, even the submodules (despite what the docs say). When I run it, it blows up on a Genshi exception error. Great. Awesome.
The list goes on, and continues to demonstrate Python’s Neglect for the simple reason that it is the only popular modern language that doesn’t have a decent API document generation tool. Yes, they have documentation tools, and yes they can generate some HTML from an API, but it’s all just not quite there. It’s simply missing the simple feature of “take module, make html, put on website”.
Curing The Neglect
There’s many more places where this kind of neglect is found, but these days I just accept it and move on unless I seriously get pissed off. The last tool I did this to was argparse and optparse which I replaced with a much nicer system in Lamson.
If I had the time I would try to fix this stuff, but I realize that none of this will be fixed until there’s a cultural shift in Python away from this habit of Neglect. There has to be a committed change in the way Python APIs are designed so that for every operation there is an inverse, or a damn good explanation as to why there is no inverse.
If there’s a put, there’s a get. If there’s an add there’s a delete. If I put a string in, I get a string out. If I can generate a time string, I can parse a time string. If there’s an install, there’s an uninstall.
No excuses about how that doesn’t fit your design. If your design is such that you can’t implement get when you have put then your design is flawed. Start over. Don’t be weird or different, just be boring and give me my uninstall.
Of course Python isn’t the only language with this problem, it’s kind of a programmer affliction. I just see this culture of Neglect much more commonly in Python.