Jason in a Nutshell

All about programming and whatever else comes to mind

Posts Tagged ‘Programming’

Programming and safe theater or Things you have to do but shouldn’t if you can avoid doing them at any cost

Posted by Jason Baker on May 14, 2009

Joel’s (in?)famous article saying never to rewrite your software seems to be circulating again on Hacker News.  Essentially, what Joel is getting at is that software companies must never, ever under any circumstances rewrite their own software.  I must say that I agree with Joel for the most part.  While I agree with most of  what Joel says, I disagree with the ultimate conclusion that Joel draws from it.

I’m reminded of a scene from Hamlet 2 (which I’m paraphrasing from memory here):

Student:  I was thinking:  what if we had lowriders come across the stage in the final scene?

Mr. Marschz:  That sounds dangerous.

Student:  Ok, nevermind.

Mr. Marschz:  No, I’m not doing safe theater.  Let’s do it!

Joel’s stance is “safe theater” for programmers.  

Risk

Still not convinced?  Does safe theater sound just fine with you?  Willing to quote Joel’s stance on this unwaveringly?  Well, I counter your Spolsky with a Kay!

I believe that the only kind of science computing can be is like the science of bridge building. Somebody has to build the bridges and other people have to tear them down and make better theories, and you have to keep on building bridges. –Alan Kay, quoted from  A Conversation with Alan Kay

Granted, I don’t necessarily agree with Alan 100% either.  I think there’s a happy medium here.  When you get down to it, the risks in rewriting your software that Joel mentions are very real risks.  And several companies have found this out the hard way.

The problem is that you can’t innovate without taking risks, and a software rewrite is the ultimate risk.  Unfortunately for us, we as programmers are in the business of innovation.  Unwillingness to take risks is a sure route to becoming a “greybeard” who only codes COBOL on mainframes because it’s what they know.  I don’t know about you, but that’s not the kind of career that I had envisioned.

So you’re saying I should rewrite, correct?

Remember, when I said that I agree with Joel for the most part, I meant it.  A complete software rewrite is a crazy and maybe even a stupid move.  If you really want to innovate, sometimes you have to do things that are crazy and stupid.  But remember that the decision to completely rewrite a piece of software can be a job-ending or even business-ending move.  So if a blog post by me is enough to convince you to do the rewrite by itself, don’t do it.

Credit where credit is due

In fairness, I don’t think even Joel 100% believes software should never be rewritten.  In a book, he praises Microsoft for writing .net even though it violates his “never rewrite” rule.  Granted, this is because Microsoft is “a center of gravity,” which is something that we can say about very few companies.  

But even though this is a very small exception to Joel’s rule, it’s still enough to say that Joel doesn’t believe his blog post 100%.  But even if he still believes it 99.9%, there’s still 0.1% of leeway.  And never underestimate what a really good programmer can do with that 0.1%.

Advertisements

Posted in Programming | Tagged: , , , , | Leave a Comment »

HTTP and you

Posted by Jason Baker on May 10, 2009

I was kind of surprised by the number of people told me that they weren’t aware of the differences between HTTP POST and HTTP GET that my last post highlighted.  Not everyone who does web design and/or development has had a formal education on this kind of thing, so I’d like to focus a little bit more on the basics of HTTP.  A full summary of the HTTP protocol would take a couple hundred pages (or 175 to be exact).

In a lot of ways, doing web development and/or design without knowing how this stuff works is a bit like doing Calculus without knowing how addition and subtraction work.  True, you probably won’t ever need it.  But you would be surprised at how many questions can be answered by having a basic understanding of HTTP.

Anatomy of a URI

 

As this helpful diagram of the URI shows, there are 5 basic parts:

  • scheme – This is the protocol that we’re using to access whatever this UR – I represents.  For obvious reasons, we’re only interested in http schemes.
  • username/password – This isn’t really used much in the context of HTTP, but it should be pretty self explanatory.
  • hostname – This essentially tells us what computer we’re accessing.  This can be either an IP address (ex: 209.85.171.100 if you’re using IPv4) or a domain name (google.com).
  • port – This is the port number on the server we’re pulling data from.  In the context of HTTP this will usually be port 80, but occasionally it will be something different.  Also bear in mind that this may be different depending on the scheme (for example, FTP will be port 21 by default).
  • path – This represents where the website “lives” on the server.  It was largely designed for representing files and directories on a file system, but it’s worth mentioning that this part is ultimately little more than arbitrary text that may be interpreted by the server however it wishes.

Anatomy of an HTTP request

When you access my blog via HTTP, your browser sends an HTTP request that looks something like this:

GET / HTTP/1.1 CRLF
Host: jasonmbaker.wordpress.com CRLF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 CRLF
Connection: close CRLF

Your browser will receive a response that looks something like this (bonus:  there’s one header I left out.  Can you guess which one it is?  I hear there might be job offers if you can figure it out.):

HTTP/1.1 200 OK CRLF 
Server: nginx CRLF
Date: Sun, 10 May 2009 23:16:28 GMT CRLF
Content-Type: text/html; charset=UTF-8 CRLF
Transfer-Encoding: chunked CRLF
Connection: close CRLF
Vary: Cookie CRLF
X-Pingback: https://jasonmbaker.wordpress.com/xmlrpc.php CRLF
CRLF
<!DOCTYPE·html·PUBLIC·"-//W3C//DTD·XHTML·1.0·Transitional//EN"·"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...HTML goes here...

There are two important parts here:  the request/response line and the headers.  Just in case you’re wondering, the CRLF is a special kind of newline.  

The Request line

The request line will usually be in this general form:

<method> <path> HTTP/<version> CRLF

There are three parts to be concerned with :

  • method – the HTTP method we’re using.  A full discussion of all of these methods would be rather lengthy.  The vast majority of webpages are requested using HTTP GET or POST.  I have a full discussion of the differences between these two methods here.
  • path – this is the path to the page we’re requesting. Usually, this is only the path part of the URI and nothing more.  There’s a simple reason for this.  By the time your web browser has connected to my blog, the server presumably already knows that it’s at jasonmbaker.wordpress.com.  Since this isn’t always the case though, this is passed either in the Host header or sometimes in the path depending on circumstances.
  • version – the version of HTTP we’re using.  Usually this will be HTTP 1.0 or 1.1, but you will sometimes run into antiquated HTTP 0.9 clients and servers.

The Response line

The response line will look like this:

HTTP/<version> <response code> CRLF

Here’s how that breaks down:

  • version – The version of HTTP.  See above.
  • response code – This indicates whether the server successfully found the requested page, if there was an error, or if the client needs to be redirected.  If it found the page, it will return 200 OK.  Otherwise, it will return some other code like the infamous 404 Not Found or a 302 Found if there is a redirect to be done.

 

HTTP Headers

An HTTP header will usually be of this form:

<header name>:  <header value> CRLF

Headers are basically just “metadata” about the request.  They include information about the encoding of the data, the browser requesting the page, and the server returning the page.  HTTP was designed to be extensible, so you will frequently run into headers that aren’t specified in the original RFC.

Form Data

Sometimes webpages will require additional data to return a webpage.  There are two ways to do this:  in the query string and in the body of the request.

The query string

In the case of HTTP GET and a couple of other HTTP methods, this data will be passed through the query string.  This request will look something like this:

GET /?page=123 HTTP/1.1 CRLF
Host: jasonmbaker.wordpress.com CRLF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 CRLF
Connection: close CRLF

The body

HTTP POST requests and all responses will pass data through the body.  An HTTP POST request will look something like this:

POST / HTTP/1.1 CRLF
Host: jasonmbaker.wordpress.com CRLF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 CRLF
Connection: close CRLF
CRLF
page=123

Notice that there are two CRLFs between the HTTP headers and the body.

Gotchas

Here are some of the things that will cause problems if you deal with HTTP often enough:

  1. HTTP is selectively case sensitive.  Essentially, HTTP header names are not case sensitive.  This means that a server has to be prepared to treat CONTENT-ENCODING, content-encoding, and cOnTeNt-EnCoDiNg exactly the same.
  2. Slashes on the end DO matter.  For example, http://www.google.com/index.html and http://www.google.com/index.html/ are different URIs.  Unless you’re trying to be tricky, you usually want to make these point to the same thing.
  3. The www matters.  For example, http://www.google.com and http://google.com are not only different URIs, they might even point to different servers.  Usually, people expect these to be the same.
  4. Path handling is harder than it looks.  For example, what happens if I want to join “/2009/05” and “10” to make “/2009/05/10”?  I can’t just concatenate those two strings together because then I would get “2009/0510.”  Nor can I arbitrarily append slashes because then I could end up with something like “/2009/05//10” if I’m not careful.

Conclusion

So, you probably know more about HTTP than you ever wanted to know.  For what it’s worth, HTTP is a bit of an antiquated protocol with a lot of “historical” features.  But it does the job it was intended to do and it does it well.

If you find any inaccuracies, please post them in the comments.  But bear in mind that I intended this to be for a broad audience, so there might be a few points that I oversimplified for the sake of simplicity.  If you want to fill in the holes, there’s not really any other place to look than the HTTP specifications (RFC 1945 for HTTP 1.0 and  RFC 2616 for HTTP 1.1).  If you’re new to HTTP, I’d highly recommend looking at the HTTP 1.0 specification first as it’s about a third as complex as HTTP 1.1.

Posted in Networking, Programming | Tagged: , , | 3 Comments »

Make sure you use the right method

Posted by Jason Baker on April 29, 2009

Pop quiz:

What is the difference between HTTP POST and HTTP GET?

If your guess is that GET sends data via the query string while POST send data in the request’s body, you are… absolutely correct.  And I hate your guts.  Why is this?  It’s not that what you’re saying is wrong by any stretch of the imagination.

Idem-what?

I hate you because while GET and POST do send data differently, that’s not the biggest difference between the two.  The biggest difference is that POST methods are idempotent.  What does this piece of programmer-ese mean?  How many times have you seen something similar to this Chrome form?

 

confirm form resubmission

confirm form resubmission

On poorly designed websites, you probably see it a lot.  Web development newbies will often ask how to disable this page (or its Firefox equivalent).  There’s a really simple answer for this:  you don’t.  That warning is there for a reason.  You will see that warning every time you hit the back button to go to a page that was requested via HTTP POST.

The reason for this is that POST requests aren’t idempotent.  Idempotent is just a fancy word meaning that something won’t make changes.  As the HTTP 1.1 specification says:

In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered “safe”. This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

Suppose we’re writing software for a web forum, and we have a URI /forum/submit-message.  If we send data to this form, it will post a message.  Now, imagine that after submitting a message, the user visits /forum/view-message and then hits the back button.  They will go back to /forum/submit-message and send the same message over again if you were to use HTTP GET instead of POST.  However, if you were to use POST, that would be prevented by the maligned “Confirm Form Resubmission” page.  Sometimes annoying things can be useful, eh?

What’s the point?

The point is that nothing is more annoying than a website that uses GET and POST incorrectly.  And you don’t want to lose users by using the wrong one, do you?

Posted in Programming | Tagged: , , , , | Leave a Comment »

You need to worry about deployment

Posted by Jason Baker on April 29, 2009

Oftentimes, people used to using PHP or Classic ASP will give up on Python because deploying Python scripts isn’t just a simple matter of copying and pasting files.  Usually, this isn’t because Python is making their lives difficult.  It’s more a matter of not thinking about how to deploy your scripts ahead of time.

Now, I’m far from a highly experience programmer, but I can tell you one thing.  Deployment is a detail that will come back to bite you if you don’t spend a little bit of time on it up front.  So here are a few pointers I’ve come up with after experiences in deploying Python scripts for web apps.

  1. Batch/Shell scripts are your friends.  A common objection is that a person just doesn’t have time to learn a new tool for deploying.  I have two responses to this: 1)  Drop that attitude otherwise you’ll never get anywhere as a programmer and 2) Don’t use them if you really don’t want to!  While batch and shell scripts aren’t the prettiest options, they’re a lot better than having nothing to automate deployment at all.  In fact, for the basic one or two page webapp, you can’t really do much better.
  2. If you invest some time in Continuous Integration, you won’t regret it.  I know what you’re saying.  Continuous Integration is a Java thing.  It’s too complicated.  And you’d be right to a point.  However, I would argue that making sense of the complication is worth your time.  It’s way too easy to deploy something that doesn’t work because somebody forgot to run their unit tests.
  3. Site-wide packages are evil.  If you aren’t already, you should really be taking advantage of virtualenv.  That is, unless of course you enjoy troubleshooting weird ImportErrors because of that egg you installed using the setuptools develop command a month ago and forgot to remove.
  4. Don’t underestimate the value of good docs.  Having good documentation is just one of those things that don’t become obviously necessary until it’s too late.  Don’t leave yourself trying to figure out how that one function you wrote a year ago works.  Write documentation as you go and use a tool like sphinx to turn it into a webpage.  This ties in with point 2.  Using Continuous Integration will make doc generation that much easier.

Admittedly, doing this stuff can be a pain.  And you might get scorn from co-workers for not having something ready the next day.  But it will be worth it.  You’ll be surprised at how much time you’ll save in the long run.

Posted in Programming | Tagged: , , , , , , , , | Leave a Comment »

The magic of python decorators

Posted by Jason Baker on April 25, 2009

Decorators in Python are one of the language’s more “magical” features.  Personally, I’ve tended to glaze over decorators in code because they always seemed to be fairly self-explanatory.  But how do you make your own?  Personally, I think understanding the uses of decorators and being able to write your own is one of the points where a python newbie transitions to a knowledgeable pythonista.

But what is a decorator?

In actuality, there’s not an actual language construct to define decorators.  Any function that takes a function as a parameter and returns one as a result may be used as a decorator.  These are known as “higher-order functions” in functional programming circles.

Chances are, you’ve already seen decorators in use and maybe even used them.  I’ll give you a common example of decorator usage:

class SomeClass(object):
     @property
     def x(self):
         return 5

>>> var = SomeClass()
>>> var.x
5

For those of you familiar with the concept of properties, it should be pretty straight-forward what’s going on here.  But where the heck did property come from?  I’ll give you a hint.  The above code works identically to this code:

class SomeClass(object):
    def x(self):
         return 5

    x = property(x)

 

I can write my own?!

Yes you can.  A lot of well-written libraries make very good use of decorators.  And given the right situation, the little bit of syntactic sugar they provide can do a lot of good.  But decorators aren’t just there for others to define.  Once you wrap your head around decorators, they can save you a lot of copy-and-pasting (which you’re not doing anyway, right?) when used in your own code.  To illustrate this, I want to show you a couple of very much real-world cases that I’ve found decorators to be useful.

Those pesky connection objects

I have a library that needs to call a few particular stored procedures in a SQL Server database.  Because my ORM doesn’t support stored procedures, I have to use straight adodbapi.  The calls look something like this:

import adodbapi

def LookupPerson():
    conn = adodbapi.connect(CONNECTION_STRING)
    try: 
        #do stuff here
    finally:
        conn.close()

This is all well and good for just one function.  But what happens when you need 4 or 5 of these?   And what about the visual cruft that the try finally block is adding to the function (adodbapi doesn’t support with blocks before you ask)?  I’m sure you’ve already guessed the solution by now.  Here’s how you can solve this problem:

import adodbapi

def with_connection(func):
    def _exec(*args, **argd):
        conn = adodbapi.connect(CONNECTION_STRING) 
        try:
             func(conn, *args, **argd)
        finally:
             conn.close
     return _exec 

@with_connection
def LookupPerson(conn):
    #do stuff here

LookupPerson()  #conn argument is passed by the decorator

 I think that the simplification that happens with LookupPerson here should be obvious.  But what is the purpose of the *args and **argd shenanigans?  The documentation covers arbitrary argument lists in depth, so I won’t go into too much detail.  But what happens if I want to lookup a person by name?  The LookupPerson function would be transformed to this:

@with_connection
def LookupPerson(conn, name):
    #do stuff here

LookupPerson('Bob')  #conn argument is passed by the decorator
LookupPerson(name='Jill')

 In fact, I can decorate any function that takes any number of arguments either by keyword or by position.  Pretty neat, eh?

Error handling

When doing web applications, it’s pretty important to have decent error handling.  But setting this up can be a pain.  For instance, what if I wanted to make my django application print a wonderfully informative traceback to a log file?  I could do that like this:

from traceback import format_exc

def index(request):
     try:
          #do stuff
     except:
         logging.log(format_exc())
         return HttpResponseServerError('Error!')

But this definitely can become problematic.  What happens if you duplicate this code in all of your view functions and you want to make a change to your error handling?  The solution is simple:

def handle_errors(func):
     def _handler(*args, **argd):
          try:
               func(*args, **argd)
         except:
                logging.log(format_exc())
                return HttpResponseServerError('Error!')

@handle_errors
def index(request):
     #do stuff here

 As you can tell, this allows us to make our views worry about actually doing stuff instead of constantly handling errors.  Yes, there are also middlewares for doing this kind of thing.  But then I wouldn’t have a reason to make a blog post about python decorators, would I?

Conclusions

Ok, so I’ll admit something.  Python’s decorator syntax is ugly.  Its Java-like syntax alone may even be enough to scare some off.  But as I’ve show, there are at least a few real-world cases where they are useful.

What are some other decorators that you’ve found to be useful?

Posted in Programming | Tagged: , , | 2 Comments »

Software update hell part II

Posted by Jason Baker on April 24, 2009

I just wanted to point out that Kas Thomas has a very excellent blog post on automatic updates that is strangely similar to my own piece Software Update Hell.  As much as I’d like to use this as an opportunity to say “Great minds think alike,” I think this is more a case of software updates just being a total pain in the ass to install on some systems.

Posted in Programming | Tagged: , , , , , , , , | Leave a Comment »

Java vs C# vs Python vs Ruby: an “objective” analysis

Posted by Jason Baker on April 21, 2009

At my place of employment, we’re looking to migrate some of our old Classic ASP applications to something newer (yes, we still have actively maintained Classic ASP code). So my boss asked me to write up an analysis of the different options we have available.

Now before I give you the link, I have a few disclaimers and other random thoughts:

  • I tried to avoid editorializing so that this can be objective as possible, but it’s impossible to discuss these kinds of issues without being subjective.  Thus, don’t take this as the gospel truth.
  • There may be errors here.  In fact, I can almost guarantee that there are errors with Java and Ruby because I’m not terribly familiar with them.  If there are errors, feel free to leave comments.
  • My boss is a Java guy, so I left some blanks that I’m pretty sure he can fill in.
  • Some of these are blatant oversimplifications.  There’s only so much data that you can squeeze into a spreadsheet.
  • I’ll try and keep up with this for a while, but chances are that I won’t for long.  These languages are all being changed.
  • I’m biased towards Python.

Ok, without further ado, here’s the link:

http://spreadsheets.google.com/pub?key=p7efJLoHuYE-iw6JxBmpSQg&hl=en

Posted in Programming | Tagged: , , , , , , , | 12 Comments »

Finding a user’s group membership in Active Directory using Python

Posted by Jason Baker on March 17, 2009

I spent some time trying to figure out what the best way is to determine group membership in Active Directory using Python.  The solution is almost embarassingly easy, but can be difficult to find for someone who’s not familiar with win32 programming.  To make matters worse, google is almost totally unhelpful for this.  So here’s the solution:

import win32net
win32net.NetUserGetGroups('domain_name.com', 'username')

 

To do this, you’ll need pywin32 and a windows computer.  Calling win32net’s NetUserGetGroups function will return a list of tuples.  Each tuple will contain the group name as element 0 and an attributes flag as element 1.  Don’t ask me to elaborate any more on the attributes flag because I can’t.  🙂

Posted in Programming | Tagged: , , , , , , | 1 Comment »

A Case for People-Friendly Computers Part I

Posted by Jason Baker on March 6, 2009

From Toshiba’s Akimu Robotic Research Institute comes a rather funny story.  A robot they had programmed to emulate human emotions developed a bit of a strange problem:

After some limited environmental conditioning, Kenji first demonstrated love by bonding with a a stuffed doll in his enclosure, which he would embrace for hours at a time.

 

What they didn’t count on were the effects of several months of self-iteration within the complex machine-learning code which gave Kenji his initial tenderness. As of last week, Kenji’s love for the doll, and indeed anybody he sets his ‘eyes’ on, is so intense that Dr. Takahashi and his team now fear to show him to outsiders.

The trouble all started when a young female intern began to spend several hours each day with Kenji, testing his systems and loading new software routines. When it came time to leave one evening, however, Kenji refused to let her out of his lab enclosure and used his bulky mechanical body to block her exit and hug her repeatedly. The intern was only able to escape after she had frantically phoned two senior staff members to come and temporarily de-activate Kenji.

You can read the full story here.

Computers aren’t human

This is indeed a comical story, but those of us who are programmers laugh for different reasons from the rest of us:  we can see it happening.  There are two opposite but complementary problems here:

  1. Humans don’t understand computers very well.  Even the smartest of us fall into the trap of thinking that you can make a robot show affection by writing a “hug” procedure.  We forget that we will also need a “stop_hugging” procedure.
  2. Computers don’t understand humans very well.  They don’t understand that when a human programs it to hug things, they mean for there to be limits.

While these things seem obvious to most of you, us computer scientists tend to focus on the second point rather than the first.  In fact, I would argue that most of our efforts in making software user-friendly run contrary to the first point (this is a point I will elaborate more on later in this series).  

There’s a fundamental problem with this approach though.  Computers are wonderful inventions that are capable of so much.  But in terms of being able to understand humans, I don’t expect any big advances in the near future.  And barring a fundamental change in the way computers work, there probably won’t be many advances in the long term either.  On the other hand, I think that human beings are capable of so much more than programmers think they are (and indeed, they’re probably more capable than they themselves think they are).  So why do we keep beating a dead horse?

I’m starting this series of posts so that I can elaborate more on these points.  Some topics that I’d like to cover include:

  • What does it mean for software to be people-friendly?
  • What’s wrong with user-friendliness?
  • What can we do to make software more people-friendly?

And who knows?  I might even write something insightful along the way.

Posted in Programming | Tagged: , , , , | 1 Comment »

Am I a bad programmer?

Posted by Jason Baker on February 27, 2009

Ever wonder why it is that poor programmers always seem to be the ones that are the most confident in their abilities? I’ve always held the opinion that part of what makes programmers good is to sit down and ask themselves “Am I a good programmer?”

According to David Dunning and Justin Kruger, I may be on to something:

People who do things badly, Dr. Dunning has found in studies conducted with a graduate student, Justin Kruger, are usually supremely confident of their abilities — more confident, in fact, than people who do things well.


Unlike their unskilled counterparts, the most able subjects in the study, Dr. Kruger and Dr. Dunning found, were likely tounderestimate their own competence.

In other words, the incompetent don’t realize how incompetent they are. And possibly even more importantly, they don’t realize how competent everyone else is. This seems to explain why they seem to be so unwilling to listen to others’ ideas.

This serves as a reminder of two things:

  1. When you disagree with someone, make sure to spend a bit of time thinking about their point of view. This is common sense, but we all need to be reminded of this sometime.
  2. Next time you find yourself worrying about how poor a job you did on that one class/method/module/entire program, remember to tell yourself that the simple fact that you’re feeling this way is a good sign.

As it turns out, there is a solution for the overconfident but poor programmer. Unfortunately, it’s not an easy one: you have to make them into non-poor programmers. After all, if they can’t become better programmers, why do you employ them in the first place?

Related links:

Posted in Programming | Tagged: , , | Leave a Comment »