Jason in a Nutshell

All about programming and whatever else comes to mind

Archive for May, 2009

In defense of feature-lacking software

Posted by Jason Baker on May 24, 2009

Let it be known from this point forward that if I produce a piece of software, and the only complaint against that piece of software is that it lacks functionality, I will consider that complaint a badge of honor.  I’m going to give some examples to help demonstrate why this is.

Please bear in mind that the examples in this post are personal opinion. You are welcome to disagree with me.  In fact, I don’t necessarily agree with myself 100%.  There are valid reasons to choose all of the products I mention.

  • Google Chrome vs Firefox: I have yet to see a review of Chrome 2.0 that wasn’t titled something like “Google Chrome 2:  lightning fast, but lacking features.”  In that same light, anytime Chrome is brought up on reddit/Hacker News/Ars Technica, there is invariably several users whose comment is along the lines of “Boy, that Google Chrome sure is nice.  If only it had <insert Firefox addon here>.”  Interestingly enough, Firefox’s main selling point against IE was at one time “IE is way more bloated than Firefox.”
  • iPod vs Creative Zen: Perhaps this comparison is a bit dated, but there are similar comparisons between the iPhone and various other phones.  At any rate, most of us have run into an Apple anti-zealot at one time or another.  Most of the time they will advocate a player like the Creative Zen (or maybe the Zune).  Their arguments against the iPod usually revolve around what it can’t do compared to what their media player of choice can do.  They seem to wonder why so many people find the iPod “easier to use.”
  • Python vs Haskell: Again, most of the arguments I see against Python by the Haskell community (and various other functional programming communities) seem focused on features it lacks.  Usually, these complaints are about lack of tail-call optimization or compile-time type checking or side-effect prevention.  I still don’t understand Haskell.

What I hope you’re starting to pull from this analysis is that all features are tradeoffs.  Each and every feature you add to a piece of software is another block of code to maintain.  And as any experienced software developer will tell you, features are easy to add.  They’re exponentially more difficult to remove.

So what’s the solution to featuritis?  Well, if you learn one thing from this blog post, make it this:  In technology, simplicity trumps all but necessity. Of course, therein lies another problem:  what is simplicity?  There’s not really any good answer to that question.  However, I would advise going with whatever makes your product simplest to the person that will use it.

At any rate, unless you’re Microsoft or Google, I’d recommend abandoning the idea of an “all singing, all dancing” product.  Chances are, it won’t happen.

Posted in Programming | Leave a Comment »

Dear massive companies who are struggling to stay relevant: you’re not cool

Posted by Jason Baker on May 22, 2009

Jonathan Schwartz is talking about a new “App Store” for Java called Project Vector:

How will it work? Candidate applications will be submitted via a simple web site, evaluated by Sun for safety and content, then presented under free or fee terms to the broad Java audience via our update mechanism. Over time, developers will bid for position on our storefront, and the relationships won’t be exclusive (as they have been for search). As with other app stores, Sun will charge for distribution – but unlike other app stores, whose audiences are tiny, measured in the millions or tens of millions, ours will have what we estimate to be approximately a billion users. That’s clearly a lot of traffic, and will position the Java App Store as having just about the world’s largest audience.

When I read this, I couldn’t help but think of the zune.  You know, the portable MP3 player that’s arguably as good or better than the iPod?  I like to imagine a couple of Microsoft executives having a conversation like this:

Exec 1: Hey, have you seen this newfangled “iPod” contraption Apple’s selling?
Exec 2: My daughter has one.  What does it do again?
Exec 1: Dunno.  But I hear that they’re creating a music revolution or something like that.
Exec 2: Really?  Well if we made one, it would really help us connect with the kids.  How do we make one?
Exec 1: Dunno.  Let’s just let the engineers worry about it.   Then all we have to do is spend a few million on marketing and it will sell because we’re Microsoft and our software is on everyone’s computer.

As I’m sure you’ve guessed, the Zune has pretty much turned out to be everything short of a complete and total failure.  And just about all of Microsoft’s attempts to be cool have gone the same route.  And it seems like Sun is trying to follow in their footsteps.

What’s in a name?

Well, a lot is in a name actually.  The thing is, your name has to mean something other than “big company whose software I’m forced to install on my computer.”  I’m pretty sure a conversation similar to Microsoft’s iPod conversation happened at Sun.  Having billions of people installing your software makes a difference if you’re trying to get people to accidentally install crappy toolbars they won’t know how to uninstall.  It takes a little bit more to get people to actually buy a product.

Apple’s App Store is successful for a different reason.  They’re smart enough to know that it takes more than a successful platform to sell apps.  In fact, it’s the other way around.  Lots of good third-party apps will make a platform successful.  Sun is in the exact opposite situation.  They have a successful platform.  Now how can they make it profitable?

How can Project Vector succeed?

Easy.  Sun needs to play to their strengths instead of making a “me too” product in a field they know nothing about.  There are two potential areas where Sun could make an AppStore succeed:  software development and mobile applications.

Software Development

Let’s face it:  getting a development enviornment set up takes work.  For starters, you have to select your tools and your libraries.  Then, you have to purchase and install them.  And any programmer will tell you that this process is harder than it sounds.  Lots of programming languages have helped solve this problem by setting up repositories where you can download libraries and development tools pretty easily.  Perl arguably wouldn’t be in existance today if not for CPAN.

Why not make a CPAN for Java where developers may buy and sell tools like these?  There are already plenty of killer apps already available for it too.  For example, imagine how much of a time saver it would be if you could install Eclipse or NetBeans with the click of a mouse.  And imagine how lucrative it would be for people who make plugins for these programs.

Mobile Applications

As Schwartz has noted, Java exists on plenty of mobile devices.  Now imagine if the makers of these mobile devices could have their own App Stores pre-made by Sun?  Granted, I’m sure they will want their share of the profits, but surely they’re smart enough to know that one of the reasons they’re getting kicked in the teeth by Apple is because of its App Store.

Update: Here’s another take on the idea that I find pretty interesting.

Posted in Programming | Leave a Comment »

Can we please stop comparing Wolfram Alpha to Google?

Posted by Jason Baker on May 20, 2009

What if I told you about a website that can:

  • Build a knowledge base and pull data from it based on search text.
  • House the expertise of many of the worlds’ foremost authorities on a subject.
  • Give you a link or two to more information about that subject.

Does that website sound like a Google killer to you?  If you said yes, you’d be completely wrong.  I was actually talking about Wikipedia.  Now, it sounds really stupid to compare Wikipedia and Google.  And there’s a good reason for that:  it is stupid.

However, when you focus on the little bit of overlap in functionality that exists between the two services, it becomes a lot easier target for IT journalists and bloggers to spin into the next “Google killer.”  The comparison between Wikipedia and Google draws an obvious likeness to comparing Wolfram Alpha to Google.

After all, Wolfram Alpha is a website that can:

  • Calculate answers based on arbitrary queries.
  • Pull in data from the web to build a knowledge base.
  • Answer many questions that we use Google for currently.

However, there’s one thing that Wolfram Alpha can’t do:  search.  In fact, the first of their FAQs is “Is Wolfram|Alpha a search engine?”  The answer is obviously no.  Can Wolfram Alpha help me find that one really good article or blog post I’ve forgotten how to locate?  No.  Can it help me find more information about vague programming concepts?  No.

So why do people keep asking the question Is Wolfram Alpha a Google-killer?  There’s one common thread that ties bloggers and journalists together:  the need for attention.  Calling Wolfram Alpha a Google-killer is grandstanding, plain and simple.  Maybe this kind of grandstanding was acceptable when the details on Wolfram Alpha were sketchy, but it’s not now.  Thus, I would be suspicious of anyone who actually takes the question seriously.  And yes, that does apply even if their answer is no.

Posted in Blogging | Tagged: , | 1 Comment »

Programming and safe theater or Things you have to do but shouldn’t if you can avoid doing them at any cost

Posted by Jason Baker on May 14, 2009

Joel’s (in?)famous article saying never to rewrite your software seems to be circulating again on Hacker News.  Essentially, what Joel is getting at is that software companies must never, ever under any circumstances rewrite their own software.  I must say that I agree with Joel for the most part.  While I agree with most of  what Joel says, I disagree with the ultimate conclusion that Joel draws from it.

I’m reminded of a scene from Hamlet 2 (which I’m paraphrasing from memory here):

Student:  I was thinking:  what if we had lowriders come across the stage in the final scene?

Mr. Marschz:  That sounds dangerous.

Student:  Ok, nevermind.

Mr. Marschz:  No, I’m not doing safe theater.  Let’s do it!

Joel’s stance is “safe theater” for programmers.  

Risk

Still not convinced?  Does safe theater sound just fine with you?  Willing to quote Joel’s stance on this unwaveringly?  Well, I counter your Spolsky with a Kay!

I believe that the only kind of science computing can be is like the science of bridge building. Somebody has to build the bridges and other people have to tear them down and make better theories, and you have to keep on building bridges. –Alan Kay, quoted from  A Conversation with Alan Kay

Granted, I don’t necessarily agree with Alan 100% either.  I think there’s a happy medium here.  When you get down to it, the risks in rewriting your software that Joel mentions are very real risks.  And several companies have found this out the hard way.

The problem is that you can’t innovate without taking risks, and a software rewrite is the ultimate risk.  Unfortunately for us, we as programmers are in the business of innovation.  Unwillingness to take risks is a sure route to becoming a “greybeard” who only codes COBOL on mainframes because it’s what they know.  I don’t know about you, but that’s not the kind of career that I had envisioned.

So you’re saying I should rewrite, correct?

Remember, when I said that I agree with Joel for the most part, I meant it.  A complete software rewrite is a crazy and maybe even a stupid move.  If you really want to innovate, sometimes you have to do things that are crazy and stupid.  But remember that the decision to completely rewrite a piece of software can be a job-ending or even business-ending move.  So if a blog post by me is enough to convince you to do the rewrite by itself, don’t do it.

Credit where credit is due

In fairness, I don’t think even Joel 100% believes software should never be rewritten.  In a book, he praises Microsoft for writing .net even though it violates his “never rewrite” rule.  Granted, this is because Microsoft is “a center of gravity,” which is something that we can say about very few companies.  

But even though this is a very small exception to Joel’s rule, it’s still enough to say that Joel doesn’t believe his blog post 100%.  But even if he still believes it 99.9%, there’s still 0.1% of leeway.  And never underestimate what a really good programmer can do with that 0.1%.

Posted in Programming | Tagged: , , , , | Leave a Comment »

HTTP and you

Posted by Jason Baker on May 10, 2009

I was kind of surprised by the number of people told me that they weren’t aware of the differences between HTTP POST and HTTP GET that my last post highlighted.  Not everyone who does web design and/or development has had a formal education on this kind of thing, so I’d like to focus a little bit more on the basics of HTTP.  A full summary of the HTTP protocol would take a couple hundred pages (or 175 to be exact).

In a lot of ways, doing web development and/or design without knowing how this stuff works is a bit like doing Calculus without knowing how addition and subtraction work.  True, you probably won’t ever need it.  But you would be surprised at how many questions can be answered by having a basic understanding of HTTP.

Anatomy of a URI

 

As this helpful diagram of the URI shows, there are 5 basic parts:

  • scheme – This is the protocol that we’re using to access whatever this UR – I represents.  For obvious reasons, we’re only interested in http schemes.
  • username/password – This isn’t really used much in the context of HTTP, but it should be pretty self explanatory.
  • hostname – This essentially tells us what computer we’re accessing.  This can be either an IP address (ex: 209.85.171.100 if you’re using IPv4) or a domain name (google.com).
  • port – This is the port number on the server we’re pulling data from.  In the context of HTTP this will usually be port 80, but occasionally it will be something different.  Also bear in mind that this may be different depending on the scheme (for example, FTP will be port 21 by default).
  • path – This represents where the website “lives” on the server.  It was largely designed for representing files and directories on a file system, but it’s worth mentioning that this part is ultimately little more than arbitrary text that may be interpreted by the server however it wishes.

Anatomy of an HTTP request

When you access my blog via HTTP, your browser sends an HTTP request that looks something like this:

GET / HTTP/1.1 CRLF
Host: jasonmbaker.wordpress.com CRLF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 CRLF
Connection: close CRLF

Your browser will receive a response that looks something like this (bonus:  there’s one header I left out.  Can you guess which one it is?  I hear there might be job offers if you can figure it out.):

HTTP/1.1 200 OK CRLF 
Server: nginx CRLF
Date: Sun, 10 May 2009 23:16:28 GMT CRLF
Content-Type: text/html; charset=UTF-8 CRLF
Transfer-Encoding: chunked CRLF
Connection: close CRLF
Vary: Cookie CRLF
X-Pingback: https://jasonmbaker.wordpress.com/xmlrpc.php CRLF
CRLF
<!DOCTYPE·html·PUBLIC·"-//W3C//DTD·XHTML·1.0·Transitional//EN"·"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...HTML goes here...

There are two important parts here:  the request/response line and the headers.  Just in case you’re wondering, the CRLF is a special kind of newline.  

The Request line

The request line will usually be in this general form:

<method> <path> HTTP/<version> CRLF

There are three parts to be concerned with :

  • method – the HTTP method we’re using.  A full discussion of all of these methods would be rather lengthy.  The vast majority of webpages are requested using HTTP GET or POST.  I have a full discussion of the differences between these two methods here.
  • path – this is the path to the page we’re requesting. Usually, this is only the path part of the URI and nothing more.  There’s a simple reason for this.  By the time your web browser has connected to my blog, the server presumably already knows that it’s at jasonmbaker.wordpress.com.  Since this isn’t always the case though, this is passed either in the Host header or sometimes in the path depending on circumstances.
  • version – the version of HTTP we’re using.  Usually this will be HTTP 1.0 or 1.1, but you will sometimes run into antiquated HTTP 0.9 clients and servers.

The Response line

The response line will look like this:

HTTP/<version> <response code> CRLF

Here’s how that breaks down:

  • version – The version of HTTP.  See above.
  • response code – This indicates whether the server successfully found the requested page, if there was an error, or if the client needs to be redirected.  If it found the page, it will return 200 OK.  Otherwise, it will return some other code like the infamous 404 Not Found or a 302 Found if there is a redirect to be done.

 

HTTP Headers

An HTTP header will usually be of this form:

<header name>:  <header value> CRLF

Headers are basically just “metadata” about the request.  They include information about the encoding of the data, the browser requesting the page, and the server returning the page.  HTTP was designed to be extensible, so you will frequently run into headers that aren’t specified in the original RFC.

Form Data

Sometimes webpages will require additional data to return a webpage.  There are two ways to do this:  in the query string and in the body of the request.

The query string

In the case of HTTP GET and a couple of other HTTP methods, this data will be passed through the query string.  This request will look something like this:

GET /?page=123 HTTP/1.1 CRLF
Host: jasonmbaker.wordpress.com CRLF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 CRLF
Connection: close CRLF

The body

HTTP POST requests and all responses will pass data through the body.  An HTTP POST request will look something like this:

POST / HTTP/1.1 CRLF
Host: jasonmbaker.wordpress.com CRLF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 CRLF
Connection: close CRLF
CRLF
page=123

Notice that there are two CRLFs between the HTTP headers and the body.

Gotchas

Here are some of the things that will cause problems if you deal with HTTP often enough:

  1. HTTP is selectively case sensitive.  Essentially, HTTP header names are not case sensitive.  This means that a server has to be prepared to treat CONTENT-ENCODING, content-encoding, and cOnTeNt-EnCoDiNg exactly the same.
  2. Slashes on the end DO matter.  For example, http://www.google.com/index.html and http://www.google.com/index.html/ are different URIs.  Unless you’re trying to be tricky, you usually want to make these point to the same thing.
  3. The www matters.  For example, http://www.google.com and http://google.com are not only different URIs, they might even point to different servers.  Usually, people expect these to be the same.
  4. Path handling is harder than it looks.  For example, what happens if I want to join “/2009/05” and “10” to make “/2009/05/10”?  I can’t just concatenate those two strings together because then I would get “2009/0510.”  Nor can I arbitrarily append slashes because then I could end up with something like “/2009/05//10” if I’m not careful.

Conclusion

So, you probably know more about HTTP than you ever wanted to know.  For what it’s worth, HTTP is a bit of an antiquated protocol with a lot of “historical” features.  But it does the job it was intended to do and it does it well.

If you find any inaccuracies, please post them in the comments.  But bear in mind that I intended this to be for a broad audience, so there might be a few points that I oversimplified for the sake of simplicity.  If you want to fill in the holes, there’s not really any other place to look than the HTTP specifications (RFC 1945 for HTTP 1.0 and  RFC 2616 for HTTP 1.1).  If you’re new to HTTP, I’d highly recommend looking at the HTTP 1.0 specification first as it’s about a third as complex as HTTP 1.1.

Posted in Networking, Programming | Tagged: , , | 3 Comments »