Jason in a Nutshell

All about programming and whatever else comes to mind

This blog has moved!

Posted by Jason Baker on October 10, 2009

Here’s my new blog.

Posted in Programming | Leave a Comment »

The Rise of SYDNI, or YAGNI is Only About Problems, Not Solutions

Posted by Jason Baker on September 6, 2009

I’ve got a new programming methodology to propose. I call it SYDNI
(Sometimes You Do Need It). It is a response to the problems that I
see with YAGNI. In fairness, I don’t dislike YAGNI. In fact, I agree
with it 100% (well, maybe 95%). But to truly appreciate it, you need
a bit of context.

 

On YAGNI

 I’ve almost started thinking of YAGNI almost as a recursive way of
thinking. That is to say that I’ve begun to think of YAGNI as being
something that uses itself to implement itself. Allow me to explain.

 

What is YAGNI?

 YAGNI
stands for “you ain’t gonna need it.” I don’t want to make this post
an in-depth discussion of what YAGNI actually is, so click the
Wikipedia link if you aren’t familiar with YAGNI. The important thing
to take away from reading about YAGNI is that it’s saying that you
shouldn’t implement functionality if you don’t need it.

 

What YAGNI ISN’T

 YAGNI sounds like a pretty straightforward way of thinking. And in a
lot of ways it is. But it’s more nuanced than one may think at first.
 The “recursive” element of YAGNI that I speak of above is that YAGNI
(in my opinion) is a very specific solution to a very specific
problem, and that problem is over-engineering.

 And YAGNI does its job well (especially in the context of Test Driven
Development). I tend to find myself throwing out a lot less code when
using YAGNI.

 A lot of people take YAGNI to mean that the simplest solution is
always the best. That isn’t the case. Or at a very minimum, that
shouldn’t be the case. There’s a key thing about simplicity
that should be understood: it’s defined by the problem, not the
solution. This is key to understanding why YAGNI is so useful. Once
you’ve gotten to the point of choosing a solution, YAGNI is no help to
you. Thus, you have to use YAGNI to choose problems, not solutions.

 

You’re not in school anymore

 In school, things are always so simple. You’re assigned a problem.
And you’re given a grade based on how well you solved that problem.
The real world is more complex.

 You see, people too often forget that software developers don’t just
define solutions to problems. After all, aren’t all feature requests
nothing more than a statement of a problem? And isn’t choosing
software features a decision about what problems you will solve?

 However, once you’ve chosen a problem to solve, there’s still the
issue of how to solve it.

 

Sometimes You Do Need It

 In solving a problem, YAGNI’s usefulness starts to fade. It does have
some importance. You do have to make sure your solution is solving
the problem you set out to solve. However, beyond that, YAGNI just
doesn’t apply. In fact, it is likely harmful. That’s where SYDNI
comes in. Although SYDNI’s name is something of a jab at YAGNI, the
principle itself isn’t. Instead, SYDNI can be thought of as a
complement to YAGNI. A yin to YAGNI’s yang (alliteration for the
win).

 Oftentimes, thoughts may enter your head that start with something
like “we’ll never need…” or “this will never have to…”. This kind
of thinking is helpful when choosing the problem to solve. However,
it’s destructive when choosing a solution. In a couple of years,
there is only one thing that will be certain about the software you’re
writing: it will be different. And it will be different in ways you
can’t have predicted or imagined. If you’re using YAGNI
appropriately, you’re choosing the easiest problems to solve.
However, at least a few of these problems come out of left field.

 Therefore, I would put SYDNI this way: ideally, a piece of software
will be no more simple or complex than the problem it is trying to
solve. Therefore, there is danger not only in solutions that are
overly complicated, but there is also danger in solutions that are
overly simple.

 This leads to another conclusion: if SYDNI is followed appropriately,
the complexity of your source code is a direct measure of how complex
the problems it is solving are. The reverse is true as well. The
complexity of the problems you’re solving is a direct measure of how
complex your source code will be.

 

But I don’t live in an ideal world!

 The key hole in SYDNI is the word “ideally”. Unfortunately, some
problems just don’t have perfectly compatible solutions. Therefore, a
key decision to be made is whether it is better to err on the side of
over-engineering or under-engineering a solution. We are now delving
into the realm of many disputes between programmers. Many people
(mis)educated on the arts of YAGNI will say that it is always better
to tend towards under-engineering. If this were true, YAGNI wouldn’t
be as useful to as many people as it has been.

 Even more unfortunately, there is no “one size fits all” answer of
whether it is better to over-engineer or under-engineer. It is highly
situational and care must be taken to arrive at the appropriate
solution. If you don’t believe me, consider the following two
questions:

  1. Which life support machine would you rather be hooked up to?
  * A machine whose software developers always did the simplest
thing possible
  * A machine whose software developers went out of their way to
anticipate possible problems and planned for each of them
 2. Which one-page web app do you feel would be easiest to maintain?
  * An application that is implemented as two or three source files
and a few database tables
  * An application with a highly normalized database, highly
modular source, and great flexibility

 I should hope that the answer to number 1 is obvious. And why it is
the correct answer should also be obvious: if you missed a particular
contingency, people can die. Thus, it makes sense to err on the side
of over-engineering.

 But number 2 is a little bit less obvious (and maybe more debatable).
However, I would err on the side of under-engineering. After all, no
matter what changes come up, a one-page web app is still a one-page
web app. The worst case is that the app would be rewritten from
scratch. That’s not to say that you need to throw caution into the
wind and ignore normal good practice. Rather, it’s saying that it’s
not really a good idea to stress much over how maintainable that
application is.

 Therefore, when deciding on a solution, there are two things that need
to be decided upon beforehand:

  1. How complex the problem is.
 2. Whether under-engineering is more harmful than over-engineering.

 Once you get those two things squared away, it should be easy to get
an idea of how complex the solution should be.

Posted via email from Jason in a Nutshell

Posted in Programming | Leave a Comment »

The relational model: of tuples, relations, rows, and tables

Posted by Jason Baker on July 5, 2009

So, whilst looking at the answers to a question I had about the relational model on StackOverflow, I stumbled across this question.  Before I educated the unwashed masses with my enlightened answer (kidding of course; I’m hardly an expert on the relational model), the answers that were there were mostly correct.  In fact I’d say that they were about 90% correct.  But as programmers (should) know, being 10% wrong can sometimes be as (if not more) dangerous as being 100% wrong.  I’d like a chance to expand my answer a bit beyond tuples and rows.

I’ll summarize the answers I saw with one statement:  “A relation is a SQL table.  A tuple is a SQL row.”  Now, you may be shocked to find out that this isn’t totally correct.  But then that would mean you didn’t read the first paragraph, asshole.  At any rate, it is true that a relation is a table’s closest analog in the relational model and that a tuple is a row’s closest analog in the relational model.  But that’s a bit like saying that a dog is a wolf’s closest analog in the domestic household.  Though there may be some similarities, they’re still different animals.

Tuples vs Rows

(This section is largely the same as my SO answer)

Tuples are unordered sets of known values with names (and they’re not quite the same as tuples in different fields of mathematics). Thus, the following tuples are the same thing (I’m using an imaginary tuple syntax since a relational tuple is largely a theoretical construct):

(x=1, y=2, z=3)
(z=3, y=2, x=1)
(y=2, z=3, x=1)

…assuming of course that x, y, and z are all integers. Also note that there is no such thing as a “duplicate” tuple. Thus, not only are the above equal, they’re the same thing. Lastly, tuples can only contain known values (thus, no nulls).

A row is an ordered set of known or unknown values with names (although they may be omitted).  Now, you may not realize it, but any set of values in a set of parenthesis is a row.  In fact, single values are converted into single-valued rows without the parenthesis.  Thus, the following queries are equivalent:

SELECT x, y, z FROM point WHERE x = 1

SELECT x, y, z FROM point WHERE (x) = (1)

SELECT x, y, z FROM point WHERE ROW(x) = ROW(1)

Therefore, the following comparisons return false in SQL:

(1, 2, 3) = (3, 2, 1)
(3, 1, 2) = (2, 3, 1)

Note that there are ways to “fake it” though. For example, consider this INSERT statement:

INSERT INTO point VALUES (1, 2, 3)

This may be rewritten into either of the two following queries:

INSERT INTO point (x, y, z) VALUES (1, 2, 3)

INSERT INTO point (y, z, x) VALUES (2, 3, 1)

…but all we’re really doing is changing the ordering rather than removing it.

And also note that there may be unknown values as well. Thus, you may have rows with unknown values:

(1, 2, NULL) = (1, 2, NULL)

…but note that this comparison will always yield UNKNOWN. After all, how can you know whether two unknown values are equal?

And lastly, rows may be duplicated. In other words, (1, 2) and (1, 2) may compare to be equal, but that doesn’t necessarily mean that they’re the same thing.

So has the SQL part of this confused you yet?  If so, then you can probably begin to see how much simpler the relational model is.  I should note that I’m largely speaking in terms of the SQL standard, which very few (if any) SQL database is fully compliant with.  Therefore, your milage may vary depending on your vendor.

Relations

Before discussing relations in detail, there is one subject you should be familiar with.  And that is the dichotomy between relations and relation variables.

Relations and Relvars

Think about this query for a moment:

CREATE TABLE point (x INT, y INT, z INT)

What is point?  Your first guess may be “point is a table”.  And that guess would be correct in terms of the SQL standard.  But think about it this way.  In the following line of Python code, what is x?

x = 1

Is x 1?  Well, not really.  Rather, x is a variable that has a value of 1.  Our CREATE TABLE query above can be viewed in much the same manner.  Therefore, you can think of point as a variable that holds a table rather than a table.  So if point isn’t actually a table, how do I make one?  It’s actually rather simple.  In fact, you’ve probably been doing it all along without realizing it.  Ever wonder what the purpose of the VALUES keyword is in an INSERT statement?  I’ll give you a hint:  INSERT statements insert tables, not rows.  Thus, you can create a table like this:

VALUES (1, "foo"), (2, "bar"), (3, "baz")

Now, as I’ve pointed out, SQL doesn’t really make this distinction.  Both point and my table created using VALUES are both tables (I suppose you could think of the VALUES table as an “anonymous table”).  However, the relational model does make this distinction by way of relations and relation variables (relvars), and it’s very important to understand.  However, remembering to use the correct term can be difficult if you come from a SQL background.  I say this because I’m going to try to use the correct terminology, but will probably mess it up horribly.

Relations vs Tables

The difference between tables and relations is actually less complicated than you might think.  In fact, the difference can mostly be summarized by saying that tables have rows while relations have tuples.  This actually has some fairly important implications.

Most of the properties of tuples map into the properties of relations about how you would expect them to.  In other words, a relation is a set of tuples without duplicates or any concept of ordering.  A table is a set of rows that may have duplicates and does have a concept of ordering (in that columns will always follow a predetermined order).  However, there is another property of tuples that causes relations to behave differently from tables in a way you might not expect.

Remember how I said above that equal tuples represent the same thing?  Suppose I want to run the following SQL query:

UPDATE point SET x=2 WHERE x = 1

In SQL, this is a fairly straight-forward procedure.  If we had an imaginary database that worked in terms of tuples and relations, this query would be nonsense.  After all, if point is a relvar to a relation that contains the tuple (x=1, y=2, z=3), I can’t really change it to (x=2, y=2, z=3).  This is because the tuple (x=1, y=2, z=3) has always been the tuple (x=1, y=2, z=3) and always will be.  Thus, tuples and relations are immutable.  With that said, there is a way to do the same thing.  If we think about relations more in procedural terms, we can perform the above query like this (in pseudocode):

point = (point - (x=1, y=2, z=3)) + (x=2, y=2, z=3)

Similarly, a DELETE statement would look like this:

point = point - (x=1, y=2, z=3)

And an INSERT statement would look like this:

point = point + (x=1, y=2, z=3)

Therefore, you can’t change a relation.  You can change what relation a relvar holds, though.

Conclusion

This is a rather interesting topic to me that I’ve done some reading about.  However, I’m not a mathematician and I’m hardly an expert on the relational model.  Therefore think of this as being “The Relational Model for Dummies” in that it’s a good intro, but hardly teaches you everything you need to know about the relational model.  If this is a subject that interests you, I’d highly recommend learning more about it from the horse’s mouth.  I find CJ Date’s SQL and Relational Theory: How to Write Accurate SQL Code to be a very good book to read if you want to learn more on this.

Posted in Programming | Tagged: , , , | Leave a Comment »

IronPython in Action reviewed

Posted by Jason Baker on June 28, 2009

IronPython in Action by Michael Foord is a somewhat interesting book in that it isn’t really isn’t about IronPython.  It’s a book about programming in .Net using Python.  Although some will accuse me of being overly pedantic, there is a subtle difference.

After all, IronPython is just a piece of software.  Anytime you have a bridge between two languages and/or programming environments, there are always some “devil’s in the details” type issues that come up.  These are questions like:

  • Are IronPython strings the same thing as .net strings?
  • Can I use ASP.NET with IronPython?
  • Can you call any .Net code from IronPython?

Who it’s for

If you’re a new programmer, IronPython in action probably isn’t the book for you.  Although it does have an introduction to programming in Python, it’s very brief.

On the other hand, if you’re an experienced programmer who already knows what variables are and why classes are useful, you’ll likely find IronPython in Action a pretty good primer.  Although this book does have an introduction to both Python and .Net, I think you’d get the most benefit if you have experience with one of those technologies.

IronPython in Action in Action

IronPython in action covers three main areas:  Python, .Net, and Usage of IronPython.  I’d like to cover these sections individually.

Python

As mentioned, the book does include a brief tutorial on Python.  If you’re a total newbie to Python, you may want to keep the official Python tutorial nearby in case it goes too fast for you.

That said, the section on Python is fairly readable.  It even has diagrams if you’re the kind of person who learns visually:

A diagram from the

A diagram from the book

…although I’m pretty sure I could have figured out what a set of parenthesis and colon is on my own, thank you very much.

I’ve always thought that there were two layers to Python:  the “normal” stuff and the black magic.  Foord doesn’t just stick to the easy stuff either.  The deepest depths of the Python language are covered all the way up to metaclasses, the arch-typical example of Python voodoo.  This isn’t what makes this part of the book shine though.  Any idiot can learn how to use a metaclass by spending some time with the Python docs.  No, the place where this book really shines is in how it teaches these concepts.  The line between Python newbie and Python wizard is in knowing when to use the magical parts, and Foord does an excellent job of giving real world examples of these.

I also found it interesting to that the book mentions some popular third party Python libraries and tools.  For instance, the section on testing includes a summary of some test-runners such as nose to help simplify running tests.  As anyone will tell you, Python’s greatest advantage is its huge standard library and its even bigger set of third-party libraries.

In all, this isn’t just a good way to learn how to use IronPython.  This is a good introduction to the Python language itself.

.Net

After that, IronPython in action delves into how .Net works and how IronPython works with it.  Just as with the Python introduction, you won’t come out of this a .Net expert if you weren’t before.  However, it’s helpful if you’re not familiar with what an assembly is or want to know how the heck you’d use a generic with a language as dynamic as Python.  As it turns out, IronPython works pretty well with .Net in all but a few cases.

The book leaves no stone unturned as far as the .Net runtime is concerned.  It goes from the basics like using .Net classes all the way up to the dirty stuff like using P/Invoke and creating dynamic objects in C#/VB.NET.  In short, I can’t think of many features of the runtime itself that this book doesn’t cover.

I was a bit disappointed that third-party .Net tools didn’t get as much coverage as Python’s did.  For instance, ADO.NET simply isn’t enough to get a full picture of how to use databases with .Net.  Since most of the Python ORMs may not have very good support as they rely on C extensions, why not cover an ORM like NHibernate or Subsonic?  Or why not show how to write tests using NUnit in addition to Python’s unittest framework?  Or why not cover interoperability with other .Net languages like F# or Boo?

Usage

Arguably the best part of this book is in the sections “Core Development Techniques” and “IronPython and advanced .Net”.  This is where you get into the real meat of using IronPython.

IronPython in Action is a very practical book.  It teaches you not only the theory of using IronPython, but the practice as well.  It teaches you how to use IronPython to do test-driven development, how to read in XML files much more easily than in Python or C#, it even goes into detail on some of the hottest .Net technologies like WPF and WMI.  Virtually any type of programmer will get a brief introduction to doing the things they want to do in IronPython.

One area that I’d like to see covered more is GTK#.  It’s not quite as cool as WPF, but it’s cross-platform and much better than WinForms.  My experience has been that PyGTK+ can be a major pain to install on some platforms, so it would be helpful to have an introduction to the .Net equivalent.

Conclusions

In summary, if you’re a developer wanting to work with Windows technologies using Python there’s no question:  go and buy IronPython in Action.  Right now.  If you’re wanting to develop IronPython applications for various platforms or don’t want to tie yourself to just Microsoft technology… still go and buy IronPython in Action.  There are still some holes that can be filled, but all in all, this is a pretty solid book.

Posted in Programming | 1 Comment »

Windows for Python programmers: IIS

Posted by Jason Baker on June 4, 2009

Ok, so you finally talked your employer into letting you use Python.  Good job!  But there’s a catch:  it has to run on Windows.  This actually isn’t too problematic.  Python runs very well on windows.  Unfortunately, it isn’t always terribly well documented.

IIS

Ok, now I know that you’re envious of those *nix Python programmers.  Apache running mod_wsgi is a pythonista’s dream.  And newer servers like lighttpd and nginx are a pythonista’s wet dream.  You might as well forget they exist:  Apache runs on Windows, but it just wasn’t made for it.  And forget about lighttpd and nginx.  You’re stuck with IIS.  Don’t worry, it’s really not that bad.

Options for running Python

This is the most important part for obvious reasons.  You have several options:

Classic ASP

No really, I’m being serious.  If you need a quick and dirty way to serve Python applications, classic ASP is the way to go.  You’ll need python win32, but chances are you’ll be needing that some time anyway.  I’d recommend just installing ActivePython.  Besides the obvious ease of setup, there is another benefit to using Python this way:  compatibility with other Classic ASP applications.  After all, you’re a windows shop.  You have at least one Classic ASP application lying around still, don’t you?

There is documentation here.

FastCGI

IIS’s FastCGI module is made and officially supported by Microsoft.  Thus, this is the best option for you if your management’s usual response to using open source software usually involves them saying something along the lines of “take a shower, hippie!”  Unfortunately, it’s a bit difficult to find documentation on (at least for Python) and a lot of Python software isn’t made to run as a daemon on Windows (django is still reliant on forking for example).  Thus, I’d recommend it as a last resort.

ISAPI-WSGI

Of course the biggest downside of this software is that it can be a bit of a tongue-twister:  say ISAPI-WSGI five times fast.  Other than that, I’ve had pretty good luck with it.  It’s easy to set up, and the performance isn’t too shabby.  Plus, it’s open source.  Currently, this seems to be the best way to get Python running in IIS.  Now you just need to convince your boss that you do in fact take showers.

There are instructions for getting ISAPI-WSGI set up with Django and Pinax, TurboGears and CherryPy, and Pylons.

PyISAPIe

Unfortunately, I don’t know enough about this project to be able to say enough one way or another.  But it is an option.

Miscellaneous tricks

Here are a few tricks that I’ve found to be helpful:

Use Application Pools

You may be envious of mod_wsgi’s daemon mode and its ability to run applications in their own process, but IIS 6 and above actually have something better:  Application Pools.  You have the ability to restrict certain sets of applications to a particular pool of processes.  This is handy if you need to isolate certain Applications.

This is also handy because if you make any changes, you will need to restart the Python interpreter.  If you’ve got Python running on its own server, then you have nothing to worry about.  Otherwise, you’ll want a way to restart the Python interpreter without disturbing any other applications that may be running.  This is where Application Pools really shine because you can not only restart Python separately from the rest of the server, but you can also restart individual Python applications as well!

To learn more about creating Application Pools, see this technet article.  Also check out this article to get more info about how to recycle (restart) an Application Pool.

Use ActivePython

I’ve already mentioned this once, but it bears mentioning again.  Use ActivePython.  It comes pre-bundled with Python win 32, which you’ll probably be installing anyway.

Choose a good framework

I suppose I could probably write a separate blog post about choosing web frameworks, but that’s for a different time.  The good news is that I can sum it up by saying this:  if you’re not sure, choose Django.

Posted in Programming | 3 Comments »

In defense of feature-lacking software

Posted by Jason Baker on May 24, 2009

Let it be known from this point forward that if I produce a piece of software, and the only complaint against that piece of software is that it lacks functionality, I will consider that complaint a badge of honor.  I’m going to give some examples to help demonstrate why this is.

Please bear in mind that the examples in this post are personal opinion. You are welcome to disagree with me.  In fact, I don’t necessarily agree with myself 100%.  There are valid reasons to choose all of the products I mention.

  • Google Chrome vs Firefox: I have yet to see a review of Chrome 2.0 that wasn’t titled something like “Google Chrome 2:  lightning fast, but lacking features.”  In that same light, anytime Chrome is brought up on reddit/Hacker News/Ars Technica, there is invariably several users whose comment is along the lines of “Boy, that Google Chrome sure is nice.  If only it had <insert Firefox addon here>.”  Interestingly enough, Firefox’s main selling point against IE was at one time “IE is way more bloated than Firefox.”
  • iPod vs Creative Zen: Perhaps this comparison is a bit dated, but there are similar comparisons between the iPhone and various other phones.  At any rate, most of us have run into an Apple anti-zealot at one time or another.  Most of the time they will advocate a player like the Creative Zen (or maybe the Zune).  Their arguments against the iPod usually revolve around what it can’t do compared to what their media player of choice can do.  They seem to wonder why so many people find the iPod “easier to use.”
  • Python vs Haskell: Again, most of the arguments I see against Python by the Haskell community (and various other functional programming communities) seem focused on features it lacks.  Usually, these complaints are about lack of tail-call optimization or compile-time type checking or side-effect prevention.  I still don’t understand Haskell.

What I hope you’re starting to pull from this analysis is that all features are tradeoffs.  Each and every feature you add to a piece of software is another block of code to maintain.  And as any experienced software developer will tell you, features are easy to add.  They’re exponentially more difficult to remove.

So what’s the solution to featuritis?  Well, if you learn one thing from this blog post, make it this:  In technology, simplicity trumps all but necessity. Of course, therein lies another problem:  what is simplicity?  There’s not really any good answer to that question.  However, I would advise going with whatever makes your product simplest to the person that will use it.

At any rate, unless you’re Microsoft or Google, I’d recommend abandoning the idea of an “all singing, all dancing” product.  Chances are, it won’t happen.

Posted in Programming | Leave a Comment »

Dear massive companies who are struggling to stay relevant: you’re not cool

Posted by Jason Baker on May 22, 2009

Jonathan Schwartz is talking about a new “App Store” for Java called Project Vector:

How will it work? Candidate applications will be submitted via a simple web site, evaluated by Sun for safety and content, then presented under free or fee terms to the broad Java audience via our update mechanism. Over time, developers will bid for position on our storefront, and the relationships won’t be exclusive (as they have been for search). As with other app stores, Sun will charge for distribution – but unlike other app stores, whose audiences are tiny, measured in the millions or tens of millions, ours will have what we estimate to be approximately a billion users. That’s clearly a lot of traffic, and will position the Java App Store as having just about the world’s largest audience.

When I read this, I couldn’t help but think of the zune.  You know, the portable MP3 player that’s arguably as good or better than the iPod?  I like to imagine a couple of Microsoft executives having a conversation like this:

Exec 1: Hey, have you seen this newfangled “iPod” contraption Apple’s selling?
Exec 2: My daughter has one.  What does it do again?
Exec 1: Dunno.  But I hear that they’re creating a music revolution or something like that.
Exec 2: Really?  Well if we made one, it would really help us connect with the kids.  How do we make one?
Exec 1: Dunno.  Let’s just let the engineers worry about it.   Then all we have to do is spend a few million on marketing and it will sell because we’re Microsoft and our software is on everyone’s computer.

As I’m sure you’ve guessed, the Zune has pretty much turned out to be everything short of a complete and total failure.  And just about all of Microsoft’s attempts to be cool have gone the same route.  And it seems like Sun is trying to follow in their footsteps.

What’s in a name?

Well, a lot is in a name actually.  The thing is, your name has to mean something other than “big company whose software I’m forced to install on my computer.”  I’m pretty sure a conversation similar to Microsoft’s iPod conversation happened at Sun.  Having billions of people installing your software makes a difference if you’re trying to get people to accidentally install crappy toolbars they won’t know how to uninstall.  It takes a little bit more to get people to actually buy a product.

Apple’s App Store is successful for a different reason.  They’re smart enough to know that it takes more than a successful platform to sell apps.  In fact, it’s the other way around.  Lots of good third-party apps will make a platform successful.  Sun is in the exact opposite situation.  They have a successful platform.  Now how can they make it profitable?

How can Project Vector succeed?

Easy.  Sun needs to play to their strengths instead of making a “me too” product in a field they know nothing about.  There are two potential areas where Sun could make an AppStore succeed:  software development and mobile applications.

Software Development

Let’s face it:  getting a development enviornment set up takes work.  For starters, you have to select your tools and your libraries.  Then, you have to purchase and install them.  And any programmer will tell you that this process is harder than it sounds.  Lots of programming languages have helped solve this problem by setting up repositories where you can download libraries and development tools pretty easily.  Perl arguably wouldn’t be in existance today if not for CPAN.

Why not make a CPAN for Java where developers may buy and sell tools like these?  There are already plenty of killer apps already available for it too.  For example, imagine how much of a time saver it would be if you could install Eclipse or NetBeans with the click of a mouse.  And imagine how lucrative it would be for people who make plugins for these programs.

Mobile Applications

As Schwartz has noted, Java exists on plenty of mobile devices.  Now imagine if the makers of these mobile devices could have their own App Stores pre-made by Sun?  Granted, I’m sure they will want their share of the profits, but surely they’re smart enough to know that one of the reasons they’re getting kicked in the teeth by Apple is because of its App Store.

Update: Here’s another take on the idea that I find pretty interesting.

Posted in Programming | Leave a Comment »

Can we please stop comparing Wolfram Alpha to Google?

Posted by Jason Baker on May 20, 2009

What if I told you about a website that can:

  • Build a knowledge base and pull data from it based on search text.
  • House the expertise of many of the worlds’ foremost authorities on a subject.
  • Give you a link or two to more information about that subject.

Does that website sound like a Google killer to you?  If you said yes, you’d be completely wrong.  I was actually talking about Wikipedia.  Now, it sounds really stupid to compare Wikipedia and Google.  And there’s a good reason for that:  it is stupid.

However, when you focus on the little bit of overlap in functionality that exists between the two services, it becomes a lot easier target for IT journalists and bloggers to spin into the next “Google killer.”  The comparison between Wikipedia and Google draws an obvious likeness to comparing Wolfram Alpha to Google.

After all, Wolfram Alpha is a website that can:

  • Calculate answers based on arbitrary queries.
  • Pull in data from the web to build a knowledge base.
  • Answer many questions that we use Google for currently.

However, there’s one thing that Wolfram Alpha can’t do:  search.  In fact, the first of their FAQs is “Is Wolfram|Alpha a search engine?”  The answer is obviously no.  Can Wolfram Alpha help me find that one really good article or blog post I’ve forgotten how to locate?  No.  Can it help me find more information about vague programming concepts?  No.

So why do people keep asking the question Is Wolfram Alpha a Google-killer?  There’s one common thread that ties bloggers and journalists together:  the need for attention.  Calling Wolfram Alpha a Google-killer is grandstanding, plain and simple.  Maybe this kind of grandstanding was acceptable when the details on Wolfram Alpha were sketchy, but it’s not now.  Thus, I would be suspicious of anyone who actually takes the question seriously.  And yes, that does apply even if their answer is no.

Posted in Blogging | Tagged: , | 1 Comment »

Programming and safe theater or Things you have to do but shouldn’t if you can avoid doing them at any cost

Posted by Jason Baker on May 14, 2009

Joel’s (in?)famous article saying never to rewrite your software seems to be circulating again on Hacker News.  Essentially, what Joel is getting at is that software companies must never, ever under any circumstances rewrite their own software.  I must say that I agree with Joel for the most part.  While I agree with most of  what Joel says, I disagree with the ultimate conclusion that Joel draws from it.

I’m reminded of a scene from Hamlet 2 (which I’m paraphrasing from memory here):

Student:  I was thinking:  what if we had lowriders come across the stage in the final scene?

Mr. Marschz:  That sounds dangerous.

Student:  Ok, nevermind.

Mr. Marschz:  No, I’m not doing safe theater.  Let’s do it!

Joel’s stance is “safe theater” for programmers.  

Risk

Still not convinced?  Does safe theater sound just fine with you?  Willing to quote Joel’s stance on this unwaveringly?  Well, I counter your Spolsky with a Kay!

I believe that the only kind of science computing can be is like the science of bridge building. Somebody has to build the bridges and other people have to tear them down and make better theories, and you have to keep on building bridges. –Alan Kay, quoted from  A Conversation with Alan Kay

Granted, I don’t necessarily agree with Alan 100% either.  I think there’s a happy medium here.  When you get down to it, the risks in rewriting your software that Joel mentions are very real risks.  And several companies have found this out the hard way.

The problem is that you can’t innovate without taking risks, and a software rewrite is the ultimate risk.  Unfortunately for us, we as programmers are in the business of innovation.  Unwillingness to take risks is a sure route to becoming a “greybeard” who only codes COBOL on mainframes because it’s what they know.  I don’t know about you, but that’s not the kind of career that I had envisioned.

So you’re saying I should rewrite, correct?

Remember, when I said that I agree with Joel for the most part, I meant it.  A complete software rewrite is a crazy and maybe even a stupid move.  If you really want to innovate, sometimes you have to do things that are crazy and stupid.  But remember that the decision to completely rewrite a piece of software can be a job-ending or even business-ending move.  So if a blog post by me is enough to convince you to do the rewrite by itself, don’t do it.

Credit where credit is due

In fairness, I don’t think even Joel 100% believes software should never be rewritten.  In a book, he praises Microsoft for writing .net even though it violates his “never rewrite” rule.  Granted, this is because Microsoft is “a center of gravity,” which is something that we can say about very few companies.  

But even though this is a very small exception to Joel’s rule, it’s still enough to say that Joel doesn’t believe his blog post 100%.  But even if he still believes it 99.9%, there’s still 0.1% of leeway.  And never underestimate what a really good programmer can do with that 0.1%.

Posted in Programming | Tagged: , , , , | Leave a Comment »

HTTP and you

Posted by Jason Baker on May 10, 2009

I was kind of surprised by the number of people told me that they weren’t aware of the differences between HTTP POST and HTTP GET that my last post highlighted.  Not everyone who does web design and/or development has had a formal education on this kind of thing, so I’d like to focus a little bit more on the basics of HTTP.  A full summary of the HTTP protocol would take a couple hundred pages (or 175 to be exact).

In a lot of ways, doing web development and/or design without knowing how this stuff works is a bit like doing Calculus without knowing how addition and subtraction work.  True, you probably won’t ever need it.  But you would be surprised at how many questions can be answered by having a basic understanding of HTTP.

Anatomy of a URI

 

As this helpful diagram of the URI shows, there are 5 basic parts:

  • scheme – This is the protocol that we’re using to access whatever this UR – I represents.  For obvious reasons, we’re only interested in http schemes.
  • username/password – This isn’t really used much in the context of HTTP, but it should be pretty self explanatory.
  • hostname – This essentially tells us what computer we’re accessing.  This can be either an IP address (ex: 209.85.171.100 if you’re using IPv4) or a domain name (google.com).
  • port – This is the port number on the server we’re pulling data from.  In the context of HTTP this will usually be port 80, but occasionally it will be something different.  Also bear in mind that this may be different depending on the scheme (for example, FTP will be port 21 by default).
  • path – This represents where the website “lives” on the server.  It was largely designed for representing files and directories on a file system, but it’s worth mentioning that this part is ultimately little more than arbitrary text that may be interpreted by the server however it wishes.

Anatomy of an HTTP request

When you access my blog via HTTP, your browser sends an HTTP request that looks something like this:

GET / HTTP/1.1 CRLF
Host: jasonmbaker.wordpress.com CRLF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 CRLF
Connection: close CRLF

Your browser will receive a response that looks something like this (bonus:  there’s one header I left out.  Can you guess which one it is?  I hear there might be job offers if you can figure it out.):

HTTP/1.1 200 OK CRLF 
Server: nginx CRLF
Date: Sun, 10 May 2009 23:16:28 GMT CRLF
Content-Type: text/html; charset=UTF-8 CRLF
Transfer-Encoding: chunked CRLF
Connection: close CRLF
Vary: Cookie CRLF
X-Pingback: https://jasonmbaker.wordpress.com/xmlrpc.php CRLF
CRLF
<!DOCTYPE·html·PUBLIC·"-//W3C//DTD·XHTML·1.0·Transitional//EN"·"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...HTML goes here...

There are two important parts here:  the request/response line and the headers.  Just in case you’re wondering, the CRLF is a special kind of newline.  

The Request line

The request line will usually be in this general form:

<method> <path> HTTP/<version> CRLF

There are three parts to be concerned with :

  • method – the HTTP method we’re using.  A full discussion of all of these methods would be rather lengthy.  The vast majority of webpages are requested using HTTP GET or POST.  I have a full discussion of the differences between these two methods here.
  • path – this is the path to the page we’re requesting. Usually, this is only the path part of the URI and nothing more.  There’s a simple reason for this.  By the time your web browser has connected to my blog, the server presumably already knows that it’s at jasonmbaker.wordpress.com.  Since this isn’t always the case though, this is passed either in the Host header or sometimes in the path depending on circumstances.
  • version – the version of HTTP we’re using.  Usually this will be HTTP 1.0 or 1.1, but you will sometimes run into antiquated HTTP 0.9 clients and servers.

The Response line

The response line will look like this:

HTTP/<version> <response code> CRLF

Here’s how that breaks down:

  • version – The version of HTTP.  See above.
  • response code – This indicates whether the server successfully found the requested page, if there was an error, or if the client needs to be redirected.  If it found the page, it will return 200 OK.  Otherwise, it will return some other code like the infamous 404 Not Found or a 302 Found if there is a redirect to be done.

 

HTTP Headers

An HTTP header will usually be of this form:

<header name>:  <header value> CRLF

Headers are basically just “metadata” about the request.  They include information about the encoding of the data, the browser requesting the page, and the server returning the page.  HTTP was designed to be extensible, so you will frequently run into headers that aren’t specified in the original RFC.

Form Data

Sometimes webpages will require additional data to return a webpage.  There are two ways to do this:  in the query string and in the body of the request.

The query string

In the case of HTTP GET and a couple of other HTTP methods, this data will be passed through the query string.  This request will look something like this:

GET /?page=123 HTTP/1.1 CRLF
Host: jasonmbaker.wordpress.com CRLF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 CRLF
Connection: close CRLF

The body

HTTP POST requests and all responses will pass data through the body.  An HTTP POST request will look something like this:

POST / HTTP/1.1 CRLF
Host: jasonmbaker.wordpress.com CRLF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 CRLF
Connection: close CRLF
CRLF
page=123

Notice that there are two CRLFs between the HTTP headers and the body.

Gotchas

Here are some of the things that will cause problems if you deal with HTTP often enough:

  1. HTTP is selectively case sensitive.  Essentially, HTTP header names are not case sensitive.  This means that a server has to be prepared to treat CONTENT-ENCODING, content-encoding, and cOnTeNt-EnCoDiNg exactly the same.
  2. Slashes on the end DO matter.  For example, http://www.google.com/index.html and http://www.google.com/index.html/ are different URIs.  Unless you’re trying to be tricky, you usually want to make these point to the same thing.
  3. The www matters.  For example, http://www.google.com and http://google.com are not only different URIs, they might even point to different servers.  Usually, people expect these to be the same.
  4. Path handling is harder than it looks.  For example, what happens if I want to join “/2009/05” and “10” to make “/2009/05/10”?  I can’t just concatenate those two strings together because then I would get “2009/0510.”  Nor can I arbitrarily append slashes because then I could end up with something like “/2009/05//10” if I’m not careful.

Conclusion

So, you probably know more about HTTP than you ever wanted to know.  For what it’s worth, HTTP is a bit of an antiquated protocol with a lot of “historical” features.  But it does the job it was intended to do and it does it well.

If you find any inaccuracies, please post them in the comments.  But bear in mind that I intended this to be for a broad audience, so there might be a few points that I oversimplified for the sake of simplicity.  If you want to fill in the holes, there’s not really any other place to look than the HTTP specifications (RFC 1945 for HTTP 1.0 and  RFC 2616 for HTTP 1.1).  If you’re new to HTTP, I’d highly recommend looking at the HTTP 1.0 specification first as it’s about a third as complex as HTTP 1.1.

Posted in Networking, Programming | Tagged: , , | 3 Comments »