Jason in a Nutshell

All about programming and whatever else comes to mind

Posts Tagged ‘sql’

The relational model: of tuples, relations, rows, and tables

Posted by Jason Baker on July 5, 2009

So, whilst looking at the answers to a question I had about the relational model on StackOverflow, I stumbled across this question.  Before I educated the unwashed masses with my enlightened answer (kidding of course; I’m hardly an expert on the relational model), the answers that were there were mostly correct.  In fact I’d say that they were about 90% correct.  But as programmers (should) know, being 10% wrong can sometimes be as (if not more) dangerous as being 100% wrong.  I’d like a chance to expand my answer a bit beyond tuples and rows.

I’ll summarize the answers I saw with one statement:  “A relation is a SQL table.  A tuple is a SQL row.”  Now, you may be shocked to find out that this isn’t totally correct.  But then that would mean you didn’t read the first paragraph, asshole.  At any rate, it is true that a relation is a table’s closest analog in the relational model and that a tuple is a row’s closest analog in the relational model.  But that’s a bit like saying that a dog is a wolf’s closest analog in the domestic household.  Though there may be some similarities, they’re still different animals.

Tuples vs Rows

(This section is largely the same as my SO answer)

Tuples are unordered sets of known values with names (and they’re not quite the same as tuples in different fields of mathematics). Thus, the following tuples are the same thing (I’m using an imaginary tuple syntax since a relational tuple is largely a theoretical construct):

(x=1, y=2, z=3)
(z=3, y=2, x=1)
(y=2, z=3, x=1)

…assuming of course that x, y, and z are all integers. Also note that there is no such thing as a “duplicate” tuple. Thus, not only are the above equal, they’re the same thing. Lastly, tuples can only contain known values (thus, no nulls).

A row is an ordered set of known or unknown values with names (although they may be omitted).  Now, you may not realize it, but any set of values in a set of parenthesis is a row.  In fact, single values are converted into single-valued rows without the parenthesis.  Thus, the following queries are equivalent:

SELECT x, y, z FROM point WHERE x = 1

SELECT x, y, z FROM point WHERE (x) = (1)

SELECT x, y, z FROM point WHERE ROW(x) = ROW(1)

Therefore, the following comparisons return false in SQL:

(1, 2, 3) = (3, 2, 1)
(3, 1, 2) = (2, 3, 1)

Note that there are ways to “fake it” though. For example, consider this INSERT statement:

INSERT INTO point VALUES (1, 2, 3)

This may be rewritten into either of the two following queries:

INSERT INTO point (x, y, z) VALUES (1, 2, 3)

INSERT INTO point (y, z, x) VALUES (2, 3, 1)

…but all we’re really doing is changing the ordering rather than removing it.

And also note that there may be unknown values as well. Thus, you may have rows with unknown values:

(1, 2, NULL) = (1, 2, NULL)

…but note that this comparison will always yield UNKNOWN. After all, how can you know whether two unknown values are equal?

And lastly, rows may be duplicated. In other words, (1, 2) and (1, 2) may compare to be equal, but that doesn’t necessarily mean that they’re the same thing.

So has the SQL part of this confused you yet?  If so, then you can probably begin to see how much simpler the relational model is.  I should note that I’m largely speaking in terms of the SQL standard, which very few (if any) SQL database is fully compliant with.  Therefore, your milage may vary depending on your vendor.

Relations

Before discussing relations in detail, there is one subject you should be familiar with.  And that is the dichotomy between relations and relation variables.

Relations and Relvars

Think about this query for a moment:

CREATE TABLE point (x INT, y INT, z INT)

What is point?  Your first guess may be “point is a table”.  And that guess would be correct in terms of the SQL standard.  But think about it this way.  In the following line of Python code, what is x?

x = 1

Is x 1?  Well, not really.  Rather, x is a variable that has a value of 1.  Our CREATE TABLE query above can be viewed in much the same manner.  Therefore, you can think of point as a variable that holds a table rather than a table.  So if point isn’t actually a table, how do I make one?  It’s actually rather simple.  In fact, you’ve probably been doing it all along without realizing it.  Ever wonder what the purpose of the VALUES keyword is in an INSERT statement?  I’ll give you a hint:  INSERT statements insert tables, not rows.  Thus, you can create a table like this:

VALUES (1, "foo"), (2, "bar"), (3, "baz")

Now, as I’ve pointed out, SQL doesn’t really make this distinction.  Both point and my table created using VALUES are both tables (I suppose you could think of the VALUES table as an “anonymous table”).  However, the relational model does make this distinction by way of relations and relation variables (relvars), and it’s very important to understand.  However, remembering to use the correct term can be difficult if you come from a SQL background.  I say this because I’m going to try to use the correct terminology, but will probably mess it up horribly.

Relations vs Tables

The difference between tables and relations is actually less complicated than you might think.  In fact, the difference can mostly be summarized by saying that tables have rows while relations have tuples.  This actually has some fairly important implications.

Most of the properties of tuples map into the properties of relations about how you would expect them to.  In other words, a relation is a set of tuples without duplicates or any concept of ordering.  A table is a set of rows that may have duplicates and does have a concept of ordering (in that columns will always follow a predetermined order).  However, there is another property of tuples that causes relations to behave differently from tables in a way you might not expect.

Remember how I said above that equal tuples represent the same thing?  Suppose I want to run the following SQL query:

UPDATE point SET x=2 WHERE x = 1

In SQL, this is a fairly straight-forward procedure.  If we had an imaginary database that worked in terms of tuples and relations, this query would be nonsense.  After all, if point is a relvar to a relation that contains the tuple (x=1, y=2, z=3), I can’t really change it to (x=2, y=2, z=3).  This is because the tuple (x=1, y=2, z=3) has always been the tuple (x=1, y=2, z=3) and always will be.  Thus, tuples and relations are immutable.  With that said, there is a way to do the same thing.  If we think about relations more in procedural terms, we can perform the above query like this (in pseudocode):

point = (point - (x=1, y=2, z=3)) + (x=2, y=2, z=3)

Similarly, a DELETE statement would look like this:

point = point - (x=1, y=2, z=3)

And an INSERT statement would look like this:

point = point + (x=1, y=2, z=3)

Therefore, you can’t change a relation.  You can change what relation a relvar holds, though.

Conclusion

This is a rather interesting topic to me that I’ve done some reading about.  However, I’m not a mathematician and I’m hardly an expert on the relational model.  Therefore think of this as being “The Relational Model for Dummies” in that it’s a good intro, but hardly teaches you everything you need to know about the relational model.  If this is a subject that interests you, I’d highly recommend learning more about it from the horse’s mouth.  I find CJ Date’s SQL and Relational Theory: How to Write Accurate SQL Code to be a very good book to read if you want to learn more on this.

Posted in Programming | Tagged: , , , | Leave a Comment »