Friday, January 30, 2009

Mass of a multi-dimensional cube

Recently I've been thinking a lot about multi-dimensional arrays. Don't ask me why, I can't tell you.

It's quite common to comment on the sparsity in a multi-dimensional array. In data warehousing this can be be quite helpful or the bane of your data management strategies. Keeping track of indexes and pointers can also be a royal pain.

Given a few dimensions, and knowing the expected size of any of the individual key spaces lends itself to the easy calculation of the volume of the multi-dimensional space.
You have two dimensions. Friends and Cities. You have 100 Friends. You are tracking 100 Cities. The possible combination of these keys is 100 x 100 or 10,000 cells in the space. But what if you wanted to estimate how many of your friends had visited each of the cities. This would certainly be much smaller.

So, you might estimate that most of your friends have probably visited 10 of the cities. Thus, there are only 10 x 100 cells filled in. But then, many of them may have visited the same cities, so you want to think about how many of the cities have been visited by any friends. Maybe most of the cities have been visited by 5 of your friends on average. Certainly there is a high likelihood that some of the cities haven't been visited by any of your friends.

Now, imagine that you have 6 or more dimensions. You are actually trying to model a much more complex set of keys and their combinations where not all combinations are likely to happen. Thus, you have lots of sparsity. But what about where you do have combinations?

Well, what's been racing through my mind is a visualization of the clusters, the density of the combinations and, that leads me to the notion of calculating the "mass of the multi-dimensional cube". Ok, it's really a multi-dimensional array as a cube has three dimensions, but data warehousing seems to use the cube concept.

There you have it, nothing conclusive but I thought I would coin my phrase.

Thursday, January 22, 2009

Technorati indexes new White House Blog

My colleague Ian Kallen put up an excellent post about the change in Washington and, in particular, the new White House Blog. He was far too humble in stating his contribution to our new crawler architecture that is rolling out this month.

Excellent post, Ian, and incredible job on the new crawler. I know it has been a long time coming.

On a separate note, I just have to comment on the title of the White House Blog. All it says is:

Blog







Go figure.

Technorati Tags: , ,