4.26.2009

Smoke Demo

I built touchlib and got the smoke demo running on our prototype today, as well as a simple touch app that prints simple touch events to stdout (finger down, finger up).

The way touchlib seems to work is definately as a library. Touch applications import touchlib and happen to read the same configuration file as far as background substraction and calibration settings and so forth, but drive the camera and do image processing themselves (hence no concept of multitasking touch apps).

I personally don't like that design choice, so I am glad that we are writing our own instead of adapting touchlib. It seems like our gesture system will work that way though, which perhaps makes more sense since we have decided gestures are more application specific.

I had to apply this patch to make touchlib not segfault:
--- ./src/CMakeLists.txt.orig 2008-04-22 19:47:46.000000000 +0000
+++ ./src/CMakeLists.txt 2008-04-22 19:47:56.000000000 +0000
@@ -9,3 +9,9 @@
# SET_TARGET_PROPERTIES(configapp
# PROPERTIES LINK_FLAGS ${OPENCV_LINKDIR} LINK_FLAGS ${OPENCV_LDFLAGS})
ENDIF(OPENGL_FOUND AND GLUT_FOUND)
+
+INCLUDE(UsePkgConfig)
+PKGCONFIG(gdk-2.0 GDK2_INCLUDE_DIR GDK2_LINK_DIR GDK2_LINK_FLAGS GDK2_CFLAGS)
+IF (GDK2_INCLUDE_DIR)
+ SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${GDK2_CFLAGS}")
+ENDIF (GDK2_INCLUDE_DIR)
--- ./src/configapp.cpp.orig 2008-04-22 19:43:36.000000000 +0000
+++ ./src/configapp.cpp 2008-04-22 19:47:56.000000000 +0000
@@ -835,6 +835,7 @@
}

if( keypressed == 13 || keypressed == 10) // enter = calibrate position

{

+ screen->unlockHighGUIMutex();

startGLApp(argc, argv);

}



--- ./src/CTouchScreen.cpp.orig 2008-04-22 19:46:11.000000000 +0000
+++ ./src/CTouchScreen.cpp 2008-04-22 19:55:09.000000000 +0000
@@ -19,6 +19,27 @@


using namespace touchlib;



+#ifdef linux

+#include

+/* A mutex lock to prevent multiple threads to access the HighGUI

+ * functions in the same time. For OpenCV <= 1.0.0 in Linux platform ,

+ * the cvWaitKey and cvShowImage function use different set of mutex lock,

+ * but in fact both of the functions would access the GDK critical

+ * session. Before the bug was being resolved in OpenCV, use this

+ * mutex lock to fix the problem.

+ * */

+#define highgui_mutex_init() { if (!g_thread_supported ()) { \

+ g_thread_init (NULL);\

+ gdk_threads_init();} \

+ }

+#define highgui_mutex_lock() gdk_threads_enter()

+#define highgui_mutex_unlock() gdk_threads_leave()

+#else

+#define highgui_mutex_init()

+#define highgui_mutex_lock()

+#define highgui_mutex_unlock()

+#endif

+

#ifdef WIN32

HANDLE CTouchScreen::hThread = 0;

HANDLE CTouchScreen::eventListMutex = 0;

@@ -31,6 +52,7 @@


CTouchScreen::CTouchScreen()

{

+ highgui_mutex_init();

frame = 0;



#ifdef WIN32

@@ -230,7 +252,9 @@
if(filterChain.size() == 0)

return false;

//printf("Process chain\n");

+ highgui_mutex_lock();

filterChain[0]->process(NULL);

+ highgui_mutex_unlock();

IplImage *output = filterChain.back()->getOutput();



if(output != NULL) {

@@ -803,6 +827,14 @@
}

}



+void CTouchScreen::lockHighGUIMutex(){

+ highgui_mutex_lock();

+}

+

+void CTouchScreen::unlockHighGUIMutex(){

+ highgui_mutex_unlock();

+}

+

// Code graveyard:

/*

// Transforms a camera space coordinate into a screen space coord

--- ./include/CTouchScreen.h.orig 2008-04-22 19:43:36.000000000 +0000
+++ ./include/CTouchScreen.h 2008-04-22 19:47:56.000000000 +0000
@@ -122,6 +122,8 @@
// returns -1 if none found..

int findTriangleWithin(vector2df pt);



+ void lockHighGUIMutex();

+ void unlockHighGUIMutex();





private:

--- ./include/ITouchScreen.h.orig 2008-04-22 19:43:36.000000000 +0000
+++ ./include/ITouchScreen.h 2008-04-22 19:56:39.000000000 +0000
@@ -83,6 +83,8 @@


// start the processing and video capturing

virtual void beginTracking() = 0;

+ virtual void lockHighGUIMutex() = 0;

+ virtual void unlockHighGUIMutex() = 0;



/**

* Sets the blob tracker.


The patch fixes a race condition in the way that OpenCV's HighGUI library interacts with one of the native X toolkits, GTK. I ported the patch to the latest version of touchlib in their cvs, r400. It ran without segfaulting, but it would stall out and hang very often (and come back as well). I assume that something else in their code was broken as well, so I went back to revision 393 which the patch is made for and it worked like a charm.

Sadly, Smoke does not seem as cool in person as in videos. My touches still seem pretty bad, me being me and all. Hopefully it is only a matter of me tweaking the touchlib calibration settings, since syncing LED pulsing against all the webcams sounds pretty tricky.

4.19.2009

Software Meeting

It is 4am and I just came home from lab.

I'm ready to blog.

The software meeting today was semi-useful to me today. The agenda was essentially video & camera calibration, and threading/IPC.

I will expound on the video calibration/correction in another post tomorrow with detail, so I will defer it for now.

My opinion about camera calibration/correction is this: I think there are two parts to the problem. There is a need to have a backend implementation of the stitching which is essentially the application of a transform on the various camera [blobs] so they are all consistent. This I view as a hard deliverable: this is a core element of what we are building.

Now, what the particulars of the transform are, we won't really know ahead of time. We could possibly get close with pencil and paper, but Jas pointed out to me that it is pretty hard because you can't "see" it interactively so it would be tedious to determine empirically. The calibration frontend is what makes this easier. The frontend would inform the backend about calibration details by probably populating a configuration file.

So I view it as two programs connected by that config file. Two separate problems. You would probably want to write the backend first, since you can't really validate the frontend too well without it. In the meantime, you could get away with a manually populated config file until the frontend was complete. With this sense of dependencies, I kind of find the frontend to be optional. At the very least, I think too much work on it is deferrable until the backend is complete: I don't think the frontend design would inform the backend design very much. If anything, it would probably be the other way around (feel free to correct me Kevin!)

Another reason I suggest deferring frontend discussion is because we can get away with delivering a table without a functioning frontend if the backend is ok, but we can't do it the other way. So just in a sense of plotting out the dependencies and doing standard CYA procedures, it seems like a safer plan to me.

On threading:

In our midnight meeting, I gave an explanation of the threaded programming model vs the multiprocess programming model vs the synchronous (there is probably a better name for this) programming model. I'm going to explain them in a little more detail here.

The synchronous model is the simplest. It would be one gigantic program doing each piece of work one at a time. This would be the simplest to write, but the drawback is performance. It literally cannot do more than one thing at a time. While it is sifting through blobs for gestures, it cannot process webcam images or stitch anything together. If we had a fast computer, this might be ok, but I suspect the computer will not be quick enough to hide latencies here. So we want to avoid this model as the other two essentially give us the software pipelines.

The process model is the next one up. A process is a program that is in a state of running. I.e., a program is code on disk or in memory and a process is the concept of a running program to the operating system. The running state involves the instruction pointer, registers, kernel accounting data structures, program data, etc. So to have multiple processes, we'd pretty much have multiple programs (technically it could be one, via fork, which happens to run blob detection or gesture detection or whatever after we forked it, but that seems overly complex.)

Multiple processes, since they are standalone, would need to use Inter-Process Communication (IPC) of some form. The main ones we would consider are either named pipes/Unix sockets or shared memory. Pipes and sockets are pretty similare: offhand, I'm not sure I can say what the practical difference would be to use besides setup code.

Besides the code for setting them up, named pipes and Unix domain sockets essentially have the same semantics as files. You keep a handle on them via a file descriptor and you can use read() and write(). There are other functions, like send() or recv() which behave a little differently, but the key to me is the file-like interface. This makes testing pretty elegant, since you can just feed them sample files on disk while you wait for your counterpart to write his program which would feed you.

Also, network sockets are very similar as well: they pretty much differ by setup code. From my administration experience, I can tell you that it is very common for a lot of network programs to run either with a Unix domain socket if it is all local communication, or use a network socket if you would prefer to spread the application across the network. I would wager that the code inside does not differ much beyond the setup code and perhaps handling special cases and so on. I think we can all agree that being able to run some of our code over a network between multiple tables or multiple table elements could be very cool in the future.

One drawback is that I'm pretty sure we cannot shove pointers through pipes or sockets. They should not be very meaningful on the receiving end since that process would have a different virtual address space. This same drawback would apply to shared memory (I'm not totally sure, but it seems more performant.)

Shared memory is when multiple programs can request that the operating system assign them the same chunk of physical memory. This way, they can read and write values from the same place and avoid lengthy copies. I'm not going to research it right now, but I would guess that when the operating system gives people handles into a shared memory region it does not map it to the same location in those particular programs' virtual address spaces. So pointers would still be broken unless they were relative. I suppose I am kind of unclear on it though. It looks like we will not use this option so it may not be important.

As opposed to the process model, threads run within a single process. A thread is pretty much a separate context and stack within the same process. This way, there can be multiple states of execution happening all at once with direct access to a processes resources: file descriptors, memory, etc. The operating system may or may not schedule multiple threads across CPUs; that is a CS111 discussion however: modern Linux has done that for years.

The thread api is pretty cool. It is called POSIX Threads or pthreads. The essence of it is that you do a tiny bit of setup code, and if you feel like spawning a thread you invoke it as a function. I.e., you can imagine it as making a function call that spawns a thread for that function and then returns immediately so you can keep going along on your way.

The two big benefits of threads is that spawning them is far less expensive than creating a new process (although if our threads or processes are long lived, we would amortize this over time) and they have access to any program data with the same privileges that a function in your program might. I.e., global variables, arguments, pointers resolving correctly, etc.

The downside is that debugging may be weird. These things are happening asynchronously so you cannot guarantee ordering of how things occur. You synchronize with mutexes or semaphores, which isn't too bad -- unless someone forgets to set or unset one, and then it is real tough to track down.

A real common programming model here is a producer-consumer model. You have some amount of producer or consumer threads filling up or depleting some buffers, and once they have done their work, a semaphore or mutex conveniently makes the one using the buffer block while the other thread uses it up and then it flip-flops the other way again.

Without thinking too hard, an example off the top of my head might be this: there are 6 buffers, one for each camera. Each buffer has an associated mutex. The blob detection code will want to fill the buffers and the blob stitching code will want to deplete them. Both of these modules will be written so that they do not touch the buffers unless they grab the mutex. Trying to grab it while the other guy has it will cause you to block. So the cameras start off owning the mutex. They detect blobs and fill the buffers. Some cameras might be faster to detect than others because they have more blobs to go through. In the meantime, right after they started, the stitching module began trying to acquire mutexes on all those buffers. He is going to block now until all the buffers have been filled and the detection threads released their mutexes. Then they block while trying to acquire the mutex, and he goes to work. When he is done, he releases his mutex and they go to work filling buffers.

* This is actually a really simple overview. After I wrote it, I realized there were several timing problems. Ask me about it later.

Finally, to end this post, I want to briefly discuss the final architecture which Jose suggested. It was pretty insightful, I am not sure if I would think of it: he suggested using both techniques. I think this is a good idea too. Jose suggested that the blob detection and blob stitching be threaded in the same program, and for gesture detection to run as a separate process and communicate via sockets.

I think this is a good idea for two reasons. 1, we get the performance edge of threads. 2, the gesture detection has a good modular divide from everyone else. The two components are now decoupled pretty well, so they could stand better alone also. Where to draw the line as far as moving things over the network is a little arbitrary, but consider this: our high level software block diagram has TUIO coming out of the blob stitching. This is a natural fit then: just have the blob stitching serve up TUIO (probably non-trivial!) and then ask the gesture daemon to connect. This makes it easier for existing apps to run on our system (since we have more of an excuse to implement some kind of TUIO server) and conversely for our gesture daemon to run on other people's systems.

tl;dr:

I think I might want to pull some CS115 moves and try to have another software meeting to work out a precise object model. I can bring my book from that class and some of my work from last quarter to show people what I'm talking about. I need to study up on it some first, though.

4.18.2009

Mac Mini 2

I got the Mini up and running mostly well enough. The primary lacking feature at the moment is wifi support; the Mini uses a Broadcom chip for wifi which is notoriously painful to use in Linux. It requires weird reverse-engineered drivers and things of that nature. I can research it and set that up later, though.

I repartitioned the hard drive and then reinstalled OSX. Once I had OSX installed, I spent a little time scrutinizing "Boot Camp". It seems like really what happens is that Leopard maybe has the BIOS emulation stub for EFI out of the box (or maybe it was left over from the prior install), so without doing anything, you can tell it to boot a normal PC operating system. The Boot Camp tool in the Utilities folder is just a hand-holding mechanism for repartitioning and rebooting your Mac.

Because of this, I resized the OSX partition in their normal partition tool instead. I made a 50GB "Msdos" partition. Then I installed rEFIt. REFIt seems to replace the Apple-provided bootloader with something nicer (I don't really know what happens in EFI land here, it looks more like Apple's loader will load rEFIt if I let it). REFIt gives me tools for synchronizing between the GPT partition map and MBR partition map, as well as choosing between Linux and OSX.

Once I was satisfied that rEFIt was installed correctly, I proceeded to install Ubuntu. I went with 8.04 amd64, because that is what we will probably be running on the final table. This means alot of programs our somewhat out of date, but it shouldn't really matter (knock on wood) too much for our code we will write later.

I had to build a 2.6.29 kernel on the Mini, which surprisingly took nearly an hour with 2 threads. I suspect this is because the Mini uses a 2.5" hard drive, so it was probably blocking on IO for a good chunk of the compilation (I should have noted user time vs system time).

There was some trouble getting the Nvidia driver installed. I wanted to go with an older driver because there are some goofy rendering problems in Compiz with PowerMizer in the newer ones. The problem can be resolved by turning off PowerMizer, which never seems to obey, or avoiding Compiz. Or running an old driver. It turns out that even slightly older drivers won't compile against newer kernels. This has always been a problem, but I burned a few hours down on it because the error message wasn't quite indicating this.

Instead, it told me it could not find the kernel sources that I was directly pointing it to. I eventually tried the most recent driver on a lark, and it worked fine. Later I read that the structure of the tree has changed over time and that is what caused the confusing message.

The Mini now boots into Linux by default does an auto-login for the scimp user. I left a shortcut on the desktop that runs a script with simple mplayer arguments for showing the webcam. About everything should be in order for running our code, aside from installing whatever random libraries we need along the way.

Postscript: along the way, while messing around with the Mini, I noticed it would throw an odd resolution to the projector: 824x632.

4.14.2009

Display Correction

I have just about exhausted my options on this. These are the different methods I have thought of or tried:
  • Custom resolutions in video driver

  • Custom screen offsets

  • Dialing settings into projectors

  • Seeing if projectors have any useful DDC controls (they don't have any controls)

  • Running a custom Compiz
    • Running a custom Xgl

  • Trying RANDR 1.3, which just came out and has options for custom screen transforms

  • Running Chromium

  • Running DMX amongst multiple instances of Xnest

(There may be more that have come across my mind which I do not recall at the moment)

Some explanation:
I view a final solution as falling into 3 tiers: individual screen transformation, individual screen trimming, and doing nothing.

Screen transformation would require some combination of Compiz, XGL, DMX, and Xnest. Or RANDR. There are wrinkles to all of these, however. I want to give a description of the current X architecture and then discuss the theory. The architecture has mostly been in my head since we started, and the possible solutions piled up as time went on.

The X Window System (also called X or X11) is a network system. A server controls the video output, the keyboard, and the mouse. Any program that runs, like a terminal, is a client. The server listens over TCP/IP or over a Unix socket. Clients connect and tell the server what to draw and the server sends back user input. This is commonly referred to as network transparency: the EXECUTION of gui applications and the INTERFACE with them can occur on different machines. This may sound convoluted, but it can be useful at times.

A simple example is that VNC-like capabilities have been there since day 1. A more complex example might be Chromium: because the system is architected with the thought that apps may have a disjointed operation, Chromium is able to grab and reroute applications.

Apps speak a (relatively) simple protocol with the server, and fancier things like dialog boxes are implemented with higher level libraries which aren't strictly standard. Communication at this level is probably analogous to Quartz on OSX or GDI+ on Windows. One of the driving philosophies here also is that the system provides mechanism and not policy, which is why the GUI can be weird sometimes.

Part of the design of this weird system, which might have a lack of features, is that it is extendable. If you look at /var/log/Xorg.0.log on your machine, you can see a lot of the extensions that have been added to the most common X server on Linux, X.org. For what I have been working on, the extensions I care about are Xinerama, Composite, and RANDR.

This is the description of Composite from the X.org wiki: "This extension causes a entire sub-tree of the window hierarchy to be rendered to an off-screen buffer. Applications can then take the contents of that buffer and do whatever they like. The off-screen buffer can be automatically merged into the parent window or merged by external programs, called compositing managers. Compositing managers enable lots of fun effects.".

To digress a little: a window manager is a privileged X application that is charged with managing windows. A given X program can only draw what is inside their window; everything outside their window, like the border or desktop, is implemented by the window manager. Consequently, window movement or minimization is handled by the window manager (it manages windows, after all!)

Window managers like Compiz rely on the Composite extension to perform distortions on the windows themselves, since the drawn output sits in an off-screen buffer rather than having been painted directly onto the screen. I don't know exactly how it is structured inside, but it pretty much uses Opengl to drive the entire screen and presumably imports drawn windows as textures onto 3D objects.

This provides possible technique for screen transformation: since Compiz is using Opengl to draw everything, it should not be *too* complicated to modify the object(s) representing the entire screen to be slightly distorted, as needed.

Problem 1: Nvidia DISABLES the Composite extension in a multi-gpu setup like we have, so Compiz will NOT run. There are a couple possible solutions to this, however

1) Use XGL
When you configure a multi-gpu setup, you typically use the Xinerama extension. It is apparently straightforward to do what amounts to running an individual X server on each gpu. This doesn't achieve the desired effect, however, as applications are stuck on whichever server they connected to (actually, this description is pretty hand-wavy and this isn't what really happens, but this is how it looks from a user perspective.)

When you use Xinerama, the different gpus are united and this extension figures out how to split the different drawing commands and so on between the different heads. It even splits Opengl between the heads, with heavy assistance by Nvidia's driver I'm sure. Key point here: random hardcore fullscreen Opengl works fine in a Xinerama arrangement.

Enter Xgl: Xgl is a pure software X server which renders into an Opengl window rather than directly onto the video card. As such, you can turn on Xinerama so you get nice multi-gpu Opengl and then run Xgl which will place its own window so it perfectly covers all the screen area Xinerama tells it about.

Problem solved, right? I've gotten this running, in lab, and it looks nice. You can imagine a performance hit, and there is, but it isn't too bad unless you do serious 3D work, which the table technically won't do. This method possibly shifts the coding work into either Xgl or Compiz: one of them would need to be informed about the layout of the various heads and be told to draw onto N objects with various distortions rather than the one. This is possible doable.

Problem 2: the code for Xgl is huge and intimidating. It may be a huge time sink understanding it and modifying it.

Problem 3: Xgl is deprecated! It isn't in any version of Ubuntu past 8.04, but maybe if we are lucky it can be coerced into running on 9.04. This of course only enables me to dump time into rewriting part of it as well; see Problem 1.

Possible Solution: Either stick with an older distro (no MPX support then) or try to get Xgl running on a newer one.

2) Use DMX
When experimenting with Chromium, which isn't a good fit for our project as it exists now, I found out about DMX. DMX stands for Distributed Multihead X. It is what Chromium relies on for a display wall. The concept of DMX is simple: again, it is a software-only X server like Xgl, but instead of drawing into its own window, it draws onto OTHER X servers. So a possible method is to ditch Xinerama, run 1 X server per gpu, and run DMX on top of all of them. Then run Compiz inside of DMX.

Problem 4: DMX segfaults on everything I've tried it with.

Possible Solution: run older versions of DMX, which apparently work (but I have not tried, sounds shady to me and apparently has rendering glitches.)

3) Use Xnest
Xnest is another pure-software X server. It simply draws into a normal window on your desktop. It could be possible to one Xnest server for each projector, and then have DMX draw into them.

Problem 5: This would perform horribly (no hardware Opengl happens here and we kind of want to use an Opengl window manager) and DMX crashes all the time anyway.

Possible Solution: I could fix DMX (either by fixing the code or running an old version of DMX) and then if Compiz runs like a pig, I could try to modify Xnest for display transformations.

This post has almost exhausted the pure software solutions. There are two more: RANDR and writing my own weird custom display layer.

RANDR is an extension for controlling the resizing and rotation of screens. It is what we use when we set up the multihead on our lab computers. In the newest version of X, which came out a few weeks ago and comes with Ubuntu 9.04, RANDR has been updated to allow arbitrary transformations. I was excited when I found out about this, and I rushed to try it in lab. Sadness: the xrandr tool simply segfaults. I think perhaps it is buggy and/or needs driver support from Nvidia. If the latter, I doubt we can rely on Nvidia to give us something usable anytime soon. For one, just because it is really new. Also, Nvidia disables RANDR when doing multi-gpu setups as it is, so I am not optimistic it would even be usable in our situation anyway.

The other solution is writing my own custom display layer. This sounds very hefty, and to be at all useful, it would need to be on the order of Xnest or Xgl anyway. It is unclear whether I am good enough or can justify the time.

One glimmer of hope is that when I first brought up the projector wall, I had a minor misconfiguration from when I was tiling 4 of our widescreens before. It caused one of the projectors to have a horizontally trimmed picture. I have not been able to recreate this effect in lab, sadly. If I did figure it out, it would be a matter of adjusting Metamodes or Xinerama layout specifed in Xorg.conf.



Summary:
Doing the correction in software is tough. To do full distortion looks like it would require me to pretty much dig in and write arguably OS-level code. I don't feel comfortable facing that in terms of my programming skill or the time for this project. Trimming *may* be achievable, simply because I may have accidentally done it last week.

I want to resolve the possibility of trimming and work on the possibility of physically aligning the projectors "well enough". I've experimented with it a little bit, and it doesn't seem *too* bad. Not having perfectly adjustable surfaces for mounting the projectors is the biggest impediment in my tests. It may be all we can get away with, though.

4.13.2009

Mac Mini

The Mac Mini is kind of a nice deal because it is simple to buy and small for our table (but why didn't we just pick one up from the campus bookstore?)

Getting Linux up and running on it has been a minor hassle however. Apple uses EFI instead of a traditional PC bios. This is why Boot Camp is used to run other operating systems.

EFI is a far more sophisticated pre-boot environment than the old PC bios. One of the features it has is the ability to run programs. Apparently, much of what Boot Camp does is install a program that provides legacy bios compatibility.

Linux actually is able to talk to EFI natively (unlike most versions of Windows), but because boot camp is so simple, most people just go with a normal Linux install. EFI, however, brings a couple problems in the sense that I would not have to deal with them at all were this a normal PC.

Problem one: the dual-boot interactions between Linux and OSX are unclear. Grub doesn't know anything about OSX of course, and apparently other people install an EFI program named rEFIt to provide boot selection between OSX and Linux. We did not do this though before installing Linux. I do not know how to boot back into OSX at this time.

Problem two: EFI specifies the use of GPT partition tables rather than the old MBR style. GPT, like EFI, is far more modern than the old MBR layout. One of the things GPT provides is compatibility to programs that expect the old MBR style. When I installed Linux, the installer relied on this to create Linux partitions. The problem is that apparently this legacy shim needs to be synchronized with the REAL GPT layout, so my partitioning changes are not valid to OSX at the moment.

The solution to that is to use OSX's Disk Utility to make my Linux partitions, or to tell rEFIt to synchronize GPT and MBR. At the moment though, we cannot seem to boot OSX at all.

A final wrinkle in the Mini so far is that Nvidia's Linux drivers do not seem to want to drive the output at more than 640x480. I have not figured out why this is being problematic. I am sure this is an innocent problem that has nothing to do with the computer being a Mac.

4.06.2009

Chromium-inspired blob detection

I thought I would write as a separate post, it felt awkward to work into the one on Chromium.

On Saturday I was talking to Eddie about any implications Chromium would have with the image stitching. I didn't think it would have any direct effect on how to go about the stitching, but I had another idea.

It is actually not a given that stitching should come before blob detection.

You could either
1) Combine N camera images into one big image and detect blobs in that. In the final image, overlapping areas in the cameras have been discarded.
2) Detect blobs in each camera feed, then combine all of the blob data. Redundant blobs can be discarded.

Both ways sound relatively easy to me. I think which one to use is just a performance issue. In both cases, I'm not imagining "stitching" that is any more complex than dialing in areas to trim which we have experimentally found to be overlapping or uninteresting. A manual calibration, if you will.

Chromium

I've spent a good amount of my weekend reading about and playing with Chromium. I've still not totally "evaluated" it, but I think I've been looking at it enough to write something.

For one, all of the pictures of display walls in their documentation is very encouraging. They also have pictures of Quake 3 running. Essentially, what Chromium lets you do is modify the Opengl command stream by directing through different servers. The servers may be running one the same machine or across the network. The servers can do all kinds of things like substitute the opengl commands, log them, or pass them on to a GPU's driver. Zero or more network hops may be along the way. The use the word parallel a lot also, which probably gets people's blood pumping. Parallel in this context I think is to say they can be intelligent about how to divide the opengl command stream so it could execute in parallel across multiple machines.

There are some gotchas though. Specifically, for our project, is it worth our time? I think for an *ideal* version of our table, something like Chromium could be very cool if we made little table modules that could be connected for an arbitrarily large table/wall.

In the real world though, it is unclear that it would be a home run for our project. For one, we need a PC for each head. We also need an application PC that programs will actually run on. This is how they intend you to use Chromium, and also how we want to things to run. It would be very baroque if we broke with what everyone else is doing and required table apps to actually be a collection of smaller processes that would coordinate their display output etc.

Second, in trying to run a network of PCs in our table, we'd need to install a network too. This would add a bit to our cost (we probably want a gigabit switch), but more importantly it would add a latency.

I've heard the idea that maybe (and this sounds like it could be a rumor) it would be a good idea to run the cameras on the nodes attached to projectors. While it sounds elegant, it also is unnecessarily complicated. Because the table applications would (presumably) run on the app node, the individual camera information would need to be accumulated there anyways.

The worst part, though, is that Chromium may be too crusty to be reliable. The last release was in August 2006. I got their simplest demo setup compiled and running in a virtual machine though. All it does is redirect the rendering of an Opengl app through the Chromium pipeline into another window on the same desktop. I have not yet tried to make it go across the network to a 2nd vm (this specifically is hardly novel: remote X accomplishes it also.)

For the display wall setup, Chromium relies on something named Distributed Multihead X (DMX). DMX is very cool - I'm almost embarrassed I had NEVER heard of it before. It is a pretty elegant program: all it does is become another X server which proxies draw commands to other X servers over the network. A very simple idea.. but that is actually all you need to do a display wall also. If we just directly used DMX, I'm not sure where we'd need Chromium except that perhaps it may be faster than Opengl -- but I don't really know since I haven't gotten either running too well then.

DMX segfaults in every configuration I've tried to run it in. This may be a symptom of also being somewhat old (newest article I found about it was from 2006 also.) I could bang on it some more tomorrow, *MAYBE* try to roll back to an older distro in a vm. I do not feel too hopeful though.

The biggest drawback to these solutions, for me, is that I already have 4 head output running fine at my desk. I'm pretty confident that I could drop another card in there to get our 6 heads. If I try to go for Compiz on it, then my config gets a bit more archaic and actual Opengl performance (say, trying to play Q3 or WoW inside) will drop off, but it still works. And it is all one computer. And stuff going over pci express is probably faster than over the network. And it appears to be stable as well -- I was nervous about Nvidia drivers being flaky.

If I *DON'T* try to run Compiz, it is more reliable and performs a bit better. I don't really need Compiz unless I try to go for the display correction, which the solution above actually complicates a little.

As an aside, and I've mentioned this to Jas already, I'm pretty much at the point where going much further into the display work would involve me trying to understand a MASSIVE codebase that could absorb a lot of my time and which could simply be out of my league.

So past checking out Chromium/DMX and having this multihead box working, I would like to move on and just be sure we do a really good job mounting up our projectors in the final table.

In case anyone is interested though, these are the options I have for the display correction:
  • Customizing XGL (hardest)

  • Customizing Compiz (hard)

  • Asking Nvidia's driver to give me custom resolutions (not too bad)