Last Post

I felt like I should write one more final post here for the off chance someone reads through all of this.

In the prior post, I showed the working display correction using homographies. We all had stayed up all night because we had our checkoff with the professors at noon. I made the homographies work in a satisfactory way around 10am that morning. The checkoff went OK for me at least -- the professors were somewhat dissatisfied with a couple other parts of the project.

We thought we would polish everything up in the week between checkoffs and our final presentations. Instead, all momentum was lost. That was the first weekend I had taken off for the entire quarter. Prior to that, I came in almost every day and probably averaged 50 hours a week working in lab (and many more sitting there doing homework or reading or something else for myself). Leaving our lab and graduating on July 14th felt very strange to me. During the summer, I made a few return trips to gather my things from the lab that was not ours anymore.

I essentially lived in there during my final quarter, and while it was a satisfying climax for my undergraduate career, it was a grueling experience. In retrospect, it was satisfying overall, I suppose. I wish our project had turned out more complete than it did. I regret that the display correction code occupied most of my time; there were several other parts of the project that I wanted to attack. I did not get around to implementing any of the "to do" ideas from my previous post. I resurrected my older mouse driver toward the end, and almost had it completely working.

At the beginning of summer, I gave Tom (a grad student who inherited our work) a debriefing of everything. The first things he wanted to complete were the parts I had attacked or started: improve the display correction, implement a mouse driver, improve the networking.

After that debriefing, I ended up almost blocking out the project. I almost cringe when I think about large touchscreens now, only because I dumped so much time into that project. At the Penny Arcade Expo this summer, I saw a Microsoft Surface. Amusingly, it looked very familiar: people had pulled up a diagnostic program that showed the raw camera feeds inside the table.

Working on Scimp, and Compiz in particular, made me feel empowered to work on open source projects in my free time. During the summer though, I did not act on that impulse very much. Maybe I was too drained from the quarter, maybe my Florida vacation in July broke me, or maybe I am just fundamentally too lazy (I have been meaning to write this post for a while now..)

I spent most of my summer lounging around in Santa Cruz and trying to find a job. At the end of summer, shortly before I had to move out of my current place, I found a job as a temp worker at Google. On my resume, I had mentioned that we won first place in the senior design contest. The interview for this particular job (I got turned down late in the process interviewing for another Google position) was the first time the project came up in any detail. They seemed impressed.

I'm still in Santa Cruz now, commuting to Mountain View to work at Google for the next 9 months. After that, who knows. I would like to make a try at graduate school I think, but I am not sure how things will play out. I am still trying to get used to a normal 9-5 schedule which also requires me to get up early. I think I want to maintain the university philosophy and lifestyle in myself. I am fortunate to be working at Google, even if it is temporary: they maintain a very university-like environment, the people around me are brilliant, and I get exposure to working in a leading technology company.


Image Correction

Jas's latest homography is pretty good. It takes us from this:

To this:

I still need to implement mouse remapping support and make it read the homography out of a file. Currently, I have to recompile it to change the homography (I did only get it working 2 hours before checkoff). It is possible for me to pull Jas's entire OpenCV homography generation code into the plugin's startup code as well. These are all ideas for the coming week.



I've spent all of my lab time since discovering the bug mention in my last post trying to resolve it. I've tried all of the interesting Nvidia xorg.conf settings, all of the interesting settings from their config tool, several Xgl settings, and turning on Twinview. I haven't tried *permutations* of all these yet (ich), but I don't think it would help since I have observed no difference in Xgl's behavior. I am good at reproducing the bug, however.

Of course spending the last 30 hours troubleshooting it is time away from the plugin. I *need* to be able to run Compiz across all the video outputs for display correction to work as I've written it so far. Anything else would be a bit of an architecture shift.

Therefore, I'm suggesting we perhaps return the 9800 GTs I picked up and order a couple of these to run with one of Renau's cards.

It is a box that will split a single vga or dvi input into 3 dvi outputs. It does whatever hardware magic needs to happen to simple present one high-resolution screen to the PC. With two of them, we could drive 6 projectors from one videocard using one framebuffer which avoids the problem I'm having now.

The downside is the price: $600. On the other hand, we are doing OK on our budget right? And I did drop $120 x 3 on videocards this weekend so it isn't too extreme. Also, the whole setup tops out at 6 projectors I think.. unless we can daisy chain them..

The upside is that display correction will work and it liberates our software stack a lot. Using one card, there is no longer a dependency on Xinerama and therefore on Xgl and therefore on an older X server. Not using Xgl yields better performances, as does not using Xinerama -- graphics data does not have to be copied between cards. Not having to run an older X server lets us run with newer distros, and even better, with MPX.

Obviously it is a bit late now to realize all of those improvements. Just getting display correction working will be enough for now.

I remember talking about the triplehead2go boxes near the beginning of the quarter, and we did balk at them being pricy. I think more of what happened though is that I became focused on a certain approach because of the cards Renau gave us initially, and I did not realize when I stopped using the cards that the design could change in this way. There is probably some Ideo design pattern I should have been remembering in order to avoid that trap.



I spent the entire night trying to debug a showstopper bug I came across.

I put all those videocards into our computer, and noticed that windows would smear or draw incorrectly when moved across GPUs. I don't remember noticing it before when I had set up the GX2s back in the corner, but I can't recall having observed it being *correct* in these cases either. Although using it more while debugging, it is super annoying so I think I would have noticed.

I posted about it on Nvidia's forums, and linked them a video (had to host on my own because Google doesn't want me to put files over 11mb in the locker). I hope someone has a solution. If not, then the display side is kind of screwed.

It seems possible that it could be a bug in Xgl, but again, I am surprised I did not notice sooner. When I run a window manager on the display Xgl draws into, I can kind of manipulate the glitch by causing Xgl to redraw sometimes myself. That would kind of imply it is a problem with Xgl and/or (maybe) Damage, but the fact that it primarily occurs across gpus points back to Nvidia.

It figures something like this would take me down a notch right after I get a proof of concept going for the display correction.



I picked up three of these 9800 GTs at Frys on my way back from Maker Faire for $120 (after tax) each. Notice the connector on the right? It is component output, but I wonder if that can be converted back to composite? If so, we could run three projectors from 1 card..

Also, they come with molex to 6-pin power converters, so now *ONLY* the power supply wattage out to be an issue.

I was choosing between these and some 1024MB Geforce 9500 GTs which were about $75 before taxes. The 9800 has way more processing power, but I wasn't sure if the video memory would be a performance issue with a 3D desktop.

There used to be issues with video memory on Nvidia cards and a compositing window manager like Compiz, because each window needed to be backed by video memory. I just checked though, and it looks like it may have been fixed in this driver release, which is a bit older than even the old driver I'm installing on the final table.

Moment of truth

Compiz Plugin

My plugin may be doing the distortion now. I left last night when it seemed to be drawing a proper unmodified screen.

I need to reread my graphics book a little bit about display transforms and make up a simple homography to try. The homographies the plugin uses are just Opengl transformation matrices, nothing particularly special. I'm crossing my fingers and hoping most everything is in order now.


Multithreading blobd's networking

I partly multithreaded blobd's netcode tonight, which will give proper behavior for multiple clients. It sort of seems to run ok on my computer, in that it seems to kind of print interesting things on stdout when I wave my hand in front of the camera, but I have no idea what appropriate output should be. This is what it looks like when I wave my hand in front:
0x7fff5cdb9fb0BlobList size is: 33
0x7fff5cdb9fb0BlobList size is: 2
0x7fff5cdb9fb0BlobList size is: 1
0x7fff5cdb9fb0BlobList size is: 10
0x7fff5cdb9fb0BlobList size is: 0
BlobList size is: 1
0x7fff5cdb9fb0BlobList size is: 0
BlobList size is: 27
0x7fff5cdb9fb0BlobList size is: 0
BlobList size is: 3
0x7fff5cdb9fb0BlobList size is: 20
0x7fff5cdb9fb0BlobList size is: 2
0x7fff5cdb9fb0BlobList size is: 14
0x7fff5cdb9fb0BlobList size is: 0

I doubt that memory address should be there -- but does it just list blob counts now, not coordinates? If so, then perhaps mine is behaving correctly. I haven't committed it because I don't want to mess people up, and also because it is a huge hack.

Two big problems are that when a client goes away, there is no thread managing the camera video anymore -- it just hangs waiting for clients. The other big problem is that if two people connect, they are both grabbing frames which is probably not what we want.

I need to rework it I suppose so there is one thread handling the video (and Kevin will expand that lately), and N threads feeding the data out on the network.

I think I would like to rework the network API also. So if/when I get into this, it'll be a good segue into that task as well.



I got trimming to work on my laptop with Compiz:

Need to see if Xgl won't ruin it though..

.. and after some testing, it doesn't. There are some other caveats though. The regions I am blowing away still represent logical pixels, so if I remove them from the middle of the screen, I leave a giant hole.

On the other hand, I figured out how to draw DUPLICATE screen areas. So we could maybe draw duplicated images in the overlapped areas, which would look coherent (assuming rotation is good..) albeit being brighter as well.

And actually, this is WITHOUT a plugin. I found it after talking to another Compiz developer.


Compiz plugins

I've been spending all day digging further into the Compiz code. Cscope in Vim has been pretty helpful as far as enabling me to jump around and look at things quickly. I don't think it has fundamentally improved my process, only made me faster. I've basically been looking through the code that does the cube rotation to try and see where it sets up the cube geometry and texture maps the composited desktop.

I feel like I'm getting close, but there is a weird lack of OpenGL in spots where I would have expected it to be. Obviously, I need to read more..

Tonight I was talking on irc to the guy who wrote MPX. I've been talking to him periodically since maybe week 7 or 8 last quarter. He suggested I re-email the Austrians who did Compiz work similar to mine and drop his name. I also noticed that I had emailed the professor who probably just deletes all his email every morning anyways.

So I emailed the students who worked on the project instead, this time. I hope they come back with something interesting. I'm a little nervous because the whole project depends on me now since we cannot resolve the projector overlaps mechanically.


More Netcode

It is a good thing I threaded the netcode earlier. We probably need to use it soon.

I debugged a problem in blobd today. The apparent problem was that the server would crash when two clients connected, the server would crash upon one of them disconnecting. First, I will discuss networking and then I will address the nature of the crash.

On networking:
I was surprised that two clients could connect at all. If this were a TCP connection, that would not work. I suppose that in this case, the semantics of Unix sockets allowed two clients to share the same socket since they both did not have named client sockets. I am guessing it is even more accidental since the clients do not talk to the server at all right now.

The reason I did not expect it to accept multiple people at first is because for a connection, there are 2 sockets: the server socket and the client socket. The server binds an address to a socket and then waits for people to connect on his socket. When someone connects, the server gets a second socket file descriptor which is associated with the client: if he reads or writes to it, the operating system makes that interact with the client who connected.

So the model is that the server waits, a client connects, and the server uses that newly forged socket to talk to the other program. Normally, the wrinkle here is that at this point the server is no longer waiting -- so a new connection would be denied (or queued by the OS for the next time the server gets around to waiting.)

It appears that when using the Unix sockets, and maybe this is because the way it is written the clients all have the same "address" right now, if the second guy tries to connect after the server has taken the first guy, the OS just hooks it up anyway. The socket file is only an address in the name space of the filesystem -- really just an inode. It identifies a communications channel between programs which the operating system maintains, and it isn't completely unintuitive that this behavior may happen. UDP behaves in a similar way. I think both of these go away if the server chooses to inspect his peers's addresses. Anyway, that behavior is a pleasant surprise, I guess -- although I don't want to keep that behavior.

The better way to handle it, which my test code does, is spawn a new thread for servicing a client when someone connects. The main thread then returns to waiting for new clients. It isn't too bad in the way of restructuring code, and has two big payoffs: you can spread processing load better and you can more easily send different data to different clients. We will want to be able to send different data if we ever implement a window manager plugin which can inform the blob server about the window geometries of clients and thus which blobs they ought to receive.

The downside is that we have to have more consideration of data manipulation. There are now N threads reading the blobLists to send over the network and 1 thread writing to that list. Mutexes or semaphores would need to be implemented to synchronize those accesses so sane results come out. I stopped short of implementing semaphores in my test code because I would also need to implement some dynamic data generation too (it only sends the same blobList over and over).

On the crash:
With some pointers from Zach, I figured out the problem of the crashing. My test server happens not to crash, as Zach and Davide noticed, because only the service threads terminate and not the main program. Also because maybe they used my test client, which is friendlier and would not quite tickle this issue.

What happens is that a client goes away. The server doesn't know when a client is going to go away. So when a client goes away, the server is almost always in the middle of sending blob data over the socket. The server is probably inside, or will enter, blobList::send() which will write() to the socket. Because the client quit, his socket was implicitly closed (I doubt there was an explicit close() call!) Writing to a closed socket/pipe generates a SIGPIPE signal, which has the default action of causing the application to terminate. Suppressing the signal (right now I wrote a handler that tells us the signal happened and then returns) allows us to instead pick up the return code of the write() calls and inspect errno.

I think that is a better behavior for now, because in any case we want to recover from the error in the application logic and not in a signal handler. So now, if someone disconnects, it says something nice about it and then exits. A slight improvement (maybe I will write this now) will be to make it go back up to the accept() call -- essentially wiping the slate and waiting for a while new client. That would still mess up the current multi-client situation though.

I think the thing to do now is to multithread the client handling code and to also consider adding some synchronization messages like "I am ready for blobs", "I am leaving now" to the network protocol.


Threaded Netcode

Today I threaded skt_server so it can service multiple clients at once. I've tested it to verify that it can service multiple people at once, and (mostly) clean up after itself. I need to think harder about how to make it service people with changing data streams as time goes on. That is one of the more complicated concurrency problems.

I wrote it today because I think we may soon approach where we will want to service multiple clients at once, and it is either this or forking processes. The netcode has actually been upgraded a bit since it was put into blobd. It is a little bit simplified (could stand to be more simplified probably) and can be easily told to run over unix sockets or TCP.

I am wondering if soon we might start encountering cases where the protocol is inadequate as well -- Davide's program is encouraging since it appears to maintain an arbitrarily long blob subscription from the server.

It occured to me that it may behoove me, depending on where I get with Compiz, to work on a proxy from our protocol to Tuio. Tom is programming against Tuio I assume, and it will be a little rough for him if we don't deliver that. I was kind of hoping that our mouse driver, if we made one, would consume Tuio as well -- although in the short term, it is probably simpler to implement it with our current protocol.

On Compiz

The Compiz codebase is pretty intimidating. Last night I had the idea that I should begin documenting their code as I unravel it while trying to figure out what I need to do with it.

I don't think it is entirely a sure thing I can pull off all the screen correction effects that we want. I reread the paper of the people in Austria who did something similar, and for two paragraphs they describe their Compiz code at a high level. The description is helpful, but it may be a little bit beyond me. If not because of the concepts, because of the large (to me) codebase.

One thing I am especially worried about is pixel loss. It seems like if I straight away "trim" pixels, that they will get lost. Writing this sentence just now, I think I may have had an idea how that might work out though: Opengl texturing.

In the paper, it sounds like they create some number of polygons to represent their corrective surface and then texture-map the composited desktop onto that. I don't know the ins and outs of texturing, but maybe some of the Opengl options will let that texturing algorithm span gaps in the polygons w/o losing the pixels which would normally fall in. I do not really kniow, though. I'm still just unraveling the Compiz framework to let me get at that functionality, let alone brushing off my Opengl knowledge and taking a crack at the more advanced parts.

Another depressing part is the Compiz puts a pretty archaic set of constraints on the software versions in our system. To span a display across multiple GPU cores, we need something called the Xinerama extension. To use Compiz, we need something called the Composite extension. Unfortunately, right now, they are not compatible. Maybe if we were a year ahead in the future, they would work together in the "current" X server.

To get around that incompatibility, we use a third program: Xgl. Xgl essentially sits on top of Xinerama and beneath Composite so they work together. Unfortunately, XGL is deprecated so we can't easily use it past Ubuntu 8.04 (specifically, this is to say X.org server 1.4 I think.)

Ok, that is doable I suppose. But we are already installing boost libraries outside of apt for Davide, and this precludes us using MPX also: release-quality MPX only exists in the 1.6 X.org server.

I think XGL is the worst part of the setup because it kind of holds up everything else. But if we don't use XGL, we can't do the display correction across multiple GPUs because Compiz won't run because the Composite extension would be absent.

I think doing the display correction outside of Compiz would be an order of magnitude more difficult, because Compiz exposes this functionality and sits at the right spot in the graphics pipeline. To do the work in another spot would possibly mean rearchitecting many other components -- and I'm barely capable of doing it in Compiz anyway.

On Configuration Files

I meant to write this almost a week ago when people were more active in discussing the format of configuration files.

For one, I think that ideally we would drop our main configuration in /etc/blobd.conf by default, but have the path configurable with an optional command line argument. This is in keeping with Unix conventions. That is only the config file for blobd of course. Any client apps ought to have their own config file modulus default choices for the blobd socket path (/var/run/scimp.sock probably) and tcp/udp port numbers (I picked 42000).

It sounds like we've already picked our file format, which is kind of the one I preferred. I'd like to reiterate my thoughts on XML though.

In my opinion, XML is rather heavy weight. We would need to pull in a library to parse it because it would be too much work ourselves. This requires us to learn that other library then. Typically, the traversal functions in libraries resemble tree traversal functions. That may or may not feel like overkill to deal with when we may not have too many configuration options in the first place.

Plus, XML is very over-engineered for what we need. I get the impression that it is designed for generic data interchange, not just config files. So it has a mini-language for transforming our particular XML schema to another XML schema and validating files against our schema. These features are useful, but we would not use them. We would be living with the complexity of the system built to support those features, though.

I consider the human readability to be a tiny bit of a myth -- just a little. Yes, the files are ascii, but if you are doing all the validation above and you have a very complex schema, then it is NOT fun at all to edit by hand, in my opinion. I hate it whenever I have to do this for other tools. As opposed to a simpler key=value paradigm..

Key=value is what I was imagining for our format. Even if the above sounds negative, I'm overall neutral on the format. I would just ask people to consider the time to get up and running with XML vs the benefits. Seems like it kind of already happened.

Oh, and one more thing. In my compiler class, we learned (and I subsequently kind of forgot) how to use some nice tools for generating scanners and parsers. By now, Kevin has already pretty much written the parser already. With the tools we used in CS104, I can just write a couple files specifying the special characters of our "language" and the syntax for it, and the tools will generate C code to handle the actual parsing. There is alot of CS theory involve state machines with writing good parsers, and these tools (yacc/bison specifically) do it nicely. Jas' C++ book has the Yacc grammar for C++. Here is the one for C: http://www.quut.com/c/ANSI-C-grammar-y.html

I think it might be fun to go back and do it for our config file. But if Kevin already has something that works, then I would just move on until it became an issue. I'm sure there are innocent bugs in his, whereas a machine-generated parser is much less likely to be buggy.

Possible Calibration Programming Interface

Note: I am writing several posts tonight. I've had things to say saved up, but never quite sat down to write them.

On Sunday night, me and Davide implemented 1-camera calibration manually inside of his keyboard app. It was not too hard to work out the math, and it took less time than that to empirically discover our calibration parameters.

Earlier when I took my shower, I was thinking about how the calibration class might look software-wise. It may have been because me and Davide discussed calibration a little the night before, or because I had been thinking about Opengl for Compiz.

My thought is that it may be neat if the Calibrator/Stitcher/Whatever class wrapped or extended the blobList class. They could then expose an interface where we feed them some kind of transform. Then when we request blobs from them using the same interfaces we do now, the blobs pop out with the transformation applied. This is not very different from how Opengl behaves or how some of Compiz behaves.

Implementing a hardcoded Calibration in Davide's app validated this for me a little bit, because we are pretty much manually transforming all the blobs in a single spot in his code. Of course, the question is really whether it is useful to be able to have arbitrary transforms like that. If there is every only going to be one transform (the one to stitch/fix the cameras), then that can be hidden altogether inside something that wraps or extends blobList.



More to come.


On Networking

I really want to go to bed, but I felt I should write about this now since I probably will not be awake for a while.

I don't like the idea of using RTP or SIP. My gut as a network guy tells me it is a strange pick of a protocol. Honestly I don't know enough about the protocols to argue it down technically, although I have the RTP RFC open on my workstation right now. I'm not sure if I want to invest time researching the protocols very far.

Instead, let me point out a few things.

First, RTP and SIP are typically associated with Voip apps. Apparently some programs used RTP for video as well, but the dominant use is Voip I am pretty sure. I think it is maybe a little goofy to classify our touch data as multimedia. That could be a small discussion on its own, though.

What are our goals for choosing a protocol? Is it ultimate correctness and scalability? Or are we trying to just build something that works? We wouldn't even need a protocol before if we didn't choose to split the blob and gesture programs (I still think that is a good design choice).

If we wanted ultimate correctness, I think we would probably use a Tuio variant (which is apparently out of the picture) or think long and hard about a good protocol design ourselves (which we do not have time for.)

Since I think we are going for a standard of just building something "good enough", then we can put far simpler things on the wire. So why aren't we just doing the stupidest, rawest, simplest stuff over our socket? Choosing SIP or RTP doesn't totally save us from protocol design: we still have to devise a system of messages we would like to pass in the RTP stream -- some kind of protocol for messages, you might say. We just would not have to worry about error detection -- instead we get to learn some huge non-standard library.

If we are aiming for good enough, then we should just use TCP. Over loopback, there should be little or none dropped packets. Most of the latency from TCP is going to be from it managing reliability on an imperfect line, but when we talk to ourselves the line WILL be perfect. The only latency then is an extra few dozen bytes of memory copies, which hopefully is not that big of a deal. If we really reall cared, we could benchmark it ourselves and choose.

In contrast to RTP, TCP is definately simple to program against and the same programming interface will exist on any system we could imagine. Ditto for UDP. For UDP though, we would probably want some degree of error detection. Simply computing some kind of checksum of the data sent/received before we do something important with it would be enough.

At the very least, if/when we discuss this subject, we should do it in light of updated assumptions: how forward-looking should the design be, how important is latency, do we care very much about this iteration having good performance BETWEEN different machines or not, etc.


Smoke Demo

I built touchlib and got the smoke demo running on our prototype today, as well as a simple touch app that prints simple touch events to stdout (finger down, finger up).

The way touchlib seems to work is definately as a library. Touch applications import touchlib and happen to read the same configuration file as far as background substraction and calibration settings and so forth, but drive the camera and do image processing themselves (hence no concept of multitasking touch apps).

I personally don't like that design choice, so I am glad that we are writing our own instead of adapting touchlib. It seems like our gesture system will work that way though, which perhaps makes more sense since we have decided gestures are more application specific.

I had to apply this patch to make touchlib not segfault:
--- ./src/CMakeLists.txt.orig 2008-04-22 19:47:46.000000000 +0000
+++ ./src/CMakeLists.txt 2008-04-22 19:47:56.000000000 +0000
@@ -9,3 +9,9 @@
--- ./src/configapp.cpp.orig 2008-04-22 19:43:36.000000000 +0000
+++ ./src/configapp.cpp 2008-04-22 19:47:56.000000000 +0000
@@ -835,6 +835,7 @@

if( keypressed == 13 || keypressed == 10) // enter = calibrate position


+ screen->unlockHighGUIMutex();

startGLApp(argc, argv);


--- ./src/CTouchScreen.cpp.orig 2008-04-22 19:46:11.000000000 +0000
+++ ./src/CTouchScreen.cpp 2008-04-22 19:55:09.000000000 +0000
@@ -19,6 +19,27 @@

using namespace touchlib;

+#ifdef linux


+/* A mutex lock to prevent multiple threads to access the HighGUI

+ * functions in the same time. For OpenCV <= 1.0.0 in Linux platform ,

+ * the cvWaitKey and cvShowImage function use different set of mutex lock,

+ * but in fact both of the functions would access the GDK critical

+ * session. Before the bug was being resolved in OpenCV, use this

+ * mutex lock to fix the problem.

+ * */

+#define highgui_mutex_init() { if (!g_thread_supported ()) { \

+ g_thread_init (NULL);\

+ gdk_threads_init();} \

+ }

+#define highgui_mutex_lock() gdk_threads_enter()

+#define highgui_mutex_unlock() gdk_threads_leave()


+#define highgui_mutex_init()

+#define highgui_mutex_lock()

+#define highgui_mutex_unlock()



#ifdef WIN32

HANDLE CTouchScreen::hThread = 0;

HANDLE CTouchScreen::eventListMutex = 0;

@@ -31,6 +52,7 @@



+ highgui_mutex_init();

frame = 0;

#ifdef WIN32

@@ -230,7 +252,9 @@
if(filterChain.size() == 0)

return false;

//printf("Process chain\n");

+ highgui_mutex_lock();


+ highgui_mutex_unlock();

IplImage *output = filterChain.back()->getOutput();

if(output != NULL) {

@@ -803,6 +827,14 @@


+void CTouchScreen::lockHighGUIMutex(){

+ highgui_mutex_lock();



+void CTouchScreen::unlockHighGUIMutex(){

+ highgui_mutex_unlock();



// Code graveyard:


// Transforms a camera space coordinate into a screen space coord

--- ./include/CTouchScreen.h.orig 2008-04-22 19:43:36.000000000 +0000
+++ ./include/CTouchScreen.h 2008-04-22 19:47:56.000000000 +0000
@@ -122,6 +122,8 @@
// returns -1 if none found..

int findTriangleWithin(vector2df pt);

+ void lockHighGUIMutex();

+ void unlockHighGUIMutex();


--- ./include/ITouchScreen.h.orig 2008-04-22 19:43:36.000000000 +0000
+++ ./include/ITouchScreen.h 2008-04-22 19:56:39.000000000 +0000
@@ -83,6 +83,8 @@

// start the processing and video capturing

virtual void beginTracking() = 0;

+ virtual void lockHighGUIMutex() = 0;

+ virtual void unlockHighGUIMutex() = 0;


* Sets the blob tracker.

The patch fixes a race condition in the way that OpenCV's HighGUI library interacts with one of the native X toolkits, GTK. I ported the patch to the latest version of touchlib in their cvs, r400. It ran without segfaulting, but it would stall out and hang very often (and come back as well). I assume that something else in their code was broken as well, so I went back to revision 393 which the patch is made for and it worked like a charm.

Sadly, Smoke does not seem as cool in person as in videos. My touches still seem pretty bad, me being me and all. Hopefully it is only a matter of me tweaking the touchlib calibration settings, since syncing LED pulsing against all the webcams sounds pretty tricky.


Software Meeting

It is 4am and I just came home from lab.

I'm ready to blog.

The software meeting today was semi-useful to me today. The agenda was essentially video & camera calibration, and threading/IPC.

I will expound on the video calibration/correction in another post tomorrow with detail, so I will defer it for now.

My opinion about camera calibration/correction is this: I think there are two parts to the problem. There is a need to have a backend implementation of the stitching which is essentially the application of a transform on the various camera [blobs] so they are all consistent. This I view as a hard deliverable: this is a core element of what we are building.

Now, what the particulars of the transform are, we won't really know ahead of time. We could possibly get close with pencil and paper, but Jas pointed out to me that it is pretty hard because you can't "see" it interactively so it would be tedious to determine empirically. The calibration frontend is what makes this easier. The frontend would inform the backend about calibration details by probably populating a configuration file.

So I view it as two programs connected by that config file. Two separate problems. You would probably want to write the backend first, since you can't really validate the frontend too well without it. In the meantime, you could get away with a manually populated config file until the frontend was complete. With this sense of dependencies, I kind of find the frontend to be optional. At the very least, I think too much work on it is deferrable until the backend is complete: I don't think the frontend design would inform the backend design very much. If anything, it would probably be the other way around (feel free to correct me Kevin!)

Another reason I suggest deferring frontend discussion is because we can get away with delivering a table without a functioning frontend if the backend is ok, but we can't do it the other way. So just in a sense of plotting out the dependencies and doing standard CYA procedures, it seems like a safer plan to me.

On threading:

In our midnight meeting, I gave an explanation of the threaded programming model vs the multiprocess programming model vs the synchronous (there is probably a better name for this) programming model. I'm going to explain them in a little more detail here.

The synchronous model is the simplest. It would be one gigantic program doing each piece of work one at a time. This would be the simplest to write, but the drawback is performance. It literally cannot do more than one thing at a time. While it is sifting through blobs for gestures, it cannot process webcam images or stitch anything together. If we had a fast computer, this might be ok, but I suspect the computer will not be quick enough to hide latencies here. So we want to avoid this model as the other two essentially give us the software pipelines.

The process model is the next one up. A process is a program that is in a state of running. I.e., a program is code on disk or in memory and a process is the concept of a running program to the operating system. The running state involves the instruction pointer, registers, kernel accounting data structures, program data, etc. So to have multiple processes, we'd pretty much have multiple programs (technically it could be one, via fork, which happens to run blob detection or gesture detection or whatever after we forked it, but that seems overly complex.)

Multiple processes, since they are standalone, would need to use Inter-Process Communication (IPC) of some form. The main ones we would consider are either named pipes/Unix sockets or shared memory. Pipes and sockets are pretty similare: offhand, I'm not sure I can say what the practical difference would be to use besides setup code.

Besides the code for setting them up, named pipes and Unix domain sockets essentially have the same semantics as files. You keep a handle on them via a file descriptor and you can use read() and write(). There are other functions, like send() or recv() which behave a little differently, but the key to me is the file-like interface. This makes testing pretty elegant, since you can just feed them sample files on disk while you wait for your counterpart to write his program which would feed you.

Also, network sockets are very similar as well: they pretty much differ by setup code. From my administration experience, I can tell you that it is very common for a lot of network programs to run either with a Unix domain socket if it is all local communication, or use a network socket if you would prefer to spread the application across the network. I would wager that the code inside does not differ much beyond the setup code and perhaps handling special cases and so on. I think we can all agree that being able to run some of our code over a network between multiple tables or multiple table elements could be very cool in the future.

One drawback is that I'm pretty sure we cannot shove pointers through pipes or sockets. They should not be very meaningful on the receiving end since that process would have a different virtual address space. This same drawback would apply to shared memory (I'm not totally sure, but it seems more performant.)

Shared memory is when multiple programs can request that the operating system assign them the same chunk of physical memory. This way, they can read and write values from the same place and avoid lengthy copies. I'm not going to research it right now, but I would guess that when the operating system gives people handles into a shared memory region it does not map it to the same location in those particular programs' virtual address spaces. So pointers would still be broken unless they were relative. I suppose I am kind of unclear on it though. It looks like we will not use this option so it may not be important.

As opposed to the process model, threads run within a single process. A thread is pretty much a separate context and stack within the same process. This way, there can be multiple states of execution happening all at once with direct access to a processes resources: file descriptors, memory, etc. The operating system may or may not schedule multiple threads across CPUs; that is a CS111 discussion however: modern Linux has done that for years.

The thread api is pretty cool. It is called POSIX Threads or pthreads. The essence of it is that you do a tiny bit of setup code, and if you feel like spawning a thread you invoke it as a function. I.e., you can imagine it as making a function call that spawns a thread for that function and then returns immediately so you can keep going along on your way.

The two big benefits of threads is that spawning them is far less expensive than creating a new process (although if our threads or processes are long lived, we would amortize this over time) and they have access to any program data with the same privileges that a function in your program might. I.e., global variables, arguments, pointers resolving correctly, etc.

The downside is that debugging may be weird. These things are happening asynchronously so you cannot guarantee ordering of how things occur. You synchronize with mutexes or semaphores, which isn't too bad -- unless someone forgets to set or unset one, and then it is real tough to track down.

A real common programming model here is a producer-consumer model. You have some amount of producer or consumer threads filling up or depleting some buffers, and once they have done their work, a semaphore or mutex conveniently makes the one using the buffer block while the other thread uses it up and then it flip-flops the other way again.

Without thinking too hard, an example off the top of my head might be this: there are 6 buffers, one for each camera. Each buffer has an associated mutex. The blob detection code will want to fill the buffers and the blob stitching code will want to deplete them. Both of these modules will be written so that they do not touch the buffers unless they grab the mutex. Trying to grab it while the other guy has it will cause you to block. So the cameras start off owning the mutex. They detect blobs and fill the buffers. Some cameras might be faster to detect than others because they have more blobs to go through. In the meantime, right after they started, the stitching module began trying to acquire mutexes on all those buffers. He is going to block now until all the buffers have been filled and the detection threads released their mutexes. Then they block while trying to acquire the mutex, and he goes to work. When he is done, he releases his mutex and they go to work filling buffers.

* This is actually a really simple overview. After I wrote it, I realized there were several timing problems. Ask me about it later.

Finally, to end this post, I want to briefly discuss the final architecture which Jose suggested. It was pretty insightful, I am not sure if I would think of it: he suggested using both techniques. I think this is a good idea too. Jose suggested that the blob detection and blob stitching be threaded in the same program, and for gesture detection to run as a separate process and communicate via sockets.

I think this is a good idea for two reasons. 1, we get the performance edge of threads. 2, the gesture detection has a good modular divide from everyone else. The two components are now decoupled pretty well, so they could stand better alone also. Where to draw the line as far as moving things over the network is a little arbitrary, but consider this: our high level software block diagram has TUIO coming out of the blob stitching. This is a natural fit then: just have the blob stitching serve up TUIO (probably non-trivial!) and then ask the gesture daemon to connect. This makes it easier for existing apps to run on our system (since we have more of an excuse to implement some kind of TUIO server) and conversely for our gesture daemon to run on other people's systems.


I think I might want to pull some CS115 moves and try to have another software meeting to work out a precise object model. I can bring my book from that class and some of my work from last quarter to show people what I'm talking about. I need to study up on it some first, though.


Mac Mini 2

I got the Mini up and running mostly well enough. The primary lacking feature at the moment is wifi support; the Mini uses a Broadcom chip for wifi which is notoriously painful to use in Linux. It requires weird reverse-engineered drivers and things of that nature. I can research it and set that up later, though.

I repartitioned the hard drive and then reinstalled OSX. Once I had OSX installed, I spent a little time scrutinizing "Boot Camp". It seems like really what happens is that Leopard maybe has the BIOS emulation stub for EFI out of the box (or maybe it was left over from the prior install), so without doing anything, you can tell it to boot a normal PC operating system. The Boot Camp tool in the Utilities folder is just a hand-holding mechanism for repartitioning and rebooting your Mac.

Because of this, I resized the OSX partition in their normal partition tool instead. I made a 50GB "Msdos" partition. Then I installed rEFIt. REFIt seems to replace the Apple-provided bootloader with something nicer (I don't really know what happens in EFI land here, it looks more like Apple's loader will load rEFIt if I let it). REFIt gives me tools for synchronizing between the GPT partition map and MBR partition map, as well as choosing between Linux and OSX.

Once I was satisfied that rEFIt was installed correctly, I proceeded to install Ubuntu. I went with 8.04 amd64, because that is what we will probably be running on the final table. This means alot of programs our somewhat out of date, but it shouldn't really matter (knock on wood) too much for our code we will write later.

I had to build a 2.6.29 kernel on the Mini, which surprisingly took nearly an hour with 2 threads. I suspect this is because the Mini uses a 2.5" hard drive, so it was probably blocking on IO for a good chunk of the compilation (I should have noted user time vs system time).

There was some trouble getting the Nvidia driver installed. I wanted to go with an older driver because there are some goofy rendering problems in Compiz with PowerMizer in the newer ones. The problem can be resolved by turning off PowerMizer, which never seems to obey, or avoiding Compiz. Or running an old driver. It turns out that even slightly older drivers won't compile against newer kernels. This has always been a problem, but I burned a few hours down on it because the error message wasn't quite indicating this.

Instead, it told me it could not find the kernel sources that I was directly pointing it to. I eventually tried the most recent driver on a lark, and it worked fine. Later I read that the structure of the tree has changed over time and that is what caused the confusing message.

The Mini now boots into Linux by default does an auto-login for the scimp user. I left a shortcut on the desktop that runs a script with simple mplayer arguments for showing the webcam. About everything should be in order for running our code, aside from installing whatever random libraries we need along the way.

Postscript: along the way, while messing around with the Mini, I noticed it would throw an odd resolution to the projector: 824x632.


Display Correction

I have just about exhausted my options on this. These are the different methods I have thought of or tried:
  • Custom resolutions in video driver

  • Custom screen offsets

  • Dialing settings into projectors

  • Seeing if projectors have any useful DDC controls (they don't have any controls)

  • Running a custom Compiz
    • Running a custom Xgl

  • Trying RANDR 1.3, which just came out and has options for custom screen transforms

  • Running Chromium

  • Running DMX amongst multiple instances of Xnest

(There may be more that have come across my mind which I do not recall at the moment)

Some explanation:
I view a final solution as falling into 3 tiers: individual screen transformation, individual screen trimming, and doing nothing.

Screen transformation would require some combination of Compiz, XGL, DMX, and Xnest. Or RANDR. There are wrinkles to all of these, however. I want to give a description of the current X architecture and then discuss the theory. The architecture has mostly been in my head since we started, and the possible solutions piled up as time went on.

The X Window System (also called X or X11) is a network system. A server controls the video output, the keyboard, and the mouse. Any program that runs, like a terminal, is a client. The server listens over TCP/IP or over a Unix socket. Clients connect and tell the server what to draw and the server sends back user input. This is commonly referred to as network transparency: the EXECUTION of gui applications and the INTERFACE with them can occur on different machines. This may sound convoluted, but it can be useful at times.

A simple example is that VNC-like capabilities have been there since day 1. A more complex example might be Chromium: because the system is architected with the thought that apps may have a disjointed operation, Chromium is able to grab and reroute applications.

Apps speak a (relatively) simple protocol with the server, and fancier things like dialog boxes are implemented with higher level libraries which aren't strictly standard. Communication at this level is probably analogous to Quartz on OSX or GDI+ on Windows. One of the driving philosophies here also is that the system provides mechanism and not policy, which is why the GUI can be weird sometimes.

Part of the design of this weird system, which might have a lack of features, is that it is extendable. If you look at /var/log/Xorg.0.log on your machine, you can see a lot of the extensions that have been added to the most common X server on Linux, X.org. For what I have been working on, the extensions I care about are Xinerama, Composite, and RANDR.

This is the description of Composite from the X.org wiki: "This extension causes a entire sub-tree of the window hierarchy to be rendered to an off-screen buffer. Applications can then take the contents of that buffer and do whatever they like. The off-screen buffer can be automatically merged into the parent window or merged by external programs, called compositing managers. Compositing managers enable lots of fun effects.".

To digress a little: a window manager is a privileged X application that is charged with managing windows. A given X program can only draw what is inside their window; everything outside their window, like the border or desktop, is implemented by the window manager. Consequently, window movement or minimization is handled by the window manager (it manages windows, after all!)

Window managers like Compiz rely on the Composite extension to perform distortions on the windows themselves, since the drawn output sits in an off-screen buffer rather than having been painted directly onto the screen. I don't know exactly how it is structured inside, but it pretty much uses Opengl to drive the entire screen and presumably imports drawn windows as textures onto 3D objects.

This provides possible technique for screen transformation: since Compiz is using Opengl to draw everything, it should not be *too* complicated to modify the object(s) representing the entire screen to be slightly distorted, as needed.

Problem 1: Nvidia DISABLES the Composite extension in a multi-gpu setup like we have, so Compiz will NOT run. There are a couple possible solutions to this, however

1) Use XGL
When you configure a multi-gpu setup, you typically use the Xinerama extension. It is apparently straightforward to do what amounts to running an individual X server on each gpu. This doesn't achieve the desired effect, however, as applications are stuck on whichever server they connected to (actually, this description is pretty hand-wavy and this isn't what really happens, but this is how it looks from a user perspective.)

When you use Xinerama, the different gpus are united and this extension figures out how to split the different drawing commands and so on between the different heads. It even splits Opengl between the heads, with heavy assistance by Nvidia's driver I'm sure. Key point here: random hardcore fullscreen Opengl works fine in a Xinerama arrangement.

Enter Xgl: Xgl is a pure software X server which renders into an Opengl window rather than directly onto the video card. As such, you can turn on Xinerama so you get nice multi-gpu Opengl and then run Xgl which will place its own window so it perfectly covers all the screen area Xinerama tells it about.

Problem solved, right? I've gotten this running, in lab, and it looks nice. You can imagine a performance hit, and there is, but it isn't too bad unless you do serious 3D work, which the table technically won't do. This method possibly shifts the coding work into either Xgl or Compiz: one of them would need to be informed about the layout of the various heads and be told to draw onto N objects with various distortions rather than the one. This is possible doable.

Problem 2: the code for Xgl is huge and intimidating. It may be a huge time sink understanding it and modifying it.

Problem 3: Xgl is deprecated! It isn't in any version of Ubuntu past 8.04, but maybe if we are lucky it can be coerced into running on 9.04. This of course only enables me to dump time into rewriting part of it as well; see Problem 1.

Possible Solution: Either stick with an older distro (no MPX support then) or try to get Xgl running on a newer one.

2) Use DMX
When experimenting with Chromium, which isn't a good fit for our project as it exists now, I found out about DMX. DMX stands for Distributed Multihead X. It is what Chromium relies on for a display wall. The concept of DMX is simple: again, it is a software-only X server like Xgl, but instead of drawing into its own window, it draws onto OTHER X servers. So a possible method is to ditch Xinerama, run 1 X server per gpu, and run DMX on top of all of them. Then run Compiz inside of DMX.

Problem 4: DMX segfaults on everything I've tried it with.

Possible Solution: run older versions of DMX, which apparently work (but I have not tried, sounds shady to me and apparently has rendering glitches.)

3) Use Xnest
Xnest is another pure-software X server. It simply draws into a normal window on your desktop. It could be possible to one Xnest server for each projector, and then have DMX draw into them.

Problem 5: This would perform horribly (no hardware Opengl happens here and we kind of want to use an Opengl window manager) and DMX crashes all the time anyway.

Possible Solution: I could fix DMX (either by fixing the code or running an old version of DMX) and then if Compiz runs like a pig, I could try to modify Xnest for display transformations.

This post has almost exhausted the pure software solutions. There are two more: RANDR and writing my own weird custom display layer.

RANDR is an extension for controlling the resizing and rotation of screens. It is what we use when we set up the multihead on our lab computers. In the newest version of X, which came out a few weeks ago and comes with Ubuntu 9.04, RANDR has been updated to allow arbitrary transformations. I was excited when I found out about this, and I rushed to try it in lab. Sadness: the xrandr tool simply segfaults. I think perhaps it is buggy and/or needs driver support from Nvidia. If the latter, I doubt we can rely on Nvidia to give us something usable anytime soon. For one, just because it is really new. Also, Nvidia disables RANDR when doing multi-gpu setups as it is, so I am not optimistic it would even be usable in our situation anyway.

The other solution is writing my own custom display layer. This sounds very hefty, and to be at all useful, it would need to be on the order of Xnest or Xgl anyway. It is unclear whether I am good enough or can justify the time.

One glimmer of hope is that when I first brought up the projector wall, I had a minor misconfiguration from when I was tiling 4 of our widescreens before. It caused one of the projectors to have a horizontally trimmed picture. I have not been able to recreate this effect in lab, sadly. If I did figure it out, it would be a matter of adjusting Metamodes or Xinerama layout specifed in Xorg.conf.

Doing the correction in software is tough. To do full distortion looks like it would require me to pretty much dig in and write arguably OS-level code. I don't feel comfortable facing that in terms of my programming skill or the time for this project. Trimming *may* be achievable, simply because I may have accidentally done it last week.

I want to resolve the possibility of trimming and work on the possibility of physically aligning the projectors "well enough". I've experimented with it a little bit, and it doesn't seem *too* bad. Not having perfectly adjustable surfaces for mounting the projectors is the biggest impediment in my tests. It may be all we can get away with, though.


Mac Mini

The Mac Mini is kind of a nice deal because it is simple to buy and small for our table (but why didn't we just pick one up from the campus bookstore?)

Getting Linux up and running on it has been a minor hassle however. Apple uses EFI instead of a traditional PC bios. This is why Boot Camp is used to run other operating systems.

EFI is a far more sophisticated pre-boot environment than the old PC bios. One of the features it has is the ability to run programs. Apparently, much of what Boot Camp does is install a program that provides legacy bios compatibility.

Linux actually is able to talk to EFI natively (unlike most versions of Windows), but because boot camp is so simple, most people just go with a normal Linux install. EFI, however, brings a couple problems in the sense that I would not have to deal with them at all were this a normal PC.

Problem one: the dual-boot interactions between Linux and OSX are unclear. Grub doesn't know anything about OSX of course, and apparently other people install an EFI program named rEFIt to provide boot selection between OSX and Linux. We did not do this though before installing Linux. I do not know how to boot back into OSX at this time.

Problem two: EFI specifies the use of GPT partition tables rather than the old MBR style. GPT, like EFI, is far more modern than the old MBR layout. One of the things GPT provides is compatibility to programs that expect the old MBR style. When I installed Linux, the installer relied on this to create Linux partitions. The problem is that apparently this legacy shim needs to be synchronized with the REAL GPT layout, so my partitioning changes are not valid to OSX at the moment.

The solution to that is to use OSX's Disk Utility to make my Linux partitions, or to tell rEFIt to synchronize GPT and MBR. At the moment though, we cannot seem to boot OSX at all.

A final wrinkle in the Mini so far is that Nvidia's Linux drivers do not seem to want to drive the output at more than 640x480. I have not figured out why this is being problematic. I am sure this is an innocent problem that has nothing to do with the computer being a Mac.


Chromium-inspired blob detection

I thought I would write as a separate post, it felt awkward to work into the one on Chromium.

On Saturday I was talking to Eddie about any implications Chromium would have with the image stitching. I didn't think it would have any direct effect on how to go about the stitching, but I had another idea.

It is actually not a given that stitching should come before blob detection.

You could either
1) Combine N camera images into one big image and detect blobs in that. In the final image, overlapping areas in the cameras have been discarded.
2) Detect blobs in each camera feed, then combine all of the blob data. Redundant blobs can be discarded.

Both ways sound relatively easy to me. I think which one to use is just a performance issue. In both cases, I'm not imagining "stitching" that is any more complex than dialing in areas to trim which we have experimentally found to be overlapping or uninteresting. A manual calibration, if you will.


I've spent a good amount of my weekend reading about and playing with Chromium. I've still not totally "evaluated" it, but I think I've been looking at it enough to write something.

For one, all of the pictures of display walls in their documentation is very encouraging. They also have pictures of Quake 3 running. Essentially, what Chromium lets you do is modify the Opengl command stream by directing through different servers. The servers may be running one the same machine or across the network. The servers can do all kinds of things like substitute the opengl commands, log them, or pass them on to a GPU's driver. Zero or more network hops may be along the way. The use the word parallel a lot also, which probably gets people's blood pumping. Parallel in this context I think is to say they can be intelligent about how to divide the opengl command stream so it could execute in parallel across multiple machines.

There are some gotchas though. Specifically, for our project, is it worth our time? I think for an *ideal* version of our table, something like Chromium could be very cool if we made little table modules that could be connected for an arbitrarily large table/wall.

In the real world though, it is unclear that it would be a home run for our project. For one, we need a PC for each head. We also need an application PC that programs will actually run on. This is how they intend you to use Chromium, and also how we want to things to run. It would be very baroque if we broke with what everyone else is doing and required table apps to actually be a collection of smaller processes that would coordinate their display output etc.

Second, in trying to run a network of PCs in our table, we'd need to install a network too. This would add a bit to our cost (we probably want a gigabit switch), but more importantly it would add a latency.

I've heard the idea that maybe (and this sounds like it could be a rumor) it would be a good idea to run the cameras on the nodes attached to projectors. While it sounds elegant, it also is unnecessarily complicated. Because the table applications would (presumably) run on the app node, the individual camera information would need to be accumulated there anyways.

The worst part, though, is that Chromium may be too crusty to be reliable. The last release was in August 2006. I got their simplest demo setup compiled and running in a virtual machine though. All it does is redirect the rendering of an Opengl app through the Chromium pipeline into another window on the same desktop. I have not yet tried to make it go across the network to a 2nd vm (this specifically is hardly novel: remote X accomplishes it also.)

For the display wall setup, Chromium relies on something named Distributed Multihead X (DMX). DMX is very cool - I'm almost embarrassed I had NEVER heard of it before. It is a pretty elegant program: all it does is become another X server which proxies draw commands to other X servers over the network. A very simple idea.. but that is actually all you need to do a display wall also. If we just directly used DMX, I'm not sure where we'd need Chromium except that perhaps it may be faster than Opengl -- but I don't really know since I haven't gotten either running too well then.

DMX segfaults in every configuration I've tried to run it in. This may be a symptom of also being somewhat old (newest article I found about it was from 2006 also.) I could bang on it some more tomorrow, *MAYBE* try to roll back to an older distro in a vm. I do not feel too hopeful though.

The biggest drawback to these solutions, for me, is that I already have 4 head output running fine at my desk. I'm pretty confident that I could drop another card in there to get our 6 heads. If I try to go for Compiz on it, then my config gets a bit more archaic and actual Opengl performance (say, trying to play Q3 or WoW inside) will drop off, but it still works. And it is all one computer. And stuff going over pci express is probably faster than over the network. And it appears to be stable as well -- I was nervous about Nvidia drivers being flaky.

If I *DON'T* try to run Compiz, it is more reliable and performs a bit better. I don't really need Compiz unless I try to go for the display correction, which the solution above actually complicates a little.

As an aside, and I've mentioned this to Jas already, I'm pretty much at the point where going much further into the display work would involve me trying to understand a MASSIVE codebase that could absorb a lot of my time and which could simply be out of my league.

So past checking out Chromium/DMX and having this multihead box working, I would like to move on and just be sure we do a really good job mounting up our projectors in the final table.

In case anyone is interested though, these are the options I have for the display correction:
  • Customizing XGL (hardest)

  • Customizing Compiz (hard)

  • Asking Nvidia's driver to give me custom resolutions (not too bad)



Ah, I finally got multihead to come up. I finally tried the other card. It came up fine. So perhaps the old one is bad.

I put the old one back in just to confirm, and now I'm waiting for the new one to boot. Knock on wood: I hope it still works ok.

The old card would do video with one gpu, but not the other. I'm tempted to put it in and see if I can send any CUDA code to the unresponsive gpu. It otherwise seemed ok as far as the system seeing it, aside from the driver not being able to boot it.

Another wrinkle is that we may be hamstrung for cards. If the hdmi idea I posted about earlier works, then we may be ok. It is unclear now. Jose probably has more of these things laying around in his lab anyway.

HDMI & multihead

I had the idea this afternoon of trying to use the HDMI output on our videocards.

This has the benefit of slightly easing the multimonitor work, while also freeing up 3 GPUs for nothing but CUDA down the road if we end up implementing it.

I picked up an hdmi-to-dvi adapter from Santa Cruz Electronics today, but it doesn't fit on the card at the same time as a normal dvi plug. So I need to either pick up an hdmi extension cable tonight, or return this adapter tomorrow and get a kit that has a cable extension.

Still trying to decide if I should try and pick one up tonight. I'm rerunning memory tests on my computer now. The ram seems to be passing now if I try it in some different dimm slots. That isn't as bad as last night, but still somewhat discouraging.

Once I get the computer up again, I need to try a couple kernel options to see if I can get multihead to work. If that doesn't work, then I will try the other video card. If that doesn't work, then maybe both cards together. I think I had one more idea after that. EDIT: this was VGA pallete snoop option and disabling UseEvents.

If multihead still doesn't work after all these things, then it may be an incompatibility with my motherboard. I could possibly pick up another from Fry's, or we could try and get another PC from Jose. If multihead STILL doesn't work (and at this point things are shady, because Nvidia's documentation says that what I am trying to do IS supported) then we are kind of SOL and will need to look into Chromium.

Bad Memory

Last night I was in lab for a long time trying to get multimonitor output to work with Jose's cards in my computer.

I probably should have kept a precise log of my actions in some kind of notebook for engineering, but this is my recollection of how things went down:

1) I updated to the latest bios and tried to do multimonitor output with my existing config. No go. Note: this was stable, however
2) I tried to boot from my old laptop hd over usb, but I could not boot because I had to change some settings on the hd.
3-0) I either booted a Knoppix cd before or after the bios update, and it ran fine. Subsequent boots did not produce a properly running Knoppix. Sometimes I got assertion failures on boot.
4) My desktop install would have trouble booting.
5) I changed the usb hd settings on my laptop, then tried to boot it. Had weird hangs or segfaults.
6) Pulled some ram, things began to work.
7) Burned a memtest86 cd, started testing ram.
8) Eventually tested each piece of ram individually, all came back bad. That means I have 4gb of bad ram in my computer which otherwise was very stable.

So what seems to have happened is

  • My ram is bad. OR

  • My motherboard is bad. OR

  • I simply seated my memory wrong.

I hope I can narrow it down today. Does anyone have access to a motherboard that uses DDR2 memory?


Webcams Part 2 & more

So today I built a 2.6.29 kernel, and the PS3 Eye driver on there seems ok. I still have a couple app compatibility wrinkles, where some things run fine against my Logitech camera. But Eddie's opencv test app works fine on it (most important), and mplayer shows video off it with low latency (also good.)

By the time we will be done in June, Ubuntu 9.04 will have been out for a month. 9.04 won't have 2.6.29 though, I don't think. Last I read, they were planning to hold off on it, anyway. It will have the next X.org release though, which has the MPX code merged in -- so it supports multiple pointers. I hope I have time at the end to implement a basic mouse driver.

I am sitting at home now watching this video about Git. Git is the scm tool that Jose set up for us last quarter. I made a Google Code account for us also, so we can do SVN via Google as well. I don't think I'd want to pick one or the other until I researched the discussion about them.


CUDA Part 2

In v2.2, which isn't out yet, they are adding zero-copy support. This could help us with performance issues for the image-processing code, assuming we get around to implement it.

Details here.



Today I picked up this Logitech camera. It maybe is better software-wise because it is a UVC (usb video class) camera, so it has a standard driver interface. Semi-standard anyway: I have not researched it *too* in-depth yet, but it seems like they have to periodically update the UVC driver for new cameras.

Playing with a few apps, I can change resolution & fps of the logitech camera pretty easily on my laptop. At first, it had a lag like the Playstation Eye. Which sounds bad, but that implies it wasn't necessarily the Eye driver that was laggy. Fiddling with the settings, it seems to have gotten faster. It is actually running at a pretty low latency now on my laptop, although it seems to kind of smear images at a high framerate.

This beckons me now to go back and tinker with the Eye some more. I may take these over to Jas' house later tonight.


Presentation Tips

I emailed a guy I know who gives really, REALLY good presentations for general tips and things that he thinks about when he makes his presentations. All of his presentations that I have seen are technical in nature but are still very engaging.

This is what he wrote to me:
You are telling a story.

More than anything else, *that* is what matters. You are telling a
tale, with twists, turns, surprises, and a moral at the end. Timing is
critical. If it's an hour presentation, no individual phase should last
longer than 5 minutes. If it's a 30 minute presentation, 3 minutes. If
it's a 10 minute presentation, it's beginning, middle, and end. If it's
a 5 minute presentation, it's context and discovery.

Slides: You are not talking to your slides. If you are delivering
slides to be read later, you can have lots of detail. But your slides
are a vague reference, just enough to remind you of the story.

Never say um. Just be silent.

Eye contact -- pick one person, in each area of your audience, who you
are speaking to. Don't be too piercing about it, but very much imagine
you are convincing or educating that *one* person. Feel their
understanding, or lack thereof, as you are talking to that one person.
As they understand, be proud. As they are confused, roll back.

Don't be afraid to drop material if it doesn't fit into the storyline.
Better to say less and explain more.

Remember. Tell a story!

It might be a little too late now to build an overarching narrative, but we have pieces of them in our presentation now. I will try to consider some of these suggestions for the things I talk about during my slides.



Eddie was asking me today about what the problem was with the PS3 eye.

This is what I did with it last week:
I tried one set of drivers from some guy's repository who seemed to be the most popular via Google. The best I could get out of the camera with his drivers were audio. I then tried the drivers by a guy from nuigroup, who modified the first guy's drivers. The thread on nuigroup for this set of drivers had several people posting success using them. When I used them however, I was getting about 1 second lag on the video.

Some details:
  • The drivers being used comes as part of a huge pile of webcam drivers that people are maintaining. I happen to be building all of them and installing them on my laptop.
  • My laptop runs 64-bit Linux, so there may be a gotcha there. I could try on my desktop, which is 32-bit.
  • The nuigroup guy's drivers I used were from Feb 15. I ought to post on there or talk to him about my issue, see if it is known.
  • The driver is otherwise nice: I'm able to select 16 different camera modes. I was getting something like 125fps out of the camera @ 320x240 -- just lagged by 1 second.
  • Apparently this driver will be in the 2.6.29 kernel


I am still ambivalent about how to maintain my email conversations for this project. I just now emailed the teachers a copy of the block diagram from my non-scimp gmail address, with a CC to our group. I did this because I wanted the mail to hit the group without me writing a bunch of CCs for us, and I want to keep most all of my scimp activity (vs. just 123 class activity) in my gmail.

I set up my main gmail so it will pick up my scimp mail as another pop3 account, so I can send as the scimp address if I choose to. That particular address seems to be a little bit of an anachronism now though, because my main gmail address is already attached to 2 mailing lists now.

Maybe we should add our scimp addresses to our internal list? Also, it may be convenient to have a scimp alias "teachers" that expands to Kip, Petersen, and Dave so I don't even have to write lengthy CCs for them.

Exponential Push [Winter]

I have to do an exponential push tonight on my CS115 project, but after that I can mostly throw myself at our Scimp stuff. I would like to finalize the block diagram (going to send the teachers my current one tonight) and the charter, then pitch in on the final report.

The writing discussion today in class bothered me a little bit. I am conflicted on the subject. I used to think I was a decent writer, because I've always gotten decent grades on things. My worst writing assignment in 123A has been an A-. I did pretty well in 185. In prior GE's I've popped out papers pretty quick, and been complimented to my face by the teacher on my writing quality.

In my GE this quarter, however, the TA has been ruthless in eviscerating my work. Granted, the paper I am thinking of in that case I did crank out *really* fast, and I didn't feel too good about the prompt to begin with, but it still plants doubt in my head.

How well written is this post?



I got Nvidia's SDK examples running on my laptop last night, but I'm afraid that Opengl might be putting memory pressure on the gpu, which makes Cuda a little problematic. Nvidia's driver is supposed to do some kind of memory management, but I have no idea how it pages things to main memory and I'm 99% certain it doesn't give the OS very good notifications of what it is doing. I need to investigate the subject.

~/NVIDIA_CUDA_SDK/bin/linux/release$ LD_LIBRARY_PATH=/usr/local/cuda/lib ./bandwidthTest
Running on......
device 0:Quadro FX 570M
Quick Mode
Host to Device Bandwidth for Pageable memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1152.5

Quick Mode
Device to Host Bandwidth for Pageable memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1240.4

Quick Mode
Device to Device Bandwidth
cudaSafeCall() Runtime API error in file , line 725 : out of memory.