4.19.2009

Software Meeting

It is 4am and I just came home from lab.

I'm ready to blog.

The software meeting today was semi-useful to me today. The agenda was essentially video & camera calibration, and threading/IPC.

I will expound on the video calibration/correction in another post tomorrow with detail, so I will defer it for now.

My opinion about camera calibration/correction is this: I think there are two parts to the problem. There is a need to have a backend implementation of the stitching which is essentially the application of a transform on the various camera [blobs] so they are all consistent. This I view as a hard deliverable: this is a core element of what we are building.

Now, what the particulars of the transform are, we won't really know ahead of time. We could possibly get close with pencil and paper, but Jas pointed out to me that it is pretty hard because you can't "see" it interactively so it would be tedious to determine empirically. The calibration frontend is what makes this easier. The frontend would inform the backend about calibration details by probably populating a configuration file.

So I view it as two programs connected by that config file. Two separate problems. You would probably want to write the backend first, since you can't really validate the frontend too well without it. In the meantime, you could get away with a manually populated config file until the frontend was complete. With this sense of dependencies, I kind of find the frontend to be optional. At the very least, I think too much work on it is deferrable until the backend is complete: I don't think the frontend design would inform the backend design very much. If anything, it would probably be the other way around (feel free to correct me Kevin!)

Another reason I suggest deferring frontend discussion is because we can get away with delivering a table without a functioning frontend if the backend is ok, but we can't do it the other way. So just in a sense of plotting out the dependencies and doing standard CYA procedures, it seems like a safer plan to me.

On threading:

In our midnight meeting, I gave an explanation of the threaded programming model vs the multiprocess programming model vs the synchronous (there is probably a better name for this) programming model. I'm going to explain them in a little more detail here.

The synchronous model is the simplest. It would be one gigantic program doing each piece of work one at a time. This would be the simplest to write, but the drawback is performance. It literally cannot do more than one thing at a time. While it is sifting through blobs for gestures, it cannot process webcam images or stitch anything together. If we had a fast computer, this might be ok, but I suspect the computer will not be quick enough to hide latencies here. So we want to avoid this model as the other two essentially give us the software pipelines.

The process model is the next one up. A process is a program that is in a state of running. I.e., a program is code on disk or in memory and a process is the concept of a running program to the operating system. The running state involves the instruction pointer, registers, kernel accounting data structures, program data, etc. So to have multiple processes, we'd pretty much have multiple programs (technically it could be one, via fork, which happens to run blob detection or gesture detection or whatever after we forked it, but that seems overly complex.)

Multiple processes, since they are standalone, would need to use Inter-Process Communication (IPC) of some form. The main ones we would consider are either named pipes/Unix sockets or shared memory. Pipes and sockets are pretty similare: offhand, I'm not sure I can say what the practical difference would be to use besides setup code.

Besides the code for setting them up, named pipes and Unix domain sockets essentially have the same semantics as files. You keep a handle on them via a file descriptor and you can use read() and write(). There are other functions, like send() or recv() which behave a little differently, but the key to me is the file-like interface. This makes testing pretty elegant, since you can just feed them sample files on disk while you wait for your counterpart to write his program which would feed you.

Also, network sockets are very similar as well: they pretty much differ by setup code. From my administration experience, I can tell you that it is very common for a lot of network programs to run either with a Unix domain socket if it is all local communication, or use a network socket if you would prefer to spread the application across the network. I would wager that the code inside does not differ much beyond the setup code and perhaps handling special cases and so on. I think we can all agree that being able to run some of our code over a network between multiple tables or multiple table elements could be very cool in the future.

One drawback is that I'm pretty sure we cannot shove pointers through pipes or sockets. They should not be very meaningful on the receiving end since that process would have a different virtual address space. This same drawback would apply to shared memory (I'm not totally sure, but it seems more performant.)

Shared memory is when multiple programs can request that the operating system assign them the same chunk of physical memory. This way, they can read and write values from the same place and avoid lengthy copies. I'm not going to research it right now, but I would guess that when the operating system gives people handles into a shared memory region it does not map it to the same location in those particular programs' virtual address spaces. So pointers would still be broken unless they were relative. I suppose I am kind of unclear on it though. It looks like we will not use this option so it may not be important.

As opposed to the process model, threads run within a single process. A thread is pretty much a separate context and stack within the same process. This way, there can be multiple states of execution happening all at once with direct access to a processes resources: file descriptors, memory, etc. The operating system may or may not schedule multiple threads across CPUs; that is a CS111 discussion however: modern Linux has done that for years.

The thread api is pretty cool. It is called POSIX Threads or pthreads. The essence of it is that you do a tiny bit of setup code, and if you feel like spawning a thread you invoke it as a function. I.e., you can imagine it as making a function call that spawns a thread for that function and then returns immediately so you can keep going along on your way.

The two big benefits of threads is that spawning them is far less expensive than creating a new process (although if our threads or processes are long lived, we would amortize this over time) and they have access to any program data with the same privileges that a function in your program might. I.e., global variables, arguments, pointers resolving correctly, etc.

The downside is that debugging may be weird. These things are happening asynchronously so you cannot guarantee ordering of how things occur. You synchronize with mutexes or semaphores, which isn't too bad -- unless someone forgets to set or unset one, and then it is real tough to track down.

A real common programming model here is a producer-consumer model. You have some amount of producer or consumer threads filling up or depleting some buffers, and once they have done their work, a semaphore or mutex conveniently makes the one using the buffer block while the other thread uses it up and then it flip-flops the other way again.

Without thinking too hard, an example off the top of my head might be this: there are 6 buffers, one for each camera. Each buffer has an associated mutex. The blob detection code will want to fill the buffers and the blob stitching code will want to deplete them. Both of these modules will be written so that they do not touch the buffers unless they grab the mutex. Trying to grab it while the other guy has it will cause you to block. So the cameras start off owning the mutex. They detect blobs and fill the buffers. Some cameras might be faster to detect than others because they have more blobs to go through. In the meantime, right after they started, the stitching module began trying to acquire mutexes on all those buffers. He is going to block now until all the buffers have been filled and the detection threads released their mutexes. Then they block while trying to acquire the mutex, and he goes to work. When he is done, he releases his mutex and they go to work filling buffers.

* This is actually a really simple overview. After I wrote it, I realized there were several timing problems. Ask me about it later.

Finally, to end this post, I want to briefly discuss the final architecture which Jose suggested. It was pretty insightful, I am not sure if I would think of it: he suggested using both techniques. I think this is a good idea too. Jose suggested that the blob detection and blob stitching be threaded in the same program, and for gesture detection to run as a separate process and communicate via sockets.

I think this is a good idea for two reasons. 1, we get the performance edge of threads. 2, the gesture detection has a good modular divide from everyone else. The two components are now decoupled pretty well, so they could stand better alone also. Where to draw the line as far as moving things over the network is a little arbitrary, but consider this: our high level software block diagram has TUIO coming out of the blob stitching. This is a natural fit then: just have the blob stitching serve up TUIO (probably non-trivial!) and then ask the gesture daemon to connect. This makes it easier for existing apps to run on our system (since we have more of an excuse to implement some kind of TUIO server) and conversely for our gesture daemon to run on other people's systems.

tl;dr:

I think I might want to pull some CS115 moves and try to have another software meeting to work out a precise object model. I can bring my book from that class and some of my work from last quarter to show people what I'm talking about. I need to study up on it some first, though.

No comments:

Post a Comment