Tuesday, 9 April 2013

Lee Going Perceptual - Final Part

Swan Song

Well folks, this final blog pretty much wraps up my entry into the Ultimate Coder Challenge, and I hope you found it interesting and inspiring. The last seven weeks have seen some very late nights, some triumphs and a fair few setbacks on a quest to build a Perceptual Computing app.



I hope this diary has helped and will help other coders who are considering entering the world of Perceptual coding.  It is an exciting new technology to get involved in and I invite anyone with a sense of adventure to give it a go.

The Final App

By the grace of Intel I have a few more days before I submit my final final version of the app to the judges, but I wanted to take the opportunity to release an almost final version today as part of my final blog.




If you would like to download and try it out with a friend, and let me know if anything horrible occurs, it will help me produce a solid final final version when the time comes.  I will be testing internally of course, but there is no substitute for external testing and an app like this needs a lot of field testing!

Instructions

The download is a complete installer which will install DirectX, the Perceptual SDK and of course the app itself.  Once complete, you can launch the app from the installed desktop icon.  If you have already installed the Perceptual SDK, you can click Cancel to skip that part of the installer.  You may need to reset your PC/Ultrabook after installing the SDK before the camera springs to life!



If you are lucky enough to own a Gesture Camera, select or say the word CAMERA to see yourself as a virtual 3D avatar and see what happens when you move your finger really close to the lens!  Everyone else is welcome to try the CALL feature which will connect two on-line users in a virtual conference call, including the ability to sketch to the screen.  To clear the sketch, just press the SPACE BAR.

I have added built-in video help in the form of the ASSISTANCE button and to exit the app entirely touch or speak the word TERMINATE.


Firewalls and Routers

Setting up your PC to allow the software to communicate over the network is potentially the trickiest part of this app. It's a two step process. First you must configure your Router/Cable/ADSL modem to allow the app to both send a receive data. You do this by accessing your Port Forwarding settings (on my BT HomeHub the address is 192.168.1.254) and ensuring both TCP and UDP data can travel both ways through port 15432, and all such traffic goes to the local IP address of the PC that will make or receive calls.



The second task is to configure any firewalls you have in place along the same lines, ensuring the local app (installed by default to your Documents folder) permits two way TCP and UDP communication on all ports.  At your own risk, you can optionally disable the firewall while you run the app should firewall configuration prove uncooperative.

The app is designed to work over the internet by default, but if you want to use the app over your local LAN network, you can locate the app folder and rename "_ipoverride.txt" to "ipoverride.txt" and amend the contents to the local IP address for the PC you have installed the app onto.  Do the same with the app on another PC in the network and you will then be able to discover registered users locally.  Alternatively, you can simply enter the IP address manually from the CALL screen.


The Last Seven Days

Most of the work since the last blog has been focused on improving the speed of the app, the audio visual synchronisation and the front-end GUI polish. For some reason my new GUI has been inspired by Chocolate and on a fast Ultrabook the app can reach 60 fps now thanks to some last minute jiggery-pokery.

I also invented a new tracker :)  Instead of head, eye, gaze, hand, tongue and foot tracker, I created a 'body mass' tracker. That is, the camera figures out where most of the 'mass' exists within shot and provides a coordinate to reflect this. The advantage over a head tracker is that individual hands and other protrusions don't affect the track, meaning when you lean left or right you are almost certainly going to get the desired effect which in the case of the app is to move the camera position so you can see left and right along the table.

I could not resist adding a little spark to the new CAMERA mode. Now when you wave your finger in front of the camera, a green spark is emitted. When you move the finger closer to the camera, it blasts out hot red sparks. 


It is the pre-cursor to an instant-feedback gesture system that I have a few ideas for, but perhaps this is something for another day :)  It was fun to see when I added it though, so I kept it in for you.

The Full Source Code

For coding fans, find below a link to the entire DBP and CPP source for the app which are not pretty, efficient or elegant, but they are honest and readable, and a great place to cut and paste from:

http://videochat.thegamecreators.com/PerceptuCam.dba
http://videochat.thegamecreators.com/PerceptuCam.cpp

If there is enough demand, I will formalise the Perceptual Computing code into official DBP commands, and perhaps even migrate them to AGK as well.


That's All Folks!

I'd like to extend my thanks to Bob and Wendy for their diligence and professionalism, making this competition a smooth and enjoyable experience. Big hugs to all the judges for your encouragement and of course for putting up with my occasionally rambling blog posts. I would also like to tip my hat to the six teams, comrades who made the competition feel more like a chilled out V.I.P Hackathon. I've learned so much from you guys these past few weeks and I'm humbled to have been part of this select group.

Mushy stuff over with, I'll finish by saying goodbye and I look forward to dismantling more cutting edge technology for you in the very near future.



Friday, 5 April 2013

Lee Going Perceptual - Part Six

Read the main blog direct from the Intel site here:

http://software.intel.com/en-us/blogs/2013/03/31/ultimate-coder-challenge-ii-lee-going-perceptual-week-six

I returned from GDC 2013 will a good sense of what I needed to do. With precious little time to do it in, find out what happened in Part Seven, due to be published on the 8th April 2013...

Tuesday, 19 March 2013

Lee Going Perceptual - Part Five

Voice Control And Other Animals

Rather than repeat my blog post, here is the link to the original one:

http://software.intel.com/en-us/blogs/2013/03/17/ultimate-coder-challenge-ii-lee-going-perceptual-week-five

And my video for this week:



And There's More

After I posted the blog I solved the Voice Control problem.  Here is the code and solution - hurray!

First you set up your grammar:


// Grammar intialization
pxcUID gid;
pmyVS->vc->CreateGrammar(&gid);
pmyVS->vc->AddGrammar(gid,1,L"Host");
pmyVS->vc->AddGrammar(gid,2,L"Call");
pmyVS->vc->AddGrammar(gid,3,L"Exit");
pmyVS->vc->AddGrammar(gid,4,L"Conference");
pmyVS->vc->AddGrammar(gid,5,L"Rick");
pmyVS->vc->AddGrammar(gid,6,L"Lee");
pmyVS->vc->AddGrammar(gid,7,L"Toggle");
pmyVS->vc->AddGrammar(gid,8,L"Import");
pmyVS->vc->AddGrammar(gid,9,L"Magnificent");
pmyVS->vc->AddGrammar(gid,10,L"Ploppy");
pmyVS->vc->AddGrammar(gid,11,L"Horse");
pmyVS->vc->AddGrammar(gid,12,L"Dog");
pmyVS->vc->AddGrammar(gid,13,L"Lion");
pmyVS->vc->AddGrammar(gid,14,L"Wolf");
pmyVS->vc->SetGrammar(gid);


And then you look for the respective value in the callback function:


gLatestVoiceCommand = -1;
int labelvalue = cmd->label;
if ( labelvalue > 0 )
{
gLatestVoiceCommand = labelvalue;
}


Presto, a VERY fast voice recognition system!   When realisation dawned, I kicked myself, then wanted to tell someone, so I am telling my blog :)

Signing Off

Decided to break my challenge tradition and do some Ultimate Coding during the week to get a good demo for GDC ready.  The size of another project on my plate shrinks this week so I should have some quality time :)

Monday, 11 March 2013

Lee Going Perceptual - Part Four


This Is Lee Calling

With the realisation that I couldn't top my week three video, I decided the smart thing was to get my head down and code the necessaries to turn my prototype into a functioning app. This meant adding a front end to the app and gets the guts of the conferencing functionality coded.


I also vowed not to bore the judges and fellow combatants to tears this week, and will stick mainly to videos and pictures.

Latest Progress

This main video covers the new additions to the app, and attempts to demonstrate conferencing in action. With only one head and two cameras, the biggest challenge was where to look.


I also made a smaller video from a mobile camera so you can see both PC’s in the same shot, and you will also see a glimpse of what the Perceptual Camera data looks like once it’s been chewed up by network packets.


Top priority in week five will be to reduce the network packet size and improve the rendered visuals so we don’t see the disconcerting transition effects in the current version.

How My Perceptual 3D Avatar Works

One of the Ultimate Coder Judges asked how the 3D avatar was constructed in the week three demo, and it occurs to me that this information may be of use to other coders so here is a short summary of the technique.


The Gesture Camera provides a 16-bit plane of depth data streaming in at 30 frames per second, which produces very accurate measurements of distance from the camera to a point on the subject in front of the camera. This depth data also provides a reference offset to allow you to lookup the colour at that point too.

Once the camera is set-up and actively sending this information, I create a grid of polygons, 320 by 240 evenly spaced on the X and Y axis. The Z axis of the vertex at each corner is controlled by the depth data, so a point furthest from the camera would have the greatest Z value. Looking at this 3D construct front on you would see the polygons with higher Z values nearer to the render viewpoint. I then take the camera colour detail for that point and modify the ‘Diffuse’ element of the respective vertex to match it. The 3D model is not textured. The vertex coordinates are so densely packed together that they produce a highly detailed representation of the original colour stream.
This process is repeated 30 times per second in sync with the rate at which the video stream outputs each frame providing a high fidelity render.  Points that are too far in the distance have the alpha component of the diffuse set to zero making them invisible to the user. This removes the backdrop from the rendered mesh creating an effective contour.

The advantage in converting camera stream data into vertex data is that you have direct access to a 3D representation of any object in front of the camera, and the possibility exists to apply reduction and optimisation algorithms from the 3D world that could never have been used on a 2D stream.

Voice Over In Pain

Here Is a summary of my attempt to get voice networking into my app. I tried to compile Linphone SDK on Visual Studio, no joy, an old VS2008 project I found on Google Code, no joy, LibJingle to see if that would help, no joy, checked out and attempted to compile MyBoghe, many dependency errors (no joy). After what was about 6 hours of fruitless toil, I found myself looking closer to home. Turns out Dark Basic pro released a module many moons back called DarkNET which provides full TCP/UDT networking commands, and yes you guessed it, built-in VOIP commands!  A world of pain has been reduced to about six commands that are fully compatible with the language I am using. Once I discovered this, my conferencing app came on in leaps and bounds.

Signing Off

As promised I have kept my blog shorter this week. I hope you liked the app and video, and please let me know if you would like blog five to include lots of source code. Next week is my last week for finishing app functionality, so we shall see VOIP (so you can hear and speak in the conference call) and optimisations and compatibility testing so you can run the app in a variety of scenarios.  Given the time constraints, I am aiming to limit the first version of the app to two users in order to cram as much fidelity into the visuals as possible.  This will also give me time to refine and augment the Perceptual Computing elements of the app, and show off more of what the Gesture Camera can do.

P.S. Hope Sascha is okay after sitting on all those wires. Ouch!

Source Code For Swipe Tracking

A request in the comments section of the sister blog to this one suggested some source code would be nice. I have extracted the best 'Perceptual Bit' from week four which is the swipe gesture detection. As you will see, it is remarkably simple (and a bit of a fudge), but it works well enough most of the time. Here you go:


// track left/right sweeps
if ( bHaveDepthCamera )
{
if ( iNearestX[1]!=0 )
{
iNearestY[1] = 0; 
if ( iSwipeMode==0 && iNearestX[1] > 160+80 )
{
iSwipeMode=1;
iSwipeModeLives=25;
}
if ( iSwipeMode==1 && iNearestX[1] > 160+0 )
{
if ( iNearestX[1] < 160+80 )
{
iSwipeModeLives=25;
iSwipeMode=2;
}
}
else
{
iSwipeModeLives--;
if ( iSwipeModeLives < 0 ) iSwipeMode=0;
}
if ( iSwipeMode==2 && iNearestX[1] > 160-80 )
{
if ( iNearestX[1] < 160+0 )
{
iSwipeModeLives=25;
iSwipeMode=3;
}
}
else
{
iSwipeModeLives--;
if ( iSwipeModeLives < 0 ) iSwipeMode=0;
}
if ( iSwipeMode==3 && iNearestX[1] < 160-80 )
{
// swiped
iNearestY[1] = 5; 
iSwipeMode = 0;
}
else
{
iSwipeModeLives--;
if ( iSwipeModeLives < 0 ) iSwipeMode=0;
}
}
}

Monday, 4 March 2013

Lee Going Perceptual - Part Three


Gazing At The Future

Welcome back to my humble attempt to re-write the rule book on teleconferencing software, a journey that will see it dragged from its complacent little rectangular world. It’s true we've had 3D for years, but we've never been able to communicate accurately and directly in that space. Thanks to the Gesture Camera, we now have the first in what will be a long line of high fidelity super accurate perceptual devices.  It is a pleasure to develop for this ground breaking device, and I hope my ramblings will light the way for future travels and travellers. So now, I will begin my crazy rant.

Latest Progress


You may recall that last week I proposed to turn that green face blob into a proper head and transmit it across to another device. The good news is that my 3D face looks a lot better, and the bad news is that getting it transmitted is going to have to wait.  In taking the advice of judges, I dug out a modern webcam product and realised the value-adds where nothing more than novelties. The market has stagnated, and the march of Skype and Google Talk do nothing more than perpetuate a flat and utilitarian experience.

I did come to appreciate however that teleconferencing cannot be taken lightly. It’s a massive industry and serious users want a reliable, quality experience that helps them get on with their job. Low latency, ease of use, backwards compatibility and essential conferencing features are all required if a new tool is to supplant the old ones.

Voice over I.P Technology

I was initially temped to write my own audio streaming system to carry audio data to the various participants in the conferencing call, but after careful study of existing solutions and the highly specialised disciplines required, I decided to go the path of least resistance and use an existing open source solution. At first I decided to use the same technology Google Talk uses for audio exchange but after a few hours of research and light development, it turns out a vital API was no longer available for download, mainly because Google had bought the company in question and moved the technology onto HTML5 and JavaScript. As luck would have it, Google partnered with another company who they did not buy called Linphone, and they provide a great open source solution that is also cross platform compatible with all the major desktops and mobiles.  

https://www.linphone.org/

A long story short, this new API is right up to date and my test across two Windows PCs, a Mac and an iPad in four way audio conferencing mode worked a treat.  Next week I shall be breaking down the sample provided to obtain the vital bits of code needed to implement audio and packet exchange between my users. As a bonus, I am going to write it in such a way that existing Linphone client apps can call into my software to join the conference call, so anyone with regular webcams or even mobile phones can join in. I will probably stick a large 3D handset in the chair in place of a 3D avatar, just for fun.

On a related note, I have decided to postpone even thinking about voice recognition until the surrounding challenges have been conquered. It never pays to spin too many plates!

Gaze Solved? – Version One

In theory, this should be a relatively simple algorithm. Find the head, then find the eyes, then grab the RGB around the eyes only. Locate the pupil at each eye, take the average, and produce a look vector. Job’s a good one, right? Well, no. At first I decided to run away and find a sample I once saw at an early Beta preview of the Perceptual SDK which created a vector from face rotation which was pretty neat. Unfortunately that sample was not included in Beta 3, and it was soon apparent why. On exploring the commands for getting ‘landmark’ data, I noticed my nose was missing. And more strikingly, all the roll, pitch and yaw values where empty too. Finding this out from the sample saved me a bucket load of time had I proceeded to add the code to my main app first. Phew. I am sure it will be fixed in a future SDK (or I was doing something silly and it does work), but I can’t afford the time to write even one email to Intel support (who are great by the way). I needed Gaze now!

I plumbed for option two, write everything myself using only the depth data as my source.  I set to work and implemented my first version of the Gave Algorithm. I have detailed the steps in case you like it, and want to use it:
  1. Find the furthest depth point from the upper half of the camera depth data
  2. March left and right to find the points at which the ‘head’ depth data stops
  3. Now we know the width of the head, trace downwards to find the shoulder
  4. Once you have a shoulder coordinate, use that to align the Y vector of the head
  5. You now have a stable X and Y vector for head tracking (and Z of course)
  6. Scan all the depth between the ears of the face, down to the shoulder height
  7. Add all depth values together, weighting them as the coordinate moves left/right
  8. Do the same for top/bottom weighting them with a vertical multi-player
  9. You are essentially using the nose and facial features to track the bulk of the head
  10. Happily, this bulk determines the general gaze direction of the face
  11. You have to enhance the depth around the nose to get better gaze tracking

I have included my entire source code to date for the two DBP commands you saw in the last blog so you can see how I access the depth and colour data, create 3D constructs and handle the interpretation of the depth information.  This current implementation is only good enough to determine which corner of the screen you are looking at, but I feel with more work this can be refined to provide almost pinpoint accurate gazing.


Interacting with Document

One thing I enjoyed when tinkering with the latest version was holding up a piece of paper, maybe with a sketch on it, and shout ‘scan’ in a firm voice.  Nothing happened of course, but I imaged what could happen. We still doodle on paper, or have some article or clipping during a meeting. It would be awesome if you could hold it up, bark a command, and the computer would turn it into a virtual item in the conference. Other attendees could then pick it up (copy it I guess), and once received could view it or print it during the call. It would be like fax but faster! I even thought of tying in your tablet to the conference call too, so when a document is shared, it instantly goes onto a tablet carousel so everyone who has a tablet can view the media. It could work in reverse too, so you could find a website or application, and then just wave the tablet in front of the camera, the camera would detect you are waving your tablet and instantly copy the contents of the tablet screen to the others in the meeting.  It was around this time I switched part of my brain off so I could finish up and record the video for your viewing pleasure.

Developer Tips

TIP 1 : Infra-red gesture cameras and 6AM sun rise do not mix very well. As I was gluing Saturday and Sunday together, the sun’s rays blasted through the window and disintegrated my virtual me. Fortunately a wall helped a few hours later.  For accurate usage of the gesture camera, ensure you are not bathed in direct sunlight!


TIP 2 : If you think you can smooth out and tame the edges of your depth data, think again. I have this one about five hours of solid thought and tinkering, and I concluded that you can only get smoothing by substantially trimming the depth shape. As the edges of a shape leap from almost zero to full depth reading, it is very difficult to filter or accommodate it. In order to move on, I moved on, but I have a few more ideas and many more days to crack this one. The current fussy edges are not bad as such, but it is something you might associate with low quality and so I want to return to this. The fact is the depth data around the edges is very dodgy, and some serious edge cleaning techniques will need to be employed to overcome this feature of the hardware.

The Code

Last week you had some DBP code. This week, try some C++. Here is the code which shows some pretty horrible unoptimised code, but it's all there and you might gleam some cut and paste usage from something that's been proved to compile and work:


Next Time

Now I have the two main components running side by side, the 3D construction and the audio conferencing, next week should be a case of gluing them together in a tidy interface. One of the judges has thrown down the gauntlet that the app should support both Gesture Camera AND Ultrabook, so I am going to pretend the depth camera is ‘built’ into the Ultrabook and treat it as one device. As I am writing the app from scratch, my interface design will make full use of touch when touch makes sense and intuitive use of perception for everything else.

P.S. The judges’ video blog was a great idea and fun to watch!   Hope you all had a good time in Barcelona and managed to avoid getting run over by all those meals on wheels.




Monday, 25 February 2013

Lee Going Perceptual : Part Two

So what does Lee have in store for you this week? You may wonder, and you won't have to wonder for long as I have provided a nice big video for those who don't have the stomach for the reams of text to follow.



The Promised Meat

I thought I would give you a day in the life to track the progress of the prototype as it was being built.  This part of the blog can get highly fluffy and very techie, so I advise readers to skip this part and jump past all the date stamps. If you absolutely must drill down into the basement, read on:

01:48 Dug Out A DBP Module and Cuppa

I have decided for the sake of expedience to 'graft' the PerC stuff onto the Basic3D module, the central 3D command set of DBP. I know it's hacky and not very responsible, but there is method to the madness.

I will be able to publicly share my modifications through Google Code when I have finished my PerC stuff so other DBP users can benefit from it.  It also means the contaminating code does not affect the new VS2010 build of the modules which I intend to overhaul before starting into the guts of DBP for another development.

02:44 Two New Commands and a Blob

I've created the first of what may be many DBP apps for this project, and added two new hacky commands called MAKE OBJECT PERCBLOB and UPDATE OBJECT PERCBLOB. I have added code into the module to create and modify the vertices of a basic sphere so I can see this on screen.


Now I know everything is running fine and I can see the 3D, and manipulate it, I just have to replace the mesh form with something that represents the depth data in some reasonable way.  It means if anything goes wrong, it's nothing to do with my 3D, just the data and the code that finds the data. Clever huh!

02:58 Ambitious Way

I was just about to create a basic vertices only grid then realised later on I would have to change four vertices for every coordinate in the depth buffer that changed. It is times like this that you realise taking the slightly longer route during early development will save headaches later. Going to use an index buffer and keep it to one vertex per depth reading, which will make a smaller footprint for the 3D object and make changing it MUCH easier.

03:16 Who'd Be A Programmer

It turns out 320x240 grid and six indices per face works out at over 400K in size, and a 16-bit indices buffer has a maximum size of 65535. The current DBP uses WORD index values (designed many moons ago when WORDs where faster and more memory efficient than gorging on 32-bit index buffers).  The natural step is to go back to vertex only where I can have a very large mesh. That said, I don't think I will need (or want) to use the entire 320x240 depth area for the final 3D object (unless the subject is extremely fat and wide). As I am both, if I can squeeze myself into a single indice buffer it should be good for most users.  I can have 10,922 faces, which works out at roughly 45x40 capture area which is okay to start with.  If I had more time, I would just bite the bullet and step up to DirectX 11 and get mucky with tessellation and geometry shading tricks but I don't have the luxury of time here. I will proceed with my little 45x40 grid and basically detect the best place to grab the depth data from as it's probably not far from the actual area I am interested in.

03:29 Smug Mode Times Two

I adjusted the size and my entire vertex+index construction code compiled and ran first time perfectly (almost). When I ran the app there was no 3D object. Thanks to the fact I added a CONTROL CAMERA WITH ARROWKEYS to my DBP program, I was able to go for a short walk to view the other side of my object. And presto, there it was. The face culling winding order was reversed, that's all.  I could have been hammering at that for ages wondering where my 3D went, but after over ten years of messing with 3D graphics I know that old chestnut!

03:44 Make Short Video

I've just made a short video of the current state of the DBP app, with the wibbly wobbly 3D grid, ready to have depth data added onto it. 


video

Adding the Depth Data is a tense moment as I will be adding lots of new headers and dependencies from the SDK sample code, and many wonderfully bad things can happen.  Fingers crossed, except mine as I need them for typing.

04:24 HOT TIP : Warning For PerC SDK C++ Users

Been banging my head against a wall wondering why my cut and paste code is not linking properly, and decided to trace the decorated name it was creating with that used by the Perc SDK Util Library. Turns out you cannot use a project that switches off wchar_t as a unique type as this confuses the linker. This option can be changed under "Project Properties>C/C++/Language/Treat WChar_t as Builtin type", it can also be changed via the "/Zc" option.

07:13 Way Too Much Fun

I just want to report that I became absorbed with the 3D version of myself. Once I got the basic representation of depth working in 3D, I just kept going. Adding normalisation, averaging vertices, tweaking depth scope and scale, all sorts of tweaks. Have to stop though as I needed to produce a blog video.

The Final DBP Code So Far

Rem Project: PercA.exe

Rem Created: Saturday, February 23, 2013
`
rem App Init
sync on : sync rate 0 : color backdrop rgb(255,128,0)
`
rem Make 3D floor
make matrix 1,1000,1000,100,100
position matrix 1,-500,-55,-500
`
rem Make 3D Object
load image "brick.png",1
make object percblob 1,50,50,50
set object cull 1,0
`
rem Place camera
position camera 0,0,75
point camera 0,0,0
`
rem Add lights
make light 1 : set directional light 1,0,0.9,-0.1 : color light 1,512,255,0
make light 2 : set directional light 2,0,-0.9,-0.1 : color light 2,0,255,512
make light 3 : set directional light 3,-0.5,0.0,-0.2 : color light 3,-100,255,-100
`
rem Main loop
do
 `
 rem Move camera around
 control camera using arrowkeys 0,1,2
 set point light 0,camera position x(),camera position y(),camera position z()
 set light range 0,200
 `
 rem Each cycle refresh 3D data in object
 update object percblob 1,0,0,0
 `
 rem Prompts
 set cursor 0,0
 print screen fps();"fps"
 if spacekey()=1 then texture object 1,1
 if returnkey()=1 then texture object 1,0
 `
 rem Update screen
 sync
 `
rem End loop
loop

The Progress In Brief

So what you have seen is the depth data from the Perceptual Camera used to generate a 3D construct in a prototype that allows me to move around the object and view it from different angles.

I discovered that there is enough fidelity in the depth data to create a good face shape and with further work I can produce more striking 3D elements. I am also happy with the speed of everything so far, and will probably stick to detecting my own gestures and intents using this raw data too.

The technical side of getting the SDK into the DBP module was painless, and apart from a sticky moment with the wchat_t type, quite easy. I now have a good foundation on which I can add DBP BASIC and C++ willy nilly to solve the various challenges ahead.

The Prototype Binary

If you have a Perceptual Camera all set up, you are welcome to try the prototype yourself. Find it at the end of this link:


I will offer the disclaimer this has only been tested on one machine and is not guaranteed to work on any other. If it does, please comment and let me know as I need all the testers I can get.

Signing Off

The next step will be to round off the 3D object so it's a real head instead of a rubber wall with protrusions.  I also want to get it textured and get it transmitted to another client sooner rather than later.  It's always a good idea to get the main chunks of your functionality in early so you know what general shape your app is going to be.  I am happy with the shape of the app at week two, and looking forward to see my Perceptual 3D gubbins evolve.

Monday, 18 February 2013

Lee Going Perceptual : Part One

Lee Going Perceptual : Part One

Welcome to the start of another exciting adventure into the weird and wonderful world of cutting edge software development. Beyond these walls lurk strange unfathomable creatures with unusual names, ready to tear the limbs of any unsuspecting coder careless enough to stay on the path.



For me, this is a journey for off-piste adventurers, reckless pioneers and unconventional mavericks. We're off the chain and aiming high so prepare yourself, it's gonna get messy!

A Bit About Me

My name is Lee Bamber, co-founder of The Game Creators (circa 1999). I have been programming since the age of nine and thanks to my long tenure as a coder, brands now include DarkBASIC, The 3D Gamemaker, FPS Creator, App Game Kit and soon to be developed; FPSC-Reloaded. I have been invited by Intel to participate in a challenge to exploit cutting edge technologies, which as we say in the UK is right up my cup of tea.


The Challenge



The basic premise of the challenge is to put six elite developers (and me) into a 'room' and ask them to create an app in seven weeks that takes full advantage of either a convertible Ultrabook and/or gesture camera. We will be judged on our coding choreography by a panel of hardened industry experts who expect everything and forgive nothing. At the end, one coder will be crowned the 'Ultimate Coder' and the rest will be consigned to obscurity.



The Video Blog

I will be posting video versions of my blog every week to communicate with expansive arm gestures and silly accents what I cannot describe in text. Hopefully amusing, sometimes informative, certainly low-budget.




The Real Blog


I do like to blog, but a video of me can be too much for the mind to cope with, so I have provided copious amounts of text as a safer alternative. In these pages you will learn about the technical details of the project, any useful discoveries I make and the dangers to be avoided.


The Idea

For this challenge I want to create a new kind of Web Cam software, perhaps even the sort of app you would find bundled with the hardware when you buy the product. Hardware manufacturers only bundle software that has mass market appeal showing off the best of what their device has to offer. Rather than shoe-horn the technology into something I was already doing, or come up with crazy ideas around what I could do with these wonderful new toys, I wanted to produce a relevant app. An app that users want, something that relates this new hardware to the needs of the human user, not the other way around. If my app can fix an existing problem, or improve a situation, or open a new door, then I will have created a good app.


The Perceptual Computing Myth

Forget the movies! That scene out of such and such was not designed with good computer interaction in mind, it was created to entertain. We all know large physical keyboards are better for writing blogs than virtual keyboards or voice dictation. Simple fact. Ask Hollywood for futuristic keyboard and they'd replace it with a super-intelegent robot, writing the blog for you and correcting your metaphors.

In the real world, we like stuff that 'just works'. The better it works for us, the more we like it. The keyboard works so well we've been using it for over 140 years, but only for writing text. You would not, for example, use it to peel potatoes. Similarly, we would not use Perceptual Interfaces to write a blog, nor would we use it to point at something right in front of our nose, we'd just reach out and touch it.

Context is king, and just as you would not chop tomatoes on your touch tablet, there will be many scenarios where you would not employ Perceptual Computing. Deciding what those scenarios are, and to what degree this new technology will improve our lives remains to be seen. What I do know is that app developers are very much on the front line and the world is watching!


The Development Setup

To create my masterpiece, I have a few tools at my disposal.  My proverbial hammer will be the programming language Dark Basic Professional. It was designed for rapid software development and has all the commands I need to create anything I can dream up.

I will be using an Ivybridge-based Desktop PC running at 4.4Ghz for the main development and a Creative Gesture Camera device for camera and depth capture. 




The Gesture Camera & SDKs

I have created a quick un-boxing video of the Perceptual device I will be using, which comes with a good sized USB cable and handy mounting arm which sits very nicely on my ageing Sony LCD.



The SDKs used will be the Intel Perceptual Computing SDK Beta 3 and the companying Nuance Dragon voice SDK.


The Convertible Ultrabook

To test my app for final deployment and for usage scenarios, I will be using the new Lenovo Ideapad Yoga 13. This huge yet slim 13 inch Ultrabook converts into a super fast touch tablet, and it will be interesting to see how many useful postures I can bend the Ultrabook into over the course of this competition.  Here is a full un-boxing video of the device.



I also continued playing with the Yoga 13 after the un-boxing and had a great time with the tablet posture. I made a quick video so you can see how smooth and responsible this form factor was. Very neat.






The State Of Play


As I write this, there is no app, no design and no code. I have a blank sheet of paper and a few blog videos. The six developers I am competing against are established, seasoned and look extremely dangerous. My chances of success are laughable, so given this humorous outcome, I'm just going to close my eyes and start typing. When I open them in seven weeks, I'll either have an amazing app or an amazing lemon.



My Amazing Lemon

Allow me now, with much ado, to get to the point. The app I am going to create for you today will be heralded as the next generation of Web Cam software. Once complete, other webcam software will appear flat and slow by comparison. It will revolutionise remote communication, and set the standard for all web camera software.

The basic premise will be to convert the depth information captured from the Gesture Camera and convert it to a real-time 3D mesh. It will take the colour information from the regular camera output and use this to create a texture for the 3D mesh. These assets are then streamed to a client app running on another computer where the virtual simulation is recreated. By controlling the quantity of data streamed, a reliable visual connection can be maintained where equivilant video streaming techniques would fail.  Additionally, such 3D constructs can be used to produce an augmented virtual environment for the protagonists, effectively extracting the participants from the real world and placing them in artificial environments.

Such environments could include board rooms for serious teleconferencing or school rooms for remote teaching. Such measures also protect privacy by allowing you to control the degree with which you replace the actual video footage, from pseudo realistic 3D to completely artificial. You could even use voice recognition to capture your voice and submit the transcript to those watching your webcam feed, protecting your identity further.

At that's just the start. With real-time access to the depth information of the caster, you can use facial tracking to work out which part of the 3D scene the speaker is interested in. The software would then rotate the camera to focus in on that area, much like you would in real life.  Your hand position and gestures could be used to call up pre-prepared material for the web cast such as images, bullet points and video footage without having to click or hunt for the files. Using voice recognition, you could bring likely material to the foreground as you speak, and use gestures to throw that item into the meeting for the rest of the group to see.

Current web cam and web casting technologies use the camera in an entirely passive way. All interaction is done with keyboard and mouse. In the real world you don't communicate with other humans by pressing their buttons and rolling them around on the floor (er, most of the time). You stand at arms length and you just talk, you move your arms and you exchange ideas. This is how humans want things to work.

By using Perceptual Computing technology to enable this elevated form of information exchange, we get closer to bridging the gap between how humans communicate through computers to other humans.




Signing Off

Note to judges, quality development is one part inspiration and ten parts iteration. If you feel my blog is too long, too short, too wordy, too nerdy or too silly, or my app is too ugly, too confusing, too broken or too irrelevant, I insist you comment and give me your most candid response. I work with a team who prize brutal honesty, with extra brutal. Not only can I handle criticism, I can fold it like origami into a pleasing shape.

Congratulations! You have reached the end of my blog post.  Do pass go, do collect £200 and do come back next week to hear more tantalising tales of turbulence and triumph as I trek through trails of tremendous technology.

NOTE: This blog is also published officially on the IDZ site at: http://software.intel.com/en-us/blogs/2013/02/17/ultimate-coder-challenge-ii-lee-going-perceptual-part-one