Disclaimer: I work at Moodstocks (http://www.moodstocks.com/).

Only Google engineers could really answer your questions but here are a 
few hints:

# How does it work?

Read What are some resources about the technological background of 
Google Goggles? and How does Plink Art recognize paintings? but remember 
that the technology has probably evolved since then. I would say their 
in-house descriptors are closer to SURF than SIFT.

All the image recognition work is done on the server, clients send 
images on the network. As far as I can tell they are *not* computing the 
descriptors on the client like this project 
https://github.com/Moodstocks/si... attempts to do.

# What kind of image indexing goes on behind the scenes?

That's probably the most difficult thing to know because everything goes 
on behind the scenes. Given the size of their image database they are 
almost certainly quantizing descriptors using a precomputed vocabulary, 
but that's the easy part. I cannot tell if they are able to do 
interesting things like adding images to their index instantly, without 
recomputing the whole index. Given that they are doing it with text now 
and that we have successfully done it at Moodstocks with a similar 
stack, I would say yes.

# How much space does it all consume?

Lots :) Given the kind of descriptors they use I would say space grows 
linearly with the number of images indexed, at something like 100kB per 
image. But that's a baseline that doesn't take replicas and things like 
that into account.

That being said disk space is not really an issue when you are Google. I 
guess they run it on top of GFS now. It would be a much bigger issue if 
you tried to do the same thing yourself :)

# How is it so fast?

It is not. Well, it depends what you compare it to, but technologies 
that do offline (client-side) recognition are way faster for the user.

On the server, answering a request may take only a few tens of 
milliseconds, but when you take network latency into account this 
doesn't matter: you are sending a picture over a mobile network and that 
will always be slow.

Their most impressive feat is not their speed, it is the size of their 
index.

That being said, at Google's scale, they have no other choice. You could 
guess that they would try to speed things up by compressing the size of 
the data they send but afaik they use JPEG blobs.

Now they're Google, so they are probably working hard on what's really 
important: reducing network latencies...