For such a large amount of data, I wouldn't store it all in a 
search-able datastore. I would pre-process it to extract metadata first, 
store that in a datastore and the rest on a filesystem. That's Google's 
approach with GFS, and a common practice in AWS with S3 as the 
filesystem and SimpleDB as the metadata store.

If you don't use Amazon's platform, I would consider Riak as the 
metadata store. Don't use SimpleDB if all your processing is not done in 
AWS since network latencies and bandwidth will kill you. Don't even try 
to store the data itself in SimpleDB, or at least read that first: 
http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/SDBLimits.html?r=3544 
to understand why it will be a nightmare on the ops side.

For the filesystem, you can use a distributed filesystem such as 
GlusterFS, MooseFS, Ceph or MogileFS, you could use S3 as a service if 
you don't query it often, or you could simply use a pool of storage 
nodes and use the metadata to find out which one(s) store which bit of 
data.