In my opinion the main problem is to stop thinking in terms of Computer Vision. The issues we are having now are more and more to do with AI in general and not with CV. We have a very hard time teaching computers concepts, and the notion of "concept" itself. Take books, for instance. As a child you are shown books and learn what they are. One day you encounter a comic, and to you it is a book. But once you have seen a few off them you understand you have to create a new category for them. Even if nobody tells you that comics are a subcategory of books, you can come up with it independently. Now take e-books and audiobooks. They could be seen as subcategories of books too. Yet if you had asked somebody what a book is years ago, the answer would probably have involved ink and paper. Concepts evolve with experience. That means you cannot just take a corpus of labelled data and form categories from it. You need emergence: software must be able to find new categories by itself. You need online, mostly unsupervised learning. Concepts go beyond that, too. They are fuzzy. Take "broken" for instance: if you have seen broken toys and broken glasses, how do you recognize a broken door? What about a broken TV? Harder: broken software? If you see a picture of a miniature car, how do you know it is a miniature? Because you infer the scale from context. So it is not enough to segment objects and recognize them independently from context, you need global scene understanding. These problems are not exclusive to vision, they are more core AI problems. Lots of them could be applicable to other senses. And after all, blind humans are still much better at most things than our algorithms. In the 70s people were focusing on core AI, and some thought it would solve everything. They were proved wrong when their techniques were crushed by much simpler, statistical ones on some of the problems they were expected to solve (for instance Markov models for speech recognition). So the pendulum swung to more focused, specialized, low-level research. Now I think the pendulum is swinging again. The most impressive recent results in CV (classification) involve neural networks and deep learning. What these teams have done is leverage relatively simple algorithms, massive computing power and large volumes of data to take on sophisticated, hand-tuned algorithms. And they have won by a huge margin. Looks like Peter Norvig was proved right once again. So the most important problem of CV may well be: how do we stop solving those low-level problems, and instead formulate them so that computers can do it instead?