| AI base of science and technology (rgznai100)

Participate in | Joe



Across the room, someone throws a ball at you and you catch it. Seems so simple, right?


In fact, to try to understand it fully, this is one of the most complex processes we’ve ever seen, let alone how to reproduce it. Creating a machine that can see around us like we do is extremely difficult, not only because computers are hard to imitate, but also because we don’t fully understand how humans do it ourselves.

That happens roughly like this: the image of the ball passes through the eyeball and lands on the retina: some basic analysis is done and sent to the brain (where the visual cortex thoroughly analyzes the image). It then sends it to the rest of the cortex, compares it to everything else it knows, categorizes it by object and dimension, and responds: Raise your hand, grab the ball (having predicted its path). The whole process takes less than a second, with almost no conscious involvement and no mistakes. Thus, reconstructing human vision is not a single problem, but a set of problems, each of which is related to the other.

Of course, no one said it would be easy. Except for that A.I. pioneer: Marvin Minsky, who in 1966 instructed a graduate student to connect a camera to a computer and describe what it saw. Poor kid: Fifty years later, we’re still doing it.

From the 1950s, formal research began in the following three areas: eye simulation (difficulty); Simulating the visual cortex (very difficult); Simulating the rest of the brain (arguably the most difficult problem ever)


see

The eye simulation is one of our biggest areas of achievement. Over the past few decades, we’ve created sensors and image processors that in some ways surpass the human eye. At the nanoscale, modern cameras with larger optical lenses and semiconductor subpixels are incredibly accurate and sensitive. The camera can also record thousands of images per second and accurately detect distances.




                                               
An image sensor in a digital camera

But while the output of these devices is highly fidelity, it is in many ways not much more advanced than the pinhole cameras of the 19th century. They only record the distribution of photons in a particular direction. Even the best camera sensors can’t identify the ball, let alone catch it.


In other words, without software, the capabilities of hardware are very limited, and that’s the biggest problem. But modern photography does offer an alternative direction.

describe

This is not meant to be a complete course in visual neuroanatomy, but rather to say that our brain responds by seeing before our mouth speaks. The brain focuses on visual tasks more than anything else, and so does the work of other cells. Billions of cells work together to extract information from the jumble of signals emitted by the retina.


Neurons fire with each other when they move quickly at a certain Angle or in a certain direction. Advanced networks aggregate these into meta-patterns: a circle, moving upwards. The other network is made up of circles that are white with red lines. Another: It’s getting bigger. An image is thus assembled from these crude but complementary descriptions.



       
The visual areas of the brain use a pattern called a histogram of directional gradients to find edges and other features

Given the complexity of these networks, early research in computer vision took a different approach: “top-down” reasoning — a book is “this way,” remember what it looks like now, unless flipped to the other side, it looks more like “this way.” A car looks like this, and when it moves, it looks like this.


It’s hard to come up with a definition that explains how the brain works, let alone models it.

You can do that for objects in a given situation, but imagine having to describe every object around you from different angles, lighting, motion changes, and many, many other things. Clearly, even to reach a child’s level of cognition requires a lot of data.


A “bottom-up” simulation of how the brain processes visual information looks more promising. A computer can perform a series of transformations on multiple images, process them into images, and distinguish edges, shadows, perspective and movement. These processes involve a lot of math and statistics, the equivalent of a computer trying to match shapes it sees to shapes it has been trained to recognize, much like our brains do.




                             
The image shown above (from Purdue University’s Electronics Lab) shows that:

By doing the calculations, the computer shows that the shape and behavior of the object are, to some extent, similar to that of other similar objects

Proponents of a bottom-up structure might say “I told you so.” In recent years, building and running artificial neural networks has been impractical because they require a lot of computation. Advances in parallel computing have broken through those barriers, and in the past few years there has been an explosion of research into simulating the brain with systems that are very similar to those in our own brains. The process of pattern recognition is accelerating and we are making more progress every day.


understand

Of course, you could build a system that could recognize any kind of apple, from any Angle, in any situation, stationary or moving, with a bite, in any situation. But it can’t recognize oranges. It can’t even tell you what an apple is, whether it’s edible, how big it is, or what it’s used for.


The problem is that good software and hardware are useless without an operating system.


Artificial intelligence and control




For us, it’s our brains: short-term and long-term memories, input from other senses, attention and cognition, lessons internalized by eons of evolution, written into the brain’s neural networks in a way we barely understand, more complex than anything we’ve encountered before.


The future of computer vision lies in integrating the concrete and powerful systems that have been created with broader systems.

This is where the cutting edge of computer science meets artificial intelligence more generally, and it’s an area we’re working on. Computer scientists, engineers, psychologists, neurologists and philosophers have failed to find any definition of how the brain works, and simulation has been left out of the discussion.

But that doesn’t mean we’re desperate. The future of computer vision lies in integrating the powerful but concrete systems we create with broader systems that are more focused on conceptual understanding: context, attention, intention, etc.

In other words, despite its infancy, computer vision is very useful. It appears in the camera and recognizes faces smiling. It appears in self-driving cars and can read traffic signs and watch for pedestrians. It appears in factory robots that monitor problems and assist humans in their work. There is a long way to go before computers can see like humans. But given how much progress has changed the world so far, it would be amazing if that day ever came.



The author | Devin Coldewey

The original address

WTF is computer vision?