Saturday, October 4, 2008

Artificial Intelligence, Language Recognition, and Babies

As you may or may not know, I have son who was born 6 (almost 7) months ago. He is just the most incredible thing. Besides excelling in the normal metrics of cuteness and snugglability, he is also doing something that all babies do, but is probably the most incredible of all--he is is learning. He is learning how to move, how to recognize patterns, how to read expressions, how to form phonemes, how parse sounds. He is training the most incredible neural network on the face of the earth to solve problems that the best minds in the world have been working on for decades and haven't come close to solving. And he's making it look easy.

How is he doing it? Why can a baby, who knows nothing of the world, who lacks a fundamental understanding of physics, biology, optics, machine learning, linguistics, language, categorization, (the list goes on...) succeed at these tasks when very bright people using supercomputers cannot? I have some theories--some from an intro linguistics class I once took, some from personal experience trying to code speech recognition, some from learning foreign languages, some from my signal processing background, and some from watching this little ball of wonder over here. For what it's worth, here are my thoughts on the matter.

First, some observations:

  1. The problems listed above (motor control, image/speech recognition, etc.) are HARD. Just because the human brain is extremely adept at solving them, let's not make the mistake of underestimating their complexity.
  2. Babies don't come out with the answers. They pretty much can't do anything in the beginning. On the other hand...
  3. Babies have a definite propensity for arriving at a solution. They may not consciously know what they are doing, but they have a pre-programmed "boot sequence" that gets them walking, talking, and causing trouble by age 2. This boot sequence is remarkably consistent between babies (no baby walks before babbling, etc.)
  4. Babies have trainers (parents) who are instrumental in their development. However, babies work on problems of their choosing--a parent aids in language acquisition, for example, but cannot get a baby to start babbling before they come to it themselves. You can see this all the time when you watch kids. They have incredible attention spans for the skills they are working on, but things outside of that range are summarily ignored.
  5. Babies do not reason their way to solutions. Reasoning comes later.
  6. Changing gears... large neural networks don't work. Small neural networks are very good at discriminating between patterns on a few set of inputs, but one can't throw 1e4 pixels into a neural network and expect to train it to recognize any old picture of a cat.
  7. A lot of skills that we think of as a single skill (say, speech recognition) are actually many interrelated skills. For example, it is certainly my experience in learning foreign languages that: a) without adequate vocabulary, I have trouble hearing the sounds that are being spoken, b) without an understanding of what a conversation is about at a high level, I have trouble knowing what words to expect, and c) without being able to hear the sounds that are being spoken, I have no clue what a conversation is about. I'm not just being silly here. There's a real, circular dependence to speech recognition that requires several skills to be developed in parallel (phoneme recognition, vocabulary, grammar, cultural expectations) in order to advance.
  8. And finally, humans have an incredible knack for categorization--grouping things by common traits, and defining groups at all sorts of levels of generality.

I believe the above observations are only consist with the idea of a modular mind with a very strong hierarchy. In order to overcome the fact that large neural networks are untrainable, the brain has to be divided into modules that trained at particular sub-tasks that require fewer inputs. The mere fact that the brain is built of neurons and that these neurons only have several inputs suggests that this must be so.

Furthermore, the division of the brain into these modules must be pre-programmed. CAT scans reveal that the same physical locations in everyone's brain are responsible for certain functions, and it's clear that reason, cognition, and other general-purpose processing in our brain are not primarily responsible for language, image recognition, or motor control (although in adults, sometimes skills like language acquisition and motor control are augmented by reasoning and cognition). Humans have a natural language instinct, and capacity for image processing and motor control that belies an underlying, inherent cerebral architecture addressing these skills.

So what is the upshot of all of this? I think that work on artificial intelligence needs to reflect strong modularity and hierarchy. While designing hierarchical processing is not hard, training the kind of multi-tiered, cross-linked, sometimes circularly dependent system that AI requires is. Why do babies go through the same boot sequence? Do the stages of child development reflect the trained of different tiers of neural networks in the brain's hierarchy? It seems possible to me that the wonder of the human brain might more than this incredible hard-wired signal processing architecture--a fundamental component might be this incredible boot sequence, taking 10-20 years to complete, that trains neural networks ranging from simple movement and stimulus response to abstract thought and language.