The holy grail of computational neuroscience: Invariance

There are quite a few problems that computational neuroscientists need to solve in order to achieve a true theoretical understanding of biological intelligence.  But I’d like to talk about one problem that I think is the holy grail of computational neuroscience and artificial intelligence: the quest for invariance. From a purely scientific and technological perspective I think this is a far more important and interesting problem than anything to do with the “C-word”: Consciousness. 🙂

Human (and animal) perception has an extraordinary feature that we still can’t fully emulate with artificial devices. Our brains somehow create and/or discover invariances in the world. Let me start with a few examples and then explain what invariance is.

Invariance in vision

Think about squares. You can recognize a square irrespective of it’s size, color, and position. You can even recognize a square with reasonable accuracy when viewing it from an oblique angle. This ability is something we take for granted, but we haven’t really figured it out yet.

Now think about human faces. You can recognize a familiar face in various lighting conditions, and under changes of facial hair, make-up, age, and context. How does the brain allow you to do things like this?

Invariance in hearing

Think about a musical tune you know well. You will probably be able to recognize it even if it is slowed down, sped up, hummed, whistled, or even sung wordlessly by someone who is tone-deaf. In some special cases, you can even recognize a piece of music from its rhythmic pattern alone, without any melody. How do you manage to do this?

Think about octave equivalence. A sound at a particular frequency sounds like the same note as a sound at double the frequency. In other words, notes an octave apart sound similar. What is happening here?

What is invariance?

How does your brain discover similarity in the midst of so much dissimilarity? The answer is that the brain somehow creates invariant representations of objects and patterns. Many computational neuroscientists are working on this problem, but there are no unifying theoretical frameworks yet.

So what does “invariance” mean? It means “immunity to a possible change”. It’s related to the formal concept of symmetry. According to mathematics and theoretical physics, an object has a symmetry if it looks the same even after a change. a square looks exactly the same if you rotate it by 90 degrees around the center. We say it is invariant (or symmetrical) with respect to a 90 degree rotation.

Our neural representations of sensory patterns somehow allow us to discover symmetries and using them for recognition and flexible behavior. And we manage to do this implicitly, without any conscious effort. This type of ability is limited and it varies from person to person, but all people have it to some extent.

Back to the examples

We can redefine our examples using the language of invariance.

 

  • The way human represent squares and other shapes is invariant with respect to rotation, as well as with respect to changes in position, lighting, and even viewing angle.
  • The way humans represent faces is invariant with respect to changes in make-up, facial hair, context, and age. (This ability varies from person to person, of course.)
  • The way humans represent musical tunes is invariant with respect to changes in speed, musical key, and timbre.
  • The way humans represent musical notes is invariant with respect to doubling of frequency ( which is equivalent to shifting by an octave.)


All these invariances are partial and limited in scope, but they are still extremely useful, and far more sophisticated than anything we can do with artificial systems.

Invariance of thought patterns?

The power of invariance is particularly striking when we enter the domain of abstract ideas — particularly metaphors and analogies.

Consider perceptual metaphors. We can touch a surface and describe it as smooth. But we can also use the word “smooth” to describe sounds. How is it that we can use texture words for things that we do not literally touch?

Now consider analogies, which are the more formal cousins of metaphors. Think of analogy questions in tests like the GRE and the SATs. Here’s an example

Army: Soldier :: Navy : _____

The answer is “Sailor”.

These questions take the form “A:B::C:D”, which we normally read as “A is to B as C is to D”. The test questions normally ask you to specify what D should be.

To make an analogy more explicit, we can re-write it this way: “R(x,y) for all (x,y) =  (A,B) or (C,D)”.  The relation “R” holds for pairs of words (x,y), and in particular, for pairs (A,B) as well as (C,D).

In this example, the analogical relationship R can be captured in the phrase “is made up of”. An army is made up of soldiers and a navy is made up of sailors. In any analogy, we are able to pick out an abstract relationship between things or concepts.

Here’s another example discussed in the Wikipedia page on analogy:

Hand: Palm :: Foot: _____

The answer most people give is “Sole”. What’s interesting about this example is that many people can understand the analogy without necessarily being able to explain the relationship R in words. This is true of various analogies. We can see implicit relationships without necessarily being able to describe them.

We can translate metaphors and analogies into the language or invariance.

 

  • The way humans represent perceptual experiences allows us to create metaphors that are invariant with respect to changes in sensory modality. So we can perceive smoothness in the modalities of touch, hearing and other senses.
  • The way humans represent abstract relationships allows us to find/create analogies that are invariant with respect to the particular things being spoken about. The validity of the analogy R(x,y) in invariant with respect to replacing the pair (x,y) with (A,B) or (C,D).


The words “metaphor” and “analogy” are essentially synonyms for the word “invariant” in the domains of percepts and concepts. Science, mathematics and philosophy often involve trying to make explicit our implicit analogies and metaphors.

Neuroscience, psychology and cognitive science aim to understand how we form these invariant representations in the first place. In my opinion doing so will revolutionize artificial intelligence.

 



Further reading:

I’ve only scratched the surface of the topic of invariance and symmetry.

I talk about symmetry and invariance in this answer too:

Mathematics: What are some small but effective theses or ideas in mathematics that you have came across? [Quora link. Sign-up required]

I talk about the importance of metaphors in this blog post:

Metaphor: the Alchemy of Thought

I was introduced to many of these ideas through a book by physicist Joe Rosen called Symmetry Rules: How Science and Nature Are Founded on Symmetry. It’s closer to a textbook that a popular treatment, but for people interested in the mathematics of symmetry and group theory, and how it relates to science, this is an excellent introduction. Here is a summary of the book: [pdf]

Relatively recent techniques such as deep learning have helped artificial systems form invariant representations. This is how facial recognition software used by Google and Facebook work. But these algorithms still don’t have the accuracy and generality of human skills, and the way they work, despite being inspired by real neural networks, is sufficiently unlike real neural processes that these algorithms may not shed much light on how human intelligence works.


 

Notes:

This post is a slightly edited form of a Quora answer I wrote recently.

In the comments section someone brought up the idea that some invariants can be easily extracted using Fourier decomposition. This is what I said is response:

Good point. Fourier decomposition is definitely part of the story (for sound at the very least), but it seems there is a lot more.

Some people think that the auditory system is just doing a Fourier transform. But this was actually shown to be partially false a century ago. The idea that pitch corresponds to the frequencies of sinusoids is called Ohm’s acoustic law.

From the wiki page:

 

For years musicians have been told that the ear is able to separate  any complex signal into a series of sinusoidal signals – that it acts as  a Fourier analyzer.  This quarter-truth, known as Ohm’s Other Law, has served to increase  the distrust with which perceptive musicians regard scientists, since it  is readily apparent to them that the ear acts in this way only under  very restricted conditions.
—W. Dixon Ward (1970)


This web page discusses some of the dimensions other that frequency that contribute to pitch:

Introduction to Psychoacoustics – Module 05

There are interesting aspects of pitch perception that render the Fourier picture problematic. For example, there is the Phenomenon of the missing    fundamental: “the observation that the pitch of a complex harmonic tone matches  the frequency of its fundamental spectral component, even if this component is  missing from the tone’s spectrum.”

Evidence suggests that the human auditory system uses both frequency and time/phase coding.

Missing fundamental:  “The brain perceives the pitch of a tone not only by its fundamental frequency, but also by the periodicity of the waveform; we may perceive the same pitch (perhaps with a different timbre) even if the fundamental frequency is missing from a tone.”

This book chapter also covers some of the evidence: [pdf]

” One of the most remarkable properties of the human auditory system is its ability to extract pitch from complex tones. If a group of pure tones, equally spaced in freque ncy are presented together, a pitch corresponding to the common frequency distance between the individual components will be heard. For example, if the pure tones with frequencies of 700, 800, and 900 Hz ar e presented together, the result is a complex sound with an underlying pitch corresponding to that of a 100 Hz tone. Since there is no physical energy at the frequency of 100 Hz in the complex, such a pitch sensation is called residual pitch or virtual pitch (Schouten 1940; Schouten, Ritsma and Cardozo, 1961). Licklider (1954) demonstrated that both the plac e (spectral) pitch and the residual (virtual) pitch have the same properties and cannot be auditorally differentiated.”

The status of Fourier decomposition in vision might be more controversial. Spatial frequency based models have their adherents, but also plenty of critics. One of my professors says that claiming the visual system does spatial Fourier amounts to confusing the object of study with the tools of study. 🙂 We still don’t whether and how the brain performs spatial Fourier decomposition.

A very recent paper reviews this issue:

The neural bases of spatial frequency processing during scene perception

“how and where spatial frequencies are processed within the brain remain unresolved questions.”

Vision scientists I know often talk about how the time domain cannot be ignored in visual processing.

A general point to be made is that even if we have mathematical solutions that are invariant, computational neuroscientists haven’t quite figured out how neural networks achieve such invariant representations. The quest for invariance is more about plausible neural implementation than mathematical description per se.