[I was asked this on Quora. Here’s a slightly modified version of my answer.]
This is an excellent question! I’m pretty sure there is not yet a definitive answer, but I suspect that the eventual answer will involve two factors:
- The visual system in humans is much more highly developed than the auditory system.
- Human cultures typically teach color words to all children, but formal musical training — complete with named notes — is relatively rare.
When you look at the brain’s cortical regions, you realize that the primary visual cortex has the most well-defined laminar structure in the whole brain. Primary auditory cortex is less structured. We still don’t know exactly how the brain’s layers contribute to sensory processing, but some theories suggest that the more well-defined cortices are capable of making more fine distinctions.
However, I don’t think the explanation for the difference between music and color perception is purely neuroscientific. Culture may well play an important role. I think that with training,— the ability to identify the exact note rather than the interval between notes — could become more common. Speakers of tonal languages like Mandarin or Cantonese are more likely to have absolute pitch, especially if they’ve had early musical training. (More on this below.)
Also: when people with no musical training are exposed to tunes they are familiar with, many of them can tell if the absolute pitch is correct or not  Similarly, when asked to produce a familiar tune, many people can hit the right pitch. . This suggests that at least some humans have the latent ability to use and/or recognize absolute pitch.
Perhaps with early training, note names will become as common as color words.
This article by a UCSD psychologist described the mystery quite well:
As someone with absolute pitch, it has always seemed puzzling to me that this ability should be so rare. When we name a color, for example as green, we do not do this by viewing a different color, determining its name, and comparing the relationship between the two colors. Instead, the labeling process is direct and immediate.
She has some fascinating data on music training among tonal language speakers:
” Figure 2. Percentages of subjects who obtained a score of at least 85% correct on the test for absolute pitch. CCOM: students at the Central Conservatory of Music, Beijing, China; all speakers of Mandarin. ESM: students at Eastman School of Music, Rochester, New York; all nontone language speakers.”
Looks like if you speak a tonal language and start learning music early, you are far more likely to have perfect pitch. (Separating causation from correlation may be tricky.)