The Race to Translate Animal Sounds Into Human Language

In 2025 we will see AI and machine learning leveraged to make real progress in understanding animal communication, answering a question that has puzzled humans as long as we have existed: “What are animals saying to each other?” The recent Coller-Dolittle Prize, offering cash prizes up to half-a-million dollars for scientists who “crack the code” is an indication of a bullish confidence that recent technological developments in machine learning and large language models (LLMs) are placing this goal within our grasp.

Many research groups have been working for years on algorithms to make sense of animal sounds. Project Ceti, for example, has been decoding the click trains of sperm whales and the songs of humpbacks. These modern machine learning tools require extremely large amounts of data, and up until now, such quantities of high-quality and well-annotated data have been lacking.

Consider LLMs such as ChatGPT that have training data available to them that includes the entirety of text available on the internet. Such information on animal communication hasn’t been accessible in the past. It’s not just that human data corpora are many orders of magnitude larger than the kind of data we have access to for animals in the wild: More than 500 GB of words were used to train GPT-3, compared to just more than 8,000 “codas” (or vocalizations) for Project Ceti’s recent analysis of sperm whale communication.

Additionally, when working with human language, we already know what is being said. We even know what constitutes a “word,” which is a huge advantage over interpreting animal communication, where scientists rarely know whether a particular wolf howl, for instance, means something different from another wolf howl, or even whether the wolves consider a howl as somehow analogous to a “word” in human language.

Nonetheless, 2025 will bring new advances, both in the quantity of animal communication data available to scientists, and in the types and power of AI algorithms that can be applied to those data. Automated recording of animal sounds has been placed in easy reach of every scientific research group, with low-cost recording devices such as AudioMoth exploding in popularity.

Massive datasets are now coming online, as recorders can be left in the field, listening to the calls of gibbons in the jungle or birds in the forest, 24/7, across long periods of time. There were occasions when such massive datasets were impossible to manage manually. Now, new automatic detection algorithms based on convolutional neural networks can race through thousands of hours of recordings, picking out the animal sounds and clustering them into different types, according to their natural acoustic characteristics.

Once those large animal datasets are available, new analytical algorithms become a possibility, such as using deep neural networks to find hidden structure in sequences of animal vocalizations, which may be analogous to the meaningful structure in human language.

However, the fundamental question that remains unclear is, what exactly are we hoping to do with these animal sounds? Some organizations, such as Interspecies.io, set its goal quite clearly as, “to transduce signals from one species into coherent signals for another.” In other words, to translate animal communication into human language. Yet most scientists agree that non-human animals do not have an actual language of their own—at least not in the way that we humans have language.

The Coller Dolittle Prize is a little more sophisticated, looking for a way “to communicate with or decipher an organism’s communication.” Deciphering is a slightly less ambitious goal than translating, considering the possibility that animals may not, in fact, have a language that can be translated. Today we don’t know just how much information, or how little, animals convey between themselves. In 2025, humanity will have the potential to leapfrog our understanding of not just how much animals say but also what exactly they are saying to each other.