An Artificial Intelligence that learns things by recording from a baby’s head
Today I read an article about a fascinating advance in the field of artificial intelligence (AI), a study in nature that offers new perspectives on human learning, inspired by nothing less than the view of the world through the eyes of a baby.
This study, led by Wai Keen Vong and his team at New York University, focuses on an AI model that has learned to recognize objects and words like “crib” and “ball” by analyzing recordings from a mounted camera. in a baby’s head. What makes this research unique is the model’s learning method: observing a small fraction of a single infant’s life, which raises an interesting debate about how humans acquire language.
Unlike other language models like ChatGPT, which learn from billions of data points, this approach attempts to mimic the way a baby begins to understand the world. Vong points out a crucial difference: «They don’t give us Internet at birth«. This highlights an important limitation in our current understanding of language learning in humans, suggesting that AI can play a crucial role in unraveling this mystery.
The model was trained with 61 hours of recordings, capturing 250,000 words and corresponding images from the baby’s perspective, a unique window into their learning process. The technique used, contrastive learning, allowed the model to discern which images and text tend to match, providing a basis for predicting what certain words refer to.
Challenges and discoveries
The model’s success, identifying the correct object 62% of the time in specific tests, is notably higher than the 25% that would be expected by chance, and comparable to AI models trained with up to 400 million image-text pairs. Even more impressive is its ability to correctly identify never-before-seen examples of certain words, such as “apple” and “dog,” albeit with an average success rate of 35%.
This study challenges existing theories in cognitive science, which suggest that babies need innate knowledge about the functioning of language to attribute meaning to words, and offers a critique of the view of linguists such as Noam Chomsky, on the complexity and the supposed need for special mechanisms in language acquisition.
Future implications
Although drawing on the experience of a single child may raise questions about the generalizability of these findings, the study opens exciting avenues for future research. Limiting the model to static images and written text highlights the complexity of real human learning and suggests areas for additional refinements that could bring AI more closely aligned with this process.
In Mediaboooster, we have covered various technological innovations and this unique focus on machine learning not only expands our understanding of AI, but also illuminates fundamental aspects of human learning. The possibility that future improvements will make these models more congruent with the complexity of human learning is immense and represents an exciting avenue for advances in cognitive science.
References
- This AI learns language by seeing the world through a baby’s eyes https://www.nature.com/articles/d41586-024-00288-1