Abstract: Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an ex-plosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new questions, contexts, and datasets. A vision model for general or-ganismal biology questions on images is of timely need. To approach this, we curate and release Tree Of Life-10m, the largest and most diverse ML-ready dataset of biology images. We then develop Bioclip, a foundation model for the tree of life, leveraging the unique properties of bi-ology captured by Treeoflife-10m, namely the abun-dance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge. We rigorously benchmark our approach on di-verse fine-grained biology classification tasks and find that BloCLIP consistently and substantially outperforms existing baselines (by 16% to 17% absolute). Intrinsic evaluation reveals that BloCLIP has learned a hierarchical representation conforming to the tree of life, shedding light on its strong generalizability.11imageomics.github.io/bioclip has models, data and code.
Loading