- Keywords: HCI, explainable AI, conversational AI, commonsense grounding, multimodal annotation, language and vision
- TL;DR: A presentation of a modular software framework designed to bridge the present gaps between dataset annotation interfaces and machine learning visualization platforms, complete with a collection of past and present research applications.
- Abstract: Artificial Intelligence (AI) research, including machine learning, computer vision, and natural language processing, requires large amounts of annotated data. The current research and development (R&D) pipeline involves each group collecting their own datasets using an annotation tool tailored specifically to their needs, followed by a series of engineering efforts in loading other external datasets and developing their own interfaces, often mimicking some components of existing annotation tools. We present a modular annotation, visualization, and inference software framework for computational language and vision research. Our framework enables researchers to set up a web interface for efficiently annotating language and vision datasets, visualizing the predictions made by a machine learning model, and interacting with an intelligent system. In addition, the tool accommodates many of the standard and popular visual annotations such as bounding boxes, segmentation, landmark points, temporal annotation and attributes, as well as textual annotations such as tagging and free form entry. These annotations are directly represented as nodes and edges as part of the graph module, which allow linking visual and textual information. Extensible and customizable as required by individual projects, the framework has been successfully applied to a number of research efforts in human-AI collaboration, including commonsense grounding of language and vision, conversational AI, and explainable AI.