Artsheets for Art DatasetsR. Srinivasan, R. Denton, J. Famularo, N. Rostamzadeh, F. Diaz and B. ColemanConference on Neural Information Processing Systems (NeurIPS)

Artsheets for Datasets

Machine learning (ML) techniques are increasingly being employed within a variety of creative domains. For example, ML tools are being used to analyze the authenticity of artworks, to simulate artistic styles, and to augment human creative processes. While this progress has opened up new creative avenues, it has also created the opportunity for adverse downstream effects such as cultural appropriation (e.g., cultural misrepresentation, offense, and undervaluing) and representational harm. Many of the concerning issues stem from the training data in ways that diligent evaluation can uncover, prevent, and mitigate. As such, when developing an arts-based dataset, it is essential to consider the social factors that influenced the process of conception and design, and the resulting gaps must be examined in order to maximize understanding of the dataset’s meaning and future impact. Each dataset creator’s decision produces opportunities, but also omissions. Each choice, moreover, builds on preexisting histories of the data’s formation and handling across time by prior actors including, but not limited to, art collectors, galleries, libraries, archives, museums, and digital repositories.