MineRL : Training AI agents in Open World Environments
The task of the competition is solving the MineRLObtainDiamond-v0 environment. In this environment, the agent begins in a random starting location without any items, and is tasked with obtaining a diamond. This task can only be accomplished by navigating the complex item hierarchy of Minecraft.
The agent receives a high reward for obtaining a diamond as well as smaller, auxiliary rewards for obtaining prerequisite items. In addition to the main environment, we provide a number of auxiliary environments. These consist of tasks which are either subtasks of ObtainDiamond or other tasks within Minecraft.
The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled . In this work, we utilized InfoGAIL algorithm in MineRL task to infer the latent structure of expert demonstrations in an unsupervised way.
Above figure shows the model summary and four sub-steps to mine diamond. Although we also could not gain diamonds from the environment, we performed the best compared to the baselines previously used such as PPO, rule-based, naive BC. The future work should contain the evaluation process of how efficient the algorithm utilized the expert demonstrations and how can agents benefit from language instructions.
 William H. Guss and Codel, Cayden and Hofmann, Katja and Houghton, Brandon and Kuno, Noboru and Milani, Stephanie and Mohanty, Sharada and Liebana, Diego Perez and Salakhutdinov, Ruslan and Topin, Nicholay and others, "The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors" at NeurIPS Competition Track.
 Yunzhu Li, Jiaming Song, and Stefano Ermon, "Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs." at NeurIPS.