Many controlled systems, such as robots in open environments, traffic and energy networks, etc. are large-scale: they have many continuous variables. Such systems may also be nonlinear, stochastic, and impossible to model accurately. Optimistic planning (OP) is a recent paradigm for general nonlinear and stochastic control, which works when a model is available; reinforcement learning (RL) additionally works model-free, by learning from data. However, existing OP and RL methods cannot handle the number of continuous variables required in large-scale systems.