Low-rank tensor formats, especially the tensor train (TT) format, have emerged as a powerful tool for the solution of large scale problems. In the context of modeling dynamical systems using Koopman operators, tensor formats arise when a product basis is employed as Galerkin subspace. The use of low-rank formats is basically inevitable in this case due to the combinatorial explosion of the basis set size. In this work, we present a data-driven method to efficiently approximate the Koopman generator using the TT format. The centerpiece of the method is a TT representation of the tensor of generator evaluations at all data sites. We analyze consistency and complexity of the approach, and present extensions to two practically relevant settings, namely the use of importance sampling data, and estimation of a projected generator on coarse grained coordinates. We illustrate our findings using two examples: the first is a low-dimensional diffusion process using a model potential, the second is a coarse grained representation of the deca alanine peptide.
more