Statistics for Self-attention policy architectures for reinforcement learning under partial observability