Abstract: Recent Transformer-based language representation techniques have commonly adopted a straightforward approach to modeling textual context as a linear sequence of successive tokens. However, ...
Abstract: There is a vast literature on representation learning based on principles such as coding efficiency, statistical independence, causality, controllability, or symmetry. In this paper we ...