ChatGPT isn’t the first of its sort, and it probably won’t be the final. The masking ensures that when producing the i-th phrase, the decoder only attends to the primary i phrases of the sequence, preserving the autoregressive property important for generating coherent text. ChatGPT generates its answers one phrase at a time, Ouyang mentioned, and scans the words it’s already produced to determine on the next one.