There is a race towards language models with longer context windows. But how good are they, and how can we know?
Originally appeared here:
Evaluating Long Context Large Language Models
Go Here to Read this Fast! Evaluating Long Context Large Language Models