paperPrimary sourceVerified

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova · 2018

Google AI Language

BERT — bidirectional Transformer pre-training via masked language modeling. Defined the pretrain-then-finetune recipe that dominated NLP until decoder-only LLMs took over.

Metadata

Type: paper
Credibility: Primary source
Language: en
Publication date: October 11, 2018
Organization: Google AI Language
Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
URL: https://arxiv.org/abs/1810.04805
Last verified: April 12, 2026

Cited in 0 articles

Not cited in any article yet.