cross encoders and colbert

April 2, 2025


TIL about cross encoders and colbert while reading about to introduce joint attention between query and candidates to prepare for my interview.

Cross Encoders:

  • process query and document pairs together through a single transformer
  • compute relevance scores directly without creating separate embeddings
  • highly accurate but computationally expensive for large document collections
  • cannot pre-compute document representations, requiring full processing for each query
  • use case: re-ranking top-k search results from a first-pass retrieval system

ColBERT:

  • uses late interaction architecture with separate encodings for queries and documents
  • creates contextualized embeddings for each token rather than a single vector
  • performs efficient token-level interactions between query and document representations
  • enables both pre-computation of document representations and fast retrieval
  • use case: semantic search over millions of documents with better accuracy than bi-encoders

some herbs that made me very heaty today and i will avoid in the future, especially when i'm stressed

  1. 淮山 (huái shān) - Chinese Yam
  2. 党参 (dǎng shēn) - Codonopsis Root
  3. 北祈 (běi qí) - Probably a typo or local name; might mean Astragalus Root
  4. 龙眼肉 (lóng yǎn ròu) - Longan Fruit
  5. 首乌 (shòu wū) - Fo-Ti Root
  6. 当归 (dāng guī) - Angelica Sinensis
  7. 熟地 (shú dì) - Prepared Rehmannia Root
  8. 黄精 (huáng jīng) - Polygonatum
  9. 川芎 (chuān xiōng) - Ligusticum Wallichii
  10. 构纪子 (gòu jì zǐ) - Possibly Goji Berries
  11. 大枣 (dà zǎo) - Red Dates/Jujube