Better & Faster Large Language Models via Multi-token Prediction

Hong-Wei Wu published on 2024-07-18 included in category Paper Introduction

Explore Meta AI's groundbreaking Multi-Token Prediction Model. This deep dive explains how predicting multiple tokens at once can enhance LLM performance, detailing its unique architecture and clever techniques for reducing GPU memory usage. A must-read for AI and ML enthusiasts.

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

Hong-Wei Wu published on 2024-07-08 included in category Paper Introduction

Discover how to efficiently train powerful Multimodal LLMs (MLLMs). This post explores a new ICLR 2024 technique that achieves top performance, rivaling full fine-tuning, by simply adjusting the LayerNorm in attention blocks—all while saving significant GPU memory.

GAIA: A Benchmark for General AI Assistants

Hong-Wei Wu published on 2024-06-27 included in category Paper Introduction

Why do powerful AIs like GPT-4 struggle with tasks humans find easy? Dive into GAIA, the game-changing benchmark from Yann LeCun's team, and discover how it redefines the true capabilities of a General AI Assistant beyond standard tests. A must-read for anyone in AI.

ChatEval: Towards Better LLM-Based Evaluators Through Multi-Agent Debate

Hong-Wei Wu published on 2024-06-23 included in category Paper Introduction

Dive into ChatEval, an easy-to-understand multi-agent LLM framework from ICLR 2024. Learn how multiple LLM agents with unique personas debate to evaluate model outputs. An ideal starting point for understanding multi-agent systems.

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Hong-Wei Wu published on 2024-04-24 included in category Paper Introduction

Discover Meta's Branch-Train-MiX (BTX), a powerful Mixture-of-Experts (MoE) technique. Learn how it merges multiple expert LLMs into one model, solving distributed training bottlenecks and preventing catastrophic forgetting.

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Hong-Wei Wu published on 2024-04-10 included in category Paper Introduction

Learn about Sparse Upcycling, a Google Research technique for converting dense models into efficient Mixture-of-Experts (MoE). Discover how to boost AI performance and reduce training costs by leveraging existing checkpoints instead of training from scratch.

DPO：Direct Preference Optimization

Hong-Wei Wu published on 2024-02-27 included in category Paper Introduction

Discover Direct Preference Optimization (DPO), a simpler and more efficient method for fine-tuning LLMs. Learn how DPO improves upon complex RLHF by eliminating the need for a reward model and using direct supervised learning for more stable and effective results.