Blog Writing Guide & Template
This guide explains how to write and publish a blog post on this site. It covers the required frontmatter fields, recommended section structure, markdown formatting reference, and a complete worked example.
1. Frontmatter Reference
Every blog post must begin with a YAML frontmatter block. Here is a complete example:
---
title: "Why RLHF Works Better Than You Think"
description: "A clear-eyed look at reinforcement learning from human feedback: what it actually optimises for, when it fails, and what comes next."
date: 2024-03-15
category: Reinforcement Learning
tags: [rlhf, alignment, llm, post-training]
featured: false
---
| Field | Type | Required | Notes |
|---|---|---|---|
title |
string | Yes | Keep under 80 characters. Use sentence case. |
description |
string | Yes | 1–2 sentences. Shown in list views and SEO meta. |
date |
YYYY-MM-DD | Yes | Publication date. Posts are sorted by this field. |
category |
string | Yes | A single category (e.g. Machine Learning, Career). |
tags |
string[] | Yes | 2–5 lowercase tags without spaces. Use hyphens. |
featured |
boolean | No | Set true to feature on the home page. Defaults to false. |
2. Recommended Section Structure
A well-structured blog post typically follows this pattern:
Introduction (hook + thesis)
- Open with a concrete question, claim, or observation that the reader will care about.
- State your main argument clearly within the first three paragraphs.
- Avoid preamble ("In this post I will…").
Body (argument + evidence)
- Organise into 3–6 sections with clear
##headings. - Each section should make one coherent point.
- Use examples, code snippets, or equations to ground abstract claims.
- Keep paragraphs short (3–5 sentences).
Conclusion (synthesis + takeaways)
- Summarise what you showed, not just what you said.
- End with a concrete implication, open question, or call to action.
Key Takeaways (optional but recommended)
A bullet-point summary for readers who skim:
Key Takeaways
- RLHF optimises for reward model score, not for human preference directly.
- Reward hacking is a feature of the optimisation landscape, not a bug.
- Constitutional AI and process reward models address different failure modes.
3. Markdown Formatting Reference
Headings
## Section Heading (use for major sections)
### Subsection Heading (use for sub-points)
#### Minor heading (use sparingly)
Emphasis
**bold** for key terms on first use or critical claims.
*italic* for titles, foreign terms, or light emphasis.
`code` for inline code, model names as strings, config keys.
Code Blocks
Use fenced code blocks with a language identifier for syntax highlighting:
```python
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("amazon/nova-pro")
```
LaTeX Math
Inline math: $\mathcal{L}(\theta) = -\mathbb{E}[r_\phi(x, y)]$
Display math (on its own line):
$
\mathcal{L}_{\text{RLHF}}(\theta) = -\mathbb{E}_{(x,y) \sim \pi_\theta}\left[r_\phi(x,y)\right] + \beta \cdot D_{\text{KL}}(\pi_\theta \| \pi_{\text{ref}})
$
Images
Place images in public/images/blog/ and reference them as:

Always include meaningful alt text.
Links
[Link text](https://example.com)
[Internal link](/publications)
Tables
| Column A | Column B | Column C |
|---|---|---|
| Row 1 | Value | Value |
| Row 2 | Value | Value |
Blockquotes
Use for quotations or callout notes:
> This is a blockquote. Use it for key quotes from papers or notable claims worth highlighting.
4. Style Guidelines
Tone: Write for a technically literate audience (ML researchers, senior engineers) who are smart but busy. Be direct. Avoid hedging every claim with "it is worth noting that…"
Length: 800–2000 words for most posts. Long-form tutorials can go up to 4000 words. Short takes (300–600 words) are fine if the content warrants it.
Audience assumption: Assume the reader knows basic ML. Don't define "gradient descent" or "transformer" from scratch. Do define domain-specific terms (e.g., GRPO, verifiable rewards) on first use.
Avoid:
- Listicles with no analysis ("5 Things About LLMs")
- Hedged non-claims ("it depends")
- Passive voice throughout
- Concluding with "I hope you found this useful"
5. Worked Example
Below is a minimal but complete blog post following this guide:
---
title: "The Difference Between SFT and RLHF (and When It Matters)"
description: "Supervised fine-tuning and RLHF are often conflated. Here is a precise account of what each does, what signal each uses, and when to choose one over the other."
date: 2024-05-20
category: Reinforcement Learning
tags: [rlhf, sft, post-training, alignment]
featured: false
---
Post-training is where the capability of a base model meets the intent of its designers.
Two techniques dominate the landscape: supervised fine-tuning (SFT) and reinforcement
learning from human feedback (RLHF). They are often used together, but they solve
fundamentally different problems.
## What SFT Does
SFT maximises the likelihood of a curated set of (prompt, response) pairs. The signal
is behavioural cloning: show the model the output you want, and train it to reproduce it.
MATH_PLACEHOLDER_1_END
The limitation is obvious: you need labelled demonstrations, and the model learns to
imitate, not to judge. If your demonstrations are wrong, the model learns to be wrong.
## What RLHF Does
RLHF optimises a policy against a reward model trained on human preference comparisons.
The model is free to explore responses the demonstrators never wrote—as long as they
score well on the reward model.
This introduces a new failure mode: reward hacking. The policy finds outputs that the
reward model scores highly but that humans would not prefer. This is not a bug in RLHF;
it is a property of any optimisation process with a proxy objective.
## When to Use Each
| Scenario | Technique |
|---|---|
| Teaching a new format or style | SFT |
| Aligning to nuanced human preferences | RLHF |
| Cold-starting a new task | SFT → RLHF |
| Verifiable correctness (math, code) | RL with ground-truth reward |
## Key Takeaways
- SFT is behaviour cloning. RLHF is preference optimisation.
- Use SFT to teach format and style; use RLHF to align values and preferences.
- Reward hacking is the central challenge of RLHF, not a peripheral concern.
6. Publishing Checklist
Before saving your post, verify:
- Frontmatter is complete and valid YAML
-
dateis inYYYY-MM-DDformat - File is saved as
content/blog/your-post-slug.md - All images are in
public/images/and paths are correct - Run
npm run generate-search && npm run generate-rssafter saving