IT Home reported on 4 month 0 that AI startup Rumi found that OpenAI was in the o0 and o0-mini models,Special Unicode characters such as Narrow No-Break Space (NNBSP, U+202F) are embedded.
Note: These characters are indistinguishable from standard spaces in normal view, but in specialized tools such as SoSciSurvey or Sublime Text, their unique code can be detected.
Rumi said that these settings did not exist in OpenAI's previous models such as GPT-4o, and these options can be removed with a simple "find and replace".It is speculated that this may be a watermark deliberately set by OpenAI.
Rumi emphasizes that this method of character detection has a very low false positive rate, but the flaws that can be easily bypassed are obvious. Another explanation is that these characters conform to typographic rules to prevent line wrapping between currency symbols and amounts or initials, possibly a habit that the model learned from the training data.
OpenAI 此前曾探索过多种水印方案,例如在 2024 年初为 DALL・E 3 图像添加 C2PA 元数据,以及 2025 年 4 月在 GPT-4o 模型上测试可见的“ImageGen”标签。
Within the industry, Google's SynthID, Microsoft's metadata embedding, and Meta's mandatory tags also reflect a focus on content provenance, but research shows that many watermarking technologies are vulnerable.