Component
comm
Description
uutils comm converts each output line to UTF-8 using String::from_utf8_lossy() before printing, which replaces invalid UTF-8 byte sequences with U+FFFD. GNU comm writes raw bytes directly to stdout without UTF-8 conversion, preserving byte-exact input.
Test / Reproduction Steps
echo -ne "\xfe\n\xff\n" > /tmp/a
echo -ne "\xff\n\xfe\n" > /tmp/b
comm /tmp/a /tmp/b | od -An -tx1
GNU output:
uutils output:
ef bf bd 0a 09 09 ef bf bd 0a 09 ef bf bd 0a
Impact
Non-UTF-8 text are silently corrupted in the output.
Component
comm
Description
uutils
commconverts each output line to UTF-8 usingString::from_utf8_lossy()before printing, which replaces invalid UTF-8 byte sequences with U+FFFD. GNUcommwrites raw bytes directly to stdout without UTF-8 conversion, preserving byte-exact input.Test / Reproduction Steps
GNU output:
uutils output:
Impact
Non-UTF-8 text are silently corrupted in the output.