Consider the following, run on the terminal on macOS:
> touch ä > ls ä > ls * ä > ls * | hexdump 00000000 c3 a4 0a 00000003 > ls | hexdump 00000000 61 cc 88 0a 00000004
* on the command line results in the file name being expanded normally, with its normal UTF-8 character values. However, when
ls retrieves the file names in the current directory without them being given to it as parameter, for an unfathomable reason the
ä character has now been transmogrified into an
a with an UTF-8 combining diacritic. Does anybody have any idea why that's happening?
This is a bit problematic because programs that resolve file names in directories are seeing that exact same difference.
macOS applies Unicode normalization to filenames; it's done so that programs would always find the exact same file regardless of whether they're using the composed or decomposed form.
Unusually, macOS with the HFS+ filesystem uses NFD normalization, which always decomposes the characters into base + combining diacritics.
(In the new APFS, the opposite NFC format is used for better compatibility, as non-macOS systems more commonly used the precomposed characters.)