This may be a GTK+ problem and not Thunar per se, but Thunar could mitigate this perhaps.
When I have file names, they seem to organize just fine within their own contexts. "A" before "B" and "김" before "박," but when I have non-ASCII characters show up in the filename, then Thunar sorts based on the first Latin letter it finds.
"김.mp3" should come before "박.mp3" and all seems fine, but if I have a file that is "김B.mp3" and a file that is "박A.mp3", then Thunar just looks at the "A" and the "B," ignores everything else before the Latin letters and then incorrectly puts "박A.mp3" before "김B.mp3".
Thank you for the great XFCE software. I actually forgot about this until I had a bunch of Korean files together in one folder again. (Last time I just renamed all the files in Romanized characters or translated the titles into English). Then I remembered this issue.
I don't have time to look into this, though if can figure out it yourself, an MR would be welcome!
I cannot help there as I have no coding experience, or at least nothing relevant or above doing "Hello, world!" in the 1990s.
If you or anyone else works with non-ASCII characters often, can you comment on whether you have had the same issue?
Do German users, for example, get frustrated at "ẞ1.txt" coming before "Ä39.txt" or "Üß2.txt" coming before "ä5.txt"?
You need to set LC_COLLATE (Locale Categories) to enable locale-aware sorting. Since the described behavior is shared across most (all?) applications, I do not think that we should add extra code to Thunar.
Hmm, most people don't know about LC_COLLATE (me included 10 minutes back).
Some languages don't use ascii letters ... so I actually wonder why locale-aware sorting should not be the default. Though possibly I overlook some aspect ?
This reminds me of !248 (merged) which maybe is related.
@lastonestanding Thank you for the suggestion. I have thought about this and wrote notes below in a reply in the thread (independent of this LC_COLLATE aside).
You need to set LC_COLLATE (Locale Categories) to enable locale-aware sorting.
That seemed like a neat workaround, but I got the following message when I tried this command:
(thunar:94095): Gtk-WARNING **: 21:22:59.842: Locale not supported by C library. Using the fallback 'C' locale.
I imagine this requires one to have the relevant locale installed on one's operating system. In any case,
Since the described behavior is shared across most (all?) applications, I do not think that we should add extra code to Thunar.
I can understand that feeling. Thunar is just behaving like the terminal ls command, which I recently found out frustrates others who try to front-load certain folders that are not hidden by prefixing _ as in _backups or whatever name you want. This also bothers me.
Somehow, KDE's Dolphin File Manager sorts seemingly perfectly, regardless of what locales the operating system has installed.
Here are the arguments for and against adding code to Thunar:
Pro (add code to Thunar):
If you copy what Dolphin does, then the sorting works outside of the box for everyone.
If you copy what Dolphin does, then the sorting works for more than one locale at once.
I like the US English locale for many things as the default. Also, many people may operate mostly in one locale, but need sorting to work for multiple languages and scripts. Anyone multilingual where the dominant scripts differ from each other (e.g., Bulgarian speakers in Greece) would face the same problems that I have faced.
Thunar (or GTK?) already includes code to put folders first as an option, overriding the default and ls pattern of mixing files and folders and only sorting by name (or date or whatever). Adding code to change sorting behavior to include non-ASCII/non-English is not a qualitatively significant change. (Although it may be quantitatively a significant amount of work.)
Con (do not add code to Thunar):
One can install the locale for the operating system. Even if this is a huge installation and you can only have one locale, most people only work in one locale ever.
Adding code here is expending limited resources.
You may have different languages alphabetize differently within the same script: think of ña versus nb. In Anglo-America, I just ignore foreign diacritics when sorting; Chinese Pinyin tone marks would be unimportant to me, but important for Chinese. Then of course there are the Turkish Iıİi and the endless debate on whether ij is its own letter in Dutch and whether to use a digraph or just i and j.
No implementation is perfect, but a different one could still be much better than the current way, in which case Thunar should add some extra code.
The real question is whether XFCE wants such a code change or would reject it out of hand as setting a bad precedent or adding "bloat." I only bring up Dolphin as an example I know as a user. I have no idea if KDE has some overly complicated solution involving Phonon and twenty thousand dependencies.
No implementation is perfect, but a different one could still be much better than the current way, in which case Thunar should add some extra code.
A different implementation could be presented and evaluated. Based on your pro and contra arguments this appears to be a rather complex matter with various aspects. Therefore, the new behavior would have to be optional (to not upset some users).
The real question is whether XFCE wants such a code change or would reject it out of hand as setting a bad precedent or adding "bloat." I only bring up Dolphin as an example I know as a user. I have no idea if KDE has some overly complicated solution involving Phonon and twenty thousand dependencies.
Do you have any references (bug reports, merge requests, commits)? It would help to know what exactly the KDE developers have done to address this matter.
I am not tech/coding literate enough to plumb the code, but I may be able to ask KDE developers in some forum maybe. That may be the only hope for me.
Simply narrowing down what bit of KDE does what for which reason can be tricky with how many dependemcies the typical KDE program pulls in.
I respect many things about KDE and I especially commend how smoother, cleaner, and more organized 5 is compared to 4, but it is still difficult when so many kf5 packages have so many layers of dependencies, with each one pulling in so much. (Why does KWrite end up pulling in KWallet on even Debian, the distro that prides itself on separating out source packages into separate binary components all the time?)
I was a bit harsh, as the only really bad design still around is Phonon, which was made for reasons that turned out to not be relevant or likely and unnecessarily complicates the AV chain so that locating any problem in playback is onerous.