Feature Request: Add support for Windows-932/CP932/"Windows's version of Shift JIS"
Mousepad has no support for Windows-932/CP932/"Windows's version of Shift JIS."
Yes, we should do UTF-8 everywhere, stop using legacy encodings, little-endian encodings, non-ASCII-compatible encoding, difficult to search encodings, etc. I know this.
But some systems, and in my experience, smart playback devices like TVs that play subtitle files, only work with one-or-two-byte encoding schemes. Oh, and Korea and Japan probably still have oodles of government webpages and many more old files on servers that are in UHC (Korea) and Windows-932 (Japan). Yeah, they will still be kicking around for another decade at least.
True Shift JIS and Windows's version of Shift JIS (a.k.a. Windows-932, a.k.a. Windows Code page 932) are different.
Currently, Mousepad claims in open and save dialogs to handle Shift JIS, which it seemingly does.
However, there is no support for Windows-932.
How are they different? Windows-932 saves tilde ~
as tilde ~
and backslash \
as backslash \
. In UTF-8, ASCII, and Windows-932, these are 0x5C and 0x7E.
In Shift JIS, there is no ~
and no \
. Therefore, Mousepad saves any tilde ~
as 0x5C and any backslash \
as 0x7E to "preserve them." This action, re-maps them to overline ‾
and yen sign ¥
. (On display, a UTF-8 program will convert them to correctly display as overline at U+203E and yen sign at U+00A5.)
N.b., the overline and the macron are different, despite the fullwidth overline being erroneously called "fullwidth macron." Unicode is infuriating. If Unicode wants to have "no changes" rule apply to re-mappings, decomposition, codepoints, and descriptions, then Unicode people should proofread their work!
Windows operating system usually used only Windows-932. It correctly stored tilde ~
as tilde at 0x7E and backslash \
as backslash at 0x7E. However, Windows releases in Japan had a default font that made all instances of ~
look like overlines and all \
look like yen signs. This was, of course, an ultimately bad decision. The simple overline is a breaking character. (You can't use it in Romanizations as is.) And the Japanese should have gotten into the habit of writing yen signs in ASCII as -Y-
, but looking like Shift JIS was important for some reason.
Windows in Korea unfortunately, also used default fonts that made the backslash look like a won sign instead of telling Koreans to type the won sign in ASCII as -W-
, but here we are. Instead of a small changeover period when Koreans had some documents that said \3000
for a money amount and the Japanese had some documents that said \40
for a money amount, we got fonts with confusing glyphs proliferate.
Now there are billions of documents that say \240
and you have no idea which it is unless you know the context. We should really display \
as \
as much as possible and avoid using \
as a currency symbol because one's local font looks "right" ... because of such problems. As long as legacy encodings stick around (and they will stick around), getting Shift JIS and Windows CP 932 distinct and accurate is essential.
Don't bother checking out Wikipedia. The various articles on text encoding disagree with each other and have myriad technical errors from people not understanding the technical documents they read. IBM's current web page is not helpful, either. You have to have some pre-existing knowledge to know which pages are reliable references and which are unreliable.
EDIT: I meant to write "CP932" as the Windows near-equivalent of Shift JIS. I messed up the copy-pasting of the number. I did slip in a criticism of Unicode committees not proofreading and making things tricky, and I hadn't, but I'm setting down permanent standards. I still stand by what I said in avoiding Wikipedia. Even if one Wikipedia has the right info, another Wikipedia article that is linked to it may not.