Since there are many possible combinations of tools and versions for kirikiri2/kirikiriZ, this is more of an example workflow for this particular kirikiriZ title. There are also some rough spots that are not necessarily particularly well optimized. If anyone knows of ways to extract the .ks files faster, please let me know!
Part A) Introduction and determining the game engine
Part B) Translating the first line of dialogue
Part C) Translating images
Part D) Changing the choices and title
Part E) Translating game engine strings
Part F) Adding subfolders to Patch.xp3
Part G) Automating dialogue extraction, translation, and insertion 1
Part H) Automating dialogue extraction, translation, and insertion 2
Part I) Automating OCR
Part A) Introduction and determining the game engine
For this case study, the title in question is こあくまちゃんの誘惑っ!, Koakuma-chan no Yuuwaku!, which could be translated as Koakuma-chan's Temptation.
The story is about two people not-quite falling in love and then maybe doing something together after one of them maybe seduces the other, or not. Okay, well... a compelling story is not really the emphasis of this eroge apparently. Moving on.
Here is what the files look like in the folder after installation.
backup/
html/
plugin/
char.xp3
data.xp3
event.xp3
Koakuma-chan no Yuuwaku.config.cmd
Koakuma-chan no Yuuwaku.exe
manual.html
parts.xp3
readme.JPN.txt
startup.exe
startup_readme.JPN.txt
voice.xp3
To keep everything organized, I am using a base folder on the desktop named "Koakuma-chan no Yuuwaku" and putting everything related to this project underneath it in more subfolders. So most paths in this case study are relative to that one since it is serving as the main project directory. For me, the main executable for this game is under "Desktop\Koakuma-chan no Yuuwaku\bin\Koakuma-chan no Yuuwaku\". The folder can moved to other places, and it should still launch.
Here is the a short summary of the steps when converting the installation iso to the repackage.
1. Download the archive files from girlcelly's thread
2. Extract the iso contents
3. Mount the disc image using iso mounting software
4. Install the contents
5. Copy over the no-dvd patch
6. Rename the no-dvd patch to something memorable like "Koakuma-chan no Yuuwaku.exe"
7. Rename and fix the config script by adding quotes around the renamed executable
8. Archive prior to first run
So back to staring at the extracted files for Koakuma-chan. .xp3 files mean some variant of the kirikiri game engine, but let's deal with that later.
As always, the first thing to do is to first run the game and verify it actually works.
Japanese kirikiri games always need the locale set to Japanese, Japan, so right-click on "Koakuma-chan no Yuuwaku.exe" and run with it using a Japanese locale emulator. Keep clicking stuff in the user interface (ui) until the first line of the game is displayed. Then, take a screenshot or otherwise record it. The objective of the initial proof of concept is to translate that single line.
Since this is a visual novel, the go-to tool to use when translating a new visual novel is Garbro. Always try Garbro first before anything else for visual novels. Even if it cannot be used for the full workflow, it can provide important clues for the type of engine we are dealing with.
Using Garbro, open data.xp3. See the bottom left where it says "KiriKiri game engine resource archive"? That indicates the .xp3 is actually a valid kirikiri archive as opposed to a randomly developer naming their archives .xp3.
- There are also .tjs scripts which implement TJS/TJS2, a unique scripting language in the kirikiri game engine.
- k2compact is an abstraction layer unique to kirikiri games, specifically kirikiriZ.
- data.xp3\scenario\ has .ks files which is a common extension for dialogue scripts in kirikiri games.
- event.xp3 has a lot of .tlg files. .tlg files are a type of image unique to the kirikiri game engine.
Koakuma-chan no Yuuwaku.exe/TEXT/00142 also mentions "Kirikiri Z Project Contributors", and if we open up backup/こあくまちゃんの誘惑っ!.exe/RT VERSION/00001, we see this.
Code:
BLOCK "041104b0"
{
VALUE "FileDescription", "TVP(KIRIKIRI) Z core / Scripting Platform for Win32"
VALUE "FileVersion", "1.4.0.8"
VALUE "InternalName", "tvp2/win32"
VALUE "LegalCopyright", "(KIRIKIRI core) (C) W.Dee and contributors All Rights Reserved. This software is based in part on the work of Independent JPEG Group. For details: Run this program with '-about' option."
VALUE "OriginalFilename", "tvpwin32.exe"
VALUE "ProductName", "TVP(KIRIKIRI) Z core / Scripting Platform for Win32"
VALUE "ProductVersion", "1.4.0.8"
}
Based upon the above evidence, this is a kirikiri game.
吉里吉里/krkr/kirikiri is an open source game engine originally developed in Japan for visual novels.
Kirikiri has a lot of documentation, extensive support for third party plugins (.dll), is easily customizable, and has a readily available royalty free official engine sdk which together have made kirikiri the most popular engine to use for Japanese VN developers overall. Internationally, Ren'Py is more popular, but Japanese developers tend to prefer kirikiri. For Japanese VNs, if there is any game engine worth learning how to work with in detail, it is kirikiri which is why this case study is so long and detailed.
Every developer tends to use a different selection of plugins, their own snapshot of the source tree/sdk, and different versions of this or that in general, a slightly different script syntax, and often semi-custom DRM. Developers tend to also heavily customize their own games with custom functions, custom plugins, and developer written macros, so each kirikiri game engine tends to be slightly different even if kirikiri games in general share the same overall codebase.
kirikiri - The original open source game engine intended for japanese visual novels.
kirikiri2 - A complete rewrite of kirikiri. It understands kag3 and tjs2.
kirikiriZ - A newer engine based on kirikiri2.
tjs2 (.tjs) - A scripting language commonly used in kirikiri game engines. This is interpreted programming code used to write the game code.
KAG3 (.ks) - KiriKiri AdventureGame script format 3. This is used for text files and dialogue. Is a scene and UI building system written by W.Dee in TJS for KRKR2/KRKRZ. KAG3 runs .ks files and is the base system on top of which all older KiriKiri/KiriKiri2 VNs are built.
e-mote/freemote (.psb/.scn/scn.txt) - newer script format used by kirikirZ that supports utf-8.
K2Compat - A backwards compatibility layer to allow KRKR2 scripts to run under KRKRZ.
Technically, kirikiri refers only to the original kirikiri game engine which was replaced by kirikiri2 and then later kirikiriZ. However, kirikiri is functionally synonymous with both kirikiri2 and kirkiriZ since end users tend to not really care about the exact version of the underlying engine. That is more of a concern for developers and translators.
KirikiriZ is a much more modern continuation of kirikiri2 that also supports .psb/.scn/.scn.txt (e-mote/freemote) utf-8 encoded scripts and has k2compat to support for reading older kag3 (.ks) text/dialogue scripts (shift-jis, utf-16-le-bom).
Kirikiri also supports the tjs2 programming language commonly implemented inside of .tjs files which has a completely different syntax from the kag3/e-mote dialogue scripts.
Regrettably, or perhaps fortunately, a lot of the strings the game engine displays relating to the game's user interface are found in the .tjs files, so translating the UI usually means hunting around for them which tend to be developer specific. Unfortunately, sometimes UI elements will obtain the display string for the option from a .dll which makes it overly difficult/impossible to translate them. That is not the case for most translatable strings, but it does happen so just be aware of it.
Due to the presence of the k2compat folder, one could conclude that Koakuma-chan is a kirikiriZ game. kirkiri_1 and kirkiri_2 games do not have k2compat folders. That means that Koakuma-chan is using a relatively modern version of the kirikiri game engine.
But... if they wrote it using kirikiriZ, not kirikiri2, then why did the developer write the scripts in .ks instead of the e-mote format (.psb/.scn/.scn.txt)? The newer script format supports utf-8, so it seems like it would be better to write the game in that format if they intended for their work to be translated into many languages. Instead, the developer opted to use .ks scripts which are written natively in shift-jis which is an encoding that does not support all languages. Maybe it is easier to work with or something? I doubt the developer is going to tell us, so let's just move on.
The kirikiri engine also supports utf-16-le-bom/ucs-2 encoded KAG3 (.ks) files for people translating kag3 .ks files into other languages.
Even though Garbro can read the file names in .xp3 archives, the actual contents of the files may be obfuscated or encrypted. Simply extracting them as-is without verifying they were correctly rendered can and will produce files with opaque or corrupt contents.
The most direct way to check if the archives are decoded correctly is to find an image (.tlg, .jpg, .png) or text file (.txt, .ks) inside of the archive and click on it. Garbro's preview pane on the right should display the contents if the contents were decoded correctly. For text files, the file may have some mojibake until the correct encoding is selected at the top right of the preview pane. If nothing at all displays, then Garbro was not able to render the file properly which indicates a decoding error due to encryption or obfuscation or unsupported file or archive type.
If the developer obfuscated or encrypted their .xp3 files in some way, then it is necessary for Garbro to be aware of that information before it can successfully extract the correctly decoded contents. Many developers do not obfuscate or encrypt their .xp3 files.
In the case of Koakuma-chan, since Garbro does not display any contents of the .xp3 files in the preview pane at all, that means the developer did implement some sort of mechanism to obscure the contents, so we will need to figure out some way to correctly decode the files.
- Open Garbro
- Click on a .xp3
- Archive parameters - Archive content could be encrypted. Choose...
- Click on "no encryption"
- Look for "Koakuma-chan no Yuuwaku"
One minor issue is that Koakuma-chan is not on the list of games known to Garbro so it is impossible to select it.
While it is certainly possible to check every entry one by one to see if they work anyway and that can work sometimes, that is highly annoying, and I am not willing to do it. That list is just way too long. A more intelligent way is to check if the developer or the parent company of the developer has released any VNs that are known to Garbro and try those VNs in hopes of a match.
For obfuscation, developers sometimes use the same obfuscation scheme for the games they release so they do not have to rewrite complicated obfuscation algorithms again. This is especially true for older titles. For encryption, each game should use a unique encryption key which makes it simultaneously easier and harder to correctly decode the game's encrypted contents. This is more common for newer titles.
Let's check VNDB to see if any game by the developer is on the list provided by Garbro.
Clicking on CloverGame, they have only made Koakuma-chan so far, but they do have a parent company listed, so let's try the parant brand and subsidiary released titles.
Code:
Hulotte, Imouto no Okage de Motesugite Yabai, SourireCrypt
Hulotte, Kanae to Meguri to no Sonogo ga Icha Love Sugite Yabai, SourireCrypt
Hulotte, Yome Sagashi ga Hakadorisugite Yabai, HashCrypt
Hulotte, Deatte 5-fun wa Ore no Mono!, XorCrypt(0x35)
Hulotte, Ore no Sugata ga, Toumei ni!? Invisible to Suuki na Unmei, XorCrypt(0x95)
Hulotte, Ore no Cupid ga Ponkotsu Sugite Kowa~i, XorCrypt(0x0E)
Hulotte, Ore no Cupid ga Ponkotsu Sugite Kowa~i Trial, XorCrypt(0x78)
Hulotte, Ore no Hitomi de Maruhadaka! Fukachi na Mirai to Misukasu Vision, XorCrypt(0x3C)
MintCUBE, Ama Koi Syrups, HashCrypt
MintCUBE, Ninki Seiyuu [Steam], NinkiSeiyuuCrypt
MintCUBE, Yuusha to Maou to, Majo no Café, XorCrypt(0xCD)
And all of the Sonora, Sphere, and CUBE titles.
Nothing worked meaning the developer is using a unique obfuscation method or encryption with a unique encryption key, like XorCrypt with a custom hex value. At some point, I would like to figure out how those custom hex values were determined and reverse engineer Koakuma's encoding technique since that would make extracting files from Koakuma-chan and other titles faster, but for now, let's use a different tool.
Since the game itself knows how to decrypt them, one of the most future proof ways to extract assets from games and software in general is to let the software decode the assets and then extract the decoded contents afterwards from the software itself. This can be accomplished either as a memory dump/reading or with filesystem level hooks.
For kirikiriZ, there are several projects that can dump the decoded files using hooking techniques. The game should read the files, decode them properly, and the hooking software takes advantage of that to write the properly decoded file to the filesystem. This bypasses the obfuscation or encryption of the .xp3 archives without actually breaking the scheme itself.
The upsides of this hooking approach are that it can bypass schemes that Garbro does not know about without having to reverse engineer the .xp3 obfuscation or encryption scheme. That is a major upside. The downside is that this approach can only dump files as they are read which also means files cannot be dumped until they are read by the game. That means dumping the files means playing through the game until all files are read. That is a major downside. Yeah... Well, let's get started.
Here are five projects from different developers that attempt to dump files from kirikiriZ titles during read time. If you know of any more, please let me know of their existence by commenting on this thread. The same applies to faster ways to extract files from Koakuma-chan.
KrkrDump seems more recently updated, so this case study, and my actual workflow, will use it, but the other ones should also work, probably, maybe.
1. Download KrkrDump v1.4+ https://github.com/crskycode/KrkrDump/releases
2. Extract the files of v1.4.zip somewhere. I extracted it to "Desktop\Koakuma-chan no Yuuwaku\tools\KrkrDump\v1.4\"
It looks like there are two files. What does the readme say to do with these files?
KrkrDump.dll
KrkrDumpLoader.exe
README.md
The tool reads a json-based config file when it starts up. That config file should have the same name as the dll. e.g KrkrDump.json
Here is an example of a valid config file:
[...]
If your config file is ready, put KrkrDump.dll and KrkrDump.json and KrkrDumpLoader.exe in the same folder, then drag Game.exe to KrkrDumpLoader.exe
outputDirectory should probably be changed. I changed mine to
Code:
"outputDirectory": "C:\\Users\\User\\Desktop\\Koakuma-chan no Yuuwaku\\extracts\\Yuu_extracts",
Remember to include two backslashes \\ instead of one backslash \ in the path just like in the example.
Next, the instructions say to "drag Game.exe to KrkrDumpLoader.exe". That is easier if they are in the same folder, so let's copy "KrkrDumpLoader.exe", "KrkrDump.dll", and "KrkrDump.json" to the main game folder.
Immediately after doing so, a command prompt pops up and a log file with the current date gets created.
Checking the path that was specified in the .json, there are now a lot of .tlg (image) files and .tjs (script) files. Garbro can render the .tlg files using its preview window, and Notepad++ can render the .tjs files using shift-jis encoding.
There are also k2compat and system folders that have a particularly important script called "initialize.tjs". Initialize.tjs has, among another things, the archive names that the game will check to load its assets. It also usually outlines the flow of control for the program, especially for older kirikiri2 titles prior to loading the main game window or most scripts.
In particular, the name of which file(s) the game engine will check for when applying patches, if any, are usually in this file. For Koakuma-chan, that is this line of code.
Code:
AddAutoPath("Patch.xp3>");
In other words, if we can fill "Patch.xp3" with our own files, the game engine will read them in place of any existing files, including .tjs scripts. That allows for arbitrary files to load at runtime which we can use to translate the game.
Initialize.tjs has a wealth of other information that we will need to carefully analyze later line by line, but for now, I will content myself with figuring out only this part of it near the top.
It looks like the first part runs System.createAppLock() and it conditionally prints out "すでに起動しています。" which could be translated to "It is already launched.". It seems like that part exists to prevent duplicate instances from running. Trying to run a duplicate instance of "Koakuma-chan no Yuuwaku.exe" displays "すでに起動しています。" in an error box, which lends support to this hypothesis.
The second part refers to this program called Hidemaru.exe under Program Files. What is that and what does it have to do with this program's debug mode?
Searching for it in Windows Explorer shows nothing, meaning that it is not part of a standard Windows install. Searching for it online leads to an interesting plugin for VSCode, a text editor. Searching for Hidemaru Editor leads to https://hide.maruo.co.jp/ Naively clicking on "English (英語)" hides everything on the page instead of translating it. These people fail at UI design, seriously. Clicking on their Japanese site link leads to their actual home page. Searching for Hide... does not come up with anything, but highlighting their tabs at the top brings us to their software index page. Highlighting each link under their エディタ, editor, category leads to two links with hidemaru in the name of the .html files.
Translating both sites and clicking around semi-randomly gives me the impression that Hidemaru.exe is a text editor with two distributions, a normal one and a Microsoft store version, possibly of a commercial nature.
Going back to the code above, it says if(__DEBUGMODE__ != 0) which means if debug mode is enabled or not enabled, do something. 0 might represent debug mode being enabled or disabled. That something is to use System.setArgument() to add -exceptionexe and the path to the Hidemaru text editor. It also adds "-exceptionarg", "/j%line% %filepath%" which could be interpreted to mean "open this line of a particular file".
All together, those lines of code probably mean if an error occurs in debug mode, then open a text editor of the failed file name at the line that failed. Why would that be useful? To help the developer with debugging when creating the game.
Usually developers remove this sort of semi-privileged or sensitive environment specific debug code prior to launching their game since this information gives us an insight into their development environment. This developer did not do that. That does give us an insight into their mindset when coding and releasing this game.
They are not really trying to hide anything. If they are, they are not trying very hard or only trying to give the superficial impression that they are.
This information on the developer's mindset might come in handy later when we have to make probabilistic arguments in how to interpret their .tjs code.
Getting back on track, the next step is to extract a few more files. Specifically, we are looking for a file with the line that we are trying to translate from the first scene of the game. Which file is that?
In Garbro, opening up data.xp3\scenario\ lists a lot of .ks files. Those are probably the dialogue scripts we are looking to decode. Most developers tend to number their scripts sequentially, possibly excluding a prologue, meaning that we are looking for either 00_01.ks or A01_01.ks in the .json outputDirectory.
Let's go back to the game and start a new playthrough until the first line of the game is displayed and then look for any .ks files in the outputDirectory.
Here is what the log says after playing the game until it displays the first line of the first scene.
西日の射す廊下に自分の足音が響く。 looks like the screenshot earlier, so that means 00_01.ks has the first line of the game.
Before making substantial changes, it is time to repack the script to check if repackaging actually works or not.
Older versions of kirikiri sometimes had the ability to load files from folders of the same name instead of the included archive.xp3 files. That meant that all what was needed to translate them was to extract the archive.xp3 files into appropriately named folders for the game to read them. Updating the contents as needed was very simple that way since no repacking was necessary. Most modern implementations of the kirikiriZ engine do not allow this however.
Keep in mind that many other game engines do, so when looking for ways to patch games, keep in mind that recreating the archives may not even be necessary.
Despite the inability to do this on modern versions of kirikiri, it is still easy to translate games using this game engine because kirikiri includes a native way to modify the files used in the game engine through a built-in patch system.
The idea is that kirikiri has one global file repository in memory that contains pointers to the files on disk and to inside the archive.xp3 files. That file repository in memory is "flat" meaning that only one file with a given name may exist at any one time. Files added to that repository will replace any files in that repository that have the same name completely.
This allows developers who discover bugs in their games or who wish to add content to their game to update the game files in a very simple manner. This same system for patching the base game can be used as a mechanic to translate the title easily.
The most common name for archives in xp3 format to patch the game is "patch.xp3". If there are additional patches, it is common for them to be named patch2.xp3, patch3.xp3 and so on. Koakuma-chan seems to support "Patch.xp3" so let's try creating it for this title.
Normally, it is best to use the engine sdk's packing tool if it is available to generate assets related to an engine to minimize the sources of potential bugs and incompatibilities. In the case of kirikiri .xp3 archives, the game engine itself is open source and the archive format is well documented. That documentation and the open source nature of kirikiri allows third party developers to implement their tools that work with the kirikiri engine with minimal bugs. That lowers the relative importance of working with primary source material for this game engine.
In other words, we can just use Garbro to create .xp3 and have a high degree of confidence that it is creating the archives properly, unlike third party tools for other engines. Feel free to use the official tool if you prefer.
For other game engines where assets need to be replaced, the sign of successfully packing the scripts is the game loading correctly. In the case of kirikiri where assets are only updated in memory and all of the original files are untouched, there is no way to know if the changes loaded at all without some sort of visible change in the loaded files. That means we should either translate the first line to see if the translation is visible or try to produce some error, perhaps by zeroing out the file or using invalid syntax.
The point is to have some sort of easily visible sign the game is loading the asset from the Patch.xp3 and overriding the original 00_01.ks file.
Let's try translating the first line and seeing if that loads. Regardless if it translates correctly or there is a syntax error, as long as something changes, then that shows the files are getting read by the game engine.
Code:
@ファイル先頭 bg=BG02b01
@playBgm file=BGM03
@Talk name=心の声
The sound of my own footsteps echoes in the corridor in the western sun.
@Hitret id=1
@Talk name=心の声
夕方に響く、ひとりぶんの足音が嫌いだ。
@Hitret id=2
In my case, I copied the file to "Koakuma-chan no Yuuwaku\MTL\Patch\00_01.ks". Be sure to copy it. Never modify the original.
Garbro has this bug where packing files that are open in the preview window does not work even though there is technical reason why it can't just open the file in a read-only mode temporarily just long enough to create the archive. This is probably one of those "working as intended" bugs by the developer that is never going to be fixed.
The easiest way to get around it is to single click on the two dots .. at the top of the folder window to empty the preview pane and select the files using either ctrl + A to select all, ctrl + shift to select all in a row, or ctrl + left click to select individual items. Then right click on the two dots .. and create archive, or press F3.
Once the archive is created, it is possible to enter it and double check the file was created properly. Next, move Patch.xp3 to the root directory of the game and start it to see if the changes load.
When the game goes to load 00_01.ks, Koakuma-chan just restarts in a loop. Why?
It is doing something different now with the Patch.xp3, so something is happening, but what exactly and what is not working? Since there is proof now that the game tries to load the Patch.xp3 in the form of an error, let's try reverting the changes to 00_01.ks to see if it is a script error or something else.
Loading Koakuma-chan with a completely unmodified Patch.xp3\00_01.ks results in exactly the same behavior as before where it loops back to the start. That means the looping is not a result of a script error or any error introduced in the translation, but something else. What else?
Removing the Patch.xp3 makes the game load normally again, and reinserting it makes it break again. The only changes made are adding 00_01.ks and the Patch.xp3. The Patch.xp3 is unencrypted, so it is not that the game cannot read the .xp3 file, but adding it produces the bug which shows there must be some error related to the Patch.xp3 archive itself or 00_01.ks. Since the .ks file is unmodified, the error cannot be related to that. The game must be refusing to read the Patch.xp3 archive correctly.
This behavior is actually pretty common where kirikiri games refuse to read some or all of the .xp3 files unless they are obfuscated or encrypted in the way the game expects. If Garbro had an entry for Koakuma-chan, we could create the patch in the way the game expects.
Since it does not, we have to find some other way to load the patch.
For kirikiri games, one way to do this is to disable the game's requirement for only loading obfuscated or encrypted patches through a runtime patch or dedicated loader to add support for unencrypted.xp3 patches. There are a lot of ways to do this online. Some do or do not work with this or that particular game depending on various factors. Examples
- crskycode's KrkrPatch software. This only works for newer kirikiri games which means kirikiriZ only.
- bynejake's KrkrPatch and GalPatch.
- arcusmaximus's KirikiriTools
This case study will be using KirikiriTools, but if that does not work, then feel free to try the others. There is a report that seems to suggest that the KirikiriTools approach does not work with DLSite's DRM for this title. The repackaged.7z version posted above does not have this issue.
Here is some documentation from KirikiriTools.
A DLL (named "version.dll") that makes games accept unencrypted .xp3 archives. By using this file, it's no longer necessary to identify and replicate the game's encryption when trying to add/replace .xp3 files; just create an unencrypted one with the Xp3Pack tool in this repository, throw the version.dll in the game's folder, and you're done.
[...]
Xp3Pack
Creates unencrypted .xp3 archives for use with the KirikiriUnencryptedArchive DLL. Unlike other packing tools, it sets all hashes in the file table to zero; this serves as a marker for the DLL to bypass the game's decryption for those files.
Typical usage is to place Xp3Pack.exe in the game folder, create a "patch" subfolder containing the files you want to include, and run "Xp3Pack patch" from the command line. This will create a patch.xp3 in the game folder. If the game already has its own patch.xp3, name your folder "patch2" and run "Xp3Pack patch2" instead. If the game already has a patch2.xp3, name your folder "patch3", and so on.
So it looks like there are two files we need from the releases page. One named version.dll that will add support for reading unencrypted archives, and Xp3Pack.exe which can create those archives and "sets all hashes in the file table to zero; this serves as a marker for the DLL to bypass the game's decryption for those files" which is intended to work with version.dll.
Is Xp3Pack.exe really needed or does version.dll add support for reading from all unencrypted archives? Does version.dll also add support for reading from folders?
Adding version.dll with the Patch.xp3 produced by Garbro does not do any thing to the looping bug. It is still there.
Extracting the contents of Patch.xp3 into a folder named "Patch" makes the game engine ignore the files. Same with naming the folder "Patch.xp3".
Okay, enough experimenting. Let's try Xp3Pack.exe like the developer said to do. Open a command prompt (cmd.exe), drag Xp3Pack.exe into the window, and press enter.
Code:
C:\Users\User>"C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\KirikiriTools\Xp3Pack.exe"
Usage: Xp3Pack folder
C:\Users\User>
Some additional documentation at the command line interface (cli) would have been nice. Let's try following the Usage: syntax. It says to use Xp3Pack.exe a space, and then a folder. Does it matter what the folder is called? Back to experimenting.
Code:
C:\Users\User>"C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\KirikiriTools
\Xp3Pack.exe" "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\Patch"
The above creates Koakuma-chan no Yuuwaku\MTL\Patch.xp3
Code:
C:\Users\User>"C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\KirikiriTools
\Xp3Pack.exe" "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\mtl_files"
The above creates Koakuma-chan no Yuuwaku\MTL\mtl_files.xp3
- The tool seems to name the .xp3 to the folder's name and place the resulting .xp3 file next to the target folder regardless of the current location of the tool or the command prompt.
- Garbro can read the contents in the created archives using no encryption selected in the xp3 window.
- Creating a subfolder in Koakuma-chan no Yuuwaku\MTL\mtl_files results in that folder and the files in that subfolder also added in the resulting .xp3 as a subfolder meaning that Xp3Pack.exe actually respects folder structure, which is a very nice and useful feature.
Okay, enough experimenting ...for now. Copying the Patch.xp3 to the main game folder with version.dll also present results in this screenshot.
Oh right. 00_01.ks is not translated since the translation was reversed as part of the debugging process. Translating it again, creating Patch.xp3 with Xp3Pack.exe again, and copying Patch.xp3 to the game directory results in this.
Okay better. Word wrapping issues aside, that is enough to translate the scripts. I went ahead and used MTL to translate all of the dialogue lines based upon the above logic. Word wrapping was done by splitting the translation into multiple lines in the .ks file at ~51 characters up to a maximum of 3.
All of the .tlg files in Koakuma-chan no Yuuwaku\extracts\Yuu_extracts are images. Some of those image are for the UI which should be translated. Decoding them from the .xp3 means making sure the game tries to read each image in the UI at least once so they can be dumped into that folder.
Assuming that has already been done and they have all been dumped, next we need to open and edit them.
I will be using Krita which is an open source cross platform image editing and drawing program. Some alternatives are mspaint, paint.net, photoshop, and gnu image manipulation program (gimp).
The problem is that Krita does not understand .tlg image files. .tlg files are unique to the kirikiri game engine, so it is not reasonable to expect image editing programs to understand the obscure format.
That means we need a .tlg converter to convert the .tlg images to a more common format, preferably one that retains the transparency of the images. The most common format to use for this is portable network graphic (.png) because it is lossless, so no information is lost during the conversion provided no color space conversion occurs, and it also supports transparency. tiff is an alternative format that also fulfills those requirements.
What program can do that? What program understands .tlg files? Well, if Garbro already understands .tlg, then there is not any particular reason to try to find a dedicated .tlg->.png/.tiff tool. Let's just keep using Garbro and see if Garbro can convert them even though they are already extracted.
But first, let's separate out the .tlg files to convert. For Koakuma-chan, that is all the files named "frm_*". This was determined by looking at "Koakuma-chan no Yuuwaku\extracts\Yuu_extracts" in Garbro to see which .tlg files contained translatable elements. Eventually, it became clear they all started with "frm_".
* is a wildcard that means match any set of files. "frm_*.tlg" will match all files that begin with "frm_" and end with ".tlg" meaning that it selects all of the ui files but excludes the computer graphics art (cgs). To select the cgs and all other .tlg files, use "*.tlg" instead. For the scenario files, it would be "*.ks".
Open a command prompt.
Code:
cd Desktop\Koakuma-chan no Yuuwaku\extracts\Yuu_extracts
mkdir ui
move frm_*.tlg ui
mkdir scenario
move *.ks scenario
Then open Garbro.
Navigate to the .tlg files that need conversion.
Koakuma-chan no Yuuwaku\extracts\Yuu_extracts\ui\
ctrl + A to select all.
File->Convert Multimedia (F6)
Images: PNG
C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\extracts\decoded\ui_png
Next, move all the images that do not need adjustments.
Then open "Koakuma-chan no Yuuwaku\extracts\decoded\ui_png" and move any files that have no UI elements to translate to a different folder like "Koakuma-chan no Yuuwaku\extracts\decoded\ui_png\no_conv". .png files can be decoded in Windows Explorer, so it should be easy to figure out which ones do not have anything to translate.
Before going any further, always test to see if repacking works prior to making any changes if at all possible. That will separate out format conversion errors from errors producing by the image editing program. This process has already been done with the scripts, but making sure repackaging works with images prior to any additional format conversions beyond .tlg -> .png or further modifications will help validate the workflow before getting to deep into a possibly flawed workflow.
One interesting note is that the kirikiri game engine supports .png files natively.
However, the game itself expects them to be in .tlg format and named appropriately as .tlg files. If all of the instances the files it refers to could be changed from .tlg to refer to .png for every file modified, then the translated png files should load correctly.
However, that would mean modifying the game's scripts which is just asking for errors. Modifying the scripts to translate UI elements is already very error prone and adjusting them further to also change the format of the UI images introduces another category of potential bugs. To avoid that, converting them back to the way the game expects should allow the game engine to read them with minimal bugs produced.
So, we need to repack the .png files back to .tlg files. The best way to convert and repackage items is usually to use the tools provided by the game engine. In the case of kirikiri, third party tools like Garbro should also work fairly reliably, but I did not see any .tlg conversion option in the Garbro Media conversion ui.
Here is the main documentation page and also the format document for the sorts of files that kirikiriZ can input. The main documentation page links to the sdks for kirikiriZ and kirikiri2. The format documentation also has links to the documentation for various other tools used in the game engine, the tools to create .xp3 and .tlg files.
[How to make a novel game]
Download "KAG for Kiri-Kiri-Z" from http://krkrz.github.io/ and place the data folder in the extracted folder in the same folder as tvpwin32/tvpwin64.exe. Instructions on how to use KAG3 can be found on the following page. https://krkrz.github.io/krkr2doc/kag3doc/contents/index.html There is also a "KAG3 for Kiri-Kiri Z" that adds a save load screen and a config screen to KAG3. This may be easier at first. The basic usage is the same.
[File]
\imageviewer : A simple image viewer. You can start it by D&D this folder into tvpwin32/tvpwin64.exe.
\movieplayer: A simple video player. You can start it by D&D this folder into tvpwin32/tvpwin64.exe.
\plugin : Contains various plugins.
\plugin64 : Contains 64-bit versions of various plugins.
debugger.sdp: The debugger configuration file.
krkrdebg.exe : The debugger itself.
krkrdegb_readme.txt: A description of the debugger.
license.txt : A text file containing the license statement.
readme.txt : This file. A brief description is provided.
tvpwin32.exe : This is the main body of Kiri-Kiri Z.
tvpwin32_dbg.exe : This is the main body of Kiri-Kiri Z built with the debugger function enabled.
tvpwin64.exe : This is a 64-bit version of Kiri-Kiri Z.
[About the operation of the TJS2 script of Kiri-Kiri 2]
Kiri-Kiri-Z is not fully compatible with Kiri-Kiri-2, so some changes are required to make the TJS2 script work for Kiri-Kiri-Kiri-2.
In Kiri-Kiri-Z, the standard character encoding has been changed to UTF-8. If you want to run the old script as is, you need to add -readencoding=Shift_JIS on the command line.
The built-in KAGParser and menus are now plugged in. If you need KAGParser and the menu class, you need to link KAGParser.dll and menu.dll.
Touch is enabled on devices that support multi-touch. (It will be like saying that onTouchDown arrives instead of onMouseDown) If touch processing is not performed, it must be disabled so that it can be handled by conventional mouse processing. To disable it, set Window.enableTouch to false.
PassThroughDrawDevice has been removed, so the part that is used needs to be rewritten.
If you are using other deleted methods, you need to rewrite those processes.
For more detailed changes, please refer to the "List of changes from Kiri-Kiri 2 in Kiri-Kiri Z" page above.
The kirikiriZ sdk says to download the kirikiri2 sdk for the format conversion tools.
sigh Was it really that hard for the developer to include the conversion tools in the kirikiriZ sdk? Really? They could not have just added them to the sdk considering they are needed for kirikiriZ games too? Well whatever. Anyway, since the developer opted not to include files necessary to create kirikiriZ game assets in the kirikiriZ sdk, download the kirikiri2 sdk instead.
TLG5:
吉里吉里独自形式。
展開速度が高速(PNGの4倍速程度)なのが特徴。
画像フォーマットコンバータで変換可能。
Kiri-Kiri-Kiri's original format.
It is characterized by a high decompression speed (about 4 times faster than PNG).
Can be converted with an image format converter.
TLG6:
吉里吉里独自形式。
PNGやTLG5よりも圧縮率は高いが、TLG5の倍ほど展開に時間がかかります。
ただ、それでもPNGの2倍速程度なので高速です。
画像フォーマットコンバータで変換可能。
Kiri-Kiri-Kiri's original format.
It has a higher compression ratio than PNG and TLG5, but takes twice as long to decompress as TLG5.
However, it is still fast, as it is about twice as fast as PNG.
Can be converted with an image format converter.
It looks like TLG6 is better than TLG5 because it offers better compression. Here is the linked documentation on the image format converter.
Here is a yandrex translated version of the UI shown in the documentation. While there are english translations of the tools themselves floating around, there is also no particular reason to use them since the UI's functionality is fairly clear with the yandex translation. Personally, I would rather use mainline tools rather than forks or obscure versions if at all possible. Reduced attack surface.
To tool itself is in the kirikiri2 sdk at kr2_232r2.zip\kr2_232r2\kirikiri2\tools\krkrtpc.exe
However, all .zip files, especially those from Japanese developers, may have malformed metadata due to possibly being created with Windows Explorer because Windows Explorer always uses incorrect metadata encoding for .zip files at least up to and including Windows 10 22H2 if the file or folder names include non-ascii characters. Is that the case with this one?
Yes, yes it is. To extract it without corruption, use an extraction program that allows specifying the text encoding for .zip metadata.
The 7-Zip graphical user interface (7zFM.exe) does not support this but the 7-Zip command line interface (7z.exe) does using the undocumented -mcp= switch. Alternatively, use WinRar (Windows), unzip (Linux), or Python 3.11+'s zipfile module (python -m zipfile --help).
Code:
1. Open a command prompt (cmd.exe).
2. Drag C:\Program Files\7-Zip\7z.exe to the command prompt window.
3. Hit space bar to enter a single space into the command prompt window.
4. Press x to signify to tell 7z.exe to enter eXtraction mode.
5. Hit space bar to enter a single space into the command prompt window.
6. Enter "-mcp=932", without the quotes, to signify windows/ansi code page 932 which is the one for shift-jis.
7. Hit space bar to enter a single space into the command prompt window.
8. Drag Desktop\Koakuma-chan no Yuuwaku\tools\kr2_232r2.zip to the command prompt window.
9. Press enter.
10. Move kr2_232r2 from the main user profile folder (C:\Users\User) to Desktop\Koakuma-chan no Yuuwaku\tools\
Open the tool under Koakuma-chan no Yuuwaku\tools\kr2_232r2\kirikiri2\tools\krkrtpc.exe with a Japanese locale emulator. Getting the tool to show the UI in the documentation above requires checking the box, so check it.
With the tool's UI finally displayed, we can use it to batch convert all the files .png files to .tlg.
Select TLG5 or TLG6 for both Opaque image and Image with transparency. TLG6 is smaller than TLG5, so I would opt for TLG6. Under Output Folder, select the Specified Folder radio button, and check the Overwrite box, and then Browse...
Browse to wherever the converted files should be placed into, such as into the folder used to create the proof of concept patch earlier. For me that is Koakuma-chan no Yuuwaku\MTL\Patch.
In Japanese locale, \ is mapped to the yen symbol ¥ for display purposes in non-unicode programs like is common for the command prompt and krkrtpc.exe. Typing \ on a keyboard into a command prompt on Windows using Japanese, Japan locale will also display as ¥. This means the path "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\Patch" will display as "C:¥Users¥User¥Desktop¥Koakuma-chan no Yuuwaku¥MTL¥Patch" but that is cosmetic only. Copy-pasting a path in the command prompt and other non-unicode programs that includes ¥ will properly interpret ¥ as the \ symbol so the target, such as a utf-8 encoded text file, will receive \ properly. This means the underlying path is not corrupt even though it displays in an odd way. Just be aware of this, and do not worry too much about how paths get displayed.
To actually convert images, drag and drop .png images into that window. It will bring up a box that says it worked. When dropping multiple selected files, it will create a cute progress bar window and then say it worked. Once you have done it a couple times and are more confident it is working as intended, check the "Do not show logs if no error occurred" checkbox at the bottom so it is easier to use quickly and leave it running in the background. It can be minimized using task manager.
Since unedited images seem to work, next is testing to see if edited images load correctly.
That does raise the question of what translated text to put in place of the Japanese glyphs. In other words, what should the kanji, hiragana, and katakana be translated to? That requires a passable translation but translation software only works on text. To convert images to text for use in translation software requires optical character recognition (ocr) software to perform the conversion.
There are a lot of different ways to perform ocr on images to convert Japanese glyphs into translatable text. Here is the yandex web one. Remember to select Japanese as the from language.
yandex's ocr is passable even if the translation quality is not. In addition to having image->text OCR, yandex can also overlay the translations over the original image with a high success rate which makes it easier for westerners to associate a particular translation with a particular part of the source image whenever there are multiple ones displayed, like in configuration screens.
In addition, since the ocr is separate from the translated text, a different translation engine can be used on the ocr'd text if accuracy is relevant.
feeding frm_0506.png into yandex yields this.
Code:
この作品はフィクションて多。
登錫多る人物名、 団体名、 地名、 設定はどは全ご架空のものごあり、
実在多るちのとは一切関係ありませい。
一登錫多る人物 (性約び表現のある本つラりラー) は全て18歳以上てる。
18歳末満の方の購入及びブレイは禁止されていま事。
このゲームは日本回内向じてる。
For Japan Only
This work is a lot of fiction.
The names of many people, groups, place names, and settings are all fictitious.、
It has nothing to do with the reality many chino.
All of them are over the age of 18.
The purchase and bray of those who are 18 years old are prohibited.
This game is introverted in Japan.
For Japan Only
That translation is hilariously wrong in all the right ways, so I am keeping it.
- Load the image
- lock the image in layers pane to treat it as a background layer without ever altering it
- create a new layer copied from the selected text that needs translation
- Edit + fill that new layer will the background color of the original
- Add a text box which will automatically add a third vector only layer for the text
- Type in the translation and format it appropriately so it looks nice
- export as png
- make sure alpha is selected and saving as krita file (.kra) is also selected which is useful to update the images again later
Use Xp3Patch.exe to repackage the images into a Patch.xp3 file, and then load them into the game. If it loads correctly, then the repackaging process was successful.
After editing all of the images here is what the UI looks like.
The the next thing to translate Game engine strings like the title, help tips, questions when moving around the UI, and, most importantly, the dialogue choices.
The screenshot has options for あげる, give, and for あげない, give. Humm. It seems like something was lost in translation there. Let's try that again. yandex says あげる is "i'll give it to you" and あげない is "i won't give it to you". Okay, that makes more sense. When in doubt, get a second opinion.
So where are あげる and あげない? They are not written as normal dialogue or they would have been translated as part of the MTL for the .ks scripts. The most common places developers can put choices are in either the game engine scripts (.tjs) or more commonly in the dialogue scripts (.ks/.scn.txt), so let's search the dialogue scripts first.
-In Windows explorer, select every .ks file.
--Single click the first one + hold shift + single click the last one will select every entry in between in order.
-Right-click on a .ks file
-Select "Edit in Notepad++"
-ctrl+f
-Enter あげない in the Find box
-Select Find all in All Opened Documents
There is 1 hit in 01_01.ks.
Code:
@Talk name=心の声
Here...
@Hitret id=245
@AddSelect text=あげる hint=自爆系
@AddSelect text=あげない hint=甘々系
@StartSelect
@if exp="ChkSelect(1)"
@onFlag id=201
@addlove id=自爆系 num=1
@Talk name=心の声
I can't show any more agitation than this.
@Hitret id=246
It looks like the choices are part of the dialogue scripts and placed after "@AddSelect text=" but they were not captured since they start with @ and I am excluding most lines that start with @ since they all need special handling compared to normal dialogue.
The line that starts with "I can't show" is missing a tab character in front of it to keep it aligned like the original untranslated line. Since the misalignment does not result in an error, I am not fixing it. There are also some ruby= lines here and there that were left alone that presumably show ruby text in the original text. However, translating lines this ruby= syntax applies to does not result in an error. In general, if it does not result in an error or have other visible effects, then it is better to not change things. When in doubt, only translate visible text and stop adjusting things once it works.
Hummm. Changing text=あげる to text=give would work, but that does not explain how to handle spaces and quotes. Time to experiment!
Code:
@AddSelect text="give it" hint=自爆系
@AddSelect text=not give hint=甘々系
It is usually a very bad idea to make more than one change at a time because then we do not know what exactly failed and why, but I am feeling adventurous. Let's do it and regret it later.
It looks like quotes do not display, but are required to display lines with spaces and lines that omit spaces will truncate the result on the next space. Two pieces of useful information and without any regrets in sight! That was very unexpected. I was hoping for a crash to be honest. Anyway.
That is enough to translate the choices. To find all of the choices, search for @AddSelect text= in Notepad++ with all of the .ks files opened. To find the context they appear in, use the game's built in skip to choice feature. In the config screen under Message Settings, be sure to select "skip all messages" to jump instantly to the next choice even without complete save data.
Let's translate the title next.
Let's go back to the Initialize.tjs file. Remember that Initialize.tjs is the file that is responsible for loading Patch.xp3 and all of the translated contents. Searching for "title" in that file brings up the following lines.
Code:
//キャプション登録
System.title = GAME_CAPTION;
- Lines that start with // are comments, meaning that the developer wrote themselves a comment. キャプション登録 could be translated to "Caption registration" meaning that they were telling themselves what the next line of code does.
- In some programing languages, like tjs2, every line of script code must end with a ;.
- In programming, usually whatever is on the left gets assigned the value on the right of =. That means System.title is probably what we are trying to change.
- The value on the right, GAME_CAPTION, is a variable. Think of variables as little buckets that store whatever you put into them. Using them in the code always hands back the contents. As an example, a variable can store the title for a game, like こあくまちゃんの誘惑っ!. Every time the variable GAME_CAPTION is called in the script, the game's interpreter will dynamically substitute the variable GAME_CAPTION with こあくまちゃんの誘惑っ! so the developer does not have to literally write こあくまちゃんの誘惑っ! every time they want to refer to it. That makes updating the game's title much easier and that the same code can work without any modifications for different games names, like "こあくまちゃんの誘惑っ! Demo version" as long as the developer sets the initial value appropriately before the code runs.
- The all caps thing is just the developer choosing to follow a convention may programmers follow where variables that are not meant to be changed after they are declared are written in all caps. game_caption would also work as a variable name in general, however, since programming languages are case sensitive, for Koakuma-chan, we have to keep using GAME_CAPTION since that is the case the developer already decided on for that variable.
The developer is taking the value of the variable GAME_CAPTION and assigning System.title the real value of GAME_CAPTION which is very likely こあくまちゃんの誘惑っ!. We probably need to write something like the following to change the title.
Code:
System.title=Koakuma-chan no Yuuwaku
Oh right. The semicolon.
Code:
System.title=Koakuma-chan no Yuuwaku;
But, what is the exact syntax for doing that, and what is the program's control flow?
Let's look for GAME_CAPTION and see how it gets used. There should be some line somewhere that says GAME_CAPTION= to assign the variable GAME_CAPTION its real value which should be こあくまちゃんの誘惑っ!. Where and when does that happen?
- Select every .tjs file in Koakuma-chan no Yuuwaku\extracts\Yuu_extracts\
- Right-click and "Edit in Notepad++"
- Find GAME_CAPTION (ctrl + f)
- Find all in all opened files
There were 11 hits. Here is the most promising in Status.tjs. Promising means that GAME_CAPTION appears on the left side of an = sign.
Code:
// HTMLマニュアル
var HTMLMANUAL_ADDRESS = "manual.html";
//ゲームタイトル
var GAME_TITLE = "こあくまちゃんの誘惑っ!";
var DATA_PATH = System.dataPath;
@if(__TRIAL__ == 0)
//製品版
//キャプション
var GAME_CAPTION = GAME_TITLE;
//ゲームバージョン
var GAME_VERSION = "1.00";
The value for the variable GAME_CAPTION actually just comes from another variable named GAME_TITLE. Ugh. Okay, well, where does GAME_TITLE get its value from? A couple lines above it, it gets its value set.
Code:
var GAME_TITLE = "こあくまちゃんの誘惑っ!";
It looks like the syntax is that new variables must be declared using the "var" keyword and a series of literal characters that are not variables must be enclosed in quotes. Spaces in between the = and the variable and string are inconsequential. In other words, the following line should work to set the title.
Code:
System.title="Koakuma-chan no Yuuwaku!";
The next problem is... where does that line go? What script file (.tjs) needs to be altered with that change for the title to get updated?
The game needs to load certain assets like initialize.tjs in order for the Patch.xp3 to load the updated changes. However, since initialize.tjs is already loaded into memory when Patch.xp3 gets read, updating initialize.tjs through Patch.xp3 would be pointless because the updated initialize.tjs would never be read.
We need to find some .tjs script file that gets read after initialize.tjs executes so the files changed in Patch.xp3 have a chance to be read and then executed by the game engine. Ideally, that script should also be executed before the main window displays because by then the System.title is already displayed and changing it might be too late.
What script is ready to be executed after initialize.tjs? What order do the script files get read and executed?
Sometimes, at the very bottom of a script, the next script will be called in a sequence. Does that happen here? Unfortunately no. There are no ".tjs" scripts at the bottom of Initialize.tjs. Okay, back to square 0.
In that case, let's step back a moment. How does Initialize.tjs get invoked anyway?
Altering scripts that load prior to Initialize.tjs will not result in any changes because those were are already executed. Unless the game engine reads them back in and executes them later, no changes would take place. However, knowing which script invoked Initialize.tjs would give us a clue as to what file it might load afterwards if that parent script is handling control flow instead of Initialize.tjs.
If we search all of the script files for "Initialize.tjs", there is 1 hit in startup.tjs.
If we search for startup.tjs, there are no hits meaning the string "startup.tjs" does not appear in any of the other scripts. Did we not dump all the files, or perhaps it is never loaded by any other script? Perhaps this is the file loaded by the executable "Koakuma-chan no Yuuwaku.exe" and loading it is hardcoded into the executable itself? If it is the first script to load, then it should provide crucial information about the program's flow of control.
That is the entire file. What can we learn from it?
It looks like Status.tjs is defined and read if it exists in Storages. If it does not exist in Storages, then the game will look for the file at system/ + file which resolves to system/Status.tjs.
That is interesting and makes me want to play some games with the engine logic that involves deleting some files and creating some folders, but that is not the logic we are looking for right now. Let's not get distracted and keep reading the script's logic.
- Next, it does the same thing to Initialize.tjs where it reads it if it is loaded.
- Then it deletes the file so it does not take up memory.
- Some code about starting a new MainWindow() gets loaded.
-- The () after MainWindow means that MainWindow is a function, not a normal variable. Functions are little code blocks that execute multiple lines of code whenever they are called. Instead of holding a value, like variables, they hold multiple lines of code.
- Then it reads save data.
- If the save data, sysReg and saveMan, has CONFIG.bootSettingWindow set to True, then it will start BootSetup.tjs.
- Otherwise, it will show a window and begin executing begin.tjs.
BootSetup.tjs is probably the file responsible for sometimes showing that small window prior to the start of the game that allows selecting what resolution to run the game at prior to launch. Users can skip that screen if they want by setting CONFIG.bootSettingWindow to False. If we modify that file and a user selects to skip that window in their game options, then none of our changes would get executed.
But what about begin.tjs? If a user opts to skip the bootSettingsWindow, then begin.tjs will get executed instead of BootSetup.tjs. Does begin.tjs always get executed or only if BootSetup.tjs is skipped? What does BootSetup.tjs say?
At the end of BootSetup.tjs is the following code.
Code:
function BootWindowToGame(){
invalidate BootWindow;
win.showWindow();
Scripts.execStorage("begin.tjs");
}
Okay that is enough to guess that begin.tjs is probably executed after BootSetup.tjs. That likely means that regardless of the CONFIG.bootSettingWindow setting, begin.tjs will always get executed.
But first! I am curious. BootWindowToGame() is a function. Defining a function is not enough to invoke it. Just because a function appears at the end of the file does not mean that is when it is actually executed. When does it actually get invoked and start loading begin.tjs?
Double clicking on BootWindowToGame in Notepad++ will highlight all instances of that series of characters, called a string, appears in the file. Scrolling up, BootWindowToGame also appears in the following line.
Code:
_asyncTrigger = new AsyncTrigger(BootWindowToGame, '');
I am not sure what AsyncTrigger does, but maybe it is some sort of trigger that is asynchronous? It also appears right below the definitions of screen resolutions. Synchronous means in a well timed or sequential order. Asynchronous means the program has no idea when something might happen. What does that sound like on a screen that appears before the game starts? Maybe a user's mouse click?
If we assume it refers to a mouse click, then perhaps begin.tjs gets invoked some time after user clicks something? That makes sense since the game waits at the resolution selection window indefinitely until the user clicks on something. So the user clicks on a resolution, and then the game launches boot.tjs. Sometime after that, the game itself launches. All of that makes sense.
Okay, curiosity satisfied now. Moving on.
Since we are modifying begin.tjs, let's copy it from the extracts\Yuu_extracts folder to the MTL\Patch folder. And add the following to the top of begin.tjs.
We probably do not need to update all of them for just the title, but I can't be bothered to figure out which ones need updating. They are all displayable strings that share the same value at runtime anyway, so just update them all.
Code:
// start of changes
// update title
System.title="Koakuma-chan no Yuuwaku!";
GAME_TITLE=System.title
GAME_CAPTION=System.title
// end of changes
//================================================
//レンダリング済みフォント登録
PrerenderedFontInit();
And after repacking Patch.xp3 and restarting the game, the title does not change. Why not? Humm.
The code above that we are using to update the title should work, right? That is how the game is setting the title, so maybe that is working as intended. If it is, then the problem is something else. What could it be if it is something else?
Well, maybe we are trying to modify the code too late in the boot process once the title is already displayed?
The game needs to load the code after Patch.xp3 gets added but before the window actually gets displayed. Once the title is actually displayed in the titlebar, updating the variable the title was stored in will not do anything. That variable's contents have already been used.
If that is the case, then how do we update that variable earlier in the boot process before the game tries to read it and before it sets the title bar? Let's go back to Initialize.tjs. What happens immediately after Patch.xp3 gets loaded?
Immediately after it loads Patch.xp3, it does more AddAutoPath() calls to add subfolders for parts. Then it adds some .dll files.
I randomly tried loading files from parts and _debug here, but it did not work.
Then after the .dll files, there is this interesting bit of code.
So the first script to load is k2compat.tjs but it must be loaded from a subfolder called "k2compat/". Messing with subfolders certainly is possible if there is no other choice, but it looks like Utility.tjs loads next and it loads from the root which makes it easier to work with. That is our next target.
First though, let's think this through. Should updating Utility.tjs work to update the title?
The problem with messing with data too early in the boot process, as opposed to too late, is that, since we cannot change the script load order in Initialize.tjs directly, our changes may get get overwritten later by the original data. There are also problems with loading early boot files by way of Patch.xp3 because changes to files too early in the boot process will not matter since they occur before Patch.xp3 has a chance to update them. If Utility.tjs is too early in the boot process, then the game will just override any data it sets and our changes will not occur. Is that the case?
Should it work? Why or why not? What can we base our reasoning on? Wait, what was the program's control flow again?
Going back to startup.tjs, it loads Status.tjs, then Initialize.tjs, then BootSetup.tjs/begin.tjs.
begin.tjs is too late in the boot process, as we discovered. Status.tjs and Initialize.tjs are too early because they are already read by the time Patch.xp3 loads, so the game would never read any changes. BootSetup.tjs might work, but because it has that conditional loading, it is unreliable to use for code that we always want loaded even if it works.
Therefore, we want to use a script that will always load immediately after Patch.xp3 loads. Utility.tjs seems to fit, but at the same time, we do not want to load our code too early and have it overwritten by the original files. What code are we trying to load again?
These are the variables we are trying to update.
System.title
GAME_TITLE
GAME_CAPTION
Do they get updated before Utility.tjs loads or after it? If they get updated before, then we can alter them in Utility.tjs and our changes will stay. If they get updated after Utility.tjs, then our changes will get overwritten by the original data. Where do the values of those variables get set?
System.title gets set in Initialize.tjs, as we discovered that earlier, but what about the others? If we search for GAME_TITLE in Notepad++ with all of the .tjs files opened, there is a hit in Status.tjs. Oh, right, right. There is already code above showing the complete contents of Status.tjs too.
Thinking about the flow of control, Status.tjs is the only file loaded by startup.tjs even before Initialize.tjs. Therefore, if we update those variables midway through Initialize.tjs in one of the scripts loaded by Initialize.tjs, Utility.tjs, then the changes are guaranteed to stay because they are never overwritten later. They are defined before Utility.tjs and also they are not used until after Utility.tjs executes.
That means what we are trying to do should work. So, let's copy Utility.tjs to the MTL\Patch folder and update MTL\Patch\Utility.tjs with the same code as above.
Code:
// start of changes
// update title
System.title = "Koakuma-chan no Yuuwaku!";
GAME_TITLE = System.title
GAME_CAPTION = System.title
// end of changes
//追加関数
//一度に複数の要素をaddする
Array.adds = function (args*){
for(var i=0;i<args.count;i++){
add(args[i]);
}
};
Syntax error. Uh... Oh! Right! Semicolons. Always with the semicolons.
Code:
// start of changes
// update title
System.title = "Koakuma-chan no Yuuwaku!";
GAME_TITLE = System.title;
GAME_CAPTION = System.title;
// end of changes
//追加関数
//一度に複数の要素をaddする
Array.adds = function (args*){
for(var i=0;i<args.count;i++){
add(args[i]);
}
};
Progress! The boot screen shown above displays the updated title with some characters appended to it. The main game shows the base title without any additional characters.
Okay, so that shows that we had the right approach with the code that updates the title, but we were using the wrong file before. Using the same code in a file that is read sooner than begin.tjs allowed the title variable to get updated before the game tried to display the title using that variable as intended.
Remember BootSetup.tjs? The file that defines how to handle the different resolutions the game came launch at, and it starts before the game does. Looking through that file, it has this line.
Code:
caption = global.GAME_CAPTION + " 起動オプション";
To finish translating the above title bar, we can update the code to
Next up is to translate all of those hints and game ui questions.
Taking a screenshot with printscreen, opening mspaint.exe, pasting with ctrl+v, cutting just the hint with Snipping Tool, and then dumping the resulting image into manga_ocr gives this as output.
選択肢後もオート談定を継続します
Since this is a game engine string, it is probably not part of the dialogue kag3/.ks scripts because it is not dialogue. Let's search all the .tjs files again since that is where the game engine strings tend to be.
No results. It might be an ocr error or in the .ks files. Searching the .ks files leads to the same thing. No results. If it is an ocr error, that would require checking each individual character, but that just takes too long and requires patience. Let's just search the .tjs files for random substrings of 選択肢後もオート談定を継続します instead.
Searching for 継続します has 2 matches in ConfigWindow.tjs. ConfigWindow... hummm. Maybe that is the .tjs that governs the configuration window?
The strings in quotes after hint: are probably what needs to be translated. We can ignore everything else. Same as before, copy ConfigWindow.tjs to MTL\Patch and update it appropriately.
Code:
%[type:RBUTTON, id:"Window" , group:"ScreenMode", file:"RadioButton", ptn:3, pos:[370, 168], width:118,
hint:"Change to windowed mode"],
%[type:RBUTTON, id:"FullScreen", group:"ScreenMode", file:"RadioButton", ptn:3, pos:[511, 168], width:155,
hint:"Change to full-screen mode"],
%[type:RBUTTON, id:"SZ1920", group:"ScreenSize", file:"RadioButton", ptn:3, pos:[370, 198], width:158,
hint:"Change the screen size to 1920x1080"],
Not having to ocr stuff makes translating so much easier. But, now we need to find out where the rest of the game engine strings are. Back to ocr again... *sigh*
Well, since I am avoiding having to run ocr again for as long as possible and ConfigWindow.tjs is still open, is there anything obvious that needs to be translated in ConfigWindow.tjs besides the hints from earlier?
That string in the .output("サンプルメッセージです"); line might need translating. What is it exactly? The config screen has a message display sample.
So, how and why is that "obvious" that it needs to be translated? In ConfigWindow.tjs, there is a lot of code that looks like this.
Code:
@if(__TRIAL__ == 0)
case "VolMovie" : file = "BGM_OP"; break;
@endif
@if(__TRIAL__ != 0)
case "VolMovie" : file = "BGM08"; break;
@endif
case "Master_VolVoice" :
case "Master_TestVoice" : file = PlaySystemVoice("音声マスター:ボリューム", false); break;
case "Master_PlayVoice" : if(_btn[type].state) file = PlaySystemVoice("音声マスター:ON", false);
else file = PlaySystemVoice("音声マスター:OFF", false);
break;
Just because there are some characters in quotes, like "Master_VolVoice", "BGM_OP", "音声マスター:OFF", or "音声マスター:ボリューム", does that mean those strings can be translated. A lot of strings in the .tjs files are internal game engine strings that refer to values or files like in in startup.tjs where it mentioned "Initialize.tjs" in quotes. Altering internal values will result in an overt crash or other unintended behavior, like voice lines not getting played.
In the translatable line above, the string appears in .output("サンプルメッセージです"); which means it is being fed into a function called output().
What makes .output("サンプルメッセージです"); different enough that we know it has a translatable string just by looking at it?
Nothing really. There is no definitive way to know. The most we can do is guess and then test to see if translating it crashes the game or produces some sort of error.
Every developer is free to write functions, call them whatever, and feed them whatever values they want.
output() can move some data around and never output anything as far as we know. PlaySystemVoice() can do nothing at all. Even for standard game engine sounding functions like PlaySystemVoice(), every developer is free to redefine them to alter their behavior completely. Every developer does things slightly differently, so there is no way to be sure without looking up each individual function definition, having a thorough understanding of the tjs2 language itself, and understanding the program's flow of control. Either that, or simply experiment and see if anything crashes.
Even then, all of the information can only be used to form a hypothesis for why something should or should not work. Knowing requires testing that hypothesis by altering the string and seeing if anything breaks in the game. In other words, experimentation is still always required.
In the case of this particular developer and going by their script names so far, like for ConfigWindow.tjs, their naming system seems pretty straight forward. They likely are not trying to obscure how they are writing their code, especially from how they even included the path to their preferred text editor in the game's final release files, so we can make some guesses.
- The output() function probably tries to output something.
- file = "BGM08" is probably setting a file name.
- PlaySystemVoice("音声マスター:OFF") is probably trying to play a system voice that is referred to as "音声マスター:OFF" internally.
However, the word "probably" in all of the above means there is no way to be sure without translating the string and seeing if the game crashes.
This dynamic involving a dramatic increase in the likelihood of game crashes and errors when modifying internal game engine scripts is why a lot of translators and mtl publishers do not even try to translate anything beyond the dialogue. Compared to the easy parsing and translating of the dialogue scripts (.ks) which are in a known format and are relatively safe to mess with, messing with the game engine scripts (.tjs) is seen as too risky.
Ultimately, knowing a particular game engine string is safe to translate requires either detailed knowledge of the game engine and experimenting. Both of those take considerable time and effort for very little benefit, especially if the translator or mtl publisher is not particularly invested in the content.
This means the preferred way of translating games for persons not willing to invest a lot of time and effort into each individual title they translate is to translate the dialogue only and let someone else handle the images and game engine scripts if they are interested.
Since there is no way to be 100% sure a game engine string is safe to translate, it is just not possible to automate the extraction of random strings, translate them, and reinsert them back into the game engine safely without human interaction. Hence, completely avoiding this problem is relatively common.
One alternative approach to blindly translating everything and hoping things do not break is to play through the game and look for anything not translated. Then only translate just those strings and nothing else.
This alternative approach leads to the least amount of game engine strings altered which minimizes the potential for crashes. It also makes it clear why a crash is happening since each individual type of string, in terms of the context they appear in, can be tested immediately during translation. While it increases the potential for the translator to not find a string to translate, that is a much better problem to have than game crashes.
This does mean dramatically reduced automation compared to a fully automated approach where every string in double quotes is dumped in a file for translation, but it also means significantly reduced chances of crashes while still translating particularly important and highly visible game engine strings, like the game title and choices.
In the case of .output("サンプルメッセージです");
- The developer does not seem to be obscuring their code, which means output() probably does some sort of output. This cannot be taken for granted with other developers.
- We know the string appears in the game.
- The string appears in both the configuration screen in game, and in ConfigWindow.tjs, which is probably the game engine file that handles the configuration screen meaning that the string appears where it is expected to appear.
- The string appears in the same file as other translatable strings associated with the configuration screen.
Bundling the above evidence together makes it likely that translating this particular string will not crash the game. However, having to determine that from the context makes it not possible to full automate since computers are not great at using induction.
In the first place, computers are based around deduction, deriving resulting data from a set of axioms, mathematical functions, implemented in the processor. The idea of using induction, evidence based reasoning that focuses on making judgements of probability, is entirely a foreign concept to computers.
But, that is what contemporary AI is supposed to address right? That is why AIs are such a powerful concept, since they can start to take computers from pure deductive reasoning and make them able to consider context when determining their output. Well, LLMs do not really know if their output makes sense or not. Considering that humans have a hard enough time determining fact from fiction, that is understandable.
Still, maybe AI will get advanced enough to one day be able to determine if strings fed to .output() and other game engine functions should be translated or not? Instead of guessing the response based upon their training data, which is what contemporary AI's do, as long as the AI has a means of forming and testing their own hypotheses, then AI can learn just like humans. We are a very long way away from that considering that contemporary LLMs do not even "know" anything at all. They are just fancy text predictors.
Anyway, I digress. Let's get back to translating this eroge. Here is what happens when the above strings get changed, including the one in output().
Here are the next strings to translate.
- the return to title screen confirmation
- the exit game confirmation
- the hints in the Sound configuration menu
- the Q.Save during dialogue flying confirmation
- the クイックセーブ, Quick Save, string in the loading screen
- a hover help tip in the post game cg gallery
- the song titles in the post game music gallery
- the location strings in the post game character gallery
Translating them is done exactly the same way as before.
- ocr them in the game window
- search the .tjs files for the string.
-- If the string is not found, then search for a substring as a workaround for ocr errors.
- Copy the relevant file from extracts\Yuu_extracts to MTL\Patch
- update the quoted string
- test each change as it is made to see if translating that string type crashes the game engine
Technically, there are end credits as well, but... no. I am drawing the line there. People's kanji names also do not translate very well without furigana, so... ad-hoc rationalization found! That does not apply to the job titles, but let's just ignore that and move on. This game might also implement the credits as videos.
The last thing to translate are any videos, such as the opening theme song. That requires finding the song name, the lyrics to the song, somehow, converting the lyrics into subtitles, somehow, and then hard coding the subtitles into the video, somehow. Including a video in the Patch.xp3, while possible, would bloat the size a lot, 177 MB, with minimal benefit, so I am skipping it for this case study. Including it is also somewhat complicated since it covers subtitling, maybe subtitle styling with kfx, and video encoding. That is a bit much for right now.
Currently, the Koakuma-chan no Yuuwaku\MTL\Patch folder is a mess because kag3 scripts, tjs2 scripts and .tlg images are all at the root meaning the resulting Patch.xp3 also has the same issue. We can leave it that way, or we have the option to clean it up by moving the files into subfolders.
While technically optional, adding syntax to loading assets from subfolders,
- keeps assets organized into subfolders
- allows us to control the load order of assets, like Utility.tjs
- simplifies integrating any official Patch.xp3 released later
- allows adding support for additional patch.xp3 files
- and helps manage integrating a large amount of patches
From what Initialize.tjs implies, Koakuma-chan does not currently support loading content from additional Patch.xp3 files. The developer probably intended to add support for a second Patch2.xp3 with their first Patch.xp3 instead of having support for multiple patches built in to the original game like most other kirikiri developers.
This developer has this habit of being really minimalistic that way. That is good for performance, but bad for development time.
If we were to add support for multiple patches ourselves, that would allow us to potentially do things like separate out the essential dialogue and ui from optional CG heavy uncensoring and video translations which tend to have large filesizes in comparison.
Honestly, for Koakuma-chan, none of the above are really necessary since it is a fairly small game. VNDB says Koakuma-chan is only 4h30min. Fully translating it only required altering 114 files or ~6 MB of files. There were a handful of .tjs scripts, 48 kag3 scripts, the rest were mostly small images. For games that are 10x longer, have thousands of files to edit that have been patched by the developer multiple times, and translation patches are 500 MB+ in size, organization is a must. Still, we might as well do it here for Koakuma-chan to document the syntax and have a template to work from for future translations.
So, how do we add support for subfolders and additional patches? Well, the developer already gave us the syntax to do so in their Initialize.tjs.
Code:
function AddAutoPath(name){
[...]
}
AddAutoPath("Patch.xp3>");
[...]
delete AddAutoPath;
[...]
LoadScript("Utility.tjs");
Our Patch.xp3 gets added using the AddAutoPath() function which is defined right above the patch enumeration and loading code. However, right after AddAutoPath() gets used, it gets deleted by this minimalistic developer. That means that we cannot use the same instance of the function while in Utility.tjs, the script we are hijacking to load custom code, since Utility.tjs does not get executed until after AddAutoPath() is already deleted.
If AddAutoPath does not exist, then we just need to define it again. Copy the entire function AddAutoPath() definition from Initialize.tjs to Utility.tjs and put it right after the title loading code. A function's body is defined as the first { until a matching closing }. Certain editors including Notepad++ will sometimes show the closing brace, but since the code is reasonably formatted, that is not really necessary.
Here is the top of Utility.tjs so far after blindly copying the function.
Code:
// start of changes
// update title
System.title = "Koakuma-chan no Yuuwaku!";
GAME_TITLE = System.title;
GAME_CAPTION = System.title;
// update storage
function AddAutoPath(name){
//Split by ">"
var path = name.split(">");
if(path.count == 1){
//it could not be split, so add it normally
Storages.addAutoPath(name);
return true;
}else{
//if it can be split, check for the existence of the archive file before adding it
if(Storages.isExistentStorage(System.exePath + path[0])) {
//the archive is found
Storages.addAutoPath(System.exePath + name);
return true;
}
}
return false;
}
// add new storages here
// end of changes
//追加関数
//一度に複数の要素をaddする
That should work, but when copying it, I noticed something interesting. It is actually this one line that is the actual api call to kirikiriZ that is important.
Code:
Storages.addAutoPath(System.exePath + name);
The user defined AddAutoPath() function is just a wrapper for the "Storages" system function Storages.addAutoPath(). Let's simplify the code then. Before we do, what is the exact syntax for adding random .xp3 archives and subfolders anyway?
Unfortunately, we cannot just look through Storages.tjs for more fun games to play with the storage subsystem, especially to enhance the parts that require encryption to also support non-encrypted archives, since that file does not actually exist, or at least not in Koakuma-chan. In Koakuma-chan, the "Storages" class is actually implemented in some compiled .dll somewhere or perhaps in the main exe, and like everything else, is subject to developer modifications prior to being compiled. Looking at the C++ source code on Github for that class is not necessarily useful for our purposes.
アーカイブ内のストレージを指定する場合は、 > で区切り、> より前をアーカイブストレージのストレージ名、> より後をアーカイブ内でのパスに指定します。
To specify storage in the archive, separate them with >, with > before the storage name of the archive storage and > after the path in the archive.
データ保存場所は コマンドラインオプション の -datapath オプションで指定されたフォルダです。System.dataPath プロパティで取得することができます。
The data storage location is the folder specified by the -datapath option of the command line option.
吉里吉里が出力する各種ログやユーザごとの設定ファイルはここに出力されます。
It can be obtained with the System.dataPath property. Various logs and user-specific configuration files output by Kiri-Kiri are output here.
また、ユーザがゲームやツールなどを作る場合は、データはここに保存することが推奨されます。
Also, if the user creates a game or tool, it is recommended to store the data here.
[Storages.addAutoPath(path)]
path
自動検索パスに追加するパスを指定します。
パスの最後は、アーカイブ内のルートフォルダを指定するときは '>'、通常のフォルダを 指定するときは '/' で終わる必要があります ( 例 : "Archive/arc.xp3>" や "System/" ) 。
2.19 beta 14 よりアーカイブの区切り文字が '#' から '>' に変わりました。
path
Specifies the path to add to the auto-search path.
The path must end with '>' to specify the root folder in the archive or with a '/' to specify a regular folder (e.g. "Archive/arc.xp3>" or "System/").
Starting with 2.19 beta 14, the archive delimiter has been changed from '#' to '>'.
Based on the documentation and examples above, .xp3 files must end in ">" to load the contents of the .xp3 instead of the literal .xp3 archive. If there are no .xp3 or > involved, then that refers to a bare folder path relative to the System.exePath.
Based upon experience, loading folders outside of .xp3 files does not work. The code in Initialize.tjs is there, but there is something preventing Storages.addAutoPath() from working as documented which is likely an aspect of the game's DRM. Thankfully, we can add support for specially crafted unencrypted archive.xp3 files by using version.dll and create such archives with Xp3Pack.exe, but that does mean, even if we can partially manipulate the storage subsystem, that we have to always use such archives when loading files.
Since we can load .tjs scripts, it should be possible to add support for bare folders and normal unencrypted .xp3 now, but we would have to overwrite or duplicate aspects of the existing Storages class as native tjs2 script. So we could do,
Code:
function Storages2(folder_or_xp3archive){
// tjs2 code that does whatever Storages.addAutoPath() does
}
Improving the compatibility of Koakuma-chan's Storage class to enable loading files from folders or regular archives is a bit beyond our purposes here, but is worth looking into for future development. For now, we are only trying to add subfolders to Patch.xp3 which we can do by replacing the archive.xp3 names above with "Patch.xp3" to get our desired syntax.
That covers syntax, but what subfolders should we actually add?
Since we have files from potentially multiple archive.xp3 files each with their own subfolders, personally, I am in favor of keeping the original folder structure. That documents where each file came from, and the folder names imply the purpose of the file. That is a more useful folder structure rather separating files based something else, like the file type. Thankfully, we can check the original folder structure and file names using Garbro to recover this information.
It looks like the kag3 scripts are under data.xp3/scenario. Most of the tjs2 scripts are under data.xp3/system. And the frm_ files for the ui are at parts.xp3/frame.
So frm_ apparently stands for frame, and, except for frm_1401.tlg, the game internally stores the ui asset filenames in uppercase. Interesting. Our lowercase frm_ files work despite not matching the original filenames because kirikiri will internally always alter the case of files to lowercase. How do we know this? By reading the documentation I linked above.
On a similar note, a lowercase patch.xp3 will also work because the Windows api is not case sensitive and will return patch.xp3 if the kirikiriZ game engine requests Patch.xp3. Most other apis do not work in such a convenient way and tend to be case sensitive.
Anyway, after copying the same folder structure into MTL\Patch, we get this folder structure.
Utility.tjs has to stay at the root, despite belonging in Patch/data/system because that is the file we are using to load our code that adds support for additional subfolders. However, we only need the name "Utility.tjs" to be at the root of Patch.xp3 for the file to be executed by Initialize.tjs. More on that in a bit.
Here is the above folder structure combined with the previous syntax in Utility.tjs.
Code:
// start of changes
// update title
System.title = "Koakuma-chan no Yuuwaku!";
GAME_TITLE = System.title;
GAME_CAPTION = System.title;
// add new storages here
Storages.addAutoPath(System.exePath + "Patch.xp3>data/system/");
Storages.addAutoPath(System.exePath + "Patch.xp3>data/scenario/");
Storages.addAutoPath(System.exePath + "Patch.xp3>parts/frame/");
if(Storages.isExistentStorage(System.exePath + "Patch2.xp3")){
Storages.addAutoPath(System.exePath + "Patch2.xp3>");
}
if(Storages.isExistentStorage(System.exePath + "Patch3.xp3")){
Storages.addAutoPath(System.exePath + "Patch3.xp3>");
}
// end of changes
//追加関数
//一度に複数の要素をaddする
[...]
Isn't that folder structure a thing of beauty? Now let's test it! Beauty means nothing if it doesn't work! Function over form I say!
So true art is where beauty and function combine together to form a fully translated Koakuma-chan. Is this the true meaning of enlightenment?
One additional nitpick is that we are currently hijacking the original Utility.tjs to load our code. Now that we can load files from arbitrary folders, we can separate our code from Utility.tjs.
Remember how kirikiri has one global flat "Storages" filesystem that consists of pointers based on unique filenames? When kirikiriZ loads Utility.tjs, it loads that file with our code into the computer's memory to execute it.
Changing the Storages filesystem's file pointer for Utility.tjs at that point does nothing since Utility.tjs is already in memory executing. Initialize.tjs just finishes executing Utility.tjs and moves on to the next .tjs in the list.
Before this, that behavior was a big problem for us because we were trying to update the game's title by updating Status.tjs, the .tjs where the title is originally defined. However, updating Status.tjs was pointless since that .tjs file was executed before we could update it using Patch.xp3. As a workaround, we had to overwrite the variables defined in Status.tjs later on. We were also not able to alter the control flow of the program before.
Now that we can, we have exactly the reverse situation. Now we can load our code before Utility.tjs executes. That means we can play games like being in a position to decide the execution order before control is passed back to Initialize.tjs, and we can mess with the Storages filesystem to do things like updating that file pointer for Utility.tjs dynamically.
1. Let's copy the original unmodified Koakuma-chan no Yuuwaku/extracts/Yuu_extracts/system/Utility.tjs to MTL/Patch/data/system/Utility.tjs.
2. Then let's remove all of the original Utility.tjs code from MTL/Patch/Utility.tjs, everything below "// end of changes"
3. And finally, add a pointer at the end of MTL/Patch/Utility.tjs to MTL/Patch/data/system/Utility.tjs based upon the syntax of loading scripts found in Initialize.tjs.
Here is the entire contents of MTL/Patch/Utility.tjs.
Since kirikiri has one global flat filesystem, that final pointer to Patch.xp3/data/system/Utility.tjs is actually just the filename. Convenient, right?
Hopefully this should happen.
0. Initialize.tjs loads Patch.xp3
1. Patch.xp3 immediately overwrites Utility.tjs
2. Initialize.tjs executes that customized Utility.tjs
3. Utility.tjs updates the variables that will be used to set the title
4. Utility.tjs loads all files in the Patch.xp3 subfolders data/system, data/scenario, and parts/frame. This will update the Storages filesystem pointer for Utility.tjs to point to Patch.xp3/data/system/Utility.tjs. The other updated files will translate the game.
5. Utility.tjs adds support for Patch2.xp3 and Patch3.xp3
6. Utility.tjs executes the file that is available at the Storages filesystem pointer for "Utility.tjs" which was updated in #4 to point to Patch.xp3/data/system/Utility.tjs, the original unmodified Utility.tjs
7. After the original Utility.tjs finishes executing, control is handed back to our custom Utility.tjs script.
8. Our custom Utility.tjs script ends and hands back control to Initialize.tjs.
9. Initialize.tjs loads the next .tjs script, and so forth.
After testing, it works beautifully.
The above posts contain enough information to fully translate the game. However, the above only outlines a potential workflow.
The actual workflow is that I used some scripts to extract and insert the translations for the dialogue, hints, a few other functions, and to ocr images. They are available publicly at my Codeberg repository. A link to that repository is in my signature.
The rest of this case study covers how to perform the above steps in more efficient and automated ways using the scripts I created for this game and for working with Japanese text in general.
Part G) Automating dialogue extraction, translation, and insertion 1
The value of automation during translation workflows cannot be understated. Knowing how to automatically extract and insert text into the .ks files, and to a lesser extent the .tjs files,
- saves a lot of time copying and reinserting strings
- allows giving the translator just the strings they need to alter, thus simplifying their task dramatically
- allows validating the workflow for the entire game before attempting a full translation by doing a quick mtl to root out any scripts syntax errors, especially in the scripts towards the end of the game
- allows applying word wrap to the translation automatically
A lot of translations projects are not completed because they are simply too much work. Automation reduces that amount of work to the point where a single person can translate a VN in a reasonable time frame, excluding the translation step.
The translation step, especially for best quality translations involving humans, always takes a while, but there are still ways to reduce the burden on the translator to make them more efficient. Automation can also make it easy to include whatever information they feel is relevant to them or would make their life easier during translating. Depending on the translator, possible things to include are the speaker for each line, a romaji version, and an mtl or two to use as reference or edit from.
Most automation also requires using a command line and the local shell language proficiently. If you were getting through this case study by manually moving files around manually using Windows Explorer and using WinRar to extract shift-jis encoded archives with a GUI, that ends here. The rest of this section assumes you are very familiar with the command prompt or terminal and can run batch scripts because working at the cli is required to work productively.
To translate Koakuma-chan's dialogue
1. open the .ks file
2. look for non-empty lines that do not start with @, ;, or *
3. assume those lines are dialogue and translate them
4. reinsert the translated lines replacing the original lines
The speaker for each line is listed above each line after @Talk name=.
If the speaker name is 心の声, then leave it untranslated.
Otherwise, if the speaker name is anything else, then translate it.
A series of steps used to accomplish a task is called an algorithm. If you can follow the steps, then you can accomplish that task. Scripting and programming in general is about taking different approaches to accomplish a task, algorithms, and expressing them in a way that a computer can understand so it knows what to do. Usually that means using a programming language like C, c#, c++, java, javascript, rust, lua, or python. KirikiriZ uses c++ and tjs2.
For the reasons to automate listed above, I wrote a script that performs the above steps for Koakuma-chan's dialogue and a second smaller script for extracting some lines from Koakuma-chan's ConfigWindow.tjs. All they do is follow the same steps to translate the dialogue and extract some of the strings in ConfigWindow.tjs that were discovered in this workflow.
The scripts were written in Python which means the program requires the Python interpreter to function because interpreted languages require an interpreter to transform the program script into machine language. For the purposes of this case study, I will only mention how to use them going forward instead of how they were written.
However, for people interested in automation and programming, open the spoiler.
The easiest and most productive programming languages to learn are shell scripting languages like Windows batch used at the command prompt and bash commonly used in Linux and OSX terminals. If you know how to use a command prompt, scripting in those languages is a very natural extension since the gist of those languages is just putting the commands that would be typed into the command prompt or terminal into a plain text file and then running that file. Learning other programming languages also assumes you already have familiarity with a command line interface, so that also makes it the best starting point.
After shell languages, I would recommend Python as the next easiest and most productive language to learn. The web development languages (HTML/CSS/Javascript) are after that.
1. First, open a command prompt or terminal and type "python --version" to see if it is already installed or not. If it says command not found or invalid, then install the Python interpreter first. Consider Python 3.10-3.12.
2. Download the script translation_tools/games/Koakuma-chan no Yuuwaku/Koakuma-chanNoYuuwaku_ScriptExtractInsertTool.py to tools\. Open the file on my codeberg and then click on the "Download file" icon next to the History button at the top, or "Download Zip" under the three dots at the top right.
-- The script is just a text file. If you want to know what it does, open it in Notepad++, and read it. Knowing Python helps.
3. run the script with --help in a terminal to print out the usage information
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\Koakuma-chanNoYuuwaku_ScriptExtractInsertTool.py"
warning, pyexcel is not available. To read and write formats besides .csv, install with 'python -m pip install pyexcel pyexcel-xls pyexcel-xlsx pyexcel-ods pyexcel-ods3 pyexcel-odsr openpyxl==3.0.10'
usage: Koakuma-chanNoYuuwaku_ScriptExtractInsertTool.py [-h] [-s SPREADSHEET]
[-o OUTPUT]
[-cn CHARACTER_NAMES]
[-c COLUMN]
[-w WORDWRAP]
[-cd CSV_DIALECT] [-v]
[-d]
mode inputfile
Extract and insert text into kirikiriz .ks files after they have been
extracted from archive.xp3. Any folders in the paths must already exist. For
usage, 'python tool.py -h' version=0.1.0
positional arguments:
mode must be extract or insert
inputfile the source file to extract strings from and insert
them into
options:
-h, --help show this help message and exit
-s SPREADSHEET, --spreadsheet SPREADSHEET
the file name to read and write the extracted strings
to and from, he first row is reserved for column
headers, must be .csv if pyexcel is not installed with
'python -m pip install pyexcel pyexcel-xls pyexcel-
xlsx pyexcel-ods pyexcel-ods3 pyexcel-odsr
openpyxl==3.0.10'
-o OUTPUT, --output OUTPUT
the output file name for the resulting file, only used
for mode=insert
-cn CHARACTER_NAMES, --character_names CHARACTER_NAMES
a .csv or spreadsheet mapping the raw character name
to a translation, the first row is reserved for column
headers
-c COLUMN, --column COLUMN
the column number in the spreadsheet to use as
replacements, only used for mode=insert, starts from
1, column A is the 1st column, so enter 1, column D is
the 4th column, so enter 4, column C is 3, default is
4
-w WORDWRAP, --wordwrap WORDWRAP
word wrap setting, enter the number of characters per
line, word wrap assumes a maximum of 3 lines, default
is 51
-cd CSV_DIALECT, --csv_dialect CSV_DIALECT
specify the csv dialect when reading spreadsheet.csv
files, normal settings are used otherwise, ignored for
non .csv formats, valid options are unix, excel,
excel-tab
-v, --version print version information
-d, --debug print debug information
In other words, the minimum syntax is
Code:
python tool.py mode file.ks
"mode" can be either to extract strings into a spreadsheet for translation or to insert the translated strings back into the original file by reading from the spreadsheet. The warning about pyexcel means that spreadsheets must be in comma separated value (.csv) format unless pyexcel is installed. To import/export other spreadsheet formats, like .ods, .xlsx, .xls, install pyexcel using the command above, but that is not necessary when working with only .csv files.
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\Kaokum
a-chanNoYuuwaku_ScriptExtractInsertTool.py" extract "C:\Users\User\Desktop\Koaku
ma-chan no Yuuwaku\extracts\Yuu_extracts2\00_01.ks"
Running it makes it complain that lots of names were not found in the character spreadsheet. That makes sense because we did not specify a character spreadsheet to use for the translated names, so the script does not know how to map the raw Japanese names to their translations.
Opening the .csv in LibreOffice Calc gives this window. I am using LibreOffice, but other spreadsheet software also works, probably.
.csv spreadsheets are actually just normal plain text files. Asking spreadsheet software like LibreOffice Calc to interpret that textfile as a spreadsheet requires telling Calc how it is formatted. Some .csv files are actually mislabeled tab separated value files (.tsv), or use a different string quoting syntax, so Calc does not really know how to open generic .csv files.
The comma separated value file was made using commas, so be sure to check "Comma" in the UI. Leave the String delimiter as ". Everything else should be unchecked and left as defaults. If you are not sure, open the .csv in Notepad++ and check how the file is structured, then insert that structure into the Libre Office Calc's dialogue box.
To skip this very annoying step, either open the .csv file in Notepad++ or use the --spreadsheet option with a different spreadsheet format. Using other formats does require installing pyexcel. After installing pyexcel, using --spreadsheet with 00_01.ks.csv, 00_01.ks.ods, 00_01.ks.xls, 00_01.ks.xlsx will all work. Without that, just resign yourself to always specifying the format in the spreadsheet software.
The rest of this case study will assume .csv for the spreadsheet format. And not all the scripts I am using for automation support pyexcel, so using .csv is somewhat unavoidable.
Code:
python -m pip install pyexcel pyexcel-xls pyexcel-xlsx pyexcel-ods pyexcel-ods3 pyexcel-odsr openpyxl==3.0.10
python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\Kaokum
a-chanNoYuuwaku_ScriptExtractInsertTool.py" extract "C:\Users\User\Desktop\Koaku
ma-chan no Yuuwaku\extracts\Yuu_extracts2\00_01.ks" --spreadsheet "C:\Users\User
\Desktop\Koakuma-chan no Yuuwaku\extracts\Yuu_extracts2\00_01.ks.ods"
Here is what the data looks like after clicking on OK with the correct options.
The first column is the untranslated strings. The second is what character spoke that line for context. Column C is metadata required to reinsert the strings in the correct locations. Column D is empty, and that is where to insert a translation for whatever is in column A.
Before we even start translating or altering the 00_01.ks.csv file, the first step is to change the Japanese character names to their latin equivalents so we know who is talking. That is useful later on when editing the translated lines.
Create a new .csv file. Open Windows Explorer to Desktop\Koakuma-chan no Yuuwaku, right-click in an empty area, New, Text Document. Name it "Koakuma-chan no Yuuwaku character names.csv" or similar. Open it in Notepad++. Enter this as the first line.
Code:
japanese name,english name
The first line is the column labels. For the second line onward, starting adding the strings found in column B of the spreadsheet followed by a comma. Make sure each entry in unique, which means do not add duplicate ones.
Code:
japanese name,english name
心の声,
澄哉,
芽衣,
Next translate the character names using any documentation available. For Koakuma-chan, that means vndb.org/v48419. The Character tab on vndb says the names are Suzumori Mei 鈴森芽衣 and Otonashi Sumiya 鳴無澄哉. VNDB lists names in proper japanese order, so 澄哉 should be Sumiya and 芽衣 should be Mei. Apparently, Otonashi Sumiya is the protagonist of this game.
Code:
japanese name,english name
心の声,
澄哉,Sumiya
芽衣,Mei
Do not add spaces between the latin names and the commas. 心の声 is not in vndb's character database. Transating it results in "Voice of the Heart", "The Voice of the Heart". So basically, an inner voice.
Code:
japanese name,english name
心の声,Inner Voice
澄哉,Sumiya
芽衣,Mei
Next, delete the 00_01.ks.csv file. Rerun the command earlier, but this time, add --character_names characternames.csv.
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\Kaokum
a-chanNoYuuwaku_ScriptExtractInsertTool.py" extract "C:\Users\User\Desktop\Koaku
ma-chan no Yuuwaku\extracts\Yuu_extracts2\00_01.ks" --spreadsheet "C:\Users\User
\Desktop\Koakuma-chan no Yuuwaku\extracts\Yuu_extracts2\00_01.ks.csv" --charact
er_names "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\Koakuma-chan no Yuuwaku
character names.csv"
That should translate all of the character names so the resulting .csv file looks like this in Notepad++. Remember that each comma separates a different column.
Next, run the tool on every script in the directory. More character names will show up as untranslated. Get the untranslated names, add them to the character_names.csv, add latin versions to each one, and then regenerate the .csv files until every character name has been added for every script and no more mentions of missing character names display. The end result of that process is this .csv.
There is another character named Rinne and apparently and Mei and Sumiya's moms show up. How romantic. -_- I am not sure what is going on with those Jump and Bow names, but the scripts will crash if there is a problem, so let's just keep going and pretend nothing is wrong for now.
However, running the command on every .ks script is actually really tedious. Let's automate that too. Create a new file called "extract.cmd" with these contents.
Code:
@echo off
set working_directory=C:\Users\User\Desktop\Koakuma-chan no Yuuwaku
set tool=%working_directory%\tools\Koakuma-chanNoYuuwaku_ScriptExtractInsertTool.py
set names_list=%working_directory%\Koakuma-chan no Yuuwaku character names.csv
for /f %%i in ('dir /b "%working_directory%\MTL\Patch\*.ks"') do python "%tool%" extract "%working_directory%\MTL\Patch\%%i" --character_names "%names_list%"
Adjust the paths as appropriate for your environment. As errors pop up from missing names,
- stop the command prompt using ctrl+c + y + enter
- get the missing names added and translated to the names_list
- delete all the spreadsheets with del *.csv
- keep regenerating the spreadsheets until there are no more missing name errors creating them
Move the correctly generated spreadsheets into their own subfolder to keep things tidy.
Code:
cd C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\Patch
mkdir *.csv
move *.csv csv
move csv ..\csv
That should move the 48 spreadsheets to Koakuma-chan no Yuuwaku\MTL\csv. There are actually 49 .ks files, but macro.ks has no translatable text according to the dialogue extraction script. Is that correct? Try opening up macro.ks with Notepad++ to check if you are curious.
To tool itself has a line that says
Code:
if len(extracted_string_list) > 0:
and then some code about writing output files. In other words, if the number of extracted strings from the .ks file is not greater than 0, then no output will be written. If there are no translatable strings in the .ks file, then output.csv will not be generated.
Anyway, the next step before attempting a final translation is to validate the workflow so we know the scripts we are handing over to a translator are actually valid to use during translation. Basically, we need to check to see if there are any parsing errors when using the tool.
The best way to do that is to simulate inserting the final translation. The easiest way to simulate inserting a final translation is to machine translate every line and insert the poorly translated lines back into the game. Then we can set the skip setting to "on" in Koakuma-chan's config menu and let the game run to see if it crashes. If it does not crash, then we can hand over the scripts to a translator.
This step is important so we do not waste other people's time, especially that of a highly valuable human translator or editor. If the texts or translations need to be formatted differently, then we should figure that out first before asking for someone else's time. In the worse case scenario, not validating the workflow means wasting your time or your coworkers by having to transfer the work done to a different workflow.
Since some translators and editors want baselines to work from, let's add both romaji and a basic translation when validating the workflow. If they do not want those included in the spreadsheets, then we can easily hand over backups of the spreadsheets without that information present or even regenerate them from scratch to cater to their preferences.
For romaji, the Cutlet python library using Unidic can do this accurately. Open a command prompt and enter the following.
Downloading unidic (0.5 GB) will take a while. While that is downloading, also download the script "romaji.py" to tools\. To download romaji.py, open the file url on codeberg and then click on the "Download file" icon next to the History button at the top.
Here is the --help from romaji.py that uses Cutlet.
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\romaji
.py"
Usage: romaji.py string_or_csv [-s] [-h]
Prints the romaji-hepburn of Japanese input text by using the python library Cut
let and unidic. If a string is specified, output is returned to stdout, otherwis
e if a .csv is specified, then the first column is used as data and output is wr
itten to the last column. .csv files are assumed to have headers.
Positional arguments:
string_or_csv the string or .csv to display as romaji
Options:
-h, --help show this help message and exit
-s, --same_output for .csv files, append data to the existing file
In other words, to add romaji to the existing csv files, we can do,
Code:
python romaji.py output.csv -s
The romaji.py file only works on .csv files since I never bothered to add pyexcel support to it. Anyway, wait until unidic is fully downloaded, and then run it over one .csv file to make sure it works.
Here is what 00_01.ks.csv looks like afterwards.
Code:
texts,character,metadata,cutlet_hepburn
西日の射す廊下に自分の足音が響く。,InnerVoice,31_1,Nishibi no sasu rouka ni jibun no ashioto ga hibiku.
夕方に響く、ひとりぶんの足音が嫌いだ。,InnerVoice,35_1,"Yuugata ni hibiku, hitori bun no ashioto ga kirai da."
[...]
「……ここにもいないか」,Sumiya,73_1,"""...... koko ni mo inai ka"""
がらんとした教室を見回して呟く。,InnerVoice,77_1,Garan to shita kyoushitsu wo mimawashite tsubuyaku.
"やけに広い教室に響く自分の声は、
どこか頼りなく弱々しい。",InnerVoice,81_2,"Yake ni hiroi kyoushitsu ni hibiku jibun no koe wa, doko ka tayorinaku yowayowashii."
「あーっ、やっと来た!」,Mei,96_1,"""aa,, yatto kita!"""
Triple double quotes """ is how csv files escape quotes while also using them for syntax. The rules governing how to display and read them since it seems oddly inconsistent to me sometimes, but the software understands it. As long as the software creating and reading them understands the syntax, we do not have to worry about it too much. If you are wondering what the actual data is after all the odd """ formatting and escaping, then open it in Libre Office Calc.
So, to add romaji to all of .csv files all at once, create Koakuma-chan no Yuuwaku\tools\add_romaji.cmd with the following contents.
Code:
@echo off
set working_directory=C:\Users\User\Desktop\Koakuma-chan no Yuuwaku
set tool=%working_directory%\tools\romaji.py
for /f %%i in ('dir /b "%working_directory%\MTL\csv\*.csv"') do python "%tool%" "%working_directory%\MTL\csv\%%i" -s"
Run insert_romaji.cmd, In the unlikely case it worked perfectly, we can move on.
Now that the .csv files all have romaji input, the next step is to create a basic mtl. I have not really found any good software to do this fully automatically.
- All of the Python libraries that support web hooks are not in compliance with EULAs for the services they are hooking. Violating service agreements means IP bans without any room to complain and the software breaking whenever the web provider changes their server code.
- The DeepL library requires an API key which, while technically free, is not actually available in many parts of the world.
- The same is true for all other cloud translation APIs. They all require API keys, especially the AI ones.
- I tried Translator++, but it distorts the input spreadsheets heavily regardless of the translation backend. It required duplicating the spreadsheets, loading them in T++, copying the data out of the program, and then fixing the broken output. Using T++ for translating is just a miserable experience honestly, especially for highly automated workflows like the one we are trying to build.
- SLR Translator does not even support spreadsheets yet. Technically it can use the same spreadsheet parser as T++, but then it just duplicates all the same problems.
- Sugoi Toolkit by MingShiba supposedly supports text files, not spreadsheets, but I have not looked into it much because it supports text files, not spreadsheets, and it does not support non-English translations. There was also a lot of code broken and removed code when going from v8 to v9. When coupled with the lack of documentation, getting anything to work with the official toolkit is error prone to say the least.
- There is a textractor plugin for Atlas translation software that is now discontinued and also it was only ever commercial available close source software.
If anyone knows of a more automated method to translate arbitrary amounts of text from Japanese besides Sugoi Toolkit's NMT model or my Sugoi Repackage of it, then please mention it, especially for other languages besides English. There might be something useful on HuggingFace, but I have no clue on how to get any of their stuff to actually work without other people's software providing templates. That is way beyond what I understand.
I think right now the only way to do endless Japanese to English translations reliably in an automated way is just to host the Sugoi model on the local system and write scripts to interact with it based upon the available published syntax. Other languages besides English? No idea.
address='http://localhost:14366'
import requests
data_to_translate=[]
json_dictionary=dict([('content',data_to_translate),('message','translate sentences')])
returned_list=requests.post(address, json=json_dictionary).json()
assert(len(returned_list)==len(data_to_translate))
for entry in returned_list:
# do stuff
The client software, sugoi_mtl.py, requires a sugoi server to process the results. One server is available from Sugoi Toolkit at "Sugoi_Toolkit_V9.5_645455\Code\backendServer\Program-Backend\Sugoi-Japanese-Translator\offlineTranslation\activateOfflineTranslationServer.bat". Sugoi Toolkit is available to download on the 15-16th of every month from MingShiba's Patreon about page. The other option is to use my Sugoi Repackage which was linked above. See the thread for information on how to get it running.
Once the server software is running, here is the sugoi_mtl.py --help output.
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\sugoi_
mtl.py" --help
Usage: sugoi_mtl.py string_or_csv [-s] [-h]
Uses Sugoi Toolkit to translate Japanese input to English. If a .csv is input,
then the first column is used as data and output is written to the last column.
Otherwise, the input is returned to stdout as output. .csv files are assumed to
have headers. The address of the Sugoi server is hardcoded as localhost.
Positional arguments:
string_or_csv the string or .csv to translate
Options:
-h, --help show this help message and exit
-s, --same_output for .csv files, append data to the existing file
So to add sugoi_mtl translations to one of the .csv files, we do,
Code:
python sugoi_mtl.py output.csv -s
The interface looks the same as before because I copy-pasted most of the code romaji.py code to create the sugoi_mtl.py script. Most programming is actually just copy pasting existing code.
Now, perform the same process as before. With the server running, run the client script on one file to make sure it works.
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\sugoi_mtl.py" "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\csv\00_01.ks.csv" -s
It will take a while to process. Once it is done, it will say that it wrote 00_01.ks.csv, and it should look like this in Calc.
Columns A-E have stuff. The final output should go into column F with a column named "Output" or "Final" or whatever. Since we are just validating the workflow, let's use column E as output. Column E is is the 5th column from left to right.
But first, let's queue the translation of the other files. Create a new file called "Koakuma-chan no Yuuwaku\tools\add_sugoi_mtl.cmd" with the following contents, and then run it.
Code:
@echo off
set working_directory=C:\Users\User\Desktop\Koakuma-chan no Yuuwaku
set tool=%working_directory%\tools\sugoi_mtl.py
for /f %%i in ('dir /b "%working_directory%\MTL\csv\*.csv"') do python "%tool%" "%working_directory%\MTL\csv\%%i" -s"
Part H) Automating dialogue extraction, translation, and insertion 2
Running the Sugoi model without cuda will take a very long while. With cuda, it can finish in <10 min depending upon how the server is configured and the available hardware. While that is running, let's figure out how to reinsert the scripts for the first 00_01.ks file using the data in 00_01.ks.csv.
To check the syntax for the ScriptExtractInsertTool.py, append --help.
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\Koakuma-chanNoYuuwaku_ScriptExtractInsertTool.py" --help
The tool needs the .csv spreadsheets to be either in the same folder as the .ks or specified with --spreadsheet. Since we are about to make alterations to all the .ks files, we also need to figure out how to not overwrite the originals. The --output option can help with that. And the --column option can be used to pick which column in the spreadsheet will be used for insertions. Column E is 5.
Let's try doing,
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\Koakuma-chanNoYuuwaku_ScriptExtractInsertTool.py" insert "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\extracts\Yuu_extracts\00_01.ks" --character_names "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\Koakuma-chan no Yuuwaku character names.csv" --column 5 --output "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\Patch\00_01.ks" --spreadsheet "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\csv\00_01.ks.csv"
These commands are getting long and illegible. To see if it worked or not, remake Patch.xp3 using KirikiriTools's Xp3Patch.exe and load it into the game.
So, the good part was that it worked, and the correct word wrap was applied automatically. The problem is that 心の声 InnerVoice was also translated as a character name and now it is showing up in game. In the original scripts, if a character name=心の声, then the game would not display it, so we need to revert this behavior in the automated workflow.
To stop that from happening, we need to tell the insertion script to stop doing it for 心の声. Instead of messing with code, one way is to duplicate "Koakuma-chan no Yuuwaku character names.csv" and remove the 心の声,InnerVoice entry so the insertion script does not know how to translate that sequence of characters which results in it leaving it alone.
Re-run the above command but with the altered character .csv.
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\Koakuma-chanNoYuuwaku_ScriptExtractInsertTool.py" insert "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\extracts\Yuu_extracts\00_01.ks" --character_names "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\Koakuma-chan no Yuuwaku character names2.csv" --column 5 --output "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\Patch\00_01.ks" --spreadsheet "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\csv\00_01.ks.csv"
>translated file already exists C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\MTL\Patch\00_01.ks
Right, right. Delete the old Patch\00_01.ks first.
Better. Now we need to check to see if there are any parsing errors, so play through the game until untranslated dialogue begins again. Enabling Options - "Skip all messages" helps a lot here.
This is why it is useful to QC the scripts. In 00_01.ks.csv, the lines are,
Code:
何か――例えば、孤独の記憶、とか。
Nan ka―― tatoeba, kodoku no kioku, to ka.
Something—for example, a memory of loneliness.
Here is the line from the translated 00_01.ks file.
Code:
@Talk name=心の声
Something\N{EM DASH}for example, a memory of loneliness.
@Hitret id=9
Here is the untranslated line from 00_01.ks.
Code:
@Talk name=心の声
何か――例えば、孤独の記憶、とか。
@Hitret id=9
So the error is about two ― getting turned into — and then turned into \N{EM DASH}. In other words, the error is not from the game's rendering of the scripts since the script.ks itself also has the error. This is an error generated when going from the .csv to the final generated .ks.
The original shift-jis .ks script uses ―, full width em dash (U+2014). Giving that to Sugoi translated it to ―, the normal half width em dash, which python does not automatically covert back to the full width version when writing back the output to the generated translated.ks. However, the half width em dash does not have any valid representation in shift-jis encoding which results in a character encoding error.
The immediate fix is obvious. Just replace the half width em dash ― with the full width version ―. That does not truly solve the problem though since the objective here is automation, fixing all possible instances of this problem, not just the 9th text entry in the game. A comparable fix needs to be applied to every character encoding error for all of the files and preferably automatically. How? Here are some options.
- Do nothing. The game still works. While it is distracting, as long as the game does not crash, that ultimately means this is purely a cosmetic error. Low quality non-solution.
- Remove or replace characters that are not valid in the destination encoding with different characters such as an empty string '' or a blank space ' '. This is actually what Python already did. It replaced the half width em dash — with \N{EM DASH} which are individually all valid characters in shift-jis, thereby allowing the script to output despite the usually fatal character encoding error. Automated low quality solution since this tends to output either merged words "Somethingfor example" or double spaces "Something for example" but at least it would be less distracting than \N{EM DASH}. Python can also insert question marks ? automatically on encoding errors instead of \n{EM DASH} along with a few other alternative error schemes.
- Leave the characters as-is and manually search for encoding mistakes in the finished scripts. This is the least automated option and not really reasonable for non english languages where character encoding errors happen constantly due to accents and grave characters on everything. For languages with very few instances of this happening, like english, there should only be a dozen or so of these errors at most. If there are more than that, then it tends to be one particular character appearing again and again. In that case, applying a fix in the insertion_script.py to change that one character to another one should reduce the errors down to a manageable number. This is potentially very high quality at the cost of reduced or no automation.
- Use a different encoding that the game understands besides shift-jis. Ideally this different encoding should support more characters like the half width em dash. For kirikiri that means using utf-16 le. Shift-jis was really only ever intended to work with japanese and a subset of english characters, so if the game understands any other encoding, then it is more easily translatable into other languages, especially non english ones. The issue with this approach is that the scripts were originally in shift-jis and the game assumes they will be in that format, not in utf-16 le. The core kirikiriZ engine supports utf-16 le, so that aspect of the game may work as intended, but perhaps the developer included a third party plugin somewhere that creates a conflict because it requires shift-jis? There is no real way to know if changing the encoding will work without errors besides testing. However, changing the encoding complicates the debugging as well because then two things are changing at once in the dialogue scripts, the encoding and the newly inserted text. If the game crashes, is the issue the encoding or the alterations? Alternatively, converting the scripts to utf-16 le without any translation applied would allow testing for utf-16 le encoding errors but doubles or triples the number of debugging passes needed to check for crashes.
Non english translations do not really have a choice but to change the shift-jis encoded .ks files to utf-16 le and deal with the more complicated debugging. However, since I am doing english and shift-jis supports most english characters, I would strongly prefer leaving the encoding as shift-jis since that is the format they were originally in. Unless changing something is required for translation, I would rather leave it alone even at the cost of reduced or no automation.
My chosen solution to fix the above error is to search all of the .ks files for \N{ in Notepad++ and deal with every character encoding error manually. For english, this is reasonable, but not for like portuguese which uses ã, á, é, ç, ú constantly, and cyrillic or ideogram based languages.
Assuming we wanted to change the encoding to utf-16 le, how would we do it?
utf-16 means 1) changing the encoding of the unmodified .ks files to create a Patch.xp3 based upon the new encoding without any additional changes to test for utf-16 le errors separately from modifications 2) telling the ScriptExtractInsertTool.py to always create them using utf-16 le which can be used to test for automatic insertion errors.
- Create a copy of the Koakuma-chan no Yuuwaku\extracts\Yuu_extracts\*.ks files.
- Open the 48 unmodified .ks files in Notepad++.
- Select encoding and convert to UTF-16 LE BOM.
- Ctrl+s to save the file.
- Ctrl+w to close the file.
- Change the encoding of the next one.
There are ways to automate the above using python of course. Create a Patch.xp3 based upon those scripts and play through the game to check if anything crashes.
Next we need to tell ScriptExtractInsertTool.py to always output utf-16 le. In the ScriptExtractInsertTool.py, the lines responsible for generating the output are these towards the end of the file.
Code:
# write to file
with open(input.output, 'w',encoding=ks_file_encoding,newline='',errors='namereplace') as file:
for line in outputfile_as_a_list:
file.write(line + '\r\n')
As background information, generally character encoding errors in python results in a fatal crash since the default error handling method is "strict" meaning no error handling at all. The errors='namereplace' argument to open() makes it so character encoding errors do not result in an error but instead replace the data with other characters silently, so — turns into \N{EM DASH}. This results in the insertion script inserting the characters properly and the game not crashing when rendering the script, but the end result is still the incorrect characters that you see above. Here are the alternative error handlers.
The actual encoding is determined by "encoding=ks_file_encoding". Earlier in the file, that was defined as "ks_file_encoding='shift-jis'" meaning that files are being output as shift-jis. Here are the standard encodings available. Plain utf-16 le is referred to as "utf_16_le" and "UTF-16LE".
- https://en.wikipedia.org/wiki/UTF-16
- https://www.ibm.com/docs/en/i/7.1?topic=unicode-ucs-2-its-relationship-utf-16
- https://www.unicode.org/faq/utf_bom.html
- When Garbro and notepad.exe (not Notepad++) say "Unicode," they actually mean utf-16-le-bom.
- "Unicode big endian" is actually utf-16-be-bom.
- "UTF-8" is actually utf-8-bom. Windows always uses a byte order mark (bom) even though including one "makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order." -unicode.org and including one unnecessarily breaks compatibility with ascii in situations where there is no technical need to do so.
- ANSI is actually whatever the local Windows/ANSI code page happens to be which depends on the locale. It would have been nice for Microsoft to implement some code to detect and display the actual code page name dynamically to save the planet a lot of encoding headaches and confusion stemming from their software.
Garbro was coded in C# which is a programming language intended for Microsoft Windows development using the .Net framework, so it shares some of the same bugs and quirks as notepad.exe.
Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8' is a valid alias for the 'utf_8' codec.
So we can also use UTF-16-LE and utf_16le and they should work too. utf-16le is confusing since it is missing a delimiter between the endian format and the bit length, so let's use "utf-16-le" for clarity. Update the ScriptExtractInsertTool.py script so it looks like this and save it.
Code:
# write to file
with open(input.output, 'w',encoding='utf-16-le',newline='',errors='namereplace') as file:
for line in outputfile_as_a_list:
file.write(line + '\r\n')
To save the old code to make it clearer what was changed and to revert back to the original if something goes wrong, one way is to keep the original code as a comment. Python comments start with #. Leading whitespace is ignored for comments.
Code:
# write to file
#with open(input.output, 'w',encoding=ks_file_encoding,newline='',errors='namereplace') as file:
with open(input.output, 'w',encoding='utf-16-le',newline='',errors='namereplace') as file:
for line in outputfile_as_a_list:
file.write(line + '\r\n')
Hummm. So Notepad++ is displaying a lot of garbage null bytes but notepad.exe works fine. Here is what the resulting file looks like in wxhexeditor.
That is little endian utf-16 no bom without a doubt. utf-16 is always 2 bytes long, so 4 hex characters. In big endian, bytes are read from left to right so 3b 00 or b'\x3b\x00' (left to right) means 㬀 in utf-16-be. In little endian, which is more common, bytes are read from right to left, so 3b 00 means b'\x00\x3b' (in left to right) which means ; in utf-16-le. Incidentally, \x3b in decimal is 59 which means ; in ascii which is the first character of the file's data.
If the BOM is missing, RFC 2781 recommends that big-endian (BE) encoding be assumed. In practice, due to Windows using little-endian (LE) order by default, many applications assume little-endian encoding. It is also reliable to detect endianness by looking for null bytes, on the assumption that characters less than U+0100 are very common. If more even bytes (starting at 0) are null, then it is big-endian.
[...]
Clause D98 of conformance (section 3.10) of the Unicode standard states, "The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian." Whether or not a higher-level protocol is in force is open to interpretation. Files local to a computer for which the native byte ordering is little-endian, for example, might be argued to be encoded as UTF-16LE implicitly. Therefore, the presumption of big-endian is widely ignored. The W3C/WHATWG encoding standard used in HTML5 specifies that content labelled either "utf-16" or "utf-16le" are to be interpreted as little-endian "to deal with deployed content".
And kirikiriZ itself just restarts when trying to read 00_01.ks encoded as utf-16-le no bom.
So what is wrong with Notepad++ and kirikiriZ? Well, one pattern is that both notepad.exe and vscode are written by Microsoft. It seems their software assumes utf-16 is always little endian unless a bom is present. The other important part is that utf-16 always needs a byte order determined, unlike utf-8, ascii, and shift-jis which have a well defined or no byte order and should not have a bom. Notepad++ does not even try to detect the byte order, and kirikiriZ's kcompact2 needs the bom or it will assume shift-jis encoding probably.
Since I was curious, opening utf-16-be no bom in Notepad++ and notepad.exe produced invalid output, but vscodium detected it properly. KirikiriZ produced a very fun looking blank error box with "OK" before crashing.
In other words, whenever using utf-16, we need to always add a bom or a lot of software breaks even though the standards state including a bom is not necessary to produce valid utf-16 since it is possible to reliably autodetect the byte order dynamically. To add it, we can add the literal bom as file.write('\ufffe') for little endian or file.write('\ufeff') for big endian, or use the equivalent in the codecs library "codecs.BOM_UTF16_LE", "codecs.BOM_UTF16_BE".
Since UTF-16 should always be written with a bom for compatibility reasons, python shortens "utf-16-le-bom" to just "utf-16".
Code:
with open(input.output, 'wt',encoding='utf-16',newline='',errors='namereplace') as file:
for line in outputfile_as_a_list:
file.write(line + '\r\n')
The above code produces this output.
That is definitely utf-16-le-bom based upon the presence of the \xff\xfe bom and every character being 2 bytes long.
However, that creates another problem. Python has this history of creating text files based upon the local computer's local environment which is why Windows line endings are hardcoded in the script as +'\r\n' instead of letting Python handle automatically since it might pick unix line endings \n on some platforms, like unix-like ones. Python happened to create utf-16-le-bom on my local system, but what if Python creates utf-16-be-bom by default when running the script.py tool?
This is how kirikiriZ handles utf-16-be-bom.
Not pretty. notepad.exe and Notepad++ can read utf-16-be-bom just fine, but kirikiriZ requires utf-16-le-bom, or at least it does when running on a local computer that uses little endian. What should we do if we wanted to create a specific variant of utf-16 such as utf-16-le-bom on a big endian computer or the reverse, create utf-16-be-bom on a little endian computer? Ideally, the code should make sure Python can't mess up the output by relying on the local computer's settings which are subject to change.
There is probably a better way, but here one example of platform independent utf-16-le-bom in Python.
Code:
# write to file
import codecs
with open(input.output, 'wb') as file:
file.write(codecs.BOM_UTF16_LE)
with open(input.output, 'at',encoding='utf-16-le',newline='',errors='namereplace') as file:
for line in outputfile_as_a_list:
file.write(line + '\r\n')
And big endian utf-16.
Code:
# write to file
import codecs
with open(input.output, 'wb') as file:
file.write(codecs.BOM_UTF16_BE)
with open(input.output, 'at',encoding='utf-16-be',newline='',errors='namereplace') as file:
for line in outputfile_as_a_list:
file.write(line + '\r\n')
Isn't working with utf-16 great? It supports so many languages!
Whenever using utf-16 with kirikiri, double check these two things.
- it must include a bom \xfe\xff or \xff\xfe to indicate endianness
- both the bom and the data must be in little endian byte order \xff\xfe meaning that bytes are read from right to left, at least on little endian computers
In other words, it must be utf-16-le-bom. In practice, almost all utf-16 for files and programs is actually utf-16-le-bom, like in Notepad++, because most pc's are little endian and most software that creates it will automatically include a bom, e.g. Python's open(file,encoding='utf-16') adds it automatically. In contrast, big endian is used a lot with networking protocols.
Now that we took the time to learn more about utf-16 handling than anyone ever wanted to know, the 48 .csv scripts should have finished processing by now and have Sugoi NMT translations.
Let's make a copy of extract.cmd. Rename it to "insert.cmd".
Code:
@echo off
set working_directory=%userprofile%/Desktop/Koakuma-chan no Yuuwaku
set tool=%working_directory%/tools/Koakuma-chanNoYuuwaku_ScriptExtractInsertTool.py
set names_list=%working_directory%/Koakuma-chan no Yuuwaku character names2.csv
set spreadsheets_directory=%working_directory%/MTL/csv
set output_directory=%working_directory%/MTL/Patch
for /f %%i in ('dir /b "%working_directory%\extracts\Yuu_extracts\*.ks"') do python "%tool%" insert "%working_directory%/extracts/Yuu_extracts/%%i" --character_names "%names_list%" --spreadsheet "%spreadsheets_directory%/%%i.csv" --output "%output_directory%/%%i" --column 5
Delete the .ks files under Koakuma-chan no Yuuwaku/MTL/Patch/*.ks and then run the script.
It should print out lots of warnings about word wrap not working right since some of the lines are too long and output the files to Koakuma-chan no Yuuwaku/MTL/Patch. The word wrap errors should not be fixed now, but it would be nice to fix them before publishing the final Patch.xp3 once the final translation is available in the spreadsheets.
The important part is to count the number of .ks files in MTL/Patch/*.ks and make sure there are 48 of them, the same number of scripts that have dialogue in them from the extracted/Yuu_extracts folder.
Paths in the Windows can almost always use either / or \. Windows defaults to \ but / also works transparently. Even mixing the syntax works fine.
Code:
dir "C:/Users/User\Desktop/Koakuma-chan no Yuuwaku"
The exception is inside the () in the for loop in batch. Those have to be \. The batch language has a lot of quirks. Batch transparently accepting / unless inside of a for loop's () is one of them apparently.
Is that em dash displaying correctly in either shift-jis or utf-16-le-bom not the most glorious thing you have ever seen?
Now we can set the skip setting to "on" in Koakuma-chan's config menu and let the game run to see if it crashes.
While that is running, let's work on automating the translations for the ConfigWindow.tjs since that has the bulk of the engine strings that we need to translate. Does it makes sense to automate this? Not really, but the alternative to copy/pasting each one. Working on automating the extracting and reinserting of random strings is more interesting, so I did that instead. Sometimes I automate stuff anyway for whatever reason, so now there is a script for it. Download codeberg/translation_tools/games/Koakuma-chan no Yuuwaku/extract_hints.py.
Code:
python extract_hints.py --help
Usage: extract_hints.py filename [-h]
Extract and insert strings from files if those strings are are surrounded in quo
tes and have 'hint' and a few other keywords in the line. Output is read and wri
tten from filename.csv. The third column is reserved for the translation. This
is meant to be run on ConfigWindow.tjs
Positional arguments:
mode must be 'extract' or 'insert'
filename the file to parse
Options:
-h, --help show this help message and exit
Run it on ConfigWindow.tjs.
Code:
C:\Users\User>python "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools\extrac
t_hints.py" extract "C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\extracts\Yuu_
extracts\system\configwindow.tjs"
number of double quotes missmatch at line 317 : 8
number of double quotes missmatch at line 342 : 6
Those printed out line numbers probably have translatable text but were complicated to parse, so no extraction was done for those lines. Indexes start at 0 in python, so line 317 in python is actually 318 in Notepad++ because N++ line numbers start at 1, not 0. Here is 317.
Nothin' here. Recalling earlier about not all game engine strings being translatable, it is probably not a good idea to translate every extracted string. However, for automation purposes, we can translate everything and delete whatever should not have a translation.
Just like before, add the romaji and then the translation. One minor issue is that I wrote extract_hints.py pretty fast, so it does not consider spreadsheet headers even though the other scripts do. So, open up ConfigWindow.tjs.csv and add a lone at the top like "data,line," before running these next commands.
Code:
cd "Desktop\Koakuma-chan no Yuuwaku\tools"
python romaji.py ..\extracts\Yuu_extracts\system\ConfigWindow.tjs.csv -s
python sugoi_mtl.py ..\extracts\Yuu_extracts\system\ConfigWindow.tjs.csv -s
[config.tjs.csv.automation1.png - maxed image limit reached- see the top of the next post for this picture ]
It looks like my column detection code messed up, so the columns are in the wrong slots. I will need to fix later. In addition, the columns are also in the wrong order for the extract_hits.py since that tool needs the translated strings in column C.
To change the order of columns in Libre Office Calc, do the following.
1. Left click on the letter of the column that should be moved. The entire column should now be selected.
2. Hold the left Alt button on the keyboard.
3. While holding the left Alt button, left click and hold on any cell in the selected column.
4. Drag the column to the correct slot as indicated by the vertical line on left.
5. Let go of the left mouse button.
6. Let go of the alt button on the keyboard.
Using the above procedure, put the columns in order of data, line, sugoi_mtl, cutlet_hepburn. Then scan the sugoi_mtl column for anything that should not be translated or was improperly translated. Improperly translated here means that it contains invalid symbols that will crash the game. We can deal with translation quality later.
Spreadsheet row 76 has
Code:
ブラウザで『こあくまちゃんの誘惑っ!』の公式サイトを開きます
getting translated to
Code:
the browser, I open the official site forKoakuma-chan's Temptation"
The " at the end of the translation would result in an extra quotation mark after the quotation mark that closes the string once the text is inserted again "creating a string that looks like this"". The odds of tjs2 handling that without crashing is low enough to not be worth considering.
ブラウザでHTMLマニュアルを閲覧します getting translated to "I'm going to browse through the browser's holographic manual" should work normally. It does have two single quotes but they should display properly while inside of the double quotes. Same with (). Incidentally, that translation is so hilariously bad that I am leaving it alone.
After fixing the quotation mark error, delete the header row, and save the file.
Code:
cd "Desktop\Koakuma-chan no Yuuwaku\tools"
python extract_hints.py insert ..\extracts\Yuu_extracts\system\ConfigWindow.tjs
wrote: ..\extracts\Yuu_extracts\system\ConfigWindow.tjs.translated.tjs
Now let's move ConfigWindow.tjs.translated.tjs to Koakuma-chan no Yuuwaku/MTL/Patch/ConfigWindow.tjs and create Patch.xp3 again.
During a quality control pass, I later found spreadsheet row 71, ConfigWindow.tjs line 314 "お風呂エコーボイスを再生します" was extracted for translation but I could not find it in the UI, so I removed the translation for it. The same happened for spreadsheet lines 84, 87, and 90. The less gets translated, the less potential for crashes.
Here is config.tjs.csv.automation1.png from the previous post.
Part I) Automating OCR
Image translations really just means automating and improving the accuracy of ocr since further automation beyond that is currently of very questionable quality.
Still, compared to text, translating images is very tedious, so anything that can help make it less tedious while still being accurate is welcome. Translating images requires
1. running ocr software on the image
2. translating based upon the ocr text
3. inserting the translation into the original image
Do note that I have not worked on translating images very much personally. There are definitely a lot better ways to extract images to run though ocr software then the Windows Snipping Tool. Once the pictures are identified and extracted from the large image, the rest is mostly automatic until step 3, which is where the challenge to automate really begins.
The current fully automated ocr workflows that I have seen that both ocr and replace text in images are not very good.
- Ocr errors are still very common, even with state of the art ocr software.
- Translating Japanese text requires context and small strings obtained from ocr have no text surrounding them which makes translation quality very poor. When combined with even a single ocr error, this makes the translated text nonsense half the time.
- Automatically replaced text in images often looks bad and distracting. Just leaving the original text and providing external translation guides can provide a better user experience than distracting images.
However, even if there are limits to what we can automate, there are still some improvements we can perform for each step to make our lives easier. For the first step of running ocr software, getting local ocr automation to work can increase the speed of getting back the results and local ocr also eliminates the need to upload images somewhere. With a good local model, the quality is also often better than many online ocr translators and enables running ocr on batches of images instead of one at a time.
As an example of a quality comparison, tesseract and yandex and are pretty good ocr for convenience, but they are almost never perfect.
Code:
ocr software, ocr result, translation
Tesseract, アイ‡Pツヲを省略賣る, Abbreviation of I‡P
yandex, アイエロッラを省略づる, Omitting Aierolla
manga_ocr, アイキャッチを省略する, Skip the eye-catching
When accuracy and automation are needed, manga_ocr provides a nice boost in accuracy compared to previous generation ocr technology. manga_ocr also works locally, so the internet is not needed after inital setup and first launch. manga_ocr uses a special image model together with PyTorch to produce excellent quality ocr when used.
The current model in use by manga_ocr is intended to work with special fonts and multiple lines commonly found in manga making it appropriate for use in game text where strange fonts are used. This model happens to be the same one Sugoi Toolkit uses.
Here are some decent quality manga_ocr alternatives.
- easy_ocr
- Microsoft's Text Extractor Powertoy
- Google lens
Although manga_ocr says it also supports clipboard and folder watch modes, I could not get those modes to work, so I wrote a small wrapper script instead that also works with directories to handle running ocr on batches of images.
1. Install rust using other installation methods. Consider using the msi available under "Standalone installers". Rust is a programming language that contains a compiler required to build some of the dependencies of manga_ocr.
Running the kha-white--manga-ocr-base model requires PyTorch. PyTorch can be a large download, especially the CUDA versions. To avoid avoid having duplicate instances of PyTorch CUDA installed (5 GB+ per instance), either reuse the PyTorch instance in the CUDA Sugoi Repackage from earlier (Sugoi Repackage/apps/Python310/python.exe) or use a single global install of PyTorch instead of venv for all of your software that uses PyTorch. If you already have PyTorch installed in the Sugoi Repackage or globally, then do not install it again.
Check if it is installed with the following command.
Code:
python -m pip freeze
If it prints out something like the following, then it is already installed.
The software that runs the ocr model itself, which manga_ocr uses, is widely compatible with at least PyTorch 1.13+CPU and PyTorch 2.3+CUDA 11.8, so really, any PyTorch version should work. Specific versions of PyTorch may need specific versions of other libraries, especially numpy, safetensors, and transformers, when used alongside certain software. The smallest PyTorch version is the CPU version which is enough for our purposes. The CUDA/ROCm versions may make sense for running ocr on thousands of files.
If some PyTorch version is not already available, then this installs the latest CPU version.
numpy 2.x has quite a few behavior changes from 1.x, so it is better to use numpy 1.x until more developers update their software.
Recent versions of transformers are threatening to break compatibility with a very large number of existing models over pickling and 4.45 did not work for me due to a tokenizers issue. Until model developers update their models, the best thing to do is to use an older known working transformers version <4.45.0 like transformers==4.44.2.
If it gives an import error, then there is a problem. Otherwise, if it imported normally, nothing should print out to the screen and the dependencies should work now.
7. Download translation_tools/ocr.py
Here is the basic manga_ocr usage syntax for single files.
The script I wrote uses the above code syntax to ocr images. Download it from codeberg at entai2965/translation_tools/ocr.py Open the file on codeberg and then next to the History button at the top, click on the "Download file" icon.
Download it to Koakuma-chan no Yuuwaku/tools/ocr.py
8. (optional) Download the model
This is optional because if the model used by manga_ocr is not present where transformers expects it to be, then it will automatically download the model from HuggingFace on first use to ~/.cache/. Just be aware of this.
~ means home directory which is usually C:\Users\user.
The ~\.cache directory is the default transformers download location for the model for commit hash aa6573bd10b0d446cbf622e29c3e084914df9741. The commit hash will change as newer versions of the model get released, so update the model as newer versions get released.
In addition to the ~\.cache directory above which manga_ocr will use by default, ocr.py also looks for a folder called "models--kha-white--manga-ocr-base" inside one of the following locations.
If the model is not in the ~\.cache and not in any of those locations above, then it will be downloaded automatically to ~\.cache the first time manga_ocr runs.
The next part is to cut the image that has strings to translate, like the config menu, into lots of little pictures that individually only show text.
There are machine learning models that try to detect where the text is in a picture to automate this, similar in concept to what yandex does. The more images we have to cut, the more automation makes sense. The fewer, the less it makes sense since the automation is not 100% perfect.
I should work on automating this at some point. For our immediate purposes, we can just use Windows's Snipping Tool instead to manually capture any text that we want to ocr.
Once we have a folder of tiny images containing Japanese text, now we can orc them by using ocr.py.
Code:
C:\Users\User\Desktop\Koakuma-chan no Yuuwaku\tools>python ocr.py --help
Usage: ocr.py filename [-o] [-h]
Run optical character recognition (ocr) on images using
models--kha-white--manga-ocr-base
Positional arguments:
filename the image file or folder containing images to ocr
Options:
-h, --help show this help message and exit
-o write ocr result to file instead of stdout
Here is the syntax to ocr a single image and write the results to a text file.
Now we have successfully run batch ocr software. The next step is to translate everything. We can use the existing romaji.py and sugoi_mtl.py scripts.
Both of those scripts are expecting .csv output which means I should make ocr.py produce a .csv instead of a .txt in a future update. For now, convert the .txt to .csv manually. A good delimiter with this early version of ocr.py is a blank space. An alternative delimiter is :.
Open it in Libre Office Calc with the settings above. Add a header, move the ocr result column to A, and then export it as a .csv with comma as a delimiter. To select the export options, make sure "Edit filter settings" is checked while exporting as .csv and export it as utf-8.
With the .csv in the correct format, run romaji.py and then sugoi_mtl.py
Code:
cd Desktop\Koakuma-chan no Yuuwaku\tools
python romaji.py "C:\Users\Public\Downloads\config_screen_sounds\output.ocr.csv" -o
python sugoi_mtl.py "C:\Users\Public\Downloads\config_screen_sounds\output.ocr.csv" -o
That produces an output.ocr.csv with these contents. Remember that every comma is a new column.
The final step for image translations is to insert the translated text into the source images. I do not have any fancy scripts to share for this last part of image editing. If I ever do, they will be on codeberg. As a partial solutions,
- Photoshop macros exist.
- Krita used to support macros, but they were removed. Krita automation has since moved to "actions" which are defined using Python scripts that implement PyQt5.
- https://imagemagick.org/
- Just use Python again and write some code with some hardcoded settings to handle your particular dataset.