This script is somewhat ridiculous. It takes a video and .vtt subtitles file and converts it to an .epub file with images for each subtitle. This makes it possible to "read" a TV episode on an E-Ink tablet, for example. I made this because I'm not good enough at listening to Japanese at full speed. Now, I can "read the episode" first, then watch it later.
|3 weeks ago|
|Dependencies||2 months ago|
|src||3 weeks ago|
|.clang-format||3 months ago|
|.gitignore||3 months ago|
|.gitmodules||2 months ago|
|AnkiInterface.py||5 months ago|
|BuildDependencies_Debug.sh||2 months ago|
|Build_Debug.sh||3 months ago|
|Build_Release.sh||3 months ago|
|Build_WithLibNotify_Debug.sh||3 months ago|
|DownloadCalibreArticles.sh||1 month ago|
|DownloadCalibreArticles_Debug.sh||1 month ago|
|Jamfile||2 months ago|
|Jamrules||1 month ago|
|LICENSE||5 months ago|
|ReadMe.org||3 weeks ago|
My tailor-made suite for studying Japanese.
Download submodules as well:
git clone --recurse-submodules https://github.com/makuto/japanese-for-me
Jam is used to build Japanese for Me. You will need a C++ toolchain (clang or gcc) installed as well.
sudo apt install jam
Install LibNotify development headers:
sudo apt install libnotify-dev
Download EDICT2 and extract to
Run the following command to convert EDICT2 to UTF-8:
iconv -f EUC-JP -t UTF-8 data/edict2 > data/utf8Edict2
TODO: Make this automatic via libiconv.
Download Japanese sentences, then English sentences, then links Download all as Detailed so that CC attribution can be upheld
(for Anki pacer only; not necessary for text analysis)
Anki is used to manage spaced repitition.
Build CURL and Mecab:
Build Japanese for Me:
If you want system-wide notifications, run this instead:
This script automatically detects Japanese articles based on whether there are any CJK characters in the article title. It then collates them into an .epub for offline reading (thanks to this article for the idea). Wallabag is used for article gathering and Calibre is used for conversion.
Install Calibre (
sudo apt install calibre on Linux, on Windows make sure you install and add it to your path)
Open your Wallabag account
Click on your Profile -> Config
Create a token and copy the token string
Paste the token string into
src/Calibre_Wallabag_To_EPUB.recipe at the URL line, along with your username
Run the corresponding
DownloadCalibreArticles*.sh script to create a .epub in the same directory with your articles
Read that with an EPUB reader, e.g. Typhon, with good Japanese dictionary support
This script is somewhat ridiculous. It takes a video and .vtt subtitles file and converts it to an .epub file with images for each subtitle. This makes it possible to "read" a TV episode on an E-Ink tablet, for example.
I made this because I'm not good enough at listening to Japanese at full speed. Now, I can "read the episode" first, then watch it later.
The full pipeline requires both
sudo apt install ffmpeg pandoc
And here's how to run it:
# Run the script python3 VideoToEPUB.py MyVideo.mp4 MyVideo.ja.vtt output/ # Convert to EPUB pandoc -f org -t epub output/MyVid.org -o ~/Documents/MyVid.epub
Note that this should work for any language, so long as the subtitles file is
.vtt format. Any video format supported by ffmpeg should work.
The repository itself is under the MIT license.
Tatoeba corpus licensing details are available here. They vary per-sentence in license, so I will assume attribution is required per-sentence (which is the most restrictive license they have).