My tailor-made suite for studying Japanese
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Macoy Madson a32944bc6e Added video to epub converter пре 4 месеци
Dependencies Added parallel-hashmap пре 6 месеци
src Added video to epub converter пре 4 месеци
.clang-format Added C++ version using RapidJSON and CURL пре 7 месеци
.gitignore Added C++ version using RapidJSON and CURL пре 7 месеци
.gitmodules Added parallel-hashmap пре 6 месеци
AnkiInterface.py Got due cards working пре 8 месеци
BuildDependencies_Debug.sh Added mecab and mecab test пре 6 месеци
Build_Debug.sh Updated readme and added build scripts пре 7 месеци
Build_Release.sh Updated readme and added build scripts пре 7 месеци
Build_WithLibNotify_Debug.sh Added system notifications and block notifications пре 7 месеци
DownloadCalibreArticles.sh Fixed Calibre download recipe пре 5 месеци
DownloadCalibreArticles_Debug.sh Fixed Calibre download recipe пре 5 месеци
Jamfile Added another Mecab test function пре 6 месеци
Jamrules Added dumb dictionary, Calibre Wallabag rule пре 5 месеци
LICENSE Initial commit пре 9 месеци
ReadMe.org Added video to epub converter пре 4 месеци

ReadMe.org

Japanese For Me

My tailor-made suite for studying Japanese.

Setup

Clone Repository

Download submodules as well:

git clone --recurse-submodules https://github.com/makuto/japanese-for-me

Install Dependencies

Jam is used to build Japanese for Me. You will need a C++ toolchain (clang or gcc) installed as well.

sudo apt install jam

For system notifications

Install LibNotify development headers:

sudo apt install libnotify-dev

Download and prepare data

Dictionary

Download EDICT2 and extract to data/edict2. Run the following command to convert EDICT2 to UTF-8:

iconv -f EUC-JP -t UTF-8 data/edict2 > data/utf8Edict2

TODO: Make this automatic via libiconv.

Tatoeba downloads (not required)

Download Japanese sentences, then English sentences, then links Download all as Detailed so that CC attribution can be upheld

License

Prepare Anki

(for Anki pacer only; not necessary for text analysis)

Anki is used to manage spaced repitition.

Building

Build CURL and Mecab:

./BuildDependencies_Debug.sh

Build Japanese for Me:

./Build_Debug.sh

If you want system-wide notifications, run this instead:

./Build_WithLibNotify_Debug.sh

Using the Calibre Wallabag download script

This script automatically detects Japanese articles based on whether there are any CJK characters in the article title. It then collates them into an .epub for offline reading (thanks to this article for the idea). Wallabag is used for article gathering and Calibre is used for conversion.

  • Install Calibre (sudo apt install calibre on Linux, on Windows make sure you install and add it to your path)

  • Open your Wallabag account

  • Click on your Profile -> Config

  • Open Feeds

  • Create a token and copy the token string

  • Paste the token string into src/Calibre_Wallabag_To_EPUB.recipe at the URL line, along with your username

  • Run the corresponding DownloadCalibreArticles*.sh script to create a .epub in the same directory with your articles

  • Read that with an EPUB reader, e.g. Typhon, with good Japanese dictionary support

Using the Video to EPUB script

This script is somewhat ridiculous. It takes a video and .vtt subtitles file and converts it to an .epub file with images for each subtitle. This makes it possible to "read" a TV episode on an E-Ink tablet, for example.

I made this because I'm not good enough at listening to Japanese at full speed. Now, I can "read the episode" first, then watch it later.

The full pipeline requires both ffmpeg and pandoc:

sudo apt install ffmpeg pandoc

And here's how to run it:

# Run the script
python3 VideoToEPUB.py MyVideo.mp4 MyVideo.ja.vtt output/
# Convert to EPUB
pandoc -f org -t epub output/MyVid.org -o ~/Documents/MyVid.epub

Note that this should work for any language, so long as the subtitles file is .vtt format. Any video format supported by ffmpeg should work.

License

The repository itself is under the MIT license.

Tatoeba corpus licensing details are available here. They vary per-sentence in license, so I will assume attribution is required per-sentence (which is the most restrictive license they have).