easyjasub

easyjasub is a tool to assist in understanding Japanese subtitles those who have a little knowledge of Japanese language, for language learning. It allows you to get subtitles with furigana and in-line translation, in several formats suitable for both video rendering and self-study.

It is designed to work as a command-line utility or as a Java library to be integrated in other applications, or maybe a custom UI.

easyjasub is still in alpha development stage, you may find defects and its usage may be difficult, do not hesitate to ask for help opening a new ticket

Visit the project blog to get support and discuss about the easyjasub tool usage and development.

The result of the automatic parsing of Japanese is not always accurate and you may see incorrect words in subtitles, this is unfortunately unavoidable.

Links

Release packages and support pages are available on SourceForge.net:

Source code is available on GitHub:

If you are a developer and want to use easyjasub as part of your application, it is deployed on oss.sonatype.org, where you can also find all releases and development snapshots

Setup

This application requires third party software to execute, please make sure you install:

Subtitles are rendered as HTML pages, below you can see an example text

まほー せかい
そこ 魔法 世界

Additional software may be required, you may need to edit or produce text based subtitles, and visualize your video with external subtitles file.

Basic usage

  1. As input, you need to provide Japanese and optionally translated subtitles in text format, both of them must be in sync with your video. Supported formats for input text subtitles are: EBU's STL, .SCC, .ASS/.SSA, .SRT, TTML; .ASS is preferred. Make sure subtitle file names have an extension matching one of supported ones above and copy them into a new empty directory. Note that picture-based subtitles (normally available in DVDs) can not be used as an input.
  2. The timing of generated subtitles will be the same timing of the Japanese subtitles, the secondary subtitles may be repeated into multiple lines if their timings does not match perfectly (this is ok since normally you split or join sentences when you translate). The synchronization of input text subtitles is very important to properly associate them, be sure that both subtitles are in sync with the video, if they are not, you need to edit subtitles (try Aegisub). easyjasub has been tested using only UTF-8 encoding, you may face issues if the subtitle files are not in UTF-8, in this case edit subtitles and save them using UTF-8 encoding.
  3. Download the easyjasub release package (choose .tar by default, or use .zip if you have a Windows OS), unpack it in the same directory
  4. Run a terminal or a command line (on Windows, execute "cmd") and run:
    java -jar easyjasub-cmd.jar -ja subtitle.ja.srt -tr subtitle.en.srt
    Where subtitle.ja.srt is the filename for Japanese subtitles and subtitle.en.srt is the filename for translated subtitles. On Windows you can use the executable file:
    easyjasub.exe -ja subtitle.ja.srt -tr subtitle.en.srt
    If you do not have translated subtitles or you do not want to use them, you can exclude them using "disabled" keyword:
    java -jar easyjasub-cmd.jar -ja subtitle.ja.srt -tr disabled

A new folder will be produced to store output and intermediate files, with the exception of SUB/IDX subtitles which are by default produced in the same folder of input files. Files produced by default are:

Subtitles look like this: Sample easyjasub subtitle picture

If something fails or you want to do an other try with different options, be sure to delete all files that easyjasub produced. The tool does not overwrite files already existing but uses them.

wkhtmltoimage is searched in its default installation folder, if it is not found you can use option -wk to specify the full path to the executable.

java -jar easyjasub-cmd.jar -ja subtitle.ja.srt -tr subtitle.en.srt -wk /usr/local/bin/wkhtmltoimage
easyjasub.exe -ja subtitle.ja.srt -tr subtitle.en.srt -wk D:\wkhtmltopdf\bin\wkhtmltoimage.exe

You can customize the elements displayed in subtitles, for example you can show romaji, hide the translation, and even show just the Hiragana rewriting hiding Kanji; to do so use the following options:

Here is a sample without translation and Furigana but with Romaji Sample easyjasub subtitle picture

And here a sample without Kanji Sample easyjasub subtitle picture

Using JMdict

easyjasub can use a JMdict (http://www.edrdg.org) file to add a rough in-line translation of Japanese words, you can enable dictionary usage with the following options:

You need to download locally on your system a JMdict file. Only English is supported for the dictionary translation.

Advanced usage

easyjasub has many options to tune subtitles generation, run it with -h option to view them. Many options, if left unspecified, gets a default value depending on the other options you have set.

Set width and height of subtitles with --width and --height option to best fit your screen, or to adapt to the video:

easyjasub <opts> --width 1024 --height 768

Sample subtitle generation on a subset of the subtitle lines with --select-lines, for example to process lines until line 10:

easyjasub <opts> --select-lines -10

Or you can select an interval

easyjasub <opts> --select-lines 5-20

Nearly all options accepts the special tag "disabled", with that you can suppress usage of that feature; for example to not produce the text file with transcribed subtitles:

easyjasub <opts> --output-text disabled

Or you can disable the usage of wkhtmltoimage even if it is found in your system:

easyjasub <opts> --wkhtmltoimage disabled

You can fine-tune the matching of subtitle lines reducing or increasing the difference in milliseconds you accept between Japanese and translated timestamps:

easyjasub <opts> --match-diff 500 --approx-diff 400

... the matching algorithm is currently not that sophisticated...

easyjasub performs the following actions:

  1. Reads Japanese subtitles file
  2. Converts Japanese subtitles in plain text format
  3. Writes a text file with subtitles
  4. Read translated subtitles file (subtitleConverter library is used to read the text subtitles)
  5. Parses the Japanese text (Kuromoji analyzer of Apache Lucene is used)
  6. Create the output folder to store files
  7. Writes a CSS file to style subtitles
  8. Writes a .JGLOSS file with furigana annotation
  9. Writes a HTML page with furigana annotation and vocab
  10. Writes one HTML file for each subtitle line
  11. Renders HTML files in PNG pictures
  12. Writes the BDN XML file
  13. Converts the BDN XML file in a SUB/IDX subtitles file

easyjasub never overwrites an existing file, so make sure you manually delete them if you need to run the tool again. You can exploit this functionality to manually change some of the files: run the tool, edit the files you want to customize and delete all others, run the tool again. This is useful for example to edit the CSS file and change the style of the text of generated subtitles.

Contributing

This software is in early alpha stage, if you find problems or you need help do not hesitate to ask for help opening a new ticket. If some particular subtitle line is inaccurate attach the .html file produced.

If you find the tool interesting and would like to extend it go to GitHub easyjasub project page and join!

Project Web Hosted by SourceForge.net

©Copyright 1999-2015 - Geeknet, Inc., All Rights Reserved

About - Legal - Help