Skip to content

jganitkevitch/joshua

This branch is 7 commits ahead of, 637 commits behind joshua-decoder/joshua:apache.

Folders and files

NameName
Last commit message
Last commit date
Mar 25, 2014
Oct 8, 2013
May 24, 2013
Sep 26, 2013
Aug 16, 2013
Oct 8, 2013
May 6, 2014
Feb 7, 2015
May 1, 2014
Jan 16, 2014
Jan 19, 2014
Oct 4, 2013
Jun 20, 2012
Jun 18, 2013
Aug 16, 2013
Jul 16, 2013
Jun 17, 2014
Oct 22, 2013
Sep 24, 2013
Jun 24, 2014

Repository files navigation

Running the Joshua Decoder:
---------------------------

If you wish to run the complete machine translation pipeline, Joshua includes a
black-box implementation that enables the entire pipeline to be run by typing
a single restartable command.  See the documentation for a walkthrough and more
information about the many options available to the pipeline.

   - web:           http://joshua-decoder.org/5.0/pipeline.html 
   - local mirror:  ./joshua-decoder.org/5.0/pipeline.html

Manually Running the Joshua Decoder:
------------------------------------

To run the decoder, first set these environment variables:

    export JAVA_HOME=/path/to/java  # maybe /usr/java/home
    export JOSHUA=/path/to/joshua

You might also find it helpful to set these:

    export LC_ALL=en_US.UTF-8
    export LANG=en_US.UTF-8

Then, compile Joshua by typing:

    cd $JOSHUA
    ant devel
    ant all

The basic method for invoking the decoder looks like this:

    cat SOURCE | JOSHUA -c CONFIG > OUTPUT

You can test this using the sample configuration files and inputs can be found 
in the example/ directory.  For example, type:

    cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config

The decoder output will load the language model and translation models defined
in the configuration file, and will then decode the five sentences in the
example file.

There are a variety of command line options that you can feed to Joshua.
For example, you can enable multithreaded decoding with the -threads N flag:

    cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config -threads 5

The configuration file defines many additional parameters, all of which can be
overridden on the command line by using the format -PARAMETER value.  For
example, to output the top 10 hypotheses instead of just the top 1 specified in
the configuration file, use -top-n N:

    cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config -top-n 10

Parameters, whether in the configuration file or on the command line, are
converted to a canonical internal representation that ignores hyphens,
underscores, and case.  So, for example, the following parameters are all
equivalent:

  {top-n, topN, top_n, TOP_N, t-o-p-N}
  {poplimit, pop-limit, pop-limit, popLimit}

and so on.  For an example of parameters, see the Joshua configuration file
template in $JOSHUA/scripts/training/templates/tune/joshua.config or the online
documentation at joshua-decoder.org/4.0/decoder.html.  There is a wealth of
information in the online documentation.

After you have successfully run the decoding example above, we recommend that
you take a look at the Joshua pipeline script, which allows you to do full 
end-to-end training of a translation model.  It is stored in

    $JOSHUA/examples

 

Releases

No releases published

Packages

No packages published

Languages

  • Java 48.9%
  • C++ 38.7%
  • Shell 5.8%
  • Perl 4.5%
  • C 1.0%
  • Python 0.9%
  • Other 0.2%