CHANGES

5.3.0  2023-09-06  Jamie Norrish  <jamie@artefact.org.nz>

  * Fixed Report issue that prevented non-tacl assets from being
    copied to an output directory.

  * Updated docs build requirements.


5.2.0  2023-06-15  Jamie Norrish  <jamie@artefact.org.nz>

  * Fixed catalogue relabelling to not incorrectly delete relabelled
    works when mapped to an existing label that is not included in the
    relabel map.


5.1.0  2023-06-14  Jamie Norrish  <jamie@artefact.org.nz>

  * Modified results reduce to properly handle the same witness under
    different labels, namely to treat them as entirely distinct.

  * Changed from deprecated pkg_resources code to importlib.resources
    for handling package files.

  * Changed from deprecated Bio.pairwise2 to Bio.Align.

  * Added logging statement when about to output CSV results from tacl
    results.


5.0.6  2023-05-10  Jamie Norrish  <jamie@artefact.org.nz>

  * Reworked build configuration.

  * Updated corpus preparation for Taisho, both markup handling and
    further splits.


5.0.5  2022-09-24  Jamie Norrish  <jamie@artefact.org.nz>

  * Improved performance of denormalisation sufficiently to be useful
    in real cases.


5.0.4  2022-08-06  Jamie Norrish  <jamie@artefact.org.nz>

  * Updated templating to work with recent versions of Jinja2.

  * Changed DataFrame.append to pd.concat when adding rows to a
    DataFrame, to suit recent versions of pandas.

  * Added the name of the missing label to the error message when
    referencing a non-existent label in a catalogue.

  * Added split of T1646 into "pin" divs.

  * Updated build process.

  * Fixed unwanted inclusion of text material within apparatus
    criticus markers.

  * Added ability to specify a pattern of works to return from a Corpus.

  * Allowed catalogues not to be sorted when saved.


5.0.3  2022-03-31  Jamie Norrish  <jamie@artefact.org.nz>

  * Removed characters that are significant within regular expressions
    when preparing CBETA TEI texts. These are used within regular
    expression contexts and so are not safe.


5.0.2  2022-02-26  Jamie Norrish  <jamie@artefact.org.nz>

  * Rebuild wheel from clean.

  * No source changes.


5.0.1  2022-02-24  Jamie Norrish  <jamie@artefact.org.nz>

  * Fixed bug when preparing a CBETA TEI text that includes a tei:g
    within anchors demarcating an apparatus criticus entry, such that
    the tei:g was included after the tei:app.

  * Added --version to tacl command to print out the version number.

  * Bumped minimum Python version to 3.8.

  * Added split of T0328 into verse and prose sub-works.

  * Added raising an error when building a database from a catalogue
    when a referenced work is not in the corpus.

  * Removed all instances of characters that are illegal in Windows
    filenames when preparing CBETA TEI texts.


5.0.0  2021-09-27  Jamie Norrish  <jamie@artefact.org.nz>

  * Updated Corpus API and its use within DataStore to allow for
    user-provided Text subclasses when getting witnesses.

  * Changed some methods of Text and WitnessText to be properties:
    get_content is now content, get_names is now siglum and work.

  * Updated the process for generating a corpus from CBETA TEI XML
    files. Certain parts of texts are extracted into new texts based
    on the markup-defined properties (such as containing div/@type).

  * Added the tacl split command to split texts based on user-supplied
    configuration files.

  * Added the tacl join-works command to join two or more prepared TEI
    XML works together into a new work.

  * Added the tacl query command to run supplied SQL commands on the
    data store with supplied parameters.

  * Added mulu title to prepared filename for extracted div where one
    exists.

  * Improved updating of the database when adding n-grams: all
    witnesses that no longer exist in the corpus will be deleted.

  * Added handling of some errors in order to provide useful error
    messages.

  * Reimplemented tacl results' extend operation to be faster and use
    less RAM. Its determination of whether the initial results are
    intersect results or not (and thus whether the results are
    automatically run through reciprocal remove) has become
    stricter. Only those results with more than one label and where
    every n-gram occurs in every label now count as intersect results.

  * Added the tacl normalise command, to normalise a corpus according
    to a user-supplied mapping. This is an as yet unoptimised feature.

  * Added options to tacl results to denormalise a set of
    results. This is an as yet unoptimised feature.


4.2.0  2018-12-04  Jamie Norrish  <jamie@artefact.org.nz>

  * Added option to restrict Results prune requirements to a labelled
    subset of results.

  * Added a lifetime report, showing details about in which labelled
    corpora n-grams occurred in.

  * Moved the jitc command to the tacl-extra repository.

  * Renamed the tacl.command package to tacl.cli.


4.1.0  2018-10-18  Jamie Norrish  <jamie@artefact.org.nz>

  * Added remove_label method to Catalogue.

  * Added method to relable a catalogue.

  * Added --relabel option to tacl results, to relabel results
    according to a su pplied catalogue.

  * Added stripping of cb:mulu contents from prepared TEI.

  * Modified Results to also accept a pandas DataFrame as well as a
    path or buffer of results.

  * Added get_works_by_label method to Catalogue.

  * Modified tacl search to allow both for multiple n-gram files and
    no n-gram f ile, in which case all n-grams are returned.

  * Added get_works method to Corpus to return a list of work names.


4.0.3  2018-05-01  Jamie Norrish  <jamie@artefact.org.nz>

  * Removed obsolete reference to tacl-helper command in setup.py.


4.0.2  2018-04-21  Jamie Norrish  <jamie@artefact.org.nz>

  * Corrected release date for version 4.0.0.


4.0.1  2018-04-21  Jamie Norrish  <jamie@artefact.org.nz>

  * UNRELEASED

  * Updated references to documentation, pointing now to
    tacl.readthedocs.io.


4.0.0  2018-04-21  Jamie Norrish  <jamie@artefact.org.nz>

  * Refactored search to output results with the same columns as for
    diff/intersect, and added grouping options to tacl results.

  * Added add-label-work-count option to tacl results to add a column
    to results giving the count of works per label.

  * Moved label-count from tacl-helper to tacl results as the option
    --add-label-count.

  * Added tacl excise command (and excise method on Text) to remove
    n-grams from a text.

  * Added --excise option to tacl results, to remove any results whose
    n-gram contains the supplied n-gram.

  * Removed tacl-helper command. Its remaining functionality could be
    duplicated through simple scripts (mostly concatenation of tacl
    commands).

  * Added reference to tacl-catalogue-manager project in
    documentation.

  * Removed support for CBETA 2011 files.

  * Changed witness handling so that all witnesses are explicit.

  * Added "latin" tokenizer for whitespace delimited sequences of
    non-punctuation characters.

  * Added check for each Result method that the supplied results have
    the required columns.

  * Improved error handling when tacl results is supplied either
    results without the necessary columns (eg, after
    collapse-witnesses), or an empty set of results.


3.0.0  2016-12-12  Jamie Norrish  <jamie@artefact.org.nz>

  * Moved useful script functions to tacl.command.utils, and added
    this to the API documentation.

  * Added colour to logging output from the command line scripts.

  * Added validate-catalogue as a tacl-helper subcommand, to validate
    a catalogue file with respect to a corpus.

  * Added label-count as a tacl-helper subcommand, to add a column to
    results with a count within a label for each n-gram.

  * Added -c/--catalogue option to tacl ngrams, to limit the texts
    added to a database to those labelled in a catalogue.

  * Added experimental command, tacl-jitc, to generate a report on
    overlap between pairs of texts in a sub-corpus, against the
    background of a second sub-corpus.

  * Modified tacl highlight command to allow for either a heatmap
    display (based on a results file) or a simple highlight (based on
    a file of n-grams).

  * Added support for preparing and stripping CBETA TEI files from
    their GitHub repository. This also changes the XML expected by the
    markup stripping operation.

  * Renamed "tacl report" to "tacl results", and tacl.Report to
    tacl.Results. (#46)

  * Added --ngrams option to tacl results, to exclude results whose
    n-gram occurs in the supplied list of n-grams.

  * Added --min-count-text and --max-count-text options to tacl
    results, to filter results that do not have at least one text
    carrying an n-gram with a count within the specified range.

  * Added bifurcated-extend options to "tacl results", to
    generate results containing n-grams extended from the provided
    results, but including only those that occur at sizes that mark a
    bifurcation (change in label count between an n-gram and its
    containing (n+1)-grams).

  * Clarified the language used through code, comments, and docs, to
    distinguish clearly between a "work" (abstract 'text', such as
    T0220, distinct from any particular expression in a witness), a
    "witness" (particular expression of a work), and "text" (the
    actual words etc). This has changed the names of some command line
    options to tacl results, and the API (Corpus.get_text is now
    Corpus.get_witness, for example). The database schema has also
    changed, meaning that databases must be recreated from
    scratch. Any existing results will need to have the "text name"
    header in the first line changed to "work". (#50)

  * Added display of aligned sequences in text order of one of the
    texts to tacl align. (#32)


2.3.2  2016-02-26  Jamie Norrish  <jamie@artefact.org.nz>

  * Added autogenerated API documentation.

  * Added convenience method to Text to get the tokenized text as a
    string.

  * Expanded test of extend to cover more cases.


2.3.1  2016-01-06  Jamie Norrish  <jamie@artefact.org.nz>

  * Fixed bug that meant extended diff results were immediately
    removed after being generated. (#44)

  * Noted on the tacl report help the reason for --extend coming
    before --reduce when both options are specified.


2.3.0  2015-12-23  Jamie Norrish  <jamie@artefact.org.nz>

  * Added automatic removal of filler n-grams from diff results. Only
    n-grams that are entirely composed of (n-1)-grams, or which do not
    overlap with any (n-1)-grams, are kept. This then allows for
    reduce and extend to work correctly on diff results.

    As a result of this change, the API for all difference queries has
    changed.

  * Add example invocations of tacl subcommands. (#30)


2.2.0  2015-11-20  Jamie Norrish  <jamie@artefact.org.nz>

  * Added basic formatting of highlighted text to match that of the
    stripped text.

  * Changed calls to pandas' to_csv method to keep up with current API
    (use 'columns' instead of 'cols').

  * Changed call to pandas' sort_index to sort_values to keep up with
    current API.

  * Specified minimum version of pandas (0.17.0) in setup.py.

  * Changed arguments to sdiff and sintersect subcommands, to work
    around Python bug https://bugs.python.org/issue9338

  * Changed tacl highlight and align to use Jinja2 templates. This
    paves the way for the HTML/CSS/JS-rich output of future scripts.


2.1.0  2015-06-27  Jamie Norrish  <jamie@artefact.org.nz>

  * Added zero-fill option to tacl report, supplying result rows with
    a count of zero to witnesses of a text that has at least one
    witness with a non-zero count.

  * Added collapse-witnesses subcommand to tacl-helper, to collapse
    result rows for witnesses to the same text with the same count
    into a single row.

  * Fixed a bug in tacl report's extend if the same witness is
    associated with more than one label in the results.

  * Added check for texts being relabelled in a catalogue file.

  * Added check for only a single label being supplied to a query.


2.0.0  2015-05-27  Jamie Norrish  <jamie@artefact.org.nz>

  * Added support for multiple witnesses to a text. Witnesses are
    automatically extracted from the CBETA texts, and results now
    include a 'siglum' field indicating which witness of the text a
    result comes from.

  * Added total count of n-grams occurrences per text to 'tacl search'
    results.

  * Split 'tacl strip' into 'tacl prepare' and 'tacl strip'.

  * Reimplemented supplied queries to be more useful and not have
    hidden gotchas.

  * Fixed a bug causing 'tacl stats' to sometimes return incorrect
    results.


1.1.0  2014-05-27  Jamie Norrish  <jamie@artefact.org.nz>

  * Added a tacl align command, to output aligned sequences of
    intersections between all pairs of differently labelled texts in a
    set of results.

  * Added LICENCE and CHANGES to MANIFEST.in.


1.0.2  2014-05-20  Jamie Norrish  <jamie@artefact.org.nz>

  * Added install_requires to setup.py, so that installation via pip
    installs the dependences.


1.0.1  2014-05-20  Jamie Norrish  <jamie@artefact.org.nz>

  * Added a MANIFEST.in to explicitly include README.rst in the
    distribution.


1.0.0  2014-05-20  Jamie Norrish  <jamie@artefact.org.nz>

  * Initial publicised release.