Skip to content

Akron/XML-Loy-XStandoff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME

XML::Loy::XStandoff - Read and Write XStandoff Documents

SYNOPSIS

use XML::Loy::XStandoff;

# Create new corpus element
my $xsf = XML::Loy::XStandoff->new('corpus');

# Create new corpusData element
my $cd = $xsf->corpus_data(id => 'cs_1');

# Add meta information
$cd->meta->add('dc:title' => 'My Document');

# Set textual content
$cd->textual_content('My text');

# Create segment spans
$cd->segment(0, 2);
my $seg = $cd->segment(3, 7);

# Return textual content based on segment spans
print $xsf->segment_content($seg);
# text

# Modify primary data
$xsf->segment_content(
  $seg => sub {
    uc $_[0];
  });

# Return XStandoff document
print $xsf->to_pretty_xml;

# <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
# <corpus xmlns="http://www.xstandoff.net/2009/xstandoff/1.1"
#         xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1">
#   <corpusData xml:id="cs_1">
#     <meta>
#       <dc:title>My Document</dc:title>
#     </meta>
#     <primaryData end="7" start="0" xml:id="pd-F82533A0-...">
#       <textualContent>My TEXT</textualContent>
#     </primaryData>
#     <segmentation>
#       <segment end="2" start="0" type="char" xml:id="seg-F825B7DA-..." />
#       <segment end="7" start="3" type="char" xml:id="seg-F82617FC-..." />
#     </segmentation>
#   </corpusData>
# </corpus>

DESCRIPTION

XML::Loy::XStandoff is an XML::Loy class for dealing with XStandoff documents.

This code may help you to create your own XML::Loy extensions.

This module is an early release! There may be significant changes in the future.

METHODS

XML::Loy::XStandoff inherits all methods from XML::Loy and implements the following new ones.

new

my $corpus = XML::Loy::XStandoff->new('corpus');
my $cd = XML::Loy::XStandoff->new('corpusData');

Create a new XML::Loy::XStandoff document, either as a corpus or a corpusData element.

annotation

my $anno = $cd->annotation;

$cd->annotation->add('level');

Retrieve an annotation element and set it, if it doesn't exist (along with a corpusData element).

corpus_data

$corpus->corpus_data(id => 'cd-1');
my $cd = $corpus->corpus_data('cd-1');
my $cd = $corpus->corpus_data;

Get or add corpus data to the corpus. accepts a parameter hash for setting or a single id parameter for getting. Giving no parameter will return the first corpus data node. If no corpus data exists, a new node is introduced with an autogenerated id.

layer

my $lay = $a->layer('xml:id' => 'lay-1');
my $lay = $a->layer('lay-1');
my $lay = $a->layer;

Add an annotation layer to the annotation level or retrieve it.

Accepts a hash of attributes for adding a new layer element.

For retrieval it accepts an id value. If no value is passed, the first element in document order is returned. In case no layer element exists, it is created (along with a level, an annotation, and a corpusData element).

level

my $lev = $a->level('xml:id' => 'lev-1');
my $lev = $a->level('lev-1');
my $lev = $a->level;

Add an annotation level to the annotation or retrieve it.

Accepts a hash of attributes for adding a new level element.

For retrieval it accepts an id value. If no value is passed, the first element in document order is returned. In case no level element exists, it is created (along with an annotation and a corpusData element).

meta

my $meta = $cd->meta;

$cd->meta->add('dc:title' => 'My title');

$cd->meta(uri  => '/meta.xml');
$cd->meta(file => '/meta.xml');
$cd->meta(uri  => 'http://.../meta.xml');
$cd->meta(as   => [-Loy, -DublinCore])->dc('Title');

Set meta information of the current node or retrieve it.

If no parameter is given, the content of the meta element is returned. If no meta element exists, but a metaRef element exists, the referenced document is returned (either from a local file or an URI). If a parameter as is given, the passed array reference is used to transform the document using the as method of XML::Loy. If no meta document is associated to the node, it is created empty.

If a file parameter is passed, the content of the document is embedded as a child of the meta element. If a uri parameter is passed, a metaRef node is created.

Note: External meta documents will be extended with XML::Loy::File and thus have to be stored separately when changed.

primary_data

my $pd = $cd->primary_data('xml:id' => 'pd-1');
my $pd = $cd->primary_data('pd-1');
my $pd = $cd->primary_data;

Add primary data to corpus data or retrieve it.

Accepts a hash of attributes for adding a new primaryData element.

For retrieval it accepts an id value. If no value is passed, the first element in document order is returned. In case no primaryData element exists, it is created (along with a corpusData element).

segment_content

print $lay->at('token')->segment_content;
print $cd->segment('seg-1')->segment_content;

$lay->at('token')->segment_content('new');
$lay->at('token')->segment_content(sub {
  return lc $_[0];
});

Retrieve, replace or modify the content of a specific segment. If invoked by a segment node, takes this segment, otherwise takes the xsf:segment attribute value of the invoking node.

If no parameter is given, returns the textual content of the segment. Accepts a string parameter, that replaces the textual content of the segment. Accepts a callback method, that accepts the textual content of the segment and returns a string to replace the textual content.

On change, the primary data (either embedded or on a local filesystem) and segments are updated.

segmentation

my $seg = $cd->segmentation;

$cd->segmentation->add('segment');

Retrieve a segmentation element and set it, if it doesn't exist (along with a corpusData element.

segment

my $seg = $cd->segment('seg-1');
print $seg->attr('start');

my $seg_id = $cd->segment(14, 20);
my $seg_id = $cd->segment(14, 20);
$cd->segment('seg-1', 14, 21);

Add or retrieve segments.

Accepts a segment id for retrieving a segment. Accepts two integers for defining start and end position of the segment. Accepts a segment id, followed by two integers for modifying start and end position of the segment.

seg

$lay->add('token')->seg('seg-1');
# <token xsf:segment="seg-1" />

print $lay->at('token')->seg;
# seg-1

Attach segment information to arbitrary elements or retrieve it.

textual_content

$pd->textual_content(uri => '/text.txt');
$pd->textual_content(uri => 'http://.../text.txt');
$pd->textual_content(file => '/text.txt');
$pd->textual_content('Hello World');

print $pd->textual_content;

Add textual data to corpus data or retrieve it.

If no parameter is given, the content of the textualContent element is returned as an XML::Loy::XStandoff::Data object. If no textualContent element exists, but a primaryDataRef element exists, the referenced document is returned (either from a local file or an URI). If no textual content is associated to the primary data, it is created empty and returned as an XML::Loy::XStandoff::Data object.

If a file parameter is passed, the content of the file is embedded as the content of the textualContent element. If a uri parameter is passed, a primaryDataRef node is created.

Note: External textual content files referenced by a URI cannot be altered using segment_content.

DEPENDENCIES

XML::Loy.

SEE ALSO

XML::Loy, XML::Loy::XStandoff::Data, XML::Loy::File, XStandoff.net.

AVAILABILITY

https://github.com/Akron/XML-Loy-XStandoff

COPYRIGHT AND LICENSE

Copyright (C) 2013-2021, Nils Diewald.

This program is free software, you can redistribute it and/or modify it under the same terms as Perl.

About

Read and Write XStandoff documents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages