XML::Loy::XStandoff - Read and Write XStandoff Documents
use XML::Loy::XStandoff;
# Create new corpus element
my $xsf = XML::Loy::XStandoff->new('corpus');
# Create new corpusData element
my $cd = $xsf->corpus_data(id => 'cs_1');
# Add meta information
$cd->meta->add('dc:title' => 'My Document');
# Set textual content
$cd->textual_content('My text');
# Create segment spans
$cd->segment(0, 2);
my $seg = $cd->segment(3, 7);
# Return textual content based on segment spans
print $xsf->segment_content($seg);
# text
# Modify primary data
$xsf->segment_content(
$seg => sub {
uc $_[0];
});
# Return XStandoff document
print $xsf->to_pretty_xml;
# <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
# <corpus xmlns="http://www.xstandoff.net/2009/xstandoff/1.1"
# xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1">
# <corpusData xml:id="cs_1">
# <meta>
# <dc:title>My Document</dc:title>
# </meta>
# <primaryData end="7" start="0" xml:id="pd-F82533A0-...">
# <textualContent>My TEXT</textualContent>
# </primaryData>
# <segmentation>
# <segment end="2" start="0" type="char" xml:id="seg-F825B7DA-..." />
# <segment end="7" start="3" type="char" xml:id="seg-F82617FC-..." />
# </segmentation>
# </corpusData>
# </corpus>
XML::Loy::XStandoff is an XML::Loy class for dealing with XStandoff documents.
This code may help you to create your own XML::Loy extensions.
This module is an early release! There may be significant changes in the future.
XML::Loy::XStandoff inherits all methods from XML::Loy and implements the following new ones.
my $corpus = XML::Loy::XStandoff->new('corpus');
my $cd = XML::Loy::XStandoff->new('corpusData');
Create a new XML::Loy::XStandoff document, either as a corpus
or a corpusData
element.
my $anno = $cd->annotation;
$cd->annotation->add('level');
Retrieve an annotation
element and set it, if it doesn't exist (along with a corpusData
element).
$corpus->corpus_data(id => 'cd-1');
my $cd = $corpus->corpus_data('cd-1');
my $cd = $corpus->corpus_data;
Get or add corpus data to the corpus. accepts a parameter hash for setting or a single id parameter for getting. Giving no parameter will return the first corpus data node. If no corpus data exists, a new node is introduced with an autogenerated id.
my $lay = $a->layer('xml:id' => 'lay-1');
my $lay = $a->layer('lay-1');
my $lay = $a->layer;
Add an annotation layer to the annotation level or retrieve it.
Accepts a hash of attributes for adding a new layer
element.
For retrieval it accepts an id value. If no value is passed, the first element in document order is returned. In case no layer
element exists, it is created (along with a level
, an annotation
, and a corpusData
element).
my $lev = $a->level('xml:id' => 'lev-1');
my $lev = $a->level('lev-1');
my $lev = $a->level;
Add an annotation level to the annotation or retrieve it.
Accepts a hash of attributes for adding a new level
element.
For retrieval it accepts an id value. If no value is passed, the first element in document order is returned. In case no level
element exists, it is created (along with an annotation
and a corpusData
element).
my $meta = $cd->meta;
$cd->meta->add('dc:title' => 'My title');
$cd->meta(uri => '/meta.xml');
$cd->meta(file => '/meta.xml');
$cd->meta(uri => 'http://.../meta.xml');
$cd->meta(as => [-Loy, -DublinCore])->dc('Title');
Set meta information of the current node or retrieve it.
If no parameter is given, the content of the meta
element is returned. If no meta
element exists, but a metaRef
element exists, the referenced document is returned (either from a local file or an URI). If a parameter as
is given, the passed array reference is used to transform the document using the as method of XML::Loy. If no meta document is associated to the node, it is created empty.
If a file
parameter is passed, the content of the document is embedded as a child of the meta element. If a uri
parameter is passed, a metaRef
node is created.
Note: External meta documents will be extended with XML::Loy::File and thus have to be stored separately when changed.
my $pd = $cd->primary_data('xml:id' => 'pd-1');
my $pd = $cd->primary_data('pd-1');
my $pd = $cd->primary_data;
Add primary data to corpus data or retrieve it.
Accepts a hash of attributes for adding a new primaryData
element.
For retrieval it accepts an id value. If no value is passed, the first element in document order is returned. In case no primaryData
element exists, it is created (along with a corpusData
element).
print $lay->at('token')->segment_content;
print $cd->segment('seg-1')->segment_content;
$lay->at('token')->segment_content('new');
$lay->at('token')->segment_content(sub {
return lc $_[0];
});
Retrieve, replace or modify the content of a specific segment. If invoked by a segment
node, takes this segment, otherwise takes the xsf:segment
attribute value of the invoking node.
If no parameter is given, returns the textual content of the segment. Accepts a string parameter, that replaces the textual content of the segment. Accepts a callback method, that accepts the textual content of the segment and returns a string to replace the textual content.
On change, the primary data (either embedded or on a local filesystem) and segments are updated.
my $seg = $cd->segmentation;
$cd->segmentation->add('segment');
Retrieve a segmentation
element and set it, if it doesn't exist (along with a corpusData
element.
my $seg = $cd->segment('seg-1');
print $seg->attr('start');
my $seg_id = $cd->segment(14, 20);
my $seg_id = $cd->segment(14, 20);
$cd->segment('seg-1', 14, 21);
Add or retrieve segments.
Accepts a segment id for retrieving a segment. Accepts two integers for defining start and end position of the segment. Accepts a segment id, followed by two integers for modifying start and end position of the segment.
$lay->add('token')->seg('seg-1');
# <token xsf:segment="seg-1" />
print $lay->at('token')->seg;
# seg-1
Attach segment information to arbitrary elements or retrieve it.
$pd->textual_content(uri => '/text.txt');
$pd->textual_content(uri => 'http://.../text.txt');
$pd->textual_content(file => '/text.txt');
$pd->textual_content('Hello World');
print $pd->textual_content;
Add textual data to corpus data or retrieve it.
If no parameter is given, the content of the textualContent
element is returned as an XML::Loy::XStandoff::Data object. If no textualContent
element exists, but a primaryDataRef
element exists, the referenced document is returned (either from a local file or an URI). If no textual content is associated to the primary data, it is created empty and returned as an XML::Loy::XStandoff::Data object.
If a file
parameter is passed, the content of the file is embedded as the content of the textualContent
element. If a uri
parameter is passed, a primaryDataRef
node is created.
Note: External textual content files referenced by a URI cannot be altered using segment_content.
XML::Loy, XML::Loy::XStandoff::Data, XML::Loy::File, XStandoff.net.
https://github.com/Akron/XML-Loy-XStandoff
Copyright (C) 2013-2021, Nils Diewald.
This program is free software, you can redistribute it and/or modify it under the same terms as Perl.