Perl's aggregate data types--arrays and hashes--allow you to store scalars indexed by integers or string keys. Perl 5's references (references) allow you to access aggregate data types indirectly, through special scalars. Nested data structures in Perl, such as an array of arrays or a hash of hashes, are possible through the use of references.
A simple declaration of an array of arrays might be:
... and a simple declaration of a hash of hashes might be:
Accessing elements in nested data structures uses Perl's reference syntax. The sigil denotes the amount of data to retrieve, and the dereferencing arrow indicates that the value of one portion of the data structure is a reference:
In the case of a nested data structure, the only way to nest a data structure is through references, thus the arrow is superfluous. This code is equivalent and clearer:
Accessing components of nested data structures as if they were first-class arrays or hashes requires disambiguation blocks:
Similarly, slicing a nested data structure requires additional punctuation:
The use of whitespace helps, but it does not entirely eliminate the noise of this construct. Sometimes using temporary variables can clarify:
perldoc perldsc
, the data structures cookbook, gives copious examples of various types of data structures available in Perl with their syntax.
Perl's expressivity extends to nested data structures. When you attempt to write to a component of a nested data structure, Perl will create the path through the data structure to that piece if it does not exist:
After the second line of code, this array of arrays of arrays of arrays contains an array reference in an array reference in an array reference in an array reference. Each array reference contains one element. Similarly, treating an undefined value as if it were a hash reference in a nested data structure will create intermediary hashes, keyed appropriately:
This behavior is autovivification, and it's more often useful than it isn't. Its benefit is in reducing the initialization code of nested data structures. Its drawback is in its inability to distinguish between the honest intent to create missing elements in nested data structures and typos.
The autovivification
pragma on the CPAN (pragmas) lets you disable autovivification in a lexical scope for specific types of operations; it's worth your time to consider this in large projects, or projects with multiple developers.
The complexity of Perl 5's dereferencing syntax combined with the potential for confusion with multiple levels of references can make debugging nested data structures difficult. Two good options exist for visualizing them.
The core module Data::Dumper
can stringify values of arbitrary complexity into Perl 5 code:
This is useful for identifying what a data structure contains, what you should access, and what you accessed instead. Data::Dumper
can dump objects as well as function references (if you set $Data::Dumper::Deparse
to a true value).
While Data::Dumper
is a core module and prints Perl 5 code, it also produces verbose output. Some developers prefer the use of the YAML::XS
or JSON
modules for debugging. You have to learn a different format to understand their outputs, but their outputs can be much clearer to read and to understand.
Perl 5's memory management system of reference counting (reference_counts) has one drawback apparent to user code. Two references which end up pointing to each other form a circular reference that Perl cannot destroy on its own. Consider a biological model, where each entity has two parents and can have children:
Because both $alice
and $robert
contain an array reference which contains $cianne
, and because $cianne
is a hash reference which contains $alice
and $robert
, Perl can never decrease the reference count of any of these three people to zero. It doesn't recognize that these circular references exist, and it can't manage the lifespan of these entities.
You must either break the reference count manually yourself (by clearing the children of $alice
and $bob
or the parents of $cianne
), or take advantage of a feature called weak references. A weak reference is a reference which does not increase the reference count of its referent. Weak references are available through the core module Scalar::Util
. Export the weaken()
function and use it on a reference to prevent the reference count from increasing:
With this accomplished, $cianne
will retain references to $alice
and $robert
, but those references will not by themselves prevent Perl's garbage collector from destroying those data structures. You rarely have to use weak references if you design your data structures correctly, but they're useful in a few situations.
While Perl is content to process data structures nested as deeply as you can imagine, the human cost of understanding these data structures as well as the relationship of various pieces, not to mention the syntax required to access various portions, can be high. Beyond two or three levels of nesting, consider whether modeling various components of your system with classes and objects (moose) will allow you to operate on your data within encapsulation boundaries.
Sometimes bundling data with behaviors appropriate to that data can clarify code.
Hey! The above document had some coding errors, which are explained below:
- Around line 3:
-
A non-empty Z<>
- Around line 111:
-
A non-empty Z<>
- Around line 192:
-
A non-empty Z<>