Skip to content

Latest commit

 

History

History
626 lines (430 loc) · 17.3 KB

hashes.pod

File metadata and controls

626 lines (430 loc) · 17.3 KB

Hashes

A hash is a first-class Perl data structure which associates string keys with scalar values. Just as the name of a variable corresponds to something which holds a value, so does a hash key refer to something which contains a value. Think of a hash like a contact list: use the names of your friends to look up their phone numbers. Other languages call hashes tables, associative arrays, dictionaries, or maps.

Hashes have two important properties: they store one scalar per unique key and they provide no specific ordering of keys. Keep that latter property in mind. Though it has always been true in Perl, it's very, very true in Perl 5.18.

Declaring Hashes

Hashes use the % sigil. Declare a lexical hash with:

A hash starts out empty. You could write my %favorite_flavors = ();, but that's redundant.

Hashes use the scalar sigil $ when accessing individual elements and curly braces { } for keyed access:

Assign a list of keys and values to a hash in a single expression:

Hashes store pairs of keys and values. Perl will warn you if you assign an odd number of elements to a hash. Idiomatic Perl often uses the fat comma operator (=>) to associate values with keys, as it makes the pairing more visible:

The fat comma operator acts like the regular comma and also automatically quotes the previous bareword (barewords). The strict pragma will not warn about such a bareword--and if you have a function with the same name as a hash key, the fat comma will not call the function:

The key of this hash will be name and not Leonardo. To call the function, make the function call explicit:

Assign an empty list to empty a hashYou may occasionally see undef %hash, but that's a little ugly.:

Hash Indexing

To access an individual hash value, use a key (a keyed access operation):to

In this example, $name contains a string which is also a key of the hash. As with accessing an individual element of an array, the hash's sigil has changed from % to $ to indicate keyed access to a scalar value.

You may also use string literals as hash keys. Perl quotes barewords automatically according to the same rules as fat commas:

Even Perl builtins get the autoquoting treatment:

The unary plus (unary_coercions) turns what would be a bareword (shift) subject to autoquoting rules into an expression. As this implies, you can use an arbitrary expression--not only a function call--as the key of a hash:

Hash keys can only be strings. Anything that evaluates to a string is an acceptable hash key. Perl will go so far as to coerce (coercion) any non-string into a string. For example, if you use an object as a hash key, you'll get the stringified version of that object instead of the object itself:

Hash Key Existence

The exists operator returns a boolean value to indicate whether a hash contains the given key:

Using exists instead of accessing the hash key directly avoids two problems. First, it does not check the boolean nature of the hash value; a hash key may exist with a value even if that value evaluates to a boolean false (including undef):

Second, exists avoids autovivification (autovivification) within nested data structures (nested_data_structures).

If a hash key exists, its value may be undef. Check that with defined:

Accessing Hash Keys and Values

Hashes are aggregate variables, but their pairwise nature is unique. Perl allows you to iterate over the keys of a hash, over the values of a hash, or over pairs of keys and values. The keys operator produces a list of hash keys:

The values operator produces a list of hash values:

The each operator produces a list of two-element lists of the key and the value:

Unlike arrays, there is no obvious ordering to these lists. The ordering depends on the internal implementation of the hash, the particular version of Perl you are using, the size of the hash, and a random factor. Even so, the order of hash items is consistent between keys, values, and each. Modifying the hash may change the order, but you can rely on that order if the hash remains the same. However, even if two hashes have the same keys and values, you cannot rely on the iteration order between those hashes being the same. They may have been constructed differently or have had elements removed. In Perl 5.18, even if they were constructed the same way, you cannot depend on the same iteration order between them.

Each hash has only a single iterator for the each operator. You cannot reliably iterate over a hash with each more than once; if you begin a new iteration while another is in progress, the former will end prematurely and the latter will begin partway through the hash. During such iteration, beware not to call any function which may itself try to iterate over the hash with each.

In practice this occurs rarely. Reset a hash's iterator with keys or values in void context when you need it:

Hash Slices

A hash slice is a list of keys or values of a hash indexed in a single operation. To initialize multiple elements of a hash at once:

This is equivalent to the initialization:

... except that the hash slice initialization does not replace the existing contents of the hash.

Hash slices also allow you to retrieve multiple values from a hash in a single operation. As with array slices, the sigil of the hash changes to @ to indicate list context. The use of the curly braces indicates keyed access and makes the fact that you're working with a hash unambiguous:

Hash slices make it easy to merge two hashes:

This is equivalent to looping over the contents of %canada_addresses manually, but is much shorter. Note that this relies on the iteration order of the hash remaining consistent between keys and values. Perl guarantees this, but only because these operations occur on the same hash and because nothing modifies the hash between the keys and values operations.

What if the same key occurs in both hashes? The hash slice approach always overwrites existing key/value pairs in %addresses. If you want other behavior, looping is more appropriate.

The Empty Hash

An empty hash contains no keys or values. It evaluates to a false value in a boolean context. A hash which contains at least one key/value pair evaluates to a true value in boolean context even if all of the keys or all of the values or both would themselves evaluate to boolean false values.

In scalar context, a hash evaluates to a string which represents the ratio of full buckets in the hash--internal details about the hash implementation that you can safely ignore. (In a boolean scalar context, this ratio evaluates to a false value, so remember that instead of the ratio details.)

In list context, a hash evaluates to a list of key/value pairs similar to the list produced by the each operator. However, you cannot iterate over this list the same way you can iterate over the list produced by each. This loop will never terminate:

You can loop over the list of keys and values with a for loop, but the iterator variable will get a key on one iteration and its value on the next, because Perl will flatten the hash into a single list of interleaved keys and values.

Hash Idioms

Because each key exists only once in a hash, assigning the same key to a hash multiple times stores only the most recent value associated with that key. This behavior has advantages! For example, to find unique elements of a listlist :

Using undef with a hash slice sets the values of the hash to undef. This idiom is the cheapest way to perform set operations with a hash.

Hashes are also useful for counting elements, such as IP addresses in a log file:

The initial value of a hash value is undef. The postincrement operator (++) treats that as zero. This in-place modification of the value increments an existing value for that key. If no value exists for that key, Perl creates a value (undef) and immediately increments it to one, as the numification of undef produces the value 0.

This strategy provides a useful caching mechanism to store the result of an expensive operation with little overhead:

This orcish maneuverOr-cache, if you like puns spelled out. returns the value from the hash, if it exists. Otherwise, it calculates, caches, and returns the value. The defined-or assignment operator (//=) evaluates its left operand. If that operand is not defined, the operator assigns to the lvalue the value of its right operand. In other words, if there's no value in the hash for the given key, this function will call create_user() with the key and update the hash.

Perl 5.10 introduced the defined-or and defined-or assignment operators. Prior to 5.10, most code used the boolean-or assignment operator (||=) for this purpose. Unfortunately, some valid values evaluate to a false value in boolean context, so evaluating the definedness of values is almost always more accurate. This lazy orcish maneuver tests for the definedness of the cached value, not truthiness. You may still see code with the pre-5.10 behavior. When you do, consider whether the defined-or operator makes more sense.

If your function takes several arguments, use a slurpy hash (parameter_slurping) to gather key/value pairs into a single hash as named function arguments:

This approach allows you to set default values:

... or include them in the hash initialization, as latter assignments take precedence over earlier assignments:

Locking Hashes

As hash keys are barewords, they offer little typo protection compared to the function and variable name protection offered by the strict pragma. The little-used core module Hash::Util can make hashes safer.

To prevent someone from accidentally adding a hash key you did not intend (whether as a typo or from untrusted user input), use the lock_keys() function to restrict the hash to its current set of keys. Any attempt to add a new key to the hash will raise an exception. Similarly you can lock or unlock the existing value for a given key in the hash (lock_value() and unlock_value()) and make or unmake the entire hash read-only with lock_hash() and unlock_hash().

This is lax security; anyone can use the appropriate unlocking functions to work around the locking. Yet it does protect against typos and other accidental behavior.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 3:

A non-empty Z<>

Around line 20:

A non-empty Z<>

Around line 113:

Deleting unknown formatting code N<>

Around line 532:

Deleting unknown formatting code N<>

Around line 603:

A non-empty Z<>