Powershell hash tables as accumulators

Here I will attempt to share some of what I think I have come to know about using hash tables in Powershell as counter and accumulators by using the += opertor.

By “accumulator” I mean a mechanism that collects information about objects as they are encountered.

In Powershell, when you use the += operator to add an entry to a hash table, it operates according to these rules:

* If the key for the hash table entry does not exist, it will create that key in the hash table, with the specified value.

* If the key already exists, the specified value will be added to the value of the existing key, according to the rules for using the += operator on the object type of the existing value.

The simplest form is a counter. In this example, I do a recursive directory search of the home directory. Initially the hash table is empty. The first time a particular extension is encountered in the objects from the pipeline stream, a new hash table entry will be created and then it’s value incremented to 1. As subsequent files with that extension name are encountered the value of the already created hash table key for that extension will be incremented again.

$hash = @{}

get-childitem $home -recurse |
foreach-object {$hash[$_.extension]++}

$hash

The next form is a numeric accumulator. It does the same directory search, but instead of counting the items per extension it sums the total bytes of files with that extension:

$hash = @{}
gci $home -recurse |%{
$hash[$_.extension] += $_.length
}

$hash

Next, an example of accumulating a collection:

$hash = @{}
gci $home -recurse |%{
$hash[$_.extension] += @($_.name)
}

$hash

The value of each key will be a collection of the file names that have that extension.

All of these examples can also be accomplished by the use of measure-object or group-object, but it will be much slower and much more memory intensive. The more objects that are being handled, to more pronounced the difference will become.

Some notes on syntax, resolution, and unintended consequences:
Hash tables are neither objects nor arrays, but Powershell allows you to treat them syntactically as either one. That is to say that a hash table reference can be resolved with either array index operators (square brackets) as you would with an element of an array, or with the dot operator as you would a property of an object.

$hash = @{
key1 = 1
key2 = 2
key3 = 3
}

Both of these will now produce “1”.

$hash[‘key1’]
$hash.key1

It also allows you to shoot yourself in the foot.

Hash tables do have a few properties, the most commonly used ones are keys,values, and count. Keys and values return a list of the keys names and their values, respectively. Count returns a the number of entries in the table. Since dot notation can be used to resolve key values there exists the potential for name conflict between the properties of a hash table and the key names.

$hash = @{
key1 = 1
key2 = 2
key3 = 3
}

Write-host "There are $($hash.count) entries in the hash table"

There are 3 entries in the hash table




$hash = @{
key1 = 1
key2 = 2
key3 = 3
count = 'monkeywrench'
}

Write-host "There are $($hash.count) entries in the hash table"

There are monkeywrench entries in the hash table

If you’re creating hash table keys dynamically from incoming data, and there is any chance a value in the data could result in a key that matches a property name of a hash table, that property becomes unaccessible by the dot operator. The dot operator will resolve to the value of the hash table key with that name, not the property of the hash table object.

To avoid unintended consequences of having your script encounter data that would create a key name that matches one of those propery names, use getenumerator() to find key names, values and counts in your script.

#Produces a list of key names, use in place of $hash.keys
$hash.getenumerator() | select -expand name   

#Produces a list of key values, use in place of $hash.values
$hash.getenumerator() | select -expand value 
 
#Produces a count of the entries, use in place of $hash.count
$hash.getenumerator()).count  
                

9 responses to “Powershell hash tables as accumulators

  1. thats a great little trick, i love it.

  2. I like it. Very neat trick.

  3. I like the monkey preventer, It is helpful.

    This is partly why PSObject is much better and safer. A hash is useful for easily defining a PSObject.

    Good blog.

    • You might well want to create a PSObject from a hash created this way, after you’re done populating it. You could do the same thing with a PSObject this is doing with a hash table using some if-then logic and add-member, but the performance hit from the overhead could make it impractical if it’s going to have to scale to handling tens or hundreds of thousands of objects.

      • Not really.

        Add-Member uses more overhead than a hash declaration.
        New-Object PSOblect -property $myhash
        is a very efficient mechanism. If we build the hash in on shot:

        $myhash=@{Name=”;Prop1=”;prop2=”…etc}

        I generally do this at the beginning of the loop and then assign the values as I gather information. This helps to guarantee that the object are homogeneous. I then end the loop with teh object generation.

        Of course sometimes all we need is a simple name=value list. In that case the object overhead would be an issue.

      • “Building the hash in one shot” implies you already know what the key names will be. The advantage of using this method is that the key names don’t have to be known in advance. They will be created as they are needed.

  4. Pingback: https://mjolinor.wordpress.com/2012/01/29/powershell-hash-tables-as-accumulators/ | becknspace

  5. Excellent article. Thanks Mjolinor. Just what I needed

  6. Pingback: Powershell hash tables as accumulators | The Powershell Workbench | AA Tech Blog

Leave a reply to mjolinor Cancel reply