Another take on using the += operator

In this post

http://powershell.org/wp/2013/09/16/powershell-performance-the-operator-and-when-to-avoid-it/

Dave Wyatt talks about the performance issues involving using the += operator to do array population and repetitive string concatenation.  I can’t argue with the points on string concatenation, or the proposed solution of using a StringBuilder, but there is a much easier solution to getting that array.

When he invokes the StringBuilder method, Dave is rightfully careful to redirect the output to $null so he doesn’t pollute the pipeline, and includes a comment in the script to that effect, so there’s a perfectly good pipeline there that isn’t being used at all.

When you use left-hand assignment to send the output of an expression to a variable (what’s referred to as “indirection” in the specification documents), Powershell will automatically create an array for you if there are multiple objects being returned from the expression.  Rather than create a generic collection or list and adding individual elements to that and then converting it to an array, you can just output to the pipeline and then use indirection to send that to your array variable and let Powershell do what it does.

So, rather than doing this:


$stringBuilder = New-Object System.Text.StringBuilder
$list = New-Object System.Collections.Generic.List[System.String]

for ($i = 0; $i -lt 10000; $i++)
    {

        $stringBuilder.Append("Line $i`r`n")
        $list.Add("Array Element $i")
    }

    $outputString = $stringBuilder.ToString()
    $array = $list.ToArray()

You can just do this:

$stringBuilder = New-Object System.Text.StringBuilder

 $array =  
    for ($i = 0; $i -lt 10000; $i++)
      {
        $null = $stringBuilder.Append("Line $i`r`n")
        "Array Element $i"
      }

 $outputString = $stringBuilder.ToString()

outputting your array elements to the pipeline, and using indirection to send them back to your array variable which will automatically be created as an array.

Here’s the performance test:



Write-Host "Using += operators:"

$outputString = ""
$array = @()

Measure-Command {
    for ($i = 0; $i -lt 10000; $i++)
    {
        $outputString += "Line $i`r`n"
        $array += "Array Element $i"
    }
} | select totalmilliseconds | fl 

Write-Host "Using StringBuilder and List:  (array)"

$stringBuilder = New-Object System.Text.StringBuilder
$list = New-Object System.Collections.Generic.List[System.String]

Measure-Command {
    for ($i = 0; $i -lt 10000; $i++)
    {

        $stringBuilder.Append("Line $i`r`n")
        $list.Add("Array Element $i")
    }

    $outputString = $stringBuilder.ToString()
    $array = $list.ToArray()
}| select totalmilliseconds | fl



Write-host "Using StringBuilder and indirection (array)"
Measure-Command {
 
 $stringBuilder = New-Object System.Text.StringBuilder

 $array =  
    for ($i = 0; $i -lt 10000; $i++)
      {
        $null = $stringBuilder.Append("Line $i`r`n")
        "Array Element $i"
      }

 $outputString = $stringBuilder.ToString()

}| select totalmilliseconds | fl

And the result:

Using += operators:

TotalMilliseconds : 8212.8541

Using StringBuilder and List: (array)

TotalMilliseconds : 124.5424

Using StringBuilder and indirection (array)

TotalMilliseconds : 91.4757

The indirection code is much simpler, and the performance is even better than using the intermediate generic collection.

Advertisements

4 responses to “Another take on using the += operator

  1. That’s a good point! We recommend that people don’t accumulate objects into arrays at all when writing a function, preferring to output the objects as they’re created, one at a time. This gives the caller the option to stream them via the pipeline, or to store the results to a variable (which PowerShell will then convert to an array, if needed). I hadn’t considered using that same approach in a local block of code, back when I wrote that post.

    I’m a little surprised that the performance difference was so marked between building your own List and letting PowerShell do it (since it’s probably using List or ArrayList under the hood anyway.) I suspect it’s something to do with the overhead of how PowerShell finds the List.Add() method at execution time.

  2. Thanks!
    Was going to ping you before I posted that, but couldn’t find any contact information.

  3. Pingback: Arrays and generic collections in Powershell | The Powershell Workbench

  4. Perfect explanation, thanks! (still valid in 2017 ^^)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s