header
header Register : : Login header
header
divider
menuleft
menuright
submenu
left

[August 25th, 2008] Check the home page regarding PowerShell related news from a brand new sponsor: Idera

Disappointed in speed of Powershell when processing text files
Last Post 18 May 2008 04:27 PM by bruceatk. 3 Replies.
Printer Friendly
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
bruceatkUser is Offline
New Member
New Member
Posts:12

--
13 May 2008 03:06 PM  

I've been learning PowerShell for a couple of months now.   I am a long time user of batch/scripting languages.  I have been looking into PowerShell as being the "one" language that I use for most command line/scripting things.  A recent task that I had to do has shown me a weakness in PowerShell that will keep me from using it for some very common things that I have to do.

I have to convert several spreadsheets into an 80 column card format for input into a legacy system.  I figured I would do it in PowerShell.  It was easy to do and I liked the simplicity of the command line that I ended up with.   The command took about 13 seconds for 4000 records when sending the output to the console.  When I redirected the output to a file it still took about 11 seconds.   When I do the same thing in VB or even VBScript it runs virtually instantaneously so it's not even worth measuring the speed it takes.   Scale that up to several files appended together adding up to several hundred thousand lines and the difference is so much that I can't use PowerShell for the task.

import-csv pat_delete.csv | `
    where-object{write-host -separator '' `
        "12 4" `
        $_.id_number `
        ($_.tcode+"      ").substring(0,6) `
        'CP' `
        $_.pcode `
        $_.pcode}

The input file is a comma delimited file and it needs to be reformated.  Am I doing something wrong?  I am used to getting a tremendous speed improvement when I redirect to a file.  I tried using Out-File and I didn't get much of an improvement, it drops it to 10 seconds.

It makes an under one minute task, take over 15 minutes.  I have an Intel Core 2 Duo workstation and the 4000 line test file is only 200k.

 

 

Bruce

bruceatkUser is Offline
New Member
New Member
Posts:12

--
16 May 2008 01:21 AM  

I guess I'm the only one interested in this problem. :)

I was very disappointed and I found it difficult to believe that it could be that slow.  Since my experience with so many commands being very quick I figured there had to be a PowerShell solution to speed it up.

Thanks to the Measure-Command cmdlet, I timed the different pieces and determined it was all in the write-host.  I looked at how  I could eliminate the write-host command.  I settled on the following:

$ParsedFile = import-csv pat_delete.csv
$OutFile = @()
foreach ($x in $ParsedFile)
    {$OutFile += `
    "12 4"+`
    $x.id_number+`
    ($x.tcode+"      ").substring(0,6)+`
    (get-date $x.tdate -uFormat %Y%m%d)+`
    'CP'+`
    $x.pcode+`
    $x.pcode
    }
Set-Content -value $OutFile TestOut.txt

I couldn't run it on the same workstation, so the workstation I'm currently on is slower when running this script.  The original file (against 4000 records) runs in 21.754 seconds on this workstation (AMD 6400 X2).

The updated script, not using write-host, now runs in 3.8 seconds.

Original using write-host:                                  TotalSeconds      : 21.7544045
Modified using foreach with Set-Content:      TotalSeconds      : 3.8205268

I tried using the faster example using where-object instead of foreach but it took 4.1 seconds.  If someone has already figured all this out I would love a link to it.

It's not instantaneous but it certainly is much better.  It goes to show it doesn't hurt to try doing things in different ways.  You might be surprised.

Thanks,

Bruce

 

 

 

halr9000User is Offline
Basic Member
Basic Member
Posts:303

--
16 May 2008 02:05 PM  
Wow, that's an amazing difference. I have to admit, I would not have done it the way you did at first. Also, instead of write-host, if you use write-output, then you can always run the whole pipeline through one of the out-* cmdlets to change the ultimate destination.

What about something like this?

$parsed = import-csv $file
$parsed | foreach-object {
"12 4" + $_.id_number + " "*6 + $_.tcode
} | out-file $outputfile # or leave pipeline off to output to screen

P.S. I saw this thread when you posted it and meant to reply but just forgot. :(
bruceatkUser is Offline
New Member
New Member
Posts:12

--
18 May 2008 04:27 PM  

Using out-file from the pipe-line is much faster than redirecting write-host but not as fast as saving the output into a varaible and then writing it.

I poked around and found some different options mentioned in "PowerShell in Action".  $host.ui.WriteLine has been the fastest so far. 3.3 seconds outputting to the screen and 2.1 to a file. It makes sense since Write-Host wraps the $host.ui api.

Now that the speed is acceptable I'm running into issues with redirection that don't make sense to me. I will start a new thread on it. I can't redirect the output in powershell, but I can from cmd.exe.

Bruce

You are not authorized to post a reply.

Active Forums 4.1
right
   
footer Sponsored by Quest Software • SAPIEN Technologies • ShellTools, LLC • Microsoft Windows Server 2008 footer
footer