Monday, February 9, 2009

PowerShell<->Perl: Reading UNICODE files

Problem Statement:
Recently, I faced a tricky situation at work. I had a CSV file generated out of a PowerShell script. I was trying to read the CSV file and generate a graph using Perl TK module. To my surprise, no value from the CSV file got plotted on the graph! Why would this happen?

Reason:

After some tussle, I found that, this was because PowerShell generated the CSV file in a UNICODE format. PERL opens the Unicode file but can’t recognize the content. This is because “Unicode format is not a character encoding”. Hence the issue!

Resolution:
In such cases, one needs to open the Unicode files using any other Perl supported encoding format. Say, we open the file as:
Open (FH, “<:utf-8”, “filepath”);
OR
Open (FH, “<:encoding(utf-8)”, “filepath”);

Thus, Unicode text files are read exactly the same way that other files are read: by specifying a text encoding.

2 comments:

jsnover said...

BTW - Export-CSV has an -Encoding switch so you can get ascii if you want it.
Export-Csv
(0)-Path | -PSPath (String)
-InputObject (PSObject) (ByValue)
(1)[-Delimiter (Char)]
[-Confirm | -cf ]
[-Encoding (ASCII | BigEndianUnicode | Default | OEM | Unicode | UTF32 | UTF7 | UTF8)]
[-Force ]
[-NoClobber ]
[-NoTypeInformation | -NTI ]
[-WhatIf | -wi ]

Export-Csv

Experiment! Enjoy! Engage!

Jeffrey Snover [MSFT]
Windows Management Partner Architect
Visit the Windows PowerShell Team blog at: http://blogs.msdn.com/PowerShell
Visit the Windows PowerShell ScriptCenter at: http://www.microsoft.com/technet/scriptcenter/hubs/msh.mspx

Chetan Giridhar said...

That looks good...Thanks Jeff!