Archive for August 6th, 2012

New in PowerShell 3: Parse HTML without IE object (unless a local file)

Remember how in PowerShell v1 and v2 we used to have to create Internet Explorer object each time we wanted to parse HTML page? This kind of works but has a few inconveniences such as having to insert Start-Sleep every now and then because IE can be busy and fail if you request too much from it too quickly.

In PowerShell v3, for web pages, things become much easier. Just do:

$p = Invoke-WebRequest "https://dmitrysotnikov.wordpress.com"

And $p.ParsedHtml.body will let you iterate though all web page elements!

However, there is a scenario in which you will have to revert to the old IE ways – local files. If the HTML file is on your local disk, $p will not have the ParsedHtml property. And you will have to use the IE COM object like you did in earlier versions of PowerShell:

$ie = new-object -com "InternetExplorer.Application"
# The easiest way to accomodate for slowness of IE
Start-Sleep -Seconds 1
$ie.Navigate("D:\SavedPage.htm")
# The easiest way to accomodate for slowness of IE
Start-Sleep -Seconds 1
$ParsedHtml = $ie.Document

Happy scripting!


My Recent Tweets

Legal

The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my employer - WSO2 or anyone else for that matter. All trademarks acknowledged.

© 2007-2014 Dmitry Sotnikov

August 2012
M T W T F S S
« Apr   Nov »
 12345
6789101112
13141516171819
20212223242526
2728293031  

%d bloggers like this: