New in PowerShell 3: Parse HTML without IE object (unless a local file)

Remember how in PowerShell v1 and v2 we used to have to create Internet Explorer object each time we wanted to parse HTML page? This kind of works but has a few inconveniences such as having to insert Start-Sleep every now and then because IE can be busy and fail if you request too much from it too quickly.

In PowerShell v3, for web pages, things become much easier. Just do:

$p = Invoke-WebRequest "https://dmitrysotnikov.wordpress.com"

And $p.ParsedHtml.body will let you iterate though all web page elements!

However, there is a scenario in which you will have to revert to the old IE ways – local files. If the HTML file is on your local disk, $p will not have the ParsedHtml property. And you will have to use the IE COM object like you did in earlier versions of PowerShell:

$ie = new-object -com "InternetExplorer.Application"
# The easiest way to accomodate for slowness of IE
Start-Sleep -Seconds 1
$ie.Navigate("D:\SavedPage.htm")
# The easiest way to accomodate for slowness of IE
Start-Sleep -Seconds 1
$ParsedHtml = $ie.Document

Happy scripting!

7 Responses to “New in PowerShell 3: Parse HTML without IE object (unless a local file)”


  1. 1 Dave Carnahan August 7, 2012 at 4:19 am

    ok – great!
    Have you tried this with FTP (file transfers) as well?

    Thanks!

  2. 3 Other (@dvsbobloblaw) April 18, 2013 at 9:46 pm

    What about sites that don’t have a valid certificate? Say for example, I wanted to get the information from my printers that have a web interface.

  3. 5 Anonymous November 14, 2013 at 8:33 am

    help please !
    when i use this cmdlet invoke-webrequest , sometimes it can not parse html as well. but i am sure it is online site and not local file.
    i.e : invoke-webrequest http://windows.microsoft.com

  4. 7 Brian Scholer February 27, 2014 at 2:12 am

    Invoke-WebRequest, under the hood, still requires and uses Internet Explorer. In fact, if IE is not available (like on Server Core), or you don’t have permissions (using an account like NETWORK SERVICE or LOCAL SERVICE), then you must call it with -UseBasicParsing, which returns raw HTML instead of a parsed DOM.


Leave a comment




Legal

The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my employer - WSO2 or anyone else for that matter. All trademarks acknowledged.

© 2007-2014 Dmitry Sotnikov

August 2012
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031