Remember how in PowerShell v1 and v2 we used to have to create Internet Explorer object each time we wanted to parse HTML page? This kind of works but has a few inconveniences such as having to insert Start-Sleep every now and then because IE can be busy and fail if you request too much from it too quickly.
In PowerShell v3, for web pages, things become much easier. Just do:
$p = Invoke-WebRequest "https://dmitrysotnikov.wordpress.com"
And $p.ParsedHtml.body
will let you iterate though all web page elements!
However, there is a scenario in which you will have to revert to the old IE ways – local files. If the HTML file is on your local disk, $p
will not have the ParsedHtml
property. And you will have to use the IE COM object like you did in earlier versions of PowerShell:
$ie = new-object -com "InternetExplorer.Application"
# The easiest way to accomodate for slowness of IE
Start-Sleep -Seconds 1
$ie.Navigate("D:\SavedPage.htm")
# The easiest way to accomodate for slowness of IE
Start-Sleep -Seconds 1
$ParsedHtml = $ie.Document
Happy scripting!
ok – great!
Have you tried this with FTP (file transfers) as well?
Thanks!
Yes, supports FTP just fine. See http://technet.microsoft.com/en-us/library/hh849901.aspx
What about sites that don’t have a valid certificate? Say for example, I wanted to get the information from my printers that have a web interface.
I don’t think this cmdlet is checking certificates.. I might be wrong though.
help please !
when i use this cmdlet invoke-webrequest , sometimes it can not parse html as well. but i am sure it is online site and not local file.
i.e : invoke-webrequest http://windows.microsoft.com
Not sure what’s going on with that one… Maybe redirections are affecting the cmdlet.
Invoke-WebRequest, under the hood, still requires and uses Internet Explorer. In fact, if IE is not available (like on Server Core), or you don’t have permissions (using an account like NETWORK SERVICE or LOCAL SERVICE), then you must call it with -UseBasicParsing, which returns raw HTML instead of a parsed DOM.