Parsing LinkedIn html pages with PowerShell

A couple of weeks ago we posted a job opening on LinkedIn (were looking for a person to be in charge of our Jelastic‘s professional services), and it turned out that while LinkedIn jobs attract a lot of applications, the site itself does not make it easy to process them afterwards. You get CVs in email, and they also post a list of applicants with email addresses, phone numbers, titles, etc. – but there is no way to export the list to, say, Excel. In our case, we really wanted to have the data exported, so we could jointly work on a shared spreadsheet and everyone involved could grade each applicant and add notes to the table.

Being a PowerShell guy, I wrote the script below that does the scraping for me. :) Basically, I just saved the page with the list of applicants to my local disk and found that in their html, each applicant information is contained in vcard element, which has class name with LinkedIn URL and the actual name, and then elements with email and phone number:

So all my script has to do is: create an IE object and then use it to find the corresponding fields, then create custom objects from them, add them to the collection, and export it to CSV. Here’s the code – hope it helps you solve similar tasks when other sites do not provide good export capabilities:

$ie = new-object -com "InternetExplorer.Application"

# The easiest way to accomodate for slowness of IE
Start-Sleep -Seconds 1

$ie.Navigate("D:\Temp\LinkedIn.htm")

# The easiest way to accomodate for slowness of IE
Start-Sleep -Seconds 1

$doc = $ie.Document

# Get a collection of vcard elements
$cards = $doc.body.getElementsByClassName("vcard")

# This will be our collection of parsed objects
$processesCards = @()

# Iterate through the collection
for ($i=0; $i -lt $cards.length; $i++) {

 $itm = $cards.item($i)

 # Get the 'name' element that has the applicant name and URL
 $name = $itm.getElementsByClassName("name").item(0).
                       getElementsByTagName("a").item(0)

 # If you want you can output the name to the screen 
 # so you know where you are
 $name.outerText

 # Get the phone number and email address
 $phone = $itm.getElementsByClassName("phone").item(0)
 $email = `
   $itm.getElementsByClassName("trk-applicant-email").item(0)

 # Below is PowerShell v3 notation. 
 # In v2, replace '[pscustomobject]' with 
 # 'new-object psobject -Property' 
 $obj = [pscustomobject] @{"name"=$name.outerText; 
                           "url"=$name.href; 
                           "email"=$email.outerText; 
                           "phone"= $phone.outerText }

 $processesCards += $obj

}

# Export to CSV - which you can open in Excel
$processesCards | Export-Csv D:\Temp\linkedin.csv 
About these ads

1 Response to “Parsing LinkedIn html pages with PowerShell”


  1. 1 Coast to Coast May 25, 2013 at 6:32 pm

    This slim and light tablet comes with specifications that provides high computing power.

    The built quality on the XPS is also a solid one with aluminum carbon
    chassis, giving it a sturdy, premium look. if you’re under 120 lbs, unless you’re really short you are NOT overweight.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s




My Recent Tweets

RSS My company’s blog

  • Meet our iPad2 Winner, Bruce Burke
    Last month we ran our first sweeps contest and received over 30,000 entries in just 4 weeks! Below is a screenshot of the Facebook entries: After announcing the winner, Bruce Burke, I decided to get in touch and find out more about him and how he is using Jelastic for his projects. Hi Bruce, thanks [...]The post Meet our iPad2 Winner, Bruce Burke appeared fi […]
  • MongoDB Master Slave Replication
    As we’ve already told you in our previous post about MySQL master-slave replication the database replication offers various benefits depending on its type and the options you choose, but the common benefit of replication is the availability of data when and where it is needed.  As a result, your customers will experience improved availability of replicated d […]
  • Integration with NetBeans IDE
    Like millions of developers out there we really love NetBeans IDE, which lets you quickly and easily develop Java desktop, mobile, and web applications, while also providing great tools for PHP developers. That’s why we have created a Jelastic plugin for this platform. With the new Jelastic plugin for NetBeans IDE, you can work with your development, [...]Th […]
  • New Version of Jelastic – 1.9.1 Launched
    Today we announced the launch of a major new version of Jelastic. The new version, 1.9.1, features a CRON scheduler, the ability to schedule database backups, new notifications about running out of resources and the latest versions of software stacks (including PostgreSQL 9.2.4). The newly launched Jelastic 1.9.1 includes: CRON job scheduler, Scheduled datab […]
  • Jelastic Released Commercially by innofield!
    Switzerland is well know for chocolate, their army knives and creating fabulous watches. Thanks to innofield,  the Swiss will forever be known as the providers of the first Swiss based PaaS solution with their Flow App Engine (powered by Jelastic). This week, innofield came out of beta and launched commercially with Jelastic 1.9.1. “As Platform-as-a-Service […]
  • Play 1 vs Play 2 Framework
    Today’s guest post comes to you from our friend and user, Dane Marcelo, JArchitect product manager. He points out some interesting differences between the Play 1 and the Play 2 frameworks. So, let’s dive into this great post! Play is an open source web application framework, written in Scala and Java, which follows the model–view–controller (MVC) architectur […]
  • Cloud Software Stacks Market Share: April 2013
    It’s that time where we can share with you the updated statistics on databases, Java and PHP application servers as well as Java and PHP version popularity. Last month was hot here at Jelastic: we launched Jelastic in the Netherlands with the most technically advanced hoster in the country – info.nl and in Switzerland with our very [...]The post Cloud Softwa […]

Legal

The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my former employer - Quest Software, or my current employer - Jelastic or anyone else for that matter. All trademarks acknowledged.

© 2007-2013 Dmitry Sotnikov

Pages

April 2012
M T W T F S S
« Mar   Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
30  

Follow

Get every new post delivered to your Inbox.

Join 65 other followers

%d bloggers like this: