How to use Nokogiri to Scrape Image URLs in RubyWritten on August 1st , 2017 by Jeremy Nelson
Nokogiri is a pretty powerful set of tools for scraping data from websites. When used strategically, it can help you carry out competitive research, audit your websites, check dynamic data sets (e.g. stock prices, ebay prices) and a lot more. I’ve found it’s pretty easy to actually use and requires a pretty minimal level of understanding in Ruby to get use out of it, which is perfect for me because I’m not a professional web developer.
I’ll take you through a scenario of how I managed to use Nokogiri to scrape common web data and how it has helped me work smarter.
One of the items I was tasked with was auditing dozens of different image galleries across just as many websites. Doing this one by one was painstaking and took forever, especially when you have to click image by image and look through each caption. The process was repetitive and seemed like something that could be automated.
For this to work, you need Ruby installed as well as the nokogiri gems.
The first step was to get something working in Ruby. Once I could do it for one hotel, it is just a matter of From there I’d make it web based with Rails.
The below ruby script, when run, just uses Nokogiri to output the Image URL based on the HREF tag being present in am image tag with the CSSclass of Gallery.
So there you have it. Pulling image URLs with Nokogiri.