Scrapebox has been notorious for scraping Google and data harvesting since it was released. It continues to be one of the best tools for doing essential tasks that would otherwise take hours.
However, Scrapebox is commonly referred to as a spam link building tool because it also provides mass blog comment functionality. It has given the software a bad rap in the SEO scene but what many don’t realize is how effective Scrapebox can be for white hat SEO. You can use Scrapebox to speed up some of the things that forever to do manually. Scrapebox comes out-of-the-box with great features that you simply can’t get in any other tool.
1. Basic Scrapebox Tasks
Scrapebox has dozens of basic tasks that it can do built right into the software, or accomplished through one of its many free plugins. These tasks may not seem like much but when doing day-to-day SEO work, they can be incredibly helpful.
– Check Domain and Page Authority
Scrapebox provides a free add-on that utilizes the Moz API for checking domain metrics like Domain Authority and Page Authority. For white hat link building, these metrics are still very important as they help you decide which sites are worthwhile for guest post prospecting and other legitimate outreach.
– Scrape Emails/Phone Numbers
Don’t forget Scrapebox’s email and phone number grabber. This simple tool works great for scraping leads off small business directories, websites, communities or anywhere that your customers can be found online.
– Check If Indexed
A backlink will only benefit your search engine rankings if it has been crawled and index. When you’re building links through legitimate outreach, this generally won’t be an issue. However, Scrapebox lets you bulk check if your backlinks have been indexed in Google which can be useful when you’re building a large number of links.
– Remove Duplicates/Trim
Scrapebox lets you clean up sites you have scraped by triming them down to their root domain and removing duplicates. These features are nice to have at your disposal when you need them and although just simple little tools, they can save you an immense amount of time.
– Outbound Link Checker
This add-on allows you to easily view the number of outbound links from a list of URLs. You can export the results and use the data for link building opportunities on low OBL pages.
– Alive Check
Useful for checking which pages are still live in a list of URLs. Alive Check can also be utilized for expired domain checking by scanning indexed domains for 404 errors.
– Dofollow Test
A dofollow backlink passes its authority onto your website and is considerably more valuable in terms of SEO. Scrapebox provides a free add-on for dofollow/nofollow testing.
– Link Extractor
Similar to the Outbound Link Checker, the Link Extractor add-on not only checks how many internal and external links exist on a page but it also allows you to export those links and save them to a file.
– Vanity Name Checker
Expired web 2.0 properties can be used to build your own free private blog network. The vanity name checker works similarly to Alive Check, scanning a list of web 2.0 domain names and checking for 404 errors to see if they have expired.
– Social Checker
Checks a list of URLs for social metrics such as Facebook, LinkedIn, Pinterest and Google +1. You can export to a file and use this for analysis of your own websites or competitors.
– Page Scanner
Use custom footprints to scan and extract data from a list of URLs.
– Google Competition Finder
The Google Competition Finder checks the number of pages indexed in Google for a list of keywords. This is a simplistic approach to keyword research and provides relatively accurate competition ratings.
– Anchor Text Checker
Check a list of backlinks and see which keywords are used as the anchor text for your links. Anchor text diversity is very important to natural link building and this add-on makes it easy to evaluate your pre-existing backlinks.
– Social Account Scraper
The Social Account Scraper scans a list of URLs and retrieves the social accounts for those businesses. This add-on finds published profiles on Twitter, Facebook, LinkedIn, Google+ and Pinterest.
– Google META Scraper
Easily scrape the META details from Google search results. You can enter any keyword or even a list of keywords and this add-on will retrieve the titles, descriptions and URLs for the Google search results.
– Whois Scraper
Export the registrant names, emails and domain creation data for a list of URLs. Whois details can be used for competitor research, outreach and a variety of other tasks.
– Google Cache Extractor
Find the exact Google cache date for a list of URLs and easily export this data to a file. The cache date is the day that Google indexed the page and recognized the content.
– Google Image Scraper
Need a large number of images for link building or other purposes? The Google Image Scraper will scrape hundreds to thousands of relevant images.
– Malware and Phishing Filter
Trim your URLs by removing sites that have malware or phishing detected. You can also use this add-on for contacting websites with malware and offering a fix in exchange for a link back to your site.
– Alexa Rank Checker
Alexa is a metric that represents the traffic a website receives. A lower Alex rating means a website has a large audience and a steady flow of traffic. You can use Alexa scores to target sites with potential to drive inbound traffic.
– DupRemove
Remove duplicates in massive URL lists that contain up to 180 million lines. If you’re working with extremely large lists of websites, DupRemove can remove duplicates without crashing the software.
– TDNAM Scraper
This add-on scrapes the Godaddy database for domains that are soon to expire in the TDNAM $5 closout auctions. You can scrape by keyword and domain extension. After scraping you can easily export the domains back to Scrapebox and check Domain Authority, Page Authority, indexed status, Alexa rank, social metrics and other key data.
– Sitemap Scraper
If you need to retrieve an entire site’s internal URL list, the Sitemap scraper makes this an easy task.
– Mass URL Shortener
Easily shorten a massive list of URLs using services like TinyURL.com.
– YouTube Downloader
Download relevant vidoes from YouTube, Vimeo, DailyMotion and other video sites. This add-on also retrieves video metrics such as the number of views, likes, dislikes, video upload date, the category it’s published in and more.
– Broken Links Checker
Check a list of URLs for broken links. A broken link is an outbound link that potentially once worked but now points to a page resulting in a 404 or similar error. Broken link checking can be used for link building by recommending the site owner fix the downed link and replace it with a link of your own.
2. Scrapebox Scraping Techniques
Being able to export Google’s results for whatever keywords you want; in bulk, is what Scrapebox does best.
You can use the Scrapebox “harvester” to effectively scrape websites that are indexed in Google based off any footprint and keywords you provide.
A footprint is a single keyword (or a keyword phrase) that you want to be present on every site that you are scraping. For instance, WordPress has a footprint of “Powered by WordPress” that can be found at the bottom of millions of blogs that use WordPress as their content management system. Therefore, “Powered by WordPress” is an example of a pretty good footprint for scraping WordPress sites.
A keyword list is a long list of keyword phrases that will combine with your footprint to perform searches in Google. Scrapebox uses these keywords to scrape Google’s index. The more keywords you have to combine with your footprint, the more results you will be able to Scrape off Google. If one of your keywords was “white hat SEO” and your footprint was “Powered by WordPress” Scrapebox would combine the two to perform a search that looks like: “Powered by WordPress” “white hat SEO”
You should use keywords that you would expect to find on the type of sites you are scraping. If you want to scrape niche related sites only, you would use a list of keywords that are related to your own niche.
Scrape sites that accept guest posts
To demonstrate how you might use Scrapebox for white hat scraping, consider scraping sites that accept guest posts.
First you need to decide on a few footprints that would be good for scraping sites for guest blogging. Here are some examples:
- allintitle: guest post guidelines
- allintitle: guest post requirements
- allintitle: guest post submission form
- allintitle: submit guest post
You can then use niche related keywords in your keyword list, which will combine with these footprints and scrape niche sites that accept guest posts.
Scrape sites to manually build links on
Directories and communities with open registration are still viable link building assets. Although less effective today and often requiring a tiered/pyramid link building structure to be fully effective, these sites are excellent targets for manual link building.
With so many different varieties of content and ways it can be reformatted, there are lots of directories you can utilize. Below you’ll find some examples of how you can search for these directories using footprints in Google.
Footprints:
- allintitle: submit share article
- allintitle: upload share powerpoint
- allintitle: upload share pdf document
- allintitle: upload share infographic
- allintitle: upload share audio
– You can change the content/directory type such as searching for “videos” instead of “articles.”
– You can also change the keywords such as searching “submit” instead of “upload.”
Think Outside the Box
Scrapebox is very effective at what it does, particularly with scraping websites. However, you really have to think outside the box (pun intended) to get the most value out of this tool.
While the above scraping techniques are great and enough to keep you busy for quite some time, don’t hesitate to try out new things and think up innovative uses for it. Scrapebox is considered the Swiss army knife of SEO tools and it’s really something you can apply in unique and creative ways that are exclusive to your own needs.
Leave A Comment