Video 2 – How to Find Powerful Blogs for Link Building

go to video 3: "How To Contact 1,000+ Blogs A Week To Take Your Links"

2. How to Find Powerful Blogs for Link Building

by Karl Adamsas,  Last updated 

Today you're  going to learn how to automatically create a list of potential link partners, how to identify the type of blogs we want to link on and how to find them on scale. 

We defined a legit blog as one with organic traffic, but we need to drill down further. 

What does our ideal site look like and what type of site is more likely to work with us?

We don't want to build links on:

  • Private blog networks
  • Free blogging platforms
  • Social platforms

These are too easy to get. We're looking for quality, not quantity.

Attributes we do want to see in a potential link partner: 

  • Evidence of their visitors interacting with this site (ie: comments and social signals)
  • Bloggers who are trying to monetise their sites through advertising (ie: affiliate deals)
  • Bloggers who are open to working with us (ie: offers a media kit)
  • WordPress platform
  • Relevance

What Not To Do

Most outreach tutorials will teach you to search for phrases like:

  • keyword + “submit a guest post”
  • keyword + “guest post”
  • keyword + “guest post by”
  • keyword + “accepting guest posts”
  • keyword + “guest post guidelines”

This will definitely get you results and this strategy has a place in link building, but it's a bit dated now.

You're going to get the same results as your competitors and all your guest posts will be labelled as sponsored.

How We Approach Outreach

The best way to approach outreach coming into 2019 is to find relevant blogs that are high quality and offer to buy advertising from them. We focus on finding suitable blogs first, then try and convince them to take our clients link under our terms. 

This strategy works on the premise that most blogs are trying to monetise their traffic and a lot of the time these bloggers are open to discussing new advertising options, or even bending their own rules for the right price.

Searching for "keyword" + "guest post" is still a valid strategy for finding the low hanging fruit, but you're going to need to go the extra mile if you want to push ahead of your competitors.

Our strategy is more time consuming and your conversion rate will be lower, but you're going to find more unique and more powerful links this way. We just have to scale it much further to find decent numbers of links.

What we want to do is identify footprints of the aforementioned qualities so that we can find them in Google.

To find suitable blogs on scale, we're going to use software called ScrapeBox.

ScrapeBox is a piece of software that performs searches in Google and record all the results for us.

We feed it key phrases and it automatically combines them to make massive lists of queries and searches them in Google for us. We use it to find link opportunities on a massive scale.

We're going to need a couple of specific elements to make up our search to get the best results.

Instead of using "keyword" + "guest post", the elements that we're going to feed into ScrapeBox are:

  • the Keyword footprint
  • the relevance footprint

The Keyword Footprint 

we obviously want relevant keywords for SEO, we want blogs that are relevant to your industry.

The best way to do this is with long tail phrases. Head terms are just to general, and long tail phrases will give us a much stronger indication that a blog is specialised and relevant.

There are many different ways that we can come up with a reliable list of keywords for this step, and I'll show you some more in a different lesson, but for now, I'm just going to show you what we call the SEMrush Method.

The SEMrush Method

We're going to use SEMrush to find all the long tail phrases that a site ranks for. 

Go to Google, grab any of the top 10 blogs, enter it into SEMrush and then just grab all the keywords that are ranks for.

From here, we want to eliminate any of the keywords which are too short.

Depending on your definition of long tail, I use the following formula to count the number of words in the cell next to it then from here we can eliminate anything with one keyword, two keyword, three keyword, whatever you feel is too short to be long tail or too long. 

=IF(LEN(TRIM(A1))=0,0,LEN(TRIM(A1))-LEN(SUBSTITUTE(A1," ",""))+1)

Now that we have a decent list of keywords, we need to move on to what we call the relevance footprint. 

The relevance footprint

The relevance footprint is just that. It's an indication of relevance or quality.

In this tutorial, we're going to use what I call the WordPress method.

In this strategy, we're specifically looking for blogs, blogging platforms, mainly WordPress. The bloggers who are running WordPress blogs are our best targets and that's who we're trying to find, but we'll still turn up a lot of non WordPress sites with this method. 

WordPress is super common and it's used by a vast majority of small to medium sized bloggers, these are the guys who are going to be willing to work with us, these are the guys who are going to have the most specialised blogs, and these are the guys who are going to give you the best prices.

When you start talking to full time bloggers or blogs who are run by teams of people, then you're going to start to pay top dollar for your links 

The smaller bloggers are generally more responsive to our emails and will be the most lenient to our requirements such as, not labelling a post as sponsored.

This list of footprints is designed to scrape up blogging platforms:

  • "Leave a Reply"
  • "leave a comment"
  • "add comment"
  • "comment here"
  • "all fields are required"
  • "notify me of new comments via email"
  • "fields with” “are required"
  • "This site uses Akismet to reduce spam"
  • "Blog Archive"
  • "Filed Under"
  • "tagged with"
  • "Save my name, email, and website in this browser for the next time I comment"
  • "No content on this site may be reused in any fashion without written permission"
  • "By using this form you agree with the storage and handling of your data by this website"

We now have our keyword footprints and our relevance footprints. 

We scraped up a massive list of long tail keywords using SEMrush, and we have our list of WordPress footprints designed to help us find WordPress blogs.

So now we want to combine all this together in ScrapeBox.

First of all, grab your list of long tail keywords, then just paste it in the top left window of ScrapeBox.

I normally wrapped my keywords in quotes (it's always worth experimenting leaving the step out)

Now grab a notepad doc, put all your WordPress footprints in it and save it to your desktop.

We have 7,091 keywords from SEMrush and here we have our list of WordPress footprints.

What ScrapeBox is going to do is it's going to take this list of WordPress footprints, and it's going to combine each of those footprints with all 7,000 of these SEMrush keywords. It's going to combine every WordPress footprint with every SEMrush keyword, and it's going to give us a huge list of keywords.

14 WordPress Footprints x 7,091 SEMrush Keywords = 99,274 Keywords

So, if we just press this button here and then navigate to where our WordPress footprints are, you can say that jumped up to 99,274 keywords, and then it's combined our two lists together.

This box here we can decide how many results that we're going to scrape up per search. I usually get the first 20 results in Google, you can scrape up to 1,000 if you wish, that's really not necessary, you get a lot of junk after the first few pages.  

Then you just hit start harvesting. Make sure Google is selected then hit start.

Now because this is almost 100,000 keywords, it's going to take a really long time, I would normally leave this overnight. For the sake of this tutorial, we'll just let it run for a few hours, we'll collect a few results and I'll show you how we clean up the list.

I'll let ScrapeBox run for a couple of hours and we have just over 37,000 results.

Now we want to get this list down as much as we can and clean it up.

First step, remove the duplicate URLs, which Takes us down to 24,000.

The next thing we want to take out are all the free blogging platforms and all your social networks (ie BlogSpot, Reddit Facebook, Weebly, etc). If you click on the Remove button and choose "remove URLs containing...", you can give it keywords and it will delete any URL that contains that word.

This is a list of words we usually remove from most projects. If you give your scrape a quick scan, you should see anything else that might need removing.

  • BlogSpot
  • (ie:
  • Weebly
  • Pinterest
  • Facebook
  • Reddit
  • .gov
  • .edu

For this project we're only interested in .com's. So, we'll choose remove URLs not containing and say .com, which takes us down to 7,000 results.

The Page Scanner 

This step is optional. I would usually use this on a much bigger list of sites, but it can be very helpful. We're going to use an add-on called the Page Scanner.

The Page Scanner scans each one of the URLs that we've scraped up and looks for any of these keywords.

This is is a list of words that tells us that a site is actively looking for advertisers. This would give us a list of quick wins, the sites that are looking for advertisers, we would contact these guys first. I just start the Page Scanner and just leave it a few minutes.

  • guest post
  • sponsored
  • advertise
  • advertising
  • promoted
  • promoted content
  • media kit
  • affiliate

Once the Page Scanner has finished, just export the results to your desktop. Now we'll grab that list of sites that are looking for advertisers and we'll put them back into ScrapeBox.

From here, remove duplicate domains and we're down to 735 suitable sites.

Trust Flow with Majestic 

Now grab your list of URLs and export it to your desktop.

Go to and run your list through their bulk URL checker. All we're interested in here is the trust flow and the citation flow.

The reason we use Majestic for this step is because we can get the results very quickly. We can get the Domain Authority with ScrapeBox, but it takes a very long time.

In our first tutorial, we said that these quality metrics are pretty useless for judging a site's quality.

What I said was, sites with a high Trust Flow aren't necessarily good, but sites with a low Trust Flow are usually very bad. 

We're going to use Majestic to eliminate all the low Trust Flow sites. We're also going to get rid of some of the super high Trust Flow  sites because those sites are very unlikely to respond to us.

We'll filter out anything with a Trust Flow  less than five and sort by Trust Flow  highest to lowest.

The sites with the highest Trust Flow : Amazon, TripAdvisor, YouTube, eBay,  etc, can be removed because they're never going to respond to us. We'll then remove ​anything with a Trust Flow  above 35.

Check THEIR organic traffic

Now we're left with a list of 462 sites, but we want to make sure that these sites all have traffic.

We're going to jump over to Ahref's and we're going to check all of our URLs in the batch analysis tool. This will give us an estimate of their current organic traffic. We can only such 200 sites at a time, so if you have a really big list, it's the perfect job for a virtual assistant.

Now that we've run that list through Ahref's, we can do a little bit more cleaning up and eliminate anything without enough traffic.

We're left with a list of 420 suitable sites.

So, each one of the blogs on our list:

  • Contains one of our long tail keywords 
  • They're all on a blogging platform like WordPress 
  • they are all looking for advertisers 
  • they all have traffic

There are plenty of steps that I could have scaled up to get a much bigger list of blogs, but that's a pretty decent start.


Now I’d like to hear from you.

Perhaps you have a question about something you read?

let me know by leaving a comment below…