Scientific Blogging: Web scraping to optimize sharing

Posted by Anant December 28, 2022

GOAL

Scientific Blogging: Have you ever noticed how some blog posts take off like an intergalactic rocket ship while others… don’t? Using web scraping and basic analysis, you can now isolate some of the differences and optimize future performance.

INGREDIENTS

Import.io
Excel
SharedCount.com
Textalyser.net

PERSONNEL

Growth Hacker

TIME

30 Minutes

OVERVIEW

I’m terrible at predicting the virality of my posts. Sometimes an article I’ve poured hours of writing and editing into will flop while the post I threw together before work drives thousands of visits.

At first, I was just frustrated, but then I remembered: working smart always gets better results than working hard. So I decided to analyze the performance of my current posts in order to optimize my future writing. I wanted to know what day of the week to publish, what headlines to use, and what topics to cover.

STEP ONE

First, you’ll need to scrape your blog to find your post titles, authors, categories, tags, publication dates, and URLs. To do this, we’ll use a web scraping tool. Personally, I’m a big fan of http://www.import.io because it’s free and super easy to use.

After you’ve installed the program, navigate to your blog, select crawler mode, and choose the multiple rows option. Then, you’ll need to highlight all rows of content on your blog index (a row would be a one-post block). Once it’s accurately identifying rows, it will ask you to highlight columns, and you’ll add, one by one, your post title, author name, publication date, URL, and any other information you can find. URL will be the only semi-difficult field to add – just select “html” instead of “text” when you highlight your post’s link, and it will give you the information you need.

After verifying the crawler on at least five pages, you can save this API template.

STEP TWO

Once you’ve trained http://www.import.io, you will run the crawler on your website. First, you’ll need to create a list of your archive page URLs. Open Excel, and create a column that lists numbers from 1 to 50. Then, looking at your website’s URL structure, create a simple formula that combines those numbers with your website. (For my site, the formula might look like this: =“http://www.natedesmond.com/blog/”&A2&“/”) This will give you a full list of archive page URLs for your website.

Then open the crawler, paste in the list of URLs, and run the crawler you created in step one.

Once it finishes, you can export this list of all your blog’s metadata to Excel.

STEP THREE

So now you have all your post metadata, but how do you know if a post performed well? To measure performance, we’ll pull social statistics for each post.

Going to http://SharedCount.com, you’ll upload your list of URLs (which is in the export from http://Import.io). SharedCount will then run an automatic analysis to see how each post performed on various social networks.

After it finishes running, simply export from SharedCount and use a vlookup formula (https://exceljet.net/excel-functions/excel-vlookup-function) to combine your social count data with your post metadata.

STEP FOUR

Now the real fun starts. You’ve got your data in hand, and you’re ready to start finding key insights.

Here are a few data points you can pull:

Popular Authors

To see which author writes the most popular posts, simply add a pivot table (https://exceljet.net/excel-tips/pivot-tables) with one column for the author’s name and one column for the average social score. To make sure your data is statistically significant, you can also add a third column that counts the total posts per author.

Best Day of the Week

Using a day-of-week formula (https://exceljet.net/formula/get-day-of-week-name-from-date), add an additional column that shows which day of the week each article was published. Then create another pivot table showing the day of the week, social score, and the number of posts.

Best Topics

Using a basic text analyzer (http://textalyser.net/), see which words occur most commonly in your titles. Then, for the most popular words, add another column to your Excel table that checks whether each title contains that word (it might look something like this: =IF(ISNUMBER(SEARCH(“wordtocheckfor”,A2)),”Contains”, “Doesn’t Contain”) ) Then just run another pivot table to see whether particular topics increase or decrease the popularity of a post.

SCIENTIFIC BLOGGING CONCLUSION

Now you have actionable steps that can help you make your future posts more likely to become popular. Of course, virality is never guaranteed, but you can increase your chances significantly. Apply your new knowledge.

Recipe courtesy of Nate Desmond (http://www.natedesmond.com)

Ready To Grow Your Startup?TVicon

Get the strategies, motivation, and in-depth interview with all the details every week!

Categories

Popular Blog Posts

26 Simple Websites that Prove Innovation Doesn’t Have...
Unleashing the Power of Unbounce: Georgiana Laudi’s Tips...
Learn How to Grow in a Regulated Market...
Learn How to Develop Internal Data and Analytics...
Discover Alistair Croll’s Data-Driven Approach to Business Growth...

Share On

Are you an
entrepreneur
who is trying
to grow a
startup?TVicon

Get the strategies, motivation, and in-depth interview with all the details every week!