Technical Reviews and Marketing: March 2013

Thursday, 21 March 2013

Recovering from a Hacked Website 3 — Off-page SEO

Recovering from a Hacked Website 3

Part III, Off-page SEO

the relative importance of on-page and off-page SEO

This is the third-part of a series of articles where I share with you the steps I took to recover my website Celtnet Recipes from a combination of years of neglect and from being hacked.

In the first part I discussed how I removed the malware and hacked content and then protected my site from further attacks.

In the second part I wend through how I updated my web pages, making then cleaner and making them load faster whilst ensuring I had unique content and unique and meaningful meta tags (on-page SEO, in other terms).

In this part I will go through some of the off-page SEO efforts that I have been undertaking on the site.

In a nutshell, off-page SEO means getting links to your site and promoting your site through social media sites and content sites like Google+, Twitter, Facebook, StumblUpon and Pinterest. I use all of these, either manually, or through sharing my RSS feeds with them.

In terms of links, I started with RSS feeds. New content was automatically pushed to an RSS feed and old content as well as essentially static content was pushed through another feed. These feeds were published to feed aggregator services and were automatically published to Twitter and Facebook.

For Google+ I semi-automated publishing by loading my RSS feeds in Google reader and then I published any new content to G+ (google+ do not like duplicate content in your Google+ pages).

That was the social media taken care of. Though article directories are not valued as highly in Google as they once were, they can still be an useful source of in-bound links, so I started an article campaign for each page that I updated as I updated it.

Also if a page or a part of my site reached a milestone or an important event or I could link it to something in the news then I released a press release about it.

Next I began guest blogging (this is by far one of the best ways of getting in-bound links with real Google 'juice' these days). I also started commenting on various blogs to get my name seen and site's profile improved.

Next I hired a few gigs on Fiverr to get some of this work done for me. There are scam artists there, but there are also some real bargains to be had. The warrior forum is also a good source of information and potential hires for out-sourcing this work.

Within a month of starting my income was beginning to increase again and I was able to use that increase to hire two SEO companies with different strategies to begin link-building strategies for me. We're just at the start of this process, but it's looking promising and I hope to give the two companies more URLs next month.

This is going to be a long slog, but I can see improvements in SERPs (an more importantly in income) already. As long as this improvement continues, in three months I will be back where I was before the trouble started and in six months I might even be able to live off the income from my website for the first time...

But I am trying not to think about that yet, as there is still a considerable amount of remedial work to be done, and in the meantime I still need to get the day-to-day work of adding content and updating old content going.

In the next article I will go into detail about a new aspect of SEO related to content ownership, rich snippets and Google+

Tuesday, 19 March 2013

Recovering from a Hacked Website — Part II, On-page SEO

Recovering from a Hacked Website 2

Part II, On-page SEO

SEO text with arrow going upwards. SEO optimization of web pages.

In the first part of this series on recovering from a hacked website I detailed how I discovered and removed the badware from my site and then hardened my code so that it could not easily be hacked again.

Today is the second part of the article series, detailing how on-page SEO was improved.

The truth is, even had my site not been compromised, because I was using it as learning platform for programming and site development the look and feel had become dated and much of the code was bloated.

Also, as I had moved to cached pages to improve response time and reduce server load on my MySQL databases I had geo-location scripts that were not working. Even the geo-location scripts themselves were causing problems as they were also hitting my MySQL instance. I'm UK based, but almost half my visitors were from North America, so I needed to serve slightly different content to the two audiences.

My first solution was to move to a static file-based geo location system that did not use the database. This freed up MySQL resource, but increased server load ‚Äî not an optimal solution. Then I came across Google's JavaScript APIs. They had already solved the problem, and by integrating this with some nifty JavaScript to load my ads and other content I could have cached pages that remained dynamic for some content without the need to bring in external IFrames... sorted!

OK, one headache removed. What about page design and removal of bloat?

I needed to keep some of my original headers, but wanted to move to a cleaner design and cleaner, more readable fonts. I took Wikipedia as my inspiration, then dusted off my CSS files and stripped out my base three column format definition.

I designed that in CSS a few years ago to give me a header, left column (1), central column (2), right column (3) and footer. There was a nasty hack in there for older internet explorer versions, but a quick internet search showed me how to fix that.

Now I had a three column definition with no IE hack, but which still kept the useful feature of having the central, main column, content come first, right after the header, making it more visible to search engines. So my columns in the page code came in the order 2, 1, 3. But with the CSS they all appeared in the right places.

I had the base CSS file defined and the main header I kept from the original files, with a few tweaks.

I added my new font definitions and some extra definitions for headers and the appearance of the left and right columns. I had a functioning CSS definition that was clean and which was 1/3 the size of my previous version. I also had a separate CSS definition for printing that omitted the left and right sidebar content in the printed file and stripped out ads. I also updated this based on my display CSS and, again, the file size reduced.

Now that I had my CSS I used an on-line tool to strip out the extraneous spaces, reducing the file size then I cached a compressed copy and called this from my web pages. Clean and compressed CSS, a huge space saving.

For the main web pages themselves I also added page compression using the following PHP code in the PHP section before the main HTML code:

if (substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip'))
ob_start("ob_gzhandler");
else
ob_start();

This being completed with the PHP code:

ob_end_flush();

at the very bottom of the file.

Now the pages would be sent compressed to any web browser that would accept compressed pages. A bandwidth and speed win.

The next problem was a little trickier. As my site had grown, I had lost sight of what I had been doing with the content. I had my main Celtnet Recipes home page ++http://www.celtnet.org.uk/recipes/++ and beneath that were region pages to regions of the world I had recipes for, for various countries I had recipes for and for ingredients and cookery techniques that I also had ingredients for.

To increase my chances of the links being indexed I had set a limit of 100 recipe links per page, with additional recipes on the succeeding pages. This was fine whilst my site was growing, but eventually I had hundreds of recipes for some pages and thousands for others. This meant that the content of my home page for that country/region/ingredient etc was being replicated hundreds and hundreds of times. Though the links were different on each page because the main text was often more content than the links, that meant massive duplicate content and duplication of <TITLE> and <DESCRIPTION> tags in the page headers.

This, of course, is VERY BAD SEO. So, I updated my headers to have the page location defined. So, the first page was the 'Home' page, then the '2nd', '3rd', '4th' pages etc. This meant that the title and description tags for each page were different.

For the duplicate content problems, I added a page definition of what the page was all about to the top of the page. This was essentially the same for each page, but I did add a new page number definition.

Next I wrapped my main text content in PHP so that it would only display on the home page, then added some boilerplate to all succeeding pages of the form 'This page is a continuation of my recipes listing from Britain, the nth page in fact. For more information about Britain and its cuisine, please visit the <home page>.'

or words to that effect. The overall effect was to make the text content of the pages less than that of the link list content below them so that the web search engines would see the pages as unique content below the home page for that entry.

Now I had pages with cleaner HTML, neater CSS, that were more readable for humans and which were more unique in terms of content than they had been before. With compression and caching they also loaded faster.

Next I validated the code as far as I could. This did reveal a few glaring problems that I fixed, but because I had some third party code and Web 2.0 buttons, the pages did not validate completely. But I did not worry about that as the main content was fine.

To get some extra load time benefits I played around with asynchronous JavaScript loading for some of the ads and some of my own JavaScript (the main page search runs off JavaScript and I also have JavaScript that adds a copyright notice to any text copied from the site).

Again, this brought load time benefits.

From a very bad start I had significantly improved my web site's appearance, on-page SEO and load times. Of course, replicating this throughout the site was (and is) a big issue and it's still on-going. But it's giving me a chance to completely overhaul the content and to make things a lot better overall.

This is very much a work in progress, but it's getting there.

Next time I will be talking a little about something that we all hate, but have to do... off-page SEO.

Monday, 11 March 2013

Recovering from a Hacked Website — Part 1

Recovering from a Hacked Website

Part I, Defending your Site

About 18 months ago my website was hacked quite badly and this series of articles is about how I recovered (or am recovering from that) with a few insights and tips along the way as to how you can avoid what happened to me in the first place.

I've been on the web since about 2003 and in 2004 I began my site, Celtnet (http://www.celtnet.org.uk). About a year later I started a new section on the site, Celtnet Recipes (http://www.celtnet.org.uk/recipes/). Initially I added a bunch of Welsh Recipes there to go with the various Welsh Legends and lists of Celtic gods I was adding to the main section of my site.

The site was growing and it needed to be more dynamic, so I move over to PHP on top of a MySQL database as my content delivery platform and added a forum as well as an article directory.

Over the years the recipe section grew, initially just with those recipes that I found interesting. Then I started on a personal project to add to the site recipes from each and every country in Europe and then recipes from each and every country in Africa. All the time I was also adding traditional and British recipes.

By 2010 the British and African recipe sections had grown into some of the largest on the web and I was getting lots and lots of visitors. This was converting into quite a decent income.

Neo with bullets from the matrix. How to make your website hacker-proof.

Neo from the Matrix stopping bullets. How to make your
website bullet (and hacker) proof.

However, though I was expanding the site and updating the code quite often, it was still really only a hobby site for me. I had begun some limited SEO and was working on the in-bound links to my site, but nothing serious.

Then, in 2011 things went a bit awry in my life and I lost interest in pretty much everything. The site was left on tick over and in 2012 it was hacked through a vulnerability in the forum system phpBB. My rankings in Google began to tumble and it was not really until February 2013 that I began to take notice.

By that time things had gone very wrong indeed. All the most popular sections of the site had been compromised and I was nowhere to be seen in the searches that I had previously been most popular for.

Some things were historic, I had cleaned out the bad code, but the Google spiders had stopped coming and I had pharmaceutical and adult spam all over the place (where the header to files are compromised and Google spiders or google referrals see different content to what someone coming directly to the site sees).

As I say, I had been running my site as a hobby site, so I quickly had to learn how to harden my site and in the process I decided to give my site an overhaul to improve my SEO and to get more links.

The first thing was to go over all the code with a fine-tooth comb and to remove some of the riskier JavaScrips I was running. This meant that two of my advertisers had to go as they relied on public domain code that was just too easily compromised (this had been responsible for some of the exploits ‚Äî redirects to spam sites).

Next I found that some of my SQL query code had not really been optimized. The main database had grown so big that queries were failing, so I went through all my SQL and optimized everything. That got the site back up and running again at optimal speed and queries were running once more.

But to reduce the load on the server overall I decided to cache my pages, also in the hope of making page queries faster.

But this exposed another vulnerability. As the site runs on PHP potential SQL inject vulnerabilities and insertion of malicious code into the cache became a possibility.

So, every page that required a variable from a get or post statement had 'htmlentities' wrapped around the variable to prevent malicious code being inserted.

for example, say I require a variable timeStamp to be passed to the script.

my original code would define this as:

$timeStamp = $_GET['ts'];

but the new header code defined:

$timeStamp = (htmlentities($_GET['ts']));

so that every malicious character like '#', '%', etc is encoded as an HTML entity.

Next I put checks around every variable, to ensure that they were valid. So, if I expected a variable to be numeric only I checked for that, or if it had a specific format I checked for that too.

The most pain was with searches, of course. But there I made a list of all potentially malicious characters and common malicious code strings and I stripped those from any user input before performing any searches or caching any pages.

Next I deployed BadBehavior throughout my site. This stopped the majority of known malicious bots from even getting to see any of my pages and malicious attacks dropped considerably (server load also dropped as a result so it was a win-win).

Now I checked permissions on all my directories and upgraded or changed them then I updated all third-party software to the latest versions and if there were problems with any of them I changed over to something else.

With the site suitably protected and hardened, the next step was to undo some of the damage done to the overall SEO.

I will detail those steps in the next article...

Technical Reviews and Marketing