Recovering from a Hacked Website 2
Part II, On-page SEO
In the
first part of this series on recovering from a hacked website I detailed how I discovered and removed the badware from my site and then hardened my code so that it could not easily be hacked again.
Today is the second part of the article series, detailing how on-page SEO was improved.
The truth is, even had my site not been compromised, because I was using it as learning platform for programming and site development the look and feel had become dated and much of the code was bloated.
Also, as I had moved to cached pages to improve response time and reduce server load on my MySQL databases I had geo-location scripts that were not working. Even the geo-location scripts themselves were causing problems as they were also hitting my MySQL instance. I'm UK based, but almost half my visitors were from North America, so I needed to serve slightly different content to the two audiences.
My first solution was to move to a static file-based geo location system that did not use the database. This freed up MySQL resource, but increased server load — not an optimal solution. Then I came across Google's JavaScript APIs. They had already solved the problem, and by integrating this with some nifty JavaScript to load my ads and other content I could have cached pages that remained dynamic for some content without the need to bring in external IFrames... sorted!
OK, one headache removed. What about page design and removal of bloat?
I needed to keep some of my original headers, but wanted to move to a cleaner design and cleaner, more readable fonts. I took Wikipedia as my inspiration, then dusted off my CSS files and stripped out my base three column format definition.
I designed that in CSS a few years ago to give me a header, left column (1), central column (2), right column (3) and footer. There was a nasty hack in there for older internet explorer versions, but a quick internet search showed me how to fix that.
Now I had a three column definition with no IE hack, but which still kept the useful feature of having the central, main column, content come first, right after the header, making it more visible to search engines. So my columns in the page code came in the order 2, 1, 3. But with the CSS they all appeared in the right places.
I had the base CSS file defined and the main header I kept from the original files, with a few tweaks.
I added my new font definitions and some extra definitions for headers and the appearance of the left and right columns. I had a functioning CSS definition that was clean and which was 1/3 the size of my previous version. I also had a separate CSS definition for printing that omitted the left and right sidebar content in the printed file and stripped out ads. I also updated this based on my display CSS and, again, the file size reduced.
Now that I had my CSS I used an on-line tool to strip out the extraneous spaces, reducing the file size then I cached a compressed copy and called this from my web pages. Clean and compressed CSS, a huge space saving.
For the main web pages themselves I also added page compression using the following PHP code in the PHP section before the main HTML code:
if (substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip'))
ob_start("ob_gzhandler");
else
ob_start();
This being completed with the PHP code:
ob_end_flush();
at the very bottom of the file.
Now the pages would be sent compressed to any web browser that would accept compressed pages. A bandwidth and speed win.
The next problem was a little trickier. As my site had grown, I had lost sight of what I had been doing with the content. I had my main Celtnet Recipes home page ++http://www.celtnet.org.uk/recipes/++ and beneath that were region pages to regions of the world I had recipes for, for various countries I had recipes for and for ingredients and cookery techniques that I also had ingredients for.
To increase my chances of the links being indexed I had set a limit of 100 recipe links per page, with additional recipes on the succeeding pages. This was fine whilst my site was growing, but eventually I had hundreds of recipes for some pages and thousands for others. This meant that the content of my home page for that country/region/ingredient etc was being replicated hundreds and hundreds of times. Though the links were different on each page because the main text was often more content than the links, that meant massive duplicate content and duplication of
<TITLE> and
<DESCRIPTION> tags in the page headers.
This, of course, is VERY BAD SEO. So, I updated my headers to have the page location defined. So, the first page was the 'Home' page, then the '2nd', '3rd', '4th' pages etc. This meant that the title and description tags for each page were different.
For the duplicate content problems, I added a page definition of what the page was all about to the top of the page. This was essentially the same for each page, but I did add a new page number definition.
Next I wrapped my main text content in PHP so that it would only display on the home page, then added some boilerplate to all succeeding pages of the form 'This page is a continuation of my recipes listing from Britain, the nth page in fact. For more information about Britain and its cuisine, please visit the
<home page>.'
or words to that effect. The overall effect was to make the text content of the pages less than that of the link list content below them so that the web search engines would see the pages as unique content below the home page for that entry.
Now I had pages with cleaner HTML, neater CSS, that were more readable for humans and which were more unique in terms of content than they had been before. With compression and caching they also loaded faster.
Next I validated the code as far as I could. This did reveal a few glaring problems that I fixed, but because I had some third party code and Web 2.0 buttons, the pages did not validate completely. But I did not worry about that as the main content was fine.
To get some extra load time benefits I played around with asynchronous JavaScript loading for some of the ads and some of my own JavaScript (the main page search runs off JavaScript and I also have JavaScript that adds a copyright notice to any text copied from the site).
Again, this brought load time benefits.
From a very bad start I had significantly improved my web site's appearance, on-page SEO and load times. Of course, replicating this throughout the site was (and is) a big issue and it's still on-going. But it's giving me a chance to completely overhaul the content and to make things a lot better overall.
This is very much a work in progress, but it's getting there.
Next time I will be talking a little about something that we all hate, but have to do... off-page SEO.