May
16
Web Analytics tools are nice to show you how many people are visiting your site, but their value can go well beyond that. Here is one simple way you can use your web analytics package to gain information you can immediately use to improve your site and generate more traffic.
There are two reports in most analytics packages that show how people found a specific page on search engines.
Keyword Phrase Report - This report show the entire phrases that searchers typed into the search box to find your site. “Blue Widgets”, “Red Widgets” would be an example.
Keywords Report - This report list the individual words searchers used to find your page. “Widgets”, “Blue”, “Red” would be an example.
I will explain on way to get information you can immediately act upon out of your Keywords Report.
On the surface, especially at the site level, this report may not tell you anything you don’t already know. If your site is about widgets, “Widgets” will of course be at the top of the list day after day, likely followed by the other primary keywords your site generally targets.
Filter the Keywords Report down to the single page level.
Pick a specific content page to target and improve, and filter the Keywords Report to only show keywords for that one page. Ideally, pick a page that gets decent long-tail traffic. If you can, open up the report in a window right beside the webpage itself so you can easily compare the two side by side.
Look at the words people are using the find the page. You will probably see a long tail trail off of a handful or two of words a lot of people use, trickling down to lots of words only a few people use.
Start by focusing on your top 5 words. Look at how you use them on the page. Check the title tag, headings, and links. Usually this won’t be too surprising. The top 5 words are likely prominent in these places.
Keep going down the list until you start seeing keywords that you don’t prominently use. What are they, and how do they relate to the page? This is where the opportunity is hiding.
Are groups of people finding the page using a word you didn’t expect or target?
Take Action!
- Re-write your title, headings, links, and content to include some of these newly found keywords. However, don’t remove or de-emphasize the other more important words. If that doesn’t make sense for this page and these words, then:
- Consider creating new pages to target these specific keywords.
Not every page will have these opportunities staring you in the face. Pick 10 pages to go through, and a bet you’ll find a couple good opportinities for improvement.
Don’t force these changes into any page where they don’t make sense. In my experience, the best page modifications not only include these keywords for search engine sake, but also make the page better for human visitors as well, after all, you’re now speaking their language.
Feb
12
The Two Key Factors for Ranking Well in Search
Filed Under Google, SEO | Leave a Comment
It’s been said that Google takes over 100 different considerations into account when figuring out where to return your page in a search result. Even though that sounds like a lot, all those 100+ factors can be boiled down into two core concepts. To rank well, you need to maximize both.
Ranking = Strength + Relevance
Page Strength
I know a lot of SEO’s love to say PageRank is dead. I disagree, and see evidence of it every day. Sure, PageRank is certainly not as important as it used to be, but it’s certainly not dead.
PageRank is a measure of how important a page is on the Internet as a whole. Every page that links to your is essentially counted as a vote of confidence for the page. Links from important pages like the New York Times count more than links from lesser pages like your brother’s personal blog about his dog. To over-simplify things greatly, Google’s PageRank score is a sum of the value of all those votes in comparison to all the other pages on the web.
There are other factors adding to a page’s strength as well. Such as:
- Is the page new?
- Has the page and/or domain been around for many years?
- Is the domain registered for a long term between renewals, like 5 years?
- Is the domain or page penalized for not following the webmaster guidelines?
- Is the domain selling links?
- Is the page located in Wikipedia? (OK - maybe not that one)
I know I’m missing some factors, but you get the general idea about what might play into strength.
What’s clear though, is that your page is never going to rank well on strength alone.
Ranking well needs one additional key factor…relevance.
Query Relevance
If strength was all that was required, a site like www.google.com would rank #1 for all queries…it’s a PageRank ten out of ten. That obviously would not make sense.
Relevance is a measure of how related a page is to what the user has search for. Your brother Joe’s blog about his dog, even though not a strong site, may very well be the most relevant destination for people searching for “joe dog blog”.
Relevance factors include:
On-Page:
- Query Terms in the title?
- Query Terms in header tags?
- Query Terms in body text?
- Query Terms in outgoing link text?
Off-Page:
- Query Terms in incoming link text?
- Query Terms in “on page factors” of the pages linking in?
It’s possible for a page to rank well based on a strong relevance while having a low pagerank, which is why some people say pagerank is dead. (That doesn’t mean PageRank is dead, it just means there are other factors than can overcome a low pagerank.)
Burn the Candle on Both Ends
Strength and Relevancy are simple high level concepts that can be easily explained and understood by both technical and non-technical users alike.
Some tasks, like link building, will have an impact on both.
For a strong ranking, your strategy should be to maximize both strength and relevancy. Don’t give one the blind eye.
Dec
13
Cut-off Your Head; Grow a Longer Tail
Filed Under SEO | 2 Comments
A couple months back, I decided to take a calculated risk with my most popular page; I cut off my head. It’s worked out for the better, and is an interesting case study. Let me explain…
A page on one of my sites was ranking a very solid #1 for a single head keyword. That’s great, right? Maybe.
Let’s say my site is about Widgets. This one head keyword, let’s call it WidgMaster was bringing in a decent amount of traffic.
While the page was certainly related to the WidgMaster variant of Widgets, it was also very relevant to learning about widgets in general. As evidence, the page currently ranks on page 3 of Google for the much more broad “widgets” term.
In terms of traffic, the page was successful, but unnecessarily pigeon-holed into a very small niche.
It’s worth noting that this page is linked to more heavily than any of the other internal pages on this site, having been on the front page of Slashdot once, and digg twice. It has more “link juice” than any other page on the site, and pulls in three times as much traffic as the homepage.
I wanted to see if I could make the page pull in a more general audience. I was looking for more “Widget” focused long tail traffic, as described very well recently in “Deep Links, Longtail Keywords, and Why you Should Love Them Both.
I performed a simple and minor, but crucial strategic change. I figured the chance of success was 50/50, and there was certainly potential for traffic loss.
I reordered the words in the page title.
Learn about WidgMaster Installation and Configuration - Widgets
Became…
Learn about Widget Installation and Configuration - WidgMaster
Almost immediately, I lost the #1 spot that I had held for WidgMaster, which had been solid #1 for over a year. The page now ranks around #6 on Google, high enough to still get a trickle of traffic, but less than 10% of what was previously coming in for that keyword.
Interesting, and supporting my initial hypothesis, the traffic levels stayed about the same.
I was hoping for an increase, but the resulting wash isn’t all bad.
The page now ranks much better for more general long term “Widget” phrases. The resulting visitors are less WidgMaster focus “one hit wonders”, and much more interested in sticking around my site after their initial landing.
While this strategy is not right for all situations and pages. Much more usable for “short head” terms than “tall head” terms, it’s certainly a strategy and method that you should consider experimenting with on a small scale, and having in your SEO Bag o’ Tricks.
Dec
11
Introduction to SOLR “Enterprise Search”
Filed Under Lucene | Leave a Comment
SOLR bills itself as an open source enterprise search engine. I would not go as far as to call it “enterprise,” but certainly believe SOLR is a nice value added wrapper around the already powerful Lucene package. It may become “enterprise class,” but it has a ways to go. That being said, it’s certainly good to aim high!
Raw Lucene search engines have a few issues that SOLR very nicely addresses. SOLR addresses Lucene issues including:
- Platform Lock-in (Java)
- Indexing requires custom Java coding
- Existing document update issues
- Index Replication
- IndexSearcher warming (fake automated queries to prepopulate the cache)
SOLR runs as a java web application. The nightly version os SOLR that I downloaded came bundled with Jetty in a very easy to run, test, and even deploy into a light duty production role.
Instead of being a Java library that you use directly from your Java code, SOLR works as a web application that you POST documents via HTTP to index documents, and query using HTTP GET requests. This HTTP interaction means your application does not need to be written in Java. Your application can be in any language that can post data and request data via http.
When it comes to setting up your index, instead of having to code in your field information into Java code. It is setup in an XML file. This file, among other things, includes the list of document fields and document primary keys needed to maintain your index.
If you are familiar with Lucene, my mention of the primary key may have peaked your interest. To those not familir with Lucene, it does not have any notion of primary keys or document updates. To update a document in Lucene, you first needed to locate and delete the previous version of the document through a rather indirect process. When provided with a primary key, SOLR will handle that process automatically for you.
The SOLR example that is included with the nightly build includes a failr simple script and sample documents to show how indexing works. Run the examples to index the samples, then you can run searches through the admin interface. To use the search results in your own application, you query the same url as shows up in the admin interface, and parse the xml response.
As an initial experiment, I have used SOLR to index product information on my sheetmusic site. It’s used for the sheetmusic searchengine as well as related products query. My searchengine implementation is only really “quick hack” quality (I have not implemented next/previous page links yet) , but the related products part usage is more polished.
Dec
11
High Performance Lucene Indexing
Filed Under Lucene | Leave a Comment
In some instances, indexing into RAM rather than direct to disk can create a large indexing performance increase. Here’s one way to do it. You may need to increase the Java JVM memory parameters with the arguments -Xms128M -Xmx256M, of course modifing the sizes to fit your needs. Tweeking the foldCount size with affect how much memory is required by setting how large the RAMDirectory is allowed to grow in terms of the number of Lucene Documents it can hold. Each time the foldCount is reached, and/or when indexing is complete, the index will be flushed to disk.
Lucene Example Code: RAM to Disk
int foldCount = 500000; int indexSize = 0; int count = 0; try { RAMDirectory ramDir = new RAMDirectory(); IndexWriter ramWriter = new IndexWriter(ramDir, analyzer, true); IndexWriter writer = new IndexWriter(indexDir,analyzer,true ); writer.mergeFactor = 100000; while(rs.next()){ Document doc = new Document(); ramWriter.addDocument(doc); count++; indexSize++; if(indexSize == foldCount){ foldToDisk(ramDir, ramWriter, writer); ramWriter = new IndexWriter(ramDir, analyzer, true); indexSize = 0; } } foldToDisk(ramDir, ramWriter, writer); writer.optimize(); writer.close(); } catch (IOException e) { e.printStackTrace(); } public static void foldToDisk(RAMDirectory ramDir, IndexWriter ramWriter, IndexWriter writer) throws IOException { ramWriter.close(); Directory dirA[] = new Directory[1]; dirA[0] = ramDir; System.out.print(”.”); mergeDirs(writer, dirA); System.out.println(”.”); }
Dec
11
Multi-Field Lucene Example
Filed Under Lucene | Leave a Comment
Storing and search more than one field is very easy to do in Lucene — This can make your lucene search engine much more powerful!
Tip: If you’re not already familiar with how to index and search single field documents, this is intended to build on our Simple Lucene Example.
Lucene Example Code: Multi-Field Documents
Create a Lucene document with more than one field
String content = "This is the example text I want to have Lucene index";
Document doc = new Document();
doc.add(Field.Keyword("keyword","Java"));
doc.add(Field.Text("title","My Document Title"));
doc.add(Field.Text("content",content));
You would then add the document to the index like normal.
Create a Lucene MultiFieldQueryParser
String fields[] = {"keyword","title","content"};
String queryString = "Java";
try {
Query query = MultiFieldQueryParser.parse(queryString,fields,new StandardAnalyzer());
} catch (ParseException e) {
System.out.println("Lucene ParseException: " + e. getMessage);
e.printStackTrace();
}
Read the additional fields from the returned hits
int hitCount = hits.length();
for(int i=0; (i < hitCount && i < 10); i++){
Document doc = hits.doc(i);
System.out.println(doc.get("keyword") + ", " + doc.get("title") + ", " + doc.get("content"));
}
That’s it!
That all you need to do to take the step to multi-field Lucene documents and searching from a single field.
Dec
11
Simple Lucene Example
Filed Under Lucene | Leave a Comment
Lucene is a great core for a Java search engine. Here is simple Lucene example code to index simple single field data along with a very basic search function. This will create simple Java search engine. For this simple lucene example code, each block is catching the thrown exceptions so you can see what is thrown. In a real world lucene implementation, you may handle this differently.
Lucene Example Code: Steps to Index the data
- Create a new Lucene index using an IndexWriter
Create a Lucene Document
Add the Lucene document to the index
Optimize and close the index
Create a new Lucene index using an IndexWriter
String indexPath = "/path/to/whereYou/wantThe/IndexStored"; IndexWriter writer = null;
try {
// Make a lucene writer and create new Lucene index with arg3 = true
writer = new IndexWriter(indexPath, new StandardAnalyzer(), true);
} catch (IOException e) {
System.out.println("IOException opening Lucene IndexWriter: " + e.getMessage());
}
Create a Lucene document
String content = "This is the example text I want to have Lucene index"; Document doc = new Document();
doc.add(Field.Text("content",content));
Add the document to the index
try { writer.addDocument(doc);
} catch (IOException e) {
System.out.println("IOException adding Lucene Document: " + e.getMessage());
}
Optimize and close the IndexWriter
try { writer.optimize();
writer.close();
catch (IOException e) {
System.out.println("IOException closing Lucene IndexWriter: " + e.getMessage());
}
Lucene Example Code: Steps to Search the Lucene Index
Open a Lucene IndexSearcher
IndexSearcher indexSearcher = new IndexSearcher(indexPath);
If you are using the Lucene search engine from a web page, you should store and reuse the same IndexSearcher for each query. The Lucene IndexSearcher caches information to make queries after the first one faster. Reusing the Lucene IndexSearcher also takes it easy on the Java garbage collector, increasing performance and memory utilization. Not reusing the IndexSearcher is a common mistake and cause of frustration for many first time lucene users. For use on the web, here is some simple JSP code to store the IndexSearcher in an application attribute and reuse it for future page loads.
indexSearcher = (IndexSearcher) application.getAttribute("searcher"); if(indexSearcher == null){
indexSearcher = new IndexSearcher(indexPath);
application.setAttribute("searcher",indexSearcher);
}
Construct a Lucene Query
String queryString = "example"; try {
Query query = QueryParser.parse(queryString,”content”,new StandardAnalyzer());
} catch (ParseException e) {
System.out.println(”Lucene ParseException: ” + e. getMessage);
e.printStackTrace();
}
Have Lucene perform the Search
Hits hits = null; try {
Hits hits = indexSearcher.search(query);
catch (IOException e) {
System.out.println("Lucene Searching Exception: " + e.getMessage());
}
Display the top Lucene Hits
int hitCount = hits.length(); for(int i=0; (i < hitCount && i < 10); i++){
Document doc = hits.doc(i);
System.out.println(doc.get("content"));
}
That’s it!
Those are the bits needed to create a simple, one field, Lucene search engine in Java. In terms of the try and catch block and variables, you’d probably implement things in a more combined manor, but the samples on this page are designed at least at some level to exist in isolation from each other.
Want a more powerful Search Engine?
I also have a Multi-Field Search Engine Example if you want to get a little bit more powerful.
Dec
11
On-Page SEO is not about Optimization
Filed Under SEO | 2 Comments
When performing on site optimization, it’s important to keep your eye on the ball.
Here’s a hint; SEO is not really about code and content optimization.
Seriously, many of us can’t see the forest through the trees. Stop looking at the trees, and start enjoying the beauty of the forest.
What do I mean by that? Most of the SEO discussion and articles about on-page optimization are looking at it backwards.
It’s not about performing the optimizations that the search engines are looking for. You know the drill…
- Optimized and Unique Title Tag
- Proper use of header, strong, list tags, etc.
- Link to related content using keyword rich anchor text
- Build content silos around a tightly focus niche
- etc. etc.
While that may work, you’re going through the motions but missing the entire point.
Here’s the point you may be missing.
It’s NOT about performing optimizations.
Search engines don’t want to return highly optimized pages. They really don’t.
It’s IS about making quality sites.
Search want to return high quality pages that are relevant to search terms.
So, on a page by page basis, how to you make a high quality site?
- Optimized and Unique Title Tag
- Proper use of header, strong, list tags, etc.
- Link to related content using keyword rich anchor text
- Build content silos around a tightly focus niche
- etc. etc.
Yes. That’s the same list you already saw.
So what’s different? Your perspective, the process, and the end result.
If you look at it from the pure SEO side, it’s entirely possible to “check-off” all the items on your list without really increasing, and possibly even reducing, the quality of the page for actual users.
If you look at it from the page quality side, but with an eye on SEO guided principles, you end up with a highly optimized quality page, because that was your goal. The more I look at things from this perspective, the more I see how SEO and page quality go hand-in-hand.
Google and (some of) the other search engines know this. They don’t pick their on-page weighting factors at random. They choose and weight them because they are generally indicators of higher quality pages for their search results.
Once your mental paradigm makes the shift, you will find yourself think less and less about optimizing pages, and more and more about how to actually make your pages better for users.
It only makes logical sense that building high quality pages is a better long term strategy that just trying to optimize content for the sake of optimizing content.
Stop giving search engine spiders optimized pages. Take the next step and start giving them the high quality pages they are really looking for in the first place.
Dec
5
MSN Comes Clean on Fake Search Traffic
Filed Under Microsoft | Leave a Comment
Microsoft has finally officially commented on the fake traffic they’ve been sending to websites recently.
Recently I commented about how Microsoft Live Search has been stuffing our log files with bot traffic pretending to be human.
Basically they say, “we’ve been doing bad stuff for the last EIGHT months that screws up your metrics and disregards internet respect standards, but it’s OK, because we are going to stop soon.”
If anything, their statements strengthens my mistrust.
At least now I know how far back I can’t trust any traffic from MSN. They offer no solutions for how to scrub our log files (which I’ve already spent time trying to do myself.)
Maybe the real problem is one of scale. Google and Yahoo probably do this too, but it’s different. Google and Yahoo (to a lesser extend) send real traffic volumes. Any stuff like this that Google and Yahoo do ends up being noise that is not noticed.
With MSN…these queries are more than just noise; it can end up being the bulk of MSN traffic coming in. In one case, the bogus queries continually use a keyword phrase that I am trying to optimize for and measure results around.
The only reason I even noticed this problem is I saw a large spike in traffic to that phrase, but could not figure out any ranking change driving the increase. The bogus Microsoft traffic was enough to cut through the Yahoo and Google traffic for that phrase, and lead me down the path to discovering this problem.
At least Microsoft has commented. They have not apologized or offered any solutions. In my mind they’ve only gone as far as saying they are playing dirty pool (but plan to stop soon!) The damage is done. I’ve said my piece. What are you thoughts on the matter?
Oct
18
Is Microsoft Live Search stuffing our log files?
Filed Under Microsoft, SEO | 7 Comments
On one of the large sites I SEO and monitor traffic for, we’ve seen a large bump in search.live.com traffic over the past few months. The keywords used don’t always make sense for the page being landed on, and the keyword phrase is always a single word.
I’ve assumed (shame on me?) that it was a bug somewhere related to search.live.com that is only allowing the first keyword through to our WebTrends Analytics tool. Andrew Urquhart has commented on his similar theory for the problem.
For yesterday’s stats, MSN had a significantly large volume of traffic that just didn’t seem to make any sense. If it’s wrong, it’s no longer statistically background noise, but rather seriously impactful data that puts the legitimacy of the log data into question.
Looking for answers, I directed my browser over to WebMasterWorld to check out the MSN forum. The Strange Referrer Activity post was right near the top. Among other things, this post on Sept. 5th 2007 was interesting, and a major cause for concern:
Thanks for all the feedback on this thread.
First, we appreciate the concerns and issues that have been raised and apologize for any incovenience this might have caused.
Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on
addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.Please keep the feedback and thoughts coming as we will use this to help improve this process and make sure that it impacts your sites as little as possible.
thanks
- msndude (msd)
That seems to confirm that at least some of this may be bogus traffic. How much is bogus? We have no easy way to know…
That should be very concerning to any webmaster who values the credibility of their log data. If nothing else, that puts thousand of historical search.live.com referrals for the past few months into question.
It concerns me that:
- Microsoft seems to not seem to care about messing with the validity of log files for websites on a global scale.
- Microsoft is making it look like websites are getting human traffic from search.live.com which in fact may be a Microsoft bot. I heard the phrase “fake it until you make it”, but that should not be a valid search engine market share tactic!
- There is no obvious and/or official way to correctly cleans the corrupted log files of this bogus traffic. (Not to mention the time I need to spend to do it.)
- Microsoft has mostly been silent on this issue.
- The bot is ignoring the robot.txt rules, and does not identify itself as a bot.
I don’t follow MSN very much since they don’t send very much traffic, so I don’t really know where to look for these things, but I’ve found no official mention of this other than the WebmasterWorld post.
Frankly, this seems to be to be very unethical of Microsoft on a number of levels. For one thing, many website opperators probably think they actually ARE getting an increase in traffic from search.live.com.
If you run a Website, this should concern you greatly. We need answers and solutions. Please comment if you have either.