Social is bad for search, and search is bad for social

November 13th, 2011

In the last two years, the concept of “social” inputs to web search has been heavily promoted. We show that social inputs to search encourage spamming to the point that search quality degrades. These attempts to pollute search are filling the “social” world with junk. An entire ecosystem has come into being to assist with search engine social spamming. Fighting this ecosystem is possible, but not easy.

Full paper: “Social is bad for search, and search is bad for social”.

Now available – new Firefox plug-in for improving search results

August 18th, 2011

We are releasing an initial version of SearchRater, our plug-in for Firefox. This plug-in adds SiteTruth ratings to search results for Google, Bing, Yahoo, Blekko, and DuckDuckGo.

Click here to go to the installation page. After installing the plug-in, try searching for some heavily spammed terms, such as “New York locksmith” or “discount drugs”.  We think you’ll like what you see.

(This is an experimental version of the plug-in. Please report any problems with comments here.)

Report on Google AdSense ads

April 10th, 2011

Report on AdSense advertising domains for the 60 day period beginning 2011-02-09.

Site ownership verified.

  Normal:           1055   9.3%
  Blocked:            16   0.1%
  Non Commercial:    242   2.1%
TOTAL:              1313  11.6%

Site ownership identified but not verified.

Normal:             2741  24.1%
Blocked:               9   0.1%
No Location:         506   4.5%
Non Commercial:      229   2.0%
TOTAL:              3485  30.7%

Not rated.

No Website:          254   2.2%
Non Commercial:     2194  19.3%
TOTAL:              2448  21.6%

Site ownership unknown or questionable.

Blocked:              67   0.6%
Negative Info:         6   0.1%
No Location:        4036  35.5%
TOTAL:              4109  36.2%

Total advertised domains reported: 11355


SiteTruth collects data on Google AdSense ads, and measures the quality of the advertisers using SiteTruth’s usual methods. This data is collected by our AdRater browser plug-in, which rates ads as they appear. We use this data to monitor advertiser, not user, behavior.  Above are the results for a recent 60-day period. There is an English-language bias to this data, as our plug-in is offered only in English.

We saw 11355 different domains promoted via AdSense ads. These are the domains linked to by the ads, not the domain on which the ad appeared.

12% of sites get our highest rating, down from 14% in 2008.  36% of the ads rate our “site unknown or questionable” rating. This percentage has held constant since we first did this analysis in 2008.  21% of sites are not rated. These are typically blogs, or sites where we don’t see commercial activity or ads.  This is twice what we saw three years ago.

Entries marked “No Website” reflect domains that no longer have live web sites. There’s a considerable amount of churn in AdSense web sites. Less than a third of the web sites we’ve ever seen referenced in an AdSense ad are still live today.

“Blocked” sites are those with “robots.txt” files which forbid us from examining their contents. These are quite rare.

In summary, the nature of the Google AdSense customer base has not changed much in recent years.

Outage – March 8, 2011

March 8th, 2011

Sitetruth.com was inaccessible due to a denial of service attack on our hosting provider’s routing servers from 0624 PST to 1155 PST today. The system is now functioning normally.

Google makes “real time results” more prominent. Their spam problem gets worse.

February 11th, 2011

googlesearchspamresults

Google recently “real time results” more prominent in search results. So what are the “real time results”? When we search for “Google search spam”, we are shown an article from someone who just discovered how to use the “-” flag in Google search as a news item. We then get  Twitter spam about that news item.

This is yet another failure of  “crowdsourcing”. It’s too easy to spam.

SiteTruth search plug-in

January 19th, 2011

We now offer a Firefox/Internet Explorer search plug-in for SiteTruth. This plug-in makes SiteTruth search easily available in your browser toolbar.

Install the SiteTruth search plug-in.

Press coverage of web spam and Google

January 11th, 2011

Web search spam is now a public issue. Coverage has moved from Search Engine Watch to TechDirt to the New York Observer to the Atlantic to the New York Times. Google’s decline in search quality is now widely recognized. The popular press has difficulty pinpointing the problem. The unhappiness of users, even if not clearly expressed, comes through clearly.

We, of course, can fix this. That’s what SiteTruth is all about. Find the business behind the web site, use automated due diligence to rate that business based on hard information about the business, and use that rating to move the less legitimate businesses down in search results. We use information obtained from reliable sources such as corporate registrations, the U.S. Securities and Exchange Commission, Dun and Bradstreet, and the Better Business Bureau. Those sources are difficult to spam. That’s our patented technology.

Our position is clear: if a web site is selling something, which includes sites with advertising, the legitimacy of the business behind the web site matters.

“Places” spam – the new front in the spam wars.

December 27th, 2010

On October 27, 2010, Google released a major change in their primary search engine. For the first time, results from the “Google Places” system, previously confined to map-related searches, were merged into Google’s main search results. Search results now contain more information about local businesses, and those search results appear prominently, near the top of web search results.

Spamming Google search results is easier and cheaper since the merger of Google Places results into web search. In only two months, effective techniques for spamming Google Places have come into wide use. Search quality as perceived by users is deteriorating. Industry sources are critical of Google’s inability to deal with the problem.

We have issued a white paper on this topic.

“Places” spam – the new front in the spam wars.

SiteTruth maps of business locations

September 12th, 2010

When SiteTruth locates a business, it now maps its location. This feature is experimental, and feedback would be appreciated.

We’ve also added one-click access to Securities and Exchange Commission filings for a business.

SiteTruth now accessing U.S. Securities and Exchange Commission data

September 3rd, 2010

SiteTruth is now using data from the U.S. Securities and Exchange Commission to help identify companies. We are now able to provide a valid business address for larger companies that may not be listed in “Yellow Pages” type directories.  We now provide some basic financial data for some companies. This gives web users an idea of the size of the company they’re dealing with.