Archive for the ‘Notes on site rating’ Category

Phony version of Ad Limiter

Thursday, April 9th, 2015

There is a phony version of Ad Limiter for Firefox being distributed. The only valid source for Ad Limiter for Firefox is “https://addons.mozilla.org/en-US/firefox/addon/ad-limiter/”. If your add-on came from some other source, it is probably fake. Our own sites, sitetruth.com, and “adlimiter.com” redirect you to that link if you’re using Firefox. (Google Chrome users are sent to the Google Chrome store. This attack does not seem to affect them.)

The current version of Ad Limiter is 2.0. Phony versions have strange version numbers, such as “1009.99.597”. If you’re using Ad Limiter on Firefox, and the version isn’t 2.0, please remove it, then reload it directly from the Mozilla Add-Ons site using the link above.

If you have been affected in any way by this problem, please contact us at info@sitetruth.com. We would like to hear from you.

Software upgrade

Monday, March 30th, 2015

The SiteTruth system has just been upgraded to Python 3. There was a brief outage between 1145 and 1200 PDT. This is in preparation for changes which will result in more accurate business identification.

Improvements to ratings

Sunday, November 16th, 2014

We’re refreshing SiteTruth. Over the next months, our results will become more accurate.

  • We now recognize and check Better Business Bureau seals from more BBB regions. Due to some changes at BBB, we were not recognizing some major markets, including New York City. That’s fixed. Sites with a BBB rating of B or better will get a green checkmark. Sites with a BBB rating of C or worse will get a red do-not-enter icon. Fake seals will negatively affect ratings.
  • SiteTruth now recognizes major Internet content delivery and network management systems, notably Cloudflare and Impervia. Previously, we were identifying some sites hosted through those services as being those services. Many sites were mis-located as being at Cloudflare’s corporate headquarters in San Francisco. That’s fixed. More technical details are in our paper “Who am I talking to?”

More to come.

Over 4,000,000 site ratings

Saturday, September 21st, 2013

The SiteTruth system has rated new web sites over 4,000,000 times.  Users with our Ad Limiter and Ad Rater plug-ins drive the rating system, so each of those ratings is for a site that someone saw in a search result or an ad. Once a site has been rated by request, the rating is good for a month.  This is a count of sites rated or re-rated, not requests to SiteTruth. (That number is much higher.)

Rating number 4,000,000 is for Hardisty’s Homewares in Santa Rosa, California.

The trouble with online advertising

Sunday, March 17th, 2013

“The best minds of my generation are thinking about how to make people click ads. That sucks.”
Jeff Hammerbacher, Facebook

Facebook Confirms 83 Million Fake Accounts

Thursday, August 2nd, 2012

Facebook now admits they have at least 83 million fake accounts. That’s where all those fake “Likes” come from.

This is why social signals won’t improve search.  As we wrote last year, Social is bad for search, and search is bad for social. If it can be spammed, it will be spammed. Only hard data that advertisers can’t manipulate, the SiteTruth approach, can clean up search.

Social is bad for search, and search is bad for social

Sunday, November 13th, 2011

In the last two years, the concept of “social” inputs to web search has been heavily promoted. We show that social inputs to search encourage spamming to the point that search quality degrades. These attempts to pollute search are filling the “social” world with junk. An entire ecosystem has come into being to assist with search engine social spamming. Fighting this ecosystem is possible, but not easy.

Full paper: “Social is bad for search, and search is bad for social”.

Report on Google AdSense ads

Sunday, April 10th, 2011

Report on AdSense advertising domains for the 60 day period beginning 2011-02-09.

Site ownership verified.

  Normal:           1055   9.3%
  Blocked:            16   0.1%
  Non Commercial:    242   2.1%
TOTAL:              1313  11.6%

Site ownership identified but not verified.

Normal:             2741  24.1%
Blocked:               9   0.1%
No Location:         506   4.5%
Non Commercial:      229   2.0%
TOTAL:              3485  30.7%

Not rated.

No Website:          254   2.2%
Non Commercial:     2194  19.3%
TOTAL:              2448  21.6%

Site ownership unknown or questionable.

Blocked:              67   0.6%
Negative Info:         6   0.1%
No Location:        4036  35.5%
TOTAL:              4109  36.2%

Total advertised domains reported: 11355


SiteTruth collects data on Google AdSense ads, and measures the quality of the advertisers using SiteTruth’s usual methods. This data is collected by our AdRater browser plug-in, which rates ads as they appear. We use this data to monitor advertiser, not user, behavior.  Above are the results for a recent 60-day period. There is an English-language bias to this data, as our plug-in is offered only in English.

We saw 11355 different domains promoted via AdSense ads. These are the domains linked to by the ads, not the domain on which the ad appeared.

12% of sites get our highest rating, down from 14% in 2008.  36% of the ads rate our “site unknown or questionable” rating. This percentage has held constant since we first did this analysis in 2008.  21% of sites are not rated. These are typically blogs, or sites where we don’t see commercial activity or ads.  This is twice what we saw three years ago.

Entries marked “No Website” reflect domains that no longer have live web sites. There’s a considerable amount of churn in AdSense web sites. Less than a third of the web sites we’ve ever seen referenced in an AdSense ad are still live today.

“Blocked” sites are those with “robots.txt” files which forbid us from examining their contents. These are quite rare.

In summary, the nature of the Google AdSense customer base has not changed much in recent years.

Press coverage of web spam and Google

Tuesday, January 11th, 2011

Web search spam is now a public issue. Coverage has moved from Search Engine Watch to TechDirt to the New York Observer to the Atlantic to the New York Times. Google’s decline in search quality is now widely recognized. The popular press has difficulty pinpointing the problem. The unhappiness of users, even if not clearly expressed, comes through clearly.

We, of course, can fix this. That’s what SiteTruth is all about. Find the business behind the web site, use automated due diligence to rate that business based on hard information about the business, and use that rating to move the less legitimate businesses down in search results. We use information obtained from reliable sources such as corporate registrations, the U.S. Securities and Exchange Commission, Dun and Bradstreet, and the Better Business Bureau. Those sources are difficult to spam. That’s our patented technology.

Our position is clear: if a web site is selling something, which includes sites with advertising, the legitimacy of the business behind the web site matters.

“Places” spam – the new front in the spam wars.

Monday, December 27th, 2010

On October 27, 2010, Google released a major change in their primary search engine. For the first time, results from the “Google Places” system, previously confined to map-related searches, were merged into Google’s main search results. Search results now contain more information about local businesses, and those search results appear prominently, near the top of web search results.

Spamming Google search results is easier and cheaper since the merger of Google Places results into web search. In only two months, effective techniques for spamming Google Places have come into wide use. Search quality as perceived by users is deteriorating. Industry sources are critical of Google’s inability to deal with the problem.

We have issued a white paper on this topic.

“Places” spam – the new front in the spam wars.