Teh Xiggeh

Google Error Message

Posted in Search Engines by Xiggeh on July 14, 2006

There’s been quite a lot of talk about an error message discovered on a Google repository server, which has been confirmed as real by Matt Cutts. Here’s the error message in question:

pacemaker-alarm-delay-in-ms-overall-sum 2341989
pacemaker-alarm-delay-in-ms-total-count 7776761
cpu-utilization 1.28
cpu-speed 2800000000
timedout-queries_total 14227
num-docinfo_total 10680907
avg-latency-ms_total 3545152552
num-docinfo_total 10680907
num-docinfo-disk_total 2200918
queries_total 1229799558
e_supplemental=150000 –pagerank_cutoff_decrease_per_round=100 –pagerank_cutoff_increase_per_round=500 –parents=12,13,14,15,16,17,18,19,20,21,22,23 –pass_country_to_leaves –phil_max_doc_activation=0.5 –port_base=32311 –production –rewrite_noncompositional_compounds –rpc_resolve_unreachable_servers –scale_prvec4_to_prvec –sections_to_retrieve=body+url+compactanchors –servlets=ascorer –supplemental_tier_section=body+url+compactanchors –threaded_logging –nouse_compressed_urls –use_domain_match –nouse_experimental_indyrank –use_experimental_spamscore –use_gwd –use_query_classifier –use_spamscore –using_borg”

How revealing is that! Unfortunately Google say they’ve put procedures in place to stop it happening again, but I’m determined to enjoy it while it lasts.

Here’s my take on the whole deal:

pacemaker-alarm: I don’t know what this is, but it sounds like a system Google have in place to keep everything up and running. It’s an alarm, so it has a trigger. And it’s a pacemaker, so it may be triggered to prevent request timeouts. And if these numbers are right, it looks like it takes 0.301ms to trigger, and triggers on 0.6% of queries.

cpu-utilization: I assume this is a number in the same format as your standard *nix load readout. Now it could be an average over 1 minute, 5 minutes, 15 minutes, or something random Google considers important, but whatever the timescale there are 1.28 processes queued for processing on average.

cpu-speed: 2800000000 works out to be 2.8GHz – could this be a Pentium 4 with the 533MHz FSB? Or maybe Intel Xeon 2.8GHz. It’s not likely to be AMD (yet).

timedout-queries: Well it looks like queries can timeout, so maybe that’s not what the pacemaker alarm is for. But on this server 14,227 queries have timed out, which works out to be 0.0012% of total queries.

num-docinfo-total: Most likely the total number of documents stored on this particular box. If we do some number crunching with other values in the message, it looks like the average document size is 4.85KiBi. Makes sense to me, and I assume Google are still compressing documents in the repository.

avg-latency-ms_total: Could this be the average latency per query? In that case it works out to be 2.88ms per query on average.

queries_total: Fucking hell! 1.2 billion queries, presumably on that box alone.

Now all that stuff above seems to be debug values returned from the server. The next lot seems to be settings on the box;

pagerank_cutoff: Increase and decrease per round? Now we now Google needs to go through 40-60 iterations of the PageRank algorithms to get vaguely accurate figures, and documents would need to be pulled from the repository (which is what this box does). This could relate to the maximum increase/decrease in PR per iteration. It would certainly prevent drastic variations per iteration. What’s interesting are the values – “100” and “500”. Now we see PR values 1-10 (which we know is just a fluffy number which is almost meaningless). Google patents and research papers refer to the Internet having a total PR of 1. So could these values be percentages?

parents: Every good server needs a parent, and every good system needs a backup. It appears this box has 12 sequentially-numbered parent servers, probably to assign workloads and retrieve documents.

port_base: Well I tried a port scan (sorry Google), but I really didn’t expect to find anything. Could servers in the GooglePlexi communicate in the 32000 port range?

production: Obviously, as we’re using it. It looks like Google can switch servers between testing and production at the click of a mouse.

rewrite_noncompositional_compounds: Non-Compositional Compounds (NCC) are phrases such as “hot dog” or “cold turkey” that make absolutely no sense when split up. When a computer is expected to understand the meanings of a document, and find related key terms, NCCs need to be extracted and treated differently.

scale_prvec4_to_prvec: I’m still working on the math on this one. Vectors scare me and I spent too much time on IRC at school.

sections_to_retrieve: Appears to be a list of document elements to return, either when a user requests a cached document, or when the indexer requests documents for scoring.

supplemental_tier_section: As above, but for supplemental documents?

servlets=ascorer: I don’t know what the ‘ascorer’ is, but it’s live. What begins with ‘a’ that Google would want to score? Hah, what doesn’t Google want to score 😉

threaded_logging: That’s some serious logging going on, although I’d probably do the same.

nouse_experimental_indyrank: I can’t think what “IndyRank” might be, what it would rank or how. I want to know though 🙂

use_experimental_spamscore: No surprises there. The “bad data push” (uh huh) caused a helluva ruckus with spam in the results, and the problem is slowly going away. This appears to be our knight in shining armour.

use_gwd: Google Web Directory?

use_query_classifier: I believe this is related to Google’s OneBox results (e.g. health, stocks, companies, and so forth)

use_spamscore: Obviously this is the older method of calculating spammer pages. Didn’t work so great now, did it?

phil_max_doc_activation: Maximum document activation? Not sure on this one. I looked up some Phils at Google. We have a Phil Winterbottom who’s published some papers on Plan 9 from Bell Labs – a distributed system build from terminals, CPU servers and file servers, but that doesn’t fit.

And now I’ve finished writing this, I found another good interpretation of this error message over at Stuntdubl. It’s interesting how he’s taken the values to be related to the cached document on the server, and I assumed they were values related to the server itself.

What are your thoughts? Am I tapping away at the vague truth, or am I on the wrong track completely?

And can I have a job at Google Ireland please? 🙂


13 Responses to 'Google Error Message'

Subscribe to comments with RSS or TrackBack to 'Google Error Message'.

  1. Stuart said,

    I’ve covered my own theories on the mysterious ‘Indyrank’ on my site here:
    No idea how close to the mark I am though 😀

  2. Xiggeh said,

    I think you mean http://www.modernlifeisrubbish.co.uk/google-indyrank-theories.asp

  3. Stuart said,

    I do. Certification of my buffoonery, were it needed.

  4. […] Detlev’s explanation of the error is by far the best I’ve seen, and I would guess that he has came closest to correct of what the error actually meant. Another pretty good guestimation of the errors is available from Teh Xiggeh. […]

  5. Had I seen this page, I wouldn’t have felt the need to spend the time to write up my own blathering article about this. By the way, it seems to me that vectors are often used in machine learning contexts in order to be able to categorize a certain parameter space with a simpler boundaries, and possibly to separate different categories in cases where fewer dimensions wouldn’t cut it.

  6. xiggeh said,

    Wesley – that’s exactly why vectors scare me. Categories are one thing, we also know Google remember what you were doing X years ago with domain ownership, Google accounts and pages crawled. Imagine what else we know, and what we don’t know, and that’s a lot of vectors! Multi-dimensional arrays confused me at school for a while, I know I’m going to struggle with this one.

  7. […] There’s been plenty of speculation posted to the blogosphere on the recently discovered cryptic Google error message; my favorites being from Wesley Tanaka and from Teh Xiggeh. […]

  8. […] We noticed the “using_borg” string appear in the Google error message last week, and now the phrase has come back again through another Google leak. To quote Garett Rogers at ZDNet: “When checking out Google’s impressive second quarter on Google Finance today, I stumbled across something that leads me to believe they are testing “version 2″ of Google Finance” […]

  9. […] [+] Understanding Search Engine Technology [+] Google Error Message […]

  10. […] There’s been plenty of speculation posted to the blogosphere on the recently discovered cryptic Google error message; my favorites being from Wesley Tanaka and from Teh Xiggeh. […]

  11. DougieFresh said,

    IndyRank? Hmmm…..Maybe it’s the speed at which the page downloads?…….Or It’s Placement in the final ranking?

  12. Intimately, the article is in reality the greatest topic on the best registry cleaner in 2009. I agree with your conclusions and will thirstily look forward to your future updates. Saying thanks will not just be enough, for the wonderful clarity in your writing. I will instantly grab your rss feed to stay abreast of any updates.

  13. The racing games are more popular among the children. Aside from
    that, Look at the bright side, one does not have to waste time at home doing nothing.

    The promise of desktops and video gaming as tutors was clearly recognized in the eighties
    when there was a national push to have laptops in to the classrooms.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: