Teh Xiggeh


Every Child Matters

Posted in Privacy & Rights by Xiggeh on July 19, 2006

What a nice, heart-warming sound byte from the UK government. Every child matters. But what exactly does it mean when the government tries to put it into practice? A lot of money, a lot of child surveillance, and not a lot of results.

The UK government keep a whole range of databases with information about children and young people. This information is shared between health, law enforcement, youth justice, social care and education agencies. Information about the children, and about their families, can be shared without the consent of the child or the parents. Information about other children collected from a child can be shared without consent of the child to whom it refers.

Why would the government want to keep such data? To identify children ‘at risk’. In my mind this is a good thing. When I was 16 a friend killed himself while drunk and on drugs. If this could have been avoided by people understanding that this child was at risk then that’s great. I don’t think this is the right way to go, however.

And hold on, the government have redefined the meaning of ‘at risk’ in a green paper ‘Every Child Matters’. Instead of being at risk of significant harm from abuse or neglect, it now means at risk of social exclusion, missing out on services or education, or of committing crime. So because little Jimmy chose not to take his GCSEs, or his mum forgot to apply for 30 pence a week Childrens’ Socks Tax Credits, Jimmy is ‘at risk’. If Jimmy got caught stealing sweets at 12 he’s flagged for life. Now this is starting to sound a little crazy, but it gets worse.

The data held about children is also used for a ‘predictive agenda’, identifying children from an early age whom agencies believe may commit a crime later in life. The criteria used to make this judgement includes poverty, getting bored easily, being a victim of bullying, truanting, having a parent with mental health problems and living in a deprived area. So if Jimmy’s dad worked for Rover, or lives in a lower-income postcode, or Jimmy gets bored a lot at school or gets repeatedly bullied for wearing the wrong jeans to school, he is considered likely to commit crime and is monitored. That, to me, seems crazy. Jimmy most likely gets bored at school because he’s very intelligent and the curriculum doesn’t stimulate his mind. His dad doesn’t have much money because Rover laid him off. Etc, etc.

Who else shares this data? Well Connexions, the support group for children and teenagers, logs everything from the moment a child walks in the door. That data is shared with government agencies and ties into the other databases mentioned at the beginning of this post. With or without the child’s consent. If, without thinking, Jimmy said “I think my dad is an alcoholic” because he drinks beer on Saturdays .. well I’ll let you imagine the rest of that scenario.

And just to top it all off, these databases will also be used to monitor youngsters who are regular smokers, youngsters who consume alcohol, oh and children not consuming 5 portions of fruit & veg a day.

In a day-and-age where children aren’t allowed to play conkers, British Bulldog or paper aeroplanes because they’re too dangerous, teachers aren’t allowed to put suncream on students because they’re probably all child molestors anyway, and children have to wear jumpers in 36degC heat incase their pastey-white skin goes a funny colour, well I think the country’s gone mad.

Can u fix my PC pls?lolz!1

Posted in Funny Stories by Xiggeh on July 17, 2006

To cut a long story short, I drove my sister up to London today. She suddenly realised she couldn’t afford the petrol, and I don’t get paid until tomorrow, so I had to pretend I forgot my wallet at the petrol station. We got to London and, oh while I’m here, could I set up her new computer.

My normal rule is I don’t help with computer questions. If it’s a friend, I charge a small fee. If it’s family, it’s free. A new computer isn’t hard to set up, so I said sure. I should’ve learnt my lesson with clients, but what she actually meant was:

  • Copy all the files off her old 486
  • Copy all the files, e-mails and bookmarks off her old iMac
  • Remove all hardware, cables, etc (1.5 hrs, it was a mess!)
  • Setup a second-hand Dell machine
  • Remove all the adware and keyloggers
  • Install MS Office
  • Install a dial-up modem
  • Rewire the phones in the house
  • Pickup broadband kit from the Post Office
  • Uninstall dial-up modem
  • Install broadband kit
  • Rewire phones in the house
  • Setup a network
  • Install a printer
  • Test the printer
  • Order new cartridges

So the whole thing took 13 hours including driving. I didn’t get a cool drink, I got half a sandwich for lunch, and temperatures are approaching the 40s. I’m thrashed.

But my mate’s just cooked me dinner, I’m about to have a cold shower and several very cold lagers and retire for the day. Looks like I’ll be going back to work tomorrow for a holiday.

At least I don’t have to meet up with the family again until Christmas…

Microsoft vs. Google

Posted in Search Engines by Xiggeh on July 14, 2006

According to an article at El Reg, Google have finally awoken the beast with a threat of encroaching on Microsoft’s territory. Kevin Turner, Microsoft COO is quoted as saying:

“Enterprise search is our business, it’s our house and Google is not going to take that business. Those people are not going to be allowed to take food off our plate, because that is what they are intending to do.”

Not allowed? Thems fighting words partner.

But hold on, Microsoft doesn’t own the enterprise search business. In fact they hardly dent the market, there are clear leaders and Microsoft isn’t one of them. Could it be using Google to become a serious contender? Or is it all bluster?

Google Error Message

Posted in Search Engines by Xiggeh on July 14, 2006

There’s been quite a lot of talk about an error message discovered on a Google repository server, which has been confirmed as real by Matt Cutts. Here’s the error message in question:

pacemaker-alarm-delay-in-ms-overall-sum 2341989
pacemaker-alarm-delay-in-ms-total-count 7776761
cpu-utilization 1.28
cpu-speed 2800000000
timedout-queries_total 14227
num-docinfo_total 10680907
avg-latency-ms_total 3545152552
num-docinfo_total 10680907
num-docinfo-disk_total 2200918
queries_total 1229799558
e_supplemental=150000 –pagerank_cutoff_decrease_per_round=100 –pagerank_cutoff_increase_per_round=500 –parents=12,13,14,15,16,17,18,19,20,21,22,23 –pass_country_to_leaves –phil_max_doc_activation=0.5 –port_base=32311 –production –rewrite_noncompositional_compounds –rpc_resolve_unreachable_servers –scale_prvec4_to_prvec –sections_to_retrieve=body+url+compactanchors –servlets=ascorer –supplemental_tier_section=body+url+compactanchors –threaded_logging –nouse_compressed_urls –use_domain_match –nouse_experimental_indyrank –use_experimental_spamscore –use_gwd –use_query_classifier –use_spamscore –using_borg”

How revealing is that! Unfortunately Google say they’ve put procedures in place to stop it happening again, but I’m determined to enjoy it while it lasts.

Here’s my take on the whole deal:

pacemaker-alarm: I don’t know what this is, but it sounds like a system Google have in place to keep everything up and running. It’s an alarm, so it has a trigger. And it’s a pacemaker, so it may be triggered to prevent request timeouts. And if these numbers are right, it looks like it takes 0.301ms to trigger, and triggers on 0.6% of queries.

cpu-utilization: I assume this is a number in the same format as your standard *nix load readout. Now it could be an average over 1 minute, 5 minutes, 15 minutes, or something random Google considers important, but whatever the timescale there are 1.28 processes queued for processing on average.

cpu-speed: 2800000000 works out to be 2.8GHz – could this be a Pentium 4 with the 533MHz FSB? Or maybe Intel Xeon 2.8GHz. It’s not likely to be AMD (yet).

timedout-queries: Well it looks like queries can timeout, so maybe that’s not what the pacemaker alarm is for. But on this server 14,227 queries have timed out, which works out to be 0.0012% of total queries.

num-docinfo-total: Most likely the total number of documents stored on this particular box. If we do some number crunching with other values in the message, it looks like the average document size is 4.85KiBi. Makes sense to me, and I assume Google are still compressing documents in the repository.

avg-latency-ms_total: Could this be the average latency per query? In that case it works out to be 2.88ms per query on average.

queries_total: Fucking hell! 1.2 billion queries, presumably on that box alone.

Now all that stuff above seems to be debug values returned from the server. The next lot seems to be settings on the box;

pagerank_cutoff: Increase and decrease per round? Now we now Google needs to go through 40-60 iterations of the PageRank algorithms to get vaguely accurate figures, and documents would need to be pulled from the repository (which is what this box does). This could relate to the maximum increase/decrease in PR per iteration. It would certainly prevent drastic variations per iteration. What’s interesting are the values – “100” and “500”. Now we see PR values 1-10 (which we know is just a fluffy number which is almost meaningless). Google patents and research papers refer to the Internet having a total PR of 1. So could these values be percentages?

parents: Every good server needs a parent, and every good system needs a backup. It appears this box has 12 sequentially-numbered parent servers, probably to assign workloads and retrieve documents.

port_base: Well I tried a port scan (sorry Google), but I really didn’t expect to find anything. Could servers in the GooglePlexi communicate in the 32000 port range?

production: Obviously, as we’re using it. It looks like Google can switch servers between testing and production at the click of a mouse.

rewrite_noncompositional_compounds: Non-Compositional Compounds (NCC) are phrases such as “hot dog” or “cold turkey” that make absolutely no sense when split up. When a computer is expected to understand the meanings of a document, and find related key terms, NCCs need to be extracted and treated differently.

scale_prvec4_to_prvec: I’m still working on the math on this one. Vectors scare me and I spent too much time on IRC at school.

sections_to_retrieve: Appears to be a list of document elements to return, either when a user requests a cached document, or when the indexer requests documents for scoring.

supplemental_tier_section: As above, but for supplemental documents?

servlets=ascorer: I don’t know what the ‘ascorer’ is, but it’s live. What begins with ‘a’ that Google would want to score? Hah, what doesn’t Google want to score 😉

threaded_logging: That’s some serious logging going on, although I’d probably do the same.

nouse_experimental_indyrank: I can’t think what “IndyRank” might be, what it would rank or how. I want to know though 🙂

use_experimental_spamscore: No surprises there. The “bad data push” (uh huh) caused a helluva ruckus with spam in the results, and the problem is slowly going away. This appears to be our knight in shining armour.

use_gwd: Google Web Directory?

use_query_classifier: I believe this is related to Google’s OneBox results (e.g. health, stocks, companies, and so forth)

use_spamscore: Obviously this is the older method of calculating spammer pages. Didn’t work so great now, did it?

phil_max_doc_activation: Maximum document activation? Not sure on this one. I looked up some Phils at Google. We have a Phil Winterbottom who’s published some papers on Plan 9 from Bell Labs – a distributed system build from terminals, CPU servers and file servers, but that doesn’t fit.

And now I’ve finished writing this, I found another good interpretation of this error message over at Stuntdubl. It’s interesting how he’s taken the values to be related to the cached document on the server, and I assumed they were values related to the server itself.

What are your thoughts? Am I tapping away at the vague truth, or am I on the wrong track completely?

And can I have a job at Google Ireland please? 🙂

Unplug the keyboard …

Posted in Funny Stories by Xiggeh on July 14, 2006

A client phoned up a few weeks ago, she was having problems with her keyboard. Apparently anything she pressed had no effect. I asked her if it was plugged in properly to which she replied (rather annoyed) “Yes yes yes of course”. I thought the best way to check was to ask her to unplug the keyboard, and plug it back in firmly.

“It’s quite tough,” she said. “I can’t unplug the keyboard.”

“Pull a bit harder,” I said. “These things can get a bit stuck over time.”

“Oh I’ve managed it. Where do I put all the wires?”

When I sent an engineer over that afternoon, it turns out she pulled the wire out the back of the keyboard, instead of unplugging it from the PC. It’s a new one on me.

« Previous Page