07 March 2007

Gargoyle - South African Blog Search

Blogging off the back of Mnr. Stopforth. And trying to add value and not rehash - practice what I preach yeah? Introducing Gargoyle. A South African / African blog search engine. Very interesting.

Here's my 10c (smallest coin we've got I gather). Nice idea - especially when you consider the volume out there. There's just too much - both competitively and in terms of finding what you want. I've always been an advocate of an African conglomeration of content - both for our Social Media Ranking Tools, our sites and our bloggers.


I'm honoured to be included (Cowboys, All Scrubbed Up & Paki's Corner). And wonder how that happened? Someone has just GOT to write a little bot which hops between links of SA blogs when it finds a post containing suspiciously South African keywords. Like...

* South Africa. Duh
* Kief
* Doos
* Jeek (can ANYONE else actually be using this?)
* Biltong
* Danny K (ye gods)
* Muti
* 27
* TBG

THAT would build you a search index of uncompromising quality. :)

Anyway, distractions aside, it looks like Mnr. Peter Hart-Davis most likly took the networking route - and got the first 20-30 that sprang to mind.

Some comments on the engine itself. Mike is right. It's blindingly fast - helps to be on a local server. The next PING OFF starts to suggest itself... Search Engines.

It only seems to link to the actual article on the blog, some of the time. Which is odd. I searched for '27 Dinner' for instance - nice keyword, used in many local blogs one would think. It gets all the keyword appearences but only sometimes links through to the target article.

What really blows my non-mathematical hair back is the explanation of the algorithm. THIS is a noble open source gesture - giving away your index maths. And a decimal point nightmare. Looks cool anyway...

page
segment = 20070307173548
digest = ce2eb00f9a21d89504df31d7b6282ee5
url = http://www.chilibean.co.za/2007/01/28/the-27dinner-on-27-january-2007/
title = The 27dinner on 27 January 2007 chilibean
lastModified = 1173281740000
primaryType = text
subType = html
boost = 0.5

score for query: 27 dinner
0.17441347 = sum of:
0.17236301 = sum of:
0.03751689 = weight(url:27^4.0 in 177), product of:
0.26228118 = queryWeight(url:27^4.0), product of:
4.0 = boost
5.2312036 = idf(docFreq=4)
0.012534456 = queryNorm
0.14304072 = fieldWeight(url:27 in 177), product of:
1.0 = tf(termFreq(url:27)=1)
5.2312036 = idf(docFreq=4)
0.02734375 = fieldNorm(field=url, doc=177)
0.0016185167 = weight(content:27 in 177), product of:
0.022446524 = queryWeight(content:27), product of:
1.7907857 = idf(docFreq=155)
0.012534456 = queryNorm
0.07210545 = fieldWeight(content:27 in 177), product of:
4.1231055 = tf(termFreq(content:27)=17)
1.7907857 = idf(docFreq=155)
0.009765625 = fieldNorm(field=content, doc=177)
0.13322762 = weight(title:27^1.5 in 177), product of:
0.11558324 = queryWeight(title:27^1.5), product of:
1.5 = boost
6.1474943 = idf(docFreq=1)
0.012534456 = queryNorm
1.1526551 = fieldWeight(title:27 in 177), product of:
1.0 = tf(termFreq(title:27)=1)
6.1474943 = idf(docFreq=1)
0.1875 = fieldNorm(field=title, doc=177)
0.0012596376 = weight(content:dinner in 177), product of:
0.03381178 = queryWeight(content:dinner), product of:
2.697507 = idf(docFreq=62)
0.012534456 = queryNorm
0.0372544 = fieldWeight(content:dinner in 177), product of:
1.4142135 = tf(termFreq(content:dinner)=2)
2.697507 = idf(docFreq=62)
0.009765625 = fieldNorm(field=content, doc=177)
7.908132E-4 = weight(content:"27 dinner"~2147483647 in 177), product of:
0.05625831 = queryWeight(content:"27 dinner"~2147483647), product of:
4.4882927 = idf(content: 27=155 dinner=62)
0.012534456 = queryNorm
0.014056824 = fieldWeight(content:"27 dinner" in 177), product of:
0.3207052 = tf(phraseFreq=0.10285183)
4.4882927 = idf(content: 27=155 dinner=62)
0.009765625 = fieldNorm(field=content, doc=177)



Yowsers.

The cached version of each result is also interesting. Whereas Google tends to strip some of the more offending formatting out of a cached page - Gargoyle leaves in it, style sheet and all. If this is a proper cache (ie a stored version of the page on the server) surely it's going to become a bit of a space nightmare fairly shortly? Well, at least space is not as expensive as bandwidth in this country.

I could blog more. But a man's gotta eat. Thanks for the listing Peter and I look forward to seeing this thing grow. I'll see if I can remember to check tomorrow and see if your bot has picked up this new post :)