Gargoyle - South African Blog Search

Blogging off the back of Mnr. Stopforth. And trying to add value and not rehash - practice what I preach yeah? Introducing Gargoyle. A South African / African blog search engine. Very interesting.

Here's my 10c (smallest coin we've got I gather). Nice idea - especially when you consider the volume out there. There's just too much - both competitively and in terms of finding what you want. I've always been an advocate of an African conglomeration of content - both for our Social Media Ranking Tools, our sites and our bloggers.


I'm honoured to be included (Cowboys, All Scrubbed Up & Paki's Corner). And wonder how that happened? Someone has just GOT to write a little bot which hops between links of SA blogs when it finds a post containing suspiciously South African keywords. Like...

* South Africa. Duh
* Kief
* Doos
* Jeek (can ANYONE else actually be using this?)
* Biltong
* Danny K (ye gods)
* Muti
* 27
* TBG

THAT would build you a search index of uncompromising quality. :)

Anyway, distractions aside, it looks like Mnr. Peter Hart-Davis most likly took the networking route - and got the first 20-30 that sprang to mind.

Some comments on the engine itself. Mike is right. It's blindingly fast - helps to be on a local server. The next PING OFF starts to suggest itself... Search Engines.

It only seems to link to the actual article on the blog, some of the time. Which is odd. I searched for '27 Dinner' for instance - nice keyword, used in many local blogs one would think. It gets all the keyword appearences but only sometimes links through to the target article.

What really blows my non-mathematical hair back is the explanation of the algorithm. THIS is a noble open source gesture - giving away your index maths. And a decimal point nightmare. Looks cool anyway...

page
segment = 20070307173548
digest = ce2eb00f9a21d89504df31d7b6282ee5
url = http://www.chilibean.co.za/2007/01/28/the-27dinner-on-27-january-2007/
title = The 27dinner on 27 January 2007 chilibean
lastModified = 1173281740000
primaryType = text
subType = html
boost = 0.5

score for query: 27 dinner
0.17441347 = sum of:
0.17236301 = sum of:
0.03751689 = weight(url:27^4.0 in 177), product of:
0.26228118 = queryWeight(url:27^4.0), product of:
4.0 = boost
5.2312036 = idf(docFreq=4)
0.012534456 = queryNorm
0.14304072 = fieldWeight(url:27 in 177), product of:
1.0 = tf(termFreq(url:27)=1)
5.2312036 = idf(docFreq=4)
0.02734375 = fieldNorm(field=url, doc=177)
0.0016185167 = weight(content:27 in 177), product of:
0.022446524 = queryWeight(content:27), product of:
1.7907857 = idf(docFreq=155)
0.012534456 = queryNorm
0.07210545 = fieldWeight(content:27 in 177), product of:
4.1231055 = tf(termFreq(content:27)=17)
1.7907857 = idf(docFreq=155)
0.009765625 = fieldNorm(field=content, doc=177)
0.13322762 = weight(title:27^1.5 in 177), product of:
0.11558324 = queryWeight(title:27^1.5), product of:
1.5 = boost
6.1474943 = idf(docFreq=1)
0.012534456 = queryNorm
1.1526551 = fieldWeight(title:27 in 177), product of:
1.0 = tf(termFreq(title:27)=1)
6.1474943 = idf(docFreq=1)
0.1875 = fieldNorm(field=title, doc=177)
0.0012596376 = weight(content:dinner in 177), product of:
0.03381178 = queryWeight(content:dinner), product of:
2.697507 = idf(docFreq=62)
0.012534456 = queryNorm
0.0372544 = fieldWeight(content:dinner in 177), product of:
1.4142135 = tf(termFreq(content:dinner)=2)
2.697507 = idf(docFreq=62)
0.009765625 = fieldNorm(field=content, doc=177)
7.908132E-4 = weight(content:"27 dinner"~2147483647 in 177), product of:
0.05625831 = queryWeight(content:"27 dinner"~2147483647), product of:
4.4882927 = idf(content: 27=155 dinner=62)
0.012534456 = queryNorm
0.014056824 = fieldWeight(content:"27 dinner" in 177), product of:
0.3207052 = tf(phraseFreq=0.10285183)
4.4882927 = idf(content: 27=155 dinner=62)
0.009765625 = fieldNorm(field=content, doc=177)



Yowsers.

The cached version of each result is also interesting. Whereas Google tends to strip some of the more offending formatting out of a cached page - Gargoyle leaves in it, style sheet and all. If this is a proper cache (ie a stored version of the page on the server) surely it's going to become a bit of a space nightmare fairly shortly? Well, at least space is not as expensive as bandwidth in this country.

I could blog more. But a man's gotta eat. Thanks for the listing Peter and I look forward to seeing this thing grow. I'll see if I can remember to check tomorrow and see if your bot has picked up this new post :)

Comments

  1. Hi Andy, thanks for the feedback, one pounds away on something and it is difficult to get any kind of distance from what one is working on.

    I seeded both tools initially from the 27Dinner attendees and I have sort of randomly branched out from there. Very much a manual process at present.

    As I mentioned on Mike's blog I think refinement will be the key to the quality of the results. They are very much 'works in progress' and the intention is to encourage a 'participatory culture' where possible. Particularly with regard to the Knowledge bank which I think has great potential.

    Thanks again
    Peter Hart-Davis

    ReplyDelete
  2. Peter - it's the right way to do things mate - collaboration. Muti seems to be making a good run of that theory at the moment. Perhaps something on your site to allow easy, categorised submission of ideas? (something I feel Muti is missing - it's more than just a contact us form...)

    With your apparent coding skills - would love to know if you could code the jokebot i spoke about. How to index African content is a tricky one, seriously speaking... People host everywhere, so you couldn't go on domains. Content would be the way - and some way to spider through it.

    Will check out the knowledge bank... Keep it up and let me know how I can help...

    ReplyDelete
  3. Hi again, good ideas, unfortunately my implementation skills are far better than my coding skills. I will see if I can track down something to implement the jokebot :)_

    ReplyDelete
  4. Ah yes, I suffer from the same problem.

    ReplyDelete

Post a Comment

Popular Posts