Spidering del.icio.us data

Christoph Erhardt, a student I advise is currently working on a new tool
for browsing tag clouds.
For this he downloaded about 1700 sets of web site + tags from
del.icio.us. As the official del.icio.us API only gives you
access to your own sites+tags, the student had to parse the html from the
web site.
He discovered the following limitations:

  • In the summary for a web site del.icio.us only shows tags which are used
    by at least two persons. If you also want tags that only have been
    assigned by one person, you have to look at the tags which each single
    person has assigned.
  • Sometimes you only get "There is no del.icio.us history for this url"
    when accessing a bookmark. If you wait a little bit and reload, it usually
    works again.
  • One web site had been tagged by over 24,000 people. This always lead to
    a timeout when loading the detail page for this bookmark. Might be a
    problem of the parser library, though.
  • After about 400 accesses del.icio.us thinks that is enough. You get a
    HTTP 999 error code until you change your IP.
Advertisement
Previous Post
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: