The other day, while organising the Melbourne, Australia Freebase gathering I found myself wanting to know all the Freebase users in Melbourne and, ideally, their email addresses.

Well, the first part is easy enough. Here’s the MQL for it:

{
  "query":[{
    "location":"Melbourne, Australia",
    "name":null,
    "type":"/freebase/user_profile"
  }]
}

The email address, however, proved impossible. Freebase doesn’t store email addresses anywhere publicly available, and I guess I can see why. The potential for spam is greater even than putting an email address on an ordinary website, because the spammers wouldn’t even have to crawl the site. They could just programmatically ask for a list of email addresses:

{
  "query":[{
    "name":null,
    "type":"/internet/email_address"
  }]
}

Or, if they were actually trying to market something to people who cared, say women of a certain age group:

{
  "query":[{
    "name":null,
    "type":"/internet/email_address",
    "person": {
        "sex":"female",
        "a:date_of_birth>":1970,
        "b:date_of_birth<":1980,
    }
  }]
}

(Code is an example only, untested and untestable, and almost certainly syntactically incorrect in the date constraints because I don’t think dates work like that.)

Now I, for one, would actually prefer to get spam about drugs for menstrual cramps rather than ones offering me a larger penis, but most other people seem to disagree, and consider it a huge invasion of privacy to think of spammers being able to search on their personal details.

On the other hand, there are some valid reasons why you might want to store email addresses in Freebase. Contacting Freebase users is one reason: I would quite like other Freebasers to be able to email me if they want to get in touch privately, and I expect that many of those Melbourne users would’ve liked to know about the upcoming meeting. You’d want the publicity of a user’s email address to be configurable by them, and private by default, but it’s by no means an obscure use case.

Another reason is that some types might fundamentally be about email. I was modelling CPAN Authors a while ago, and one of the very few data points recorded about contributors to CPAN is their email address. That information is already available in a public database of sorts, so there’s no real expectation of privacy there. Or what about mailing lists? A model for “mailing list” might reasonably want to store “posting address”, “subscription address”, and “admin address”. Or it might be nice to store email contacts for political representatives, so that their constituents can email them about issues of importance.

I’m sure there are a thousand reasons to want to record email addresses in Freebase, and there’s only one reason not to… but it’s a biggie. I wonder what solutions to this problem will emerge?

Tags:
Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon