freebasing

Notes on Freebase, the free database of everything

Archive for September, 2007

The email issue

The other day, while organising the Melbourne, Australia Freebase gathering I found myself wanting to know all the Freebase users in Melbourne and, ideally, their email addresses.

Well, the first part is easy enough. Here’s the MQL for it:

{
  "query":[{
    "location":"Melbourne, Australia",
    "name":null,
    "type":"/freebase/user_profile"
  }]
}

The email address, however, proved impossible. Freebase doesn’t store email addresses anywhere publicly available, and I guess I can see why. The potential for spam is greater even than putting an email address on an ordinary website, because the spammers wouldn’t even have to crawl the site. They could just programmatically ask for a list of email addresses:

{
  "query":[{
    "name":null,
    "type":"/internet/email_address"
  }]
}

Or, if they were actually trying to market something to people who cared, say women of a certain age group:

{
  "query":[{
    "name":null,
    "type":"/internet/email_address",
    "person": {
        "sex":"female",
        "a:date_of_birth>":1970,
        "b:date_of_birth<":1980,
    }
  }]
}

(Code is an example only, untested and untestable, and almost certainly syntactically incorrect in the date constraints because I don’t think dates work like that.)

Now I, for one, would actually prefer to get spam about drugs for menstrual cramps rather than ones offering me a larger penis, but most other people seem to disagree, and consider it a huge invasion of privacy to think of spammers being able to search on their personal details.

On the other hand, there are some valid reasons why you might want to store email addresses in Freebase. Contacting Freebase users is one reason: I would quite like other Freebasers to be able to email me if they want to get in touch privately, and I expect that many of those Melbourne users would’ve liked to know about the upcoming meeting. You’d want the publicity of a user’s email address to be configurable by them, and private by default, but it’s by no means an obscure use case.

Another reason is that some types might fundamentally be about email. I was modelling CPAN Authors a while ago, and one of the very few data points recorded about contributors to CPAN is their email address. That information is already available in a public database of sorts, so there’s no real expectation of privacy there. Or what about mailing lists? A model for “mailing list” might reasonably want to store “posting address”, “subscription address”, and “admin address”. Or it might be nice to store email contacts for political representatives, so that their constituents can email them about issues of importance.

I’m sure there are a thousand reasons to want to record email addresses in Freebase, and there’s only one reason not to… but it’s a biggie. I wonder what solutions to this problem will emerge?

Tags:
  • 2 Comments
  • Filed under: Data modeling
  • Melbourne (Australia) gathering

    I’m in the early stages of planning a Melbourne Freebase meetup/gathering/hack/play/etc session, to be held at Horse Bazaar in the CBD, probably on October 9th. More details to follow, but if you happen to be in Melbourne and are interested in attending, drop me a line at skud@infotrope.net (or comment here).

    Tags:
  • 0 Comments
  • Filed under: Community
  • Other Freebase blogs

    Just a few quick links to other blogs you should be reading if you’re interested in Freebase.

    • First up, The Freebase Blog is the official company blog. That’s where you’ll find Metaweb staff posting about various Freebase topics, from user group meetings to data modelling and FAQs.
    • Freebasics is a blog by one of the Metaweb guys, Rob (I presume Robert Cook but I’m not certain), and is also a general news/views/ideas blog. Unfortunately it hasn’t been updated lately.
    • Perl Goddess is a blog by synedra aka Kirsten Jones, a super-smart woman who I had the privilege of meeting while I was in the Bay Area last month. She’s just started working for Applied Minds on Freebase-related stuff, and has been blogging about it. I’m also hoping to get her to write some articles for freebasing and turn it into a bit of a group blog.

    And finally, you can see who else is talking about Freebase at Freebase blogs and coverage, a help topic in the system itself.

    Tags:
  • 0 Comments
  • Filed under: Community
  • “I KNOW LETS MAKE A TYPE FOR UNICODE CHARACTER!!!1″

    Too many types. Must sleep now.

    Tags:
  • 0 Comments
  • Filed under: Uncategorized
  • I just set up an IRC channel to discuss Freebase. It’s #freebase on irc.freenode.net.

    When I joined the channel I found manveru there and he said he’d seem me round. Most specificially, he alerted me to this:

    Featured contributors

    Yay me!

    Tags:
  • 0 Comments
  • Filed under: Community
  • Events future and past

    This is interesting. Freebase user Peter von Stackelberg has been working up a whole domain full of types related to Future Studies and Forecasting. I came across his Event type while considering historical events.

    I’m not any kind of formally trained historian, but I do enjoy history and I’m quite interested in it. When I consider historical events, I want to know when they happened, where they happened, who was involved, and whether they were part of any larger event. (For instance, the Battle of Trafalgar occurred during the Napoleonic Wars; the death of Horatio Nelson occurred during the Battle of Trafalgar.) I started out modelling these in Historical Event (skud’s types); see Battle of Trafalgar or Nuremberg Trials for an example of my type in use.

    It seems like Peter’s forecasting event has some of the same critieria, especially the “when” aspects and the ability to nest events within each other. However, he’s added fields such as “supports trend” and “opposes trend” for use in forecasting — his particular field of interest.

    Meanwhile, I’m considering adding “commemorative event” (eg. Trafalgar 200 for 200th anniversary of the Battle of Trafalgar) and “historic site”, possibly subtyping that using National Historic Site or Protected Site, or co-typing with something like Listed Site.

    All this makes me think that what we really need is an “Event” type which we can use as part of “Historic Event” or “Forecasting Event”. This will require further consideration.

    Tags:
  • 0 Comments
  • Filed under: Data modeling
  • New tool: RSS feed of new types

    I find myself wondering what everyone else on Freebase is up to, and to my mind, the best way to get a feel for that is by seeing what new types are created.

    The creation of a new type in the main domain hierarchy indicates a new area to build out content, while new user types show what people are messing around with, and where the nexuses (nexi? nexes?) of experimentation are.

    In both cases, if it’s something you’re interested in, then seeing it pop up on an RSS feed will trigger you to go check it out, join any discussion that’s going on, and take part in the work at hand.

    To that end, I’ve built an RSS feed of new Freebase types. It lists those in the main namespace as well as public user types.

    Subscribe Subscribe to the “New Types” feed

    You can also see the 10 newest types in the sidebar of this blog.

    Tags:
  • 1 Comment
  • Filed under: Tools
  • I recently spent a year working for a real estate search engine company that dealt with property listings in a number of different countries, ranging from England to Indonesia. While I was there I learnt a few things about location data which I think have some bearing on Freebase.

    The most important thing is that concepts of location are culturally dependent. In Australia, like the US, we have States. In Canada it’s provinces. In the UK it’s counties. In Fiji, they don’t have any such administrative divisions, just a bunch of islands. If you’ve ever tried to order something online to ship to another country you’ll know what it’s like. “State? I don’t have a state! And what’s this zipcode thing?”

    When the original version of the real estate search database was designed — long before my time — the people involved were only really thinking about Australia. They decided that every listing would have a suburb, a state, and a postcode of 4 digits. Obviously this soon started breaking when the company started spreading into other countries. When the new database design was made, just about the only thing they found common among all the culturally diverse ideas of location was this: Locations may contain, or be contained by, other locations. You can see this reflected in Freebase’s Location type — along with a few other attributes, such as “adjoins” and “area”.

    It’s only when you start getting into culturally specific ideas of location that you see things like “capital city” or “postal code” or “governor” — attributes that reflect anything other than the pure geometry of the space.

    I’ve been messing around a bit with Australian-specific location types, which include “Location” or, when appropriate, “Administrative Division”. You can see them here: Australian State, Australian Territory, and Australian Municipality (which I might rename to Local Government Area, I’m not sure.)

    Tags:
  • 5 Comments
  • Filed under: Data modeling
  • So, it’s a freebase blog.

    I wanted somewhere to blog about Freebase, Metaweb, and the like, so here it is.

    I’ll be your host for this blog. My name’s Kirrily Robert, commonly known as Skud, and you can find me on Freebase under that name. My interests so far on Freebase have included tall ships and Australian location data.

    Tags:
  • 0 Comments
  • Filed under: Uncategorized