Snippets
Chamberlain: A User-Serving Model for Identity Management - 11-05-2009
Dr. Ernie writes about user-centric identity management in the wake ...
Best Warning Message Ever - 09-30-2009
From my diagnostic warning messages today: Instantiating NSNavExpansionButtonCell (superclass of ...
Toy Chest: Online Tools for Non-Programmers - 08-12-2009
UC Santa Barbara maintains The Toy Chest: a great list ...
Topia TermExtract - 08-12-2009
This little library looks fun - Topia TermExtract applies a ...
Terminology Watch: Log or Sign In? - 07-31-2009
Tanin Ehrami took the time to collate what terms are ...
Making the Web Smarter - 07-31-2009
Fred Wilson writes about the new Common Tag consortium. The ...
Curating the Real-Time Web - 07-31-2009
A catchy tag phrase from publish2. It's another way to ...
Shiny boxes for bits - 06-12-2009
From Core77, a design review of recent "digital talisman" products. ...
Shadow Physics Game - 06-12-2009
Intruiging demo video of a platformer where you control the ...
Hive: Petabyte-scale data warehousing on Hadoop - 06-12-2009
An engineering note from Facebook about Hive, their Hadoop-based data ...

Contacts in the Browser video

05-27-2010

We put together a video where I explain how Contacts works with your browser. Enjoy!

(Original at Vimeo)

How Firefox Contacts Auto-Discovery Works

04-22-2010

In version 0.3 of Firefox Contacts, I wanted to explore how the web browser could gather more information about a person to help create a personalized contact page.

What the software now does is this: When a contact page is loaded, either through the "Contacts" management screen, or by loading a person: URL, Firefox invokes a set of discovery modules on it. Each of these discovery modules inspects the person and tries to decide whether it can add anything to the person; in many cases, this fires off secondary network requests, which, when they complete, can start the whole ball rolling again. So what you see is little bits of information coming into the browser as more links are found.

To get specific, here's the list of discovery modules that are currently supported:

Webfinger
Webfinger works by converting an email address to an HTTP URL, loading that URL, and looking for Links to other resources. It can be supported on any website by the simple addition of a metadata file at the root of the website. Contacts runs through the list of links it finds through Webfinger and adds them to the "urls" property of the person.
Google Social Graph
Google has a little-known project called the Social Graph API. It is a specialized index of the web that identifies links between "people pages". Their page does a more thorough job of explaining it than I can, but the basic idea is that it identifies links between profile pages and provides a way to search them. It works on public pages only.
HCard
The HCard importer works by loading any page that Contacts thinks might contain personal data marked up in the HCard microformat. This could be because it was a Link with a rel of http://microformats.org/profile/hcard, discovered through webfinger, or because it is in the "urls" list of the record and is known to be an HCard provider - right now, I've got digg, twitter, status.net, blogger, and linkedin tagged, but that list could certainly grow. The HCard importer is built in to Firefox and should be able to parse all of the metadata defined in the spec, though not all of it is visualized on the contact page yet.
Proprietary interfaces: Gravatar, Flickr, Yelp, and Amazon
With these importers, I wanted to demonstrate how a proprietary interface could work. In most cases, they simply traverse the list of email addresses and ask the site whether there is an account with that address. In the case of Yelp, the search only works if you are logged in, because Yelp requires an authenticated session to perform email-based discovery. That is an important point: In every case, the agent performing the discovery is you, not some search engine or social networking site. The browser is the ultimate point of trust for any user-to-user display of data, and by connecting directly from the browser, we can avoid a bunch of privacy-eroding disclosures.

Want to hack on it?

If you would like to experiment with automatic discovery of data, you can start right away. Here's a couple ideas:

Create an HCard-capable page
status.net and Google Profiles are HCard-enabled today. If you create a status.net account and associate it with your record (e.g. by adding a URL field to your contact), the information in that field will automatically be added to your record.
Webfinger-enable your site
If you operate a website and host an email address at that site, you can webfinger-enable yourself by simply creating a host-meta file that points to your user directory and placing an XRD file there. You can put anything you want into the XRD file. (This is an interesting enough idea that I'll have to write a HOWTO for it - unless somebody else out there has already done it?) You can password-protect the XRD file if you only want to disclose it to your friends.
Create links between your pages
If you have, say, a Facebook page and a public page somewhere else, you can add a link on each of them that points to the other. That will allow people that can view your page to reach the other page automatically -- so, for example, your Facebook friends can find your Yelp or Netflix reviews.

Our goal is to create rich representations of people on the web that protect privacy and enhance our ability to connect and share with each other. If you have other ideas for how we can do that, please share them here, or on our discussion list..

Facebook Graph API and Firefox Contacts

04-21-2010

I just finished a first cut at integrating the new Facebook Graph API with Firefox Contacts!

The integration had these steps:

  1. I set up a Facebook Application for Firefox Contacts. This is fairly standard Facebook Platform stuff, and nothing has changed here. This gave me an Application ID and an Application Secret (about which more below).
  2. I created a Contacts Importer and Discoverer subclass for Facebook. The Importer module serves to get the user's friend list into the Firefox people list, and Discoverer subclass serves to render information about a person that can be retrieved through the Graph API.
  3. I implemented the OAuth 2.0 permission flow in Firefox. This involved a certain amount of Firefox-internals hacking, because the OAuth flow depends on "web redirects," which are requests from a web server to send the web client off to another page. I had to intercept one of these redirections inside my extension to grab an identifier from the URL, and then trigger additional network steps to complete the transaction.
    An aside: The OAuth 2.0 flow involves an "application secret", which makes sense if I'm proving that I am some web service. When I am acting as a user-agent, it doesn't make sense at all, and involves putting something called a "secret" into client-side code, which means that it's not a secret in any sense. Unifying the user's authentication context with the user-agent's authentication context (that is, letting the browser do what the user can do) would fix this.
  4. I then retrieve the user's friend list, and, when requested to do so by the People discovery system, retrieve an arbitrary user's profile data and render it!

All in all, it took about seven hours of steady hacking to get it all working. That's much faster than I've been able to do anything with the old Facebook Platform API. Congrats to Bret and the team for that!

Released yet?

I'll be releasing 0.3 tomorrow morning, with Facebook and Yahoo! support, as well as people-in-the-awesomebar and the much improved discovery experience.

Until then, sources are available at hg.mozilla.org/labs/people, as always!

Contacts in the Browser - 0.2 released

03-31-2010

Firefox Contacts 0.2 has been released! This is mostly an infrastructure upgrade release; the database schema was changed to support a more dynamic way of interacting with contact services. I also added support for Portable Contacts, Webfinger and HCard import, which bring the "web of contacts" much closer to reality.

Contacts 0.2 - Tech Overview

Database

We made a big change to the database schema for this release. In 0.1, we combined data from multiple services into a single document and saved that into the database. While that made it simple to read, we had no ability to remove data from just one service, or to see where data had come from.

In 0.2, we save all the data from each service, with a service name label. These documents are flattened into a single record when the object is read, instead of when it is written. That means we can remove and refresh data on a per-service basis.

Old system:

{
  documents: {
    default: {
      displayName: "John Doe",
      name: {
        given: "John",
        family: "Doe"
      },
      emails: [
        {type: "work", value: "john@work.com"},
        {type: "home", value: "john@home.com"}
      ]
   }
  }
}
New system:
{
  documents: {
    gmail: {
      name: {
        given: "J",
        family: "Doe"
      },
      emails: [
        {type: "work", value: "john@work.com"},
      ]
   },
    native: {
      displayName: "John Doe",
      emails: [
        {type: "home", value: "john@home.com"},
      ]
   }
}

With this new capability in the database, we were able to add these new features:

  • Per service "refresh" and "disconnect"
  • In the Contact View pane, a "where did this data come from?" view
  • A per-service refresh timestamp

Importers

Two new importers were finished for this release: LinkedIn and Plaxo.

The LinkedIn importer is a bit complicated since LinkedIn has a CAPTCHA for contact downloads. We spent some time trying to get the CAPTCHA flow into the Contacts user experience but didn't get it working -- so we are taking advantage of the fact that LinkedIn uses a server-side validity timer for the CAPTCHA. So the user has to answer the question, then import the link again. Messy and not very satisfying but it does work.

Plaxo is a technically interesting case because Plaxo is a pure Portable Contacts provider. We implemented a Generic Portable Contacts import module, and the Plaxo module is a trivial subclass of that.

We fixed some international character issues in the Mac Native address book importer by going to UTF-16 everywhere.

Discoverers

The Discoverer system was completed for this release. A discoverer has a simple API that takes a person and returns a record of new data about that person. We wrote a number of discoverers to demo the system:

  • Webfinger: The webfinger protocol allows for lookup of a service directory for a user from an email address. We follow the procedure and create one or more "url" records, annotating each with the "rel" of the link.
  • HCard: The HCard specification is a microformat to embed contact information in a webpage. We use the Firefox built-in implementation of HCard to parse the entire page and grab the embedded data -- and, FYI, we made a decision to only grab links that have a "rel" of "me". To decide which links to examine for HCard data, the discoverer scans the "urls" list for links with a "rel" of http://microformats.org/profile/hcard and loads those.
  • Yelp, Amazon, Flickr, and Gravatar: These are all simple search-by-email lookups. In each case, the site exposes a propietary search URL, which we query with each of the user's emails.

Contacts View

Relatively simple changes to the contacts view in this release. The most interesting new bit is the "Where did this data come from?" view, which just exposes the data structure we create in the database schema change. Also, lots of fiddly CSS changes to get the vertical heights right everywhere.

Email Autocomplete

Other than changes to support the new database, the only change in the auto-completion module was support for more than one email address per contact, which was a much-requested feature.

Contacts in the Browser

03-19-2010

I'm happy to announce the launch of the Contacts in the Browser project! You can read about it at the Mozilla Labs blog. I've been working on the extension for the last couple months and am glad to get it out in public where people can play with it.

I'll use this space to talk a bit more about the technical underpinnings of the extension.

In essence, Contacts is a local database with some specialized logic to handle duplicate detection, an API for importing records, a browser overlay for form autocompletion, and a security-limited API to query records from web content. I'll take each piece in turn.

The database

The internal database is a Javascript wrapper on the Firefox Storage Service. The Firefox service, in turn, wraps a SQLite embedded database library. We program the database using standard SQL, so we have your standard model of tables, indexes, and queries in there.

Our data model is effectively schema-less. Each person is represented as a GUID and a JSON blob. The people table is declared like this:

CREATE TABLE id INTEGER PRIMARY KEY, guid TEXT UNIQUE NOT NULL, json TEXT NOT NULL

We then have a data-driven scheme to create index tables from the JSON representation, so we have some additional tables and indexes that look like this:

CREATE TABLE displayName (id INTEGER PRIMARY KEY, person_id INTEGER NOT NULL, val TEXT NOT NULL COLLATE NOCASE);
CREATE INDEX displayName_person_id ON displayName (person_id);
CREATE INDEX displayName_val ON displayName (val);

It's the responsibility of the application logic to update the index tables, but we have some helper methods for that. Right now we're indexing on displayName, givenName, familyName, and emails. Callers (who must have chrome-level privileges; i.e. be an extension, not web content) can insert records with the add() method, or update records with update().

Working with sqlite is sometimes great and sometimes a bit of extra work. We learned early on that we had to be careful with our transactions, because committing a transaction on a laptop hard drive can take 10 milliseconds or more. In an early version, we were opening a transaction for each new person record, so importing an address book with 1000 contacts was taking upwards of 20 seconds. We eventually combined the import into a single transaction, which cut the runtime by about 100x.

The most interesting feature of a contact database isn't bulk import, though -- it's de-duplication. It is very common for users to have many repeated copies of contact data scattered all over their computers and the web. We would like Contacts to ultimately help with this problem, which means we need to be able to merge and combine data from multiple sources.

Our current implementation has a pretty trivial union algorithm. It simply compares the email addresses and full displayName of each person against the whole list, and merges the records if it finds a match. This has several problems:

  1. It could miss two records for the same person that happen not to have a value in common, either because they are data islands or because of trivial differences in their displayName (e.g. Bob and Robert)
  2. It could combine two people that happen to share a data value (e.g. a couple that uses the same e-mail address)
  3. By merging the records, it loses the attribution of data back to its source. That makes it harder to implement refresh and to explain to the user what happened.

About the JSON blob

We needed to pick a representation format for the user data that we put into our JSON blob. We settled on Portable Contacts. We have a bunch of thoughts about how to construct a representation system that allows multiple schemas to co-exist, but we needed to pick one to start with, and PoCo (as it is known by it's fans) hits all the right points of openness, adoption, and extensibility.

That said, our current use of PoCo is probably not quite right. We have a notion of multiple documents inside the JSON blob, but we don't have a clean mapping for which service provided what. The next release of Contacts will contain a revision to the schema to handle service attribution better. (Current work on the Mozilla wiki, here)

The Importer system

The second major piece of Contacts is the Importer system. A generic ImporterBackend object is provided as the parent class for implementations of an Importer, which is registered with the PeopleImporterSvc service.

The contract of Importer.import() is pretty simple: it takes a progress function and a completion callback, and does whatever is necessary to get some contacts into the database. Callers are encouraged to call People.add() only once, since it runs much faster that way, and should provide feedback to the user through the callback functions. In our implementation, the messages passed through those functions are rendered into the Contact Manager user interface.

We did a couple importer implementations to get a feel for how they would work.

  • The Gmail importer was quite easy, because Gmail provides a simple POST interface to retrieve contact book data. We made a bad decision, in hindsight, to use the VCard export mode instead of the PoCo one. That will probably change in the next release.
  • The Twitter importer is interesting because it uses the Twitter API instead of going through the user website. That required us to integrate with the Password Manager to get the user's username and password. We should also provide a form for the user to provide their username and password interactively if they would prefer not to save the credentials in the Password Manager. The twitter API doesn't give us the user's email address, so it's a case where the person-de-duplication logic is important (and usually insufficient, in our current form).
  • The last one we did was a native (OS-level) address book. The MacOS address book API is well-documented and has been stable since MacOS 10.2, so we picked that one and ran with it. The Windows API is a bit more complicated, since there were some changes in the Vista timeframe, and we'll get to it later. For this one, we had to write some native C code and compile it into a binary extension library.

We also did a Gravatar importer. Since we wrote this one, I've come to think that this is actually our first instance of a new object, which I'm calling a Discoverer. This is an object that, given a person or a piece of personal data, retrieves some other chunk of personal data. In the case of Gravatar, we examine the list of email addresses to determine if any of them are associated with a Gravatar, and, if so, we return the image URL of the Gravatar to the People service.

Discoverers will perform their discover() method on a person and return one or more records containing new data about that person, which can then be merged into the person record. We can do this automatically on all contacts, or interactively when we view a single contact.

There are a lot of interesting possibilities for discovery. We can do service-specific discovery, like Gravatar and Flickr, or generic discovery protocols like WebFinger and the Google Social Graph API. Because the discovery object can run in the user's web context, it can be used for search into restricted social networks, such as to discover a Facebook page for a contact. Coming soon.

The form autocompletion overlay

The extension includes the PeopleAutoCompleteSearch object, which implements the Firefox Autocomplete interface to provide form autocompletion.

One limitation of that interface is that we only get access to the form name. We don't get the type or the rel attribute, so we can't detect email fields in every case. But, since those fields are frequently named "email", "e-mail", "recipient", or "recipients", we look for those names, and pop up the autocompletion if we find a match.

The Content API

The last piece of the extension is a Javascript API. We dynamically extend the navigator Javascript object by using a XUL injector technique, which is an advanced bit of Firefox hackery. What we do is watch for a state that indicates that the page is loaded, and inject a new function into the page before the rest of the page runs. The function is created dynamically and returned as a closure, so we can restrict the scope of access to internal data.

What this means is that when the page calls navigator.people.find(), we can check with the Permission Manager to see whether the user has granted privileges to the Contacts system, and then check the internal Contacts database to see if field-level permissions have been saved. If the permissions are all there, we run the query and return. If they're not, we can pop up a XUL-based modal dialog that puts the user through a permission flow to grant (or deny) access to the contact database.

What's Next?

I think there are a huge range of exciting applications that are enabled by getting people into the browser.

Address book functionality is only the beginning, though it is an exciting step. I imagine email, phone, and physical address auto-completion and hotlinking everywhere. I want websites to stop asking for my credentials, or even an OAuth ticket, at other sites, simply to get access to my friend list.

There are lots basic address book capabilities that need doing: groups, multiple values for image, disambiguation based on common nicknames, per-service refresh (with timestamps to keep track of how stale data is), bulk edits.

In a future release, I will be adding support for hashed email access. In nearly every social networking setup task, there is no need for me to disclose my peers' real email addresses -- instead, I just need to disclose a stable, unique token for them. If a site could simply retrieve the hash of every email address in my friends group, they could discover all the people that have an account on the website, without actually knowing their names or addresses.

The discovery system has some exciting potential applications: by creating a user interface for web-based people discovery systems, we can help decrease the isolation of personal data into islands scattered across the web.

Much work remains to do! Thanks to everybody that's provided positive feedback on the initial release.

All posts »