Storage Talk

June 29, 2004

Unfortunately my ankle was fractured pretty badly and it was important I have
surgery on Wednesday. Unfortunately this precluded my flying to Norway for GUADEC
on saturday. I actually proposed that I fly to Norway on Saturday to my orthopaedic surgeon.
He gave me a look that was darker than oil at midnight, and went back to what he was doing
without saying anything. Some people tell me I should have interpreted this as a “sounds ok”.
However, he later said some things about our goal being to “reduce the chance of having
arthritis in the ankle for the rest of your life”. That scared me into behaving.

There’s a more formalish storage paper for the occasion here. But honestly, I think the speaking notes are
more informative for getting at the soul of the material. In my experience that’s often true
of talks vs accompanying papers. So I’m including my speaking notes here. I blame oxycontin for any incoherent bits. They’re a little random but I hope you press through
because some of the good stuff is near the middle/end ;-). Maybe I’ll do sketches on whiteboards for all the places I was going to do live sketches and take pictures, but for now the notes are all booooring woooords. Unfortunately in many cases the sketches are the meat of the thing, but I think you can get some idea what I’m talking about from the text. I’ve fleshed it out past the notes in some places where it was totally incomprehensible:

  • Storage is designed to support a
    more general user experience than just “find files more easily�.
    Storage isn’t a silver bullet, but it can serve as a toolkit for
    making new user experiences easier to extend across the desktop. In
    the process it helps dissolve the application/desktop boundary a
    little.

The Experience

  1. Intro:
    Related to many existing systems

    1. Wiki –
      anybody can edit or work with information. Information is not super
      formal to start with, but can become “formalized�. Unlike wiki,
      allow for rich in place editing and better tie in to the OS for
      noticing changes and tracking “change threads� (which are
      themselves communication often).

    2. Whiteboard –
      support quick informal live collaborations. Don’t force things into
      a particular “format� or medium but allow people to mix it up.
      Share a space with lots of presence information, etc. Also envision
      this working when people are in the same place.

    3. Groupware –
      handle objects that people need to deal with to get their job done.
      People, teams, projects, tasks, deadlines. These are more central
      to knowledge workers than even documents. Like groupware, track
      threads of communication, but don’t tie people down to text
      messages. Let them respond with people, projects, tasks, etc.
      Rather than “posting to lists� you just append items to a topic
      in the (or a) central store.

    4. Bugzilla –
      tasks, and schedules, process, status, owner, etc. Track more
      interesting metadata in a way that people can shape to their
      organization.

  2. Build “objects people care
    about�

    1. This is more about what gets
      built on top of Storage, but its a major part of the overall
      experience. The file manager (atop the filesystem) is about
      managing formal documents and folders to group documents in large
      concrete chunks. The <some name here> (atop storage) should
      focus on objects that fill people’s daily lives
      .

    2. People, Projects, Teams, Tasks,
      Messages, Topics, Discussions, Managers, Proposals, etc, etc, etc
      (and yes, Documents too) are objects people care about. Many others
      that are specific to particular industries and job roles. Some of
      these objects currently live in specialized applications like
      evolution, and most of these will still be handled primarily
      through a specialized interface. <sketch the two specialized
      interfaces>.

    3. Its usually a good idea to have
      specialized tools for targeting specific use cases.

    4. OTOH, although we work on text
      documents mostly in the office suite, we still expose common
      operations to the base OS (the filemanager mostly in this case).
      How can we extend the set of useful things that can be done with
      information across the information boundary? In a less generic
      sense, can we build support for the objects people deal with on
      a day to day basis more deeply into the OS
      . It doesn’t have to
      be done by a univeral component system, but base libraries like
      storage can make it easier to support the important “one off�
      optimizations in the base OS (such as for projects).

  3. Support informal work

    1. Most office applications are
      focused on producing deliverables: formal documents. But
      deliverables are the exception. Most knowledge workers spend most
      of their time processing, sharing, and extending information not
      producing deliverables. We want to build interfaces that allow for
      some degree of information soup. <sketch the process flow for
      organzing SubsByTheInch2005>

    2. Informal work can eventually turn
      into formal deliverables. Make this process as convenient as
      possible.

  4. Information is information, don’t
    force large chunks

    1. We currently have odd
      granularities of information. “Files� in the case of “formal
      documentsâ€? (but since we don’t have informal constructs, many
      things are pushed into this).

  5. Access items within large bodies
    of information

    1. The storage “research-y�
      solution to this is object reference using human language phrases

    2. This aspect of storage still
      interests me, and has been where most of the work has gone until
      now…. but it is more researchy because it is prone to
      being technically infeasible (jury is still out ;-). As such, other
      parts of storage are not predicated on it.

  6. Provide the components for
    collaboration

    1. If storage is the physics, social
      interaction is the chemistry. Storage needs to provide some very
      basic structures that will give rise (when people, environments,
      tasks, etc) are thrown into the mix to social interactions. Rather
      than trying to control things rigidly, as traditional computer
      environments have done, we allow social behaviors to regulate
      things more (as things work normally outside computer world).

    2. Presence information is the
      substrate for coordinating social interactions. Who is where and
      doing what is the most relevant context for social interactions.

    3. Access by multiple
      threads/computers/people. Rather than “versioning� documents
      and the associated problems (e.g.
      merging is a nearly insoluable
      UI
      problem) we allow “live� (or at least effectively live) access
      to documents.

    4. Fine granularity. If we have
      access from multiple places, the temptation is to use locking of
      “documents�. Even inside formal documents, however, this will
      greatly limit collaborative ability. If we have rich fine grained
      presence information, combined with very fine grained data access,
      we can provide the ability to socially manage interactions rather
      than requiring “forced� lockouts.

  7. Track information flow

    1. E-mail showed the importance of
      threads of communication between people. An e-mail
      thread morphs into a task (like a bug), which morphs into a few
      more tasks (which might have discussions associated with them),
      which turns into a full fledged project with an associated team,
      which eventually produces a policy document. All this stays tied
      together. <show interface idea>

A Brief History (aka
excuse):

  • Storage was initially
    implemented as project Gargamel by a team of Stanford CS (and one
    EE, and yours truly) students as a senior project. Brian Quistorf,
    James Farwell, Khalil Bey, Josh Radel. It gets to a nice demo-able
    point before they graduate.

  • It gets even more
    finished as I work on it after graduation while not looking for
    work. Web page is written, screenshots made, etc.

  • I foolishly decide to
    rewrite the NL parser (and lose the old CVS history when importing
    to cvs.gnome.org). I get sidetracked writing the NL parser.

  • Slashdot etc hit.
    Lots of developer interest, but I’m snowed for other reasons and
    don’t succesfully get development moving with other people. Plus I
    still have to finish the NL rewrite before things will function
    again.

  • The summer is
    completely crazy, and I stop working on Storage for 8 months.

  • Today: NL rewrite is
    now done. Its a much stronger foundation, but the semantic grammar
    is still small. However, even with the small grammar it can do very
    sophisticated (correct) interpretations of phrases like “songs
    that aren’t by ‘John Lennon’ but have the word ‘love’ in themâ€?.
    This would be very difficult to parse with a traditional “naive�
    scavanging search interpretation. Marco is also contributing to
    Storage, as well as some other Epiphany dudes. Things are starting
    to pick up, and I’m determined to not kill storage by bottlenecking
    again. I’m looking for a “project managerâ€?.

What’s there today:

Non-NL

  • storage-store
    – manages the postgresql server, handles notification

  • libstorage
    – GObject interface to store items

  • libstorage-translators
    – serializes / deserializes data streams from / to storage items’

  • GnomeVFS
    module
    – automatically invokes translators on
    read/write into the store allowing existing GNOME apps to use the
    store like a normal filesystem

  • NL

  • PET
    – parses sentences into Head-Phrase Structure Grammar (HPSG)
    trees, by Dr. Ullrich Callmeier.

  • libmrs
    – interface to the Minimal Recursion ‘Semantics’ information in
    the HPSG tree

  • libmrs-converters
    – translates MRS into a more meaningful XML statement using a
    client chosen semantic grammar

  • libstorage-nl
    – translates using storage-specific semantic grammar into the
    intermediate XML form, and then to an SQL query

What’s in the near future:

  • Currently libstorage,
    the VFS module, and some translators directly access the postgresql
    server. This is undesirable: it means permissions on a shared store
    would have to be enforced using a collection of SQL views, it means
    locking becomes very tricky, and it means that libstorage and other
    things link directly against postgresql libraries (though this could
    be addressed by gnome-db).

  • Support for NL
    searches in select non-English languages (probably Spanish first,
    but perhaps Japanese). Storage is built on a “language neutral
    framework�, but grammar engineering is a very
    difficult task. Some of the availability of NL searches will depend
    on what the linguistics community produces and distributes freely.

  • A
    nifty collaborative application to provide a test bed for the
    collaboration/locking framework. <sketch collaborative
    whiteboard/wiki design> (also shows informal work) Ideas? 😉

<demo NL search interface>

<show NL slides and explain basic NL
process>