NDB Entities and Keys

Objects in the Datastore are known as entities. Each entity is an instance of a ndb.Model class; or, more likely, an application-defined subclass of ndb.Model. Each entity is identified by a key, unique within the application's Datastore. (If you're new to App Engine but not to web application development in general, this is similar to defining a schema in SQL, except that there is no need to issue CREATE TABLE commands.)

Overview

An entity model class defines one or more properties possessed by entities of that class. For example:

from google.appengine.ext import ndb

class Account(ndb.Model):
  username = ndb.StringProperty()
  userid = ndb.IntegerProperty()
  email = ndb.StringProperty()

Each entity is identified by a key, unique within the application's Datastore. In its simplest form, a key consists of a kind and an identifier. The kind is normally the name of the model class to which the entity belongs ("Account" in the example above), but can be changed to some other string by overriding the class method _get_kind(). The identifier may be either a key name string assigned by the application or an integer numeric ID generated automatically by the Datastore.

An entity's key can designate another key as a parent. As a shorthand for saying "an entity's key's parent", people usually say "an entity's parent"; depending on context, they might mean the entity's key's parent; or the entity that has that key. An entity without a parent is a root entity. An entity, its parent, parent's parent, and so on recursively, are its ancestors. The entities in the Datastore thus form a hierarchical key space similar to the hierarchical directory structure of a file system. The sequence of entities beginning with a root entity and proceeding from parent to child, leading to a given entity, constitute that entity's ancestor path.

The complete key identifying an entity thus consists of a sequence of kind-identifier pairs specifying its ancestor path and terminating with those of the entity itself. The constructor method for class Key accepts such a sequence of kinds and identifiers and returns an object representing the key for the corresponding entity. For example, a revision of a message that "belongs to" an owner, might have a key that looks like

rev_key = ndb.Key('Account', 'Sandy', 'Message', 'greeting', 'Revision', '2')

Notice that the entity's kind is designated by the last kind-name in the list. (You might think that the string '2' is a strange key value: why not use a number? You can use numeric IDs, but it's a little tricky. If you want to know the details, please see Using Numeric Key IDs.)

For a root entity, the ancestor path is empty and the key consists solely of the entity's own kind and identifier:

sandy_key = ndb.Key('Account', 'Sandy')

(Alternatively, you can use the model class object itself, rather than its name, to specify the entity's kind—

sandy_key = ndb.Key(Account, 'Sandy')

—but it will be converted to the name string in the actual key.) You can also use the named parameter parent to designate any entity in the ancestor path directly. Thus, the following key specifications are all equivalent:

k1 = ndb.Key('Account', 'Sandy', 'Message', 'greetings', 'Revision', '2')
k2 = ndb.Key(Revision, '2', parent=ndb.Key('Account', 'Sandy', 'Message', 'greetings'))
k3 = ndb.Key(Revision, '2', parent=ndb.Key(Account, 'Sandy', Message, 'greetings'))

Creating Entities

You create an entity by calling the constructor method for its model class. You can specify the entity's properties to the constructor with keyword arguments:

sandy = Account(username='Sandy',
                userid=123,
                email='sandy@gmail.com')

This creates an object in your program's main memory; it will be gone as soon as the process terminates. To store the object as a persistent entity in the Datastore, use the put() method. This returns a key for retrieving the entity from the Datastore later:

sandy_key = sandy.put()

Alternatively, instead of supplying the property values directly to the constructor, you can set them manually after creation:

sandy = Account()
sandy.username = 'Sandy'
sandy.userid = 123
sandy.email = 'sandy@gmail.com'

The convenience method populate() allows you to set several properties in one operation:

sandy = Account()
sandy.populate(username='Sandy',
               userid=123,
               email='sandy@gmail.com')

However you choose to set the entity's properties, the property types (in this case, StringProperty and IntegerProperty) enforce type checking. For example:

bad = Account(username='Sand', userid='not integer') # Raises an exception
sandy.username = 42 # Raises an exception

Retrieving Entities from Keys

Given an entity's key, you can retrieve the entity from the Datastore:

sandy = sandy_key.get()

The Key methods kind() and id() recover the entity's kind and identifier from the key:

kindString = rev_key.kind() # returns "Revision"
ident = rev_key.id() # returns "2"

The parent() method returns a key representing the parent entity:

greeting_key = rev_key.parent()

You can also use an entity's key to obtain an encoded string suitable for embedding in a URL:

urlString = rev_key.urlsafe()

This produces a result like agVoZWxsb3IPCxIHQWNjb3VudBiZiwIM which can later be used to reconstruct the key and retrieve the original entity:

rev_key = ndb.Key(urlsafe=urlString)
revision = rev_key.get()

Note: The URL-safe string looks cryptic, but it is not encrypted! It can easily be decoded to recover the original entity's kind and identifier:

key = Key(urlsafe=urlString) kindString = key.kind() ident = key.id()

If you use such URL-safe keys, don't use sensitive data such as email addresses as entity identifiers. (A possible solution would be to use the MD5 hash of the sensitive data as the identifier. This stops third parties, who can see the encrypted keys, from using them to harvest email addresses, though it doesn't stop them from independently generating their own hash of a known email address and using it to check whether that address is present in the Datastore.)

Updating Entities

To update an existing entity, just retrieve it from the Datastore, modify its properties, and store it back again:

sandy = key.get()
sandy.email = 'sandy@gmail.co.uk'
sandy.put()

(You can ignore the value returned by put() in this case, since an entity's key doesn't change when you update it.)

Deleting Entities

When an entity is no longer needed, you can remove it from the Datastore with the key's delete() method:

sandy.key.delete()

Note that this is an operation on the key, not on the entity itself. It always returns None.

Operations on Multiple Keys or Entities

Because each get() or put() operation invokes a separate remote procedure call (RPC), issuing many such calls inside a loop is an inefficient way to process a collection of entities or keys at once. The following methods are faster:

list_of_keys = ndb.put_multi(list_of_entities)
list_of_entities = ndb.get_multi(list_of_keys)
ndb.delete_multi(list_of_keys)

Advanced note: These methods interact correctly with the context and caching; they don't correspond directly to specific RPC calls.

Expando Models

Sometimes you don't want to declare your properties ahead of time. A special model subclass, Expando, changes the behavior of its entities so that any attribute assigned (as long as it doesn't start with an underscore) is saved to the Datastore. For example:

class Mine(ndb.Expando):
  pass

e = Mine()
e.foo = 1
e.bar = 'blah'
e.tags = ['exp', 'and', 'oh']
e.put()

This writes an entity to the Datastore with a foo property with integer value 1, a bar property with string value 'blah', and a repeated tags property with string values 'exp', 'and', and 'oh'. The properties are indexed, and you can inspect them using the entity's _properties attribute:

print e._properties
{'foo': GenericProperty('foo'), 'bar': GenericProperty('bar'),
'tags': GenericProperty('tags', repeated=True)}

An Expando created by getting a value from the Datastore has properties for all property values that were saved in the Datastore.

An application can add predefined properties to an Expando subclass:

class FlexEmployee(ndb.Expando):
  name = ndb.StringProperty()
  age = ndb.IntegerProperty()

e = FlexEmployee(name='Sandy', location='SF')

This gives e a name attribute with value 'Sandy, an age attribute with value None, and a dynamic attribute location with value 'SF'.

To create an Expando subclass whose properties are unindexed, set _default_indexed = False in the subclass definition:

class Specialized(ndb.Expando):
  _default_indexed = False

e = Specialized(foo='a', bar=['b'])
print e._properties
{'foo': GenericProperty('foo', indexed=False),
 'bar': GenericProperty('bar', indexed=False, repeated=True)}

You can also set _default_indexed on an Expando entity. In this case it will affect all properties assigned after it was set.

Another useful technique is querying an Expando kind for a dynamic property. A query like

FlexEmployee.query(FlexEmployee.location == 'SF')

won't work, as the class doesn't have a property object for the location property. Instead, use GenericProperty, the class Expando uses for dynamic properties:

FlexEmployee.query(ndb.GenericProperty('location') == 'SF')

Model Hooks

NDB offers a lightweight hooking mechanism. By defining a hook, an application can run some code before or after some type of operations; for example, a Model might run some function before every get(). A hook function runs when using the synchronous, asynchronous and multi versions of the appropriate method. For example, a "pre-get" hook would apply to all of get(), get_async(), and get_multi(). There are pre-RPC and post-RPC versions of each hook.

Hooks can be useful for

query caching
auditing Datastore activity per-user
mimicking database triggers

The following example shows how to define hook functions:

from google.appengine.ext import ndb

class Friend(ndb.Model):
  name = ndb.StringProperty()

  def _pre_put_hook(self):
    # inform someone they have new friend

  @classmethod
  def _post_delete_hook(cls, key, future):
    # inform someone they have lost a friend

f = Friend()
f.name = 'Carole King'
f.put() # _pre_put_hook is called
fut = f.key.delete_async() # _post_delete_hook not yet called
fut.get_result() # _post_delete_hook is called

If you use post- hooks with asynchronous APIs, the hooks are triggered by calling check_result(), get_result() or yielding (inside a tasklet) an async method's future. Post hooks do not check whether the RPC was successful; the hook runs regardless of failure.

All post- hooks have a Future argument at the end of the call signature. This Future object holds the result of the action. You can call get_result() on this Future to retrieve the result; you can be sure that get_result() won't block, since the Future is complete by the time the hook is called.

Raising an exception during a pre-hook prevents the request from taking place. Although hooks are triggered inside *_async methods, you cannot pre-empt an RPC by raising tasklets.Return in a pre-RPC hook.

Using Numeric Key IDs

A key is a series of kind-ID pairs. You want to make sure each entity has a key that is unique within its application and namespace. An application can create an entity without specifying an ID; the Datastore automatically generates a numeric ID. If an application picks some IDs "by hand" and they're numeric and the application lets the Datastore generate some IDs automatically, the Datastore might choose some IDs that the application already used. To avoid, this, the application should "reserve" the range of numbers it will use to choose IDs (or use string IDs to avoid this issue entirely).

To "reserve" a range of IDs, an application can use a model class' allocate_ids() class method. The method can be used in two ways: to allocate a specified number of IDs, or to reserve a given range of IDs. To allocate 100 IDs for a given model class (say MyModel), use the form:

first, last = MyModel.allocate_ids(100)

To allocate 100 IDs for entities with parent key p:

first, last = MyModel.allocate_ids(100, parent=p)

The returned values, first and last, are the first and last ID (inclusive) allocated. An application can use these to construct keys as follows:

keys = [Key(MyModel, id) for id in range(first, last+1)]

These keys are guaranteed not to have been returned previously by the Datastore's internal ID generator, nor will they be returned by future calls to the internal ID generator. However, allocate_ids() does not check whether the IDs returned are present in the Datastore; it only interacts with the ID generator.

An alternate form lets you allocate all IDs up to a given maximum value:

first, last = MyModel.allocate_ids(max=N)

This form ensures that all IDs less than or equal to N are considered allocated. The return values, first and last, indicate the range of IDs that were reserved by this operation. It is not an error to try to reserve IDs that were already allocated; in that case, first indicates the first ID not yet allocated and last is the last ID allocated. (In that case, first > last and first > N.)

An application cannot call allocate_ids() in a transaction.

Google App Engine