Django dynamic model fields

ghz 1years ago ⋅ 7864 views

Question

I'm working on a multi-tenanted application in which some users can define their own data fields (via the admin) to collect additional data in forms and report on the data. The latter bit makes JSONField not a great option, so instead I have the following solution:

class CustomDataField(models.Model):
    """
    Abstract specification for arbitrary data fields.
    Not used for holding data itself, but metadata about the fields.
    """
    site = models.ForeignKey(Site, default=settings.SITE_ID)
    name = models.CharField(max_length=64)

    class Meta:
        abstract = True

class CustomDataValue(models.Model):
    """
    Abstract specification for arbitrary data.
    """
    value = models.CharField(max_length=1024)

    class Meta:
        abstract = True

Note how CustomDataField has a ForeignKey to Site - each Site will have a different set of custom data fields, but use the same database. Then the various concrete data fields can be defined as:

class UserCustomDataField(CustomDataField):
    pass

class UserCustomDataValue(CustomDataValue):
    custom_field = models.ForeignKey(UserCustomDataField)
    user = models.ForeignKey(User, related_name='custom_data')

    class Meta:
        unique_together=(('user','custom_field'),)

This leads to the following use:

custom_field = UserCustomDataField.objects.create(name='zodiac', site=my_site) #probably created in the admin
user = User.objects.create(username='foo')
user_sign = UserCustomDataValue(custom_field=custom_field, user=user, data='Libra')
user.custom_data.add(user_sign) #actually, what does this even do?

But this feels very clunky, particularly with the need to manually create the related data and associate it with the concrete model. Is there a better approach?

Options that have been pre-emptively discarded:

  • Custom SQL to modify tables on-the-fly. Partly because this won't scale and partly because it's too much of a hack.
  • Schema-less solutions like NoSQL. I have nothing against them, but they're still not a good fit. Ultimately this data is typed, and the possibility exists of using a third-party reporting application.
  • JSONField, as listed above, as it's not going to work well with queries.

Answer

As of today, there are four available approaches, two of them requiring a certain storage backend:

  1. Django-eav (the original package is no longer mantained but has some thriving forks )

This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it:

* uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic; 
* allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like:
    
            eav.unregister(Encounter)
    eav.register(Patient)
    

* **[Nicely integrates with Django admin](https://github.com/mvpdev/django-eav/blob/master/eav/admin.py)** ;

* At the same time being really powerful.

Downsides:

* Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model.
* Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases.
* You will need to select [one of the forks](https://github.com/mvpdev/django-eav/network), since the official package is no longer maintained and there is no clear leader.

The usage is pretty straightforward:

    import eav
from app.models import Patient, Encounter

eav.register(Encounter)
eav.register(Patient)
Attribute.objects.create(name='age', datatype=Attribute.TYPE_INT)
Attribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT)
Attribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT)
Attribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT)
Attribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT)

self.yes = EnumValue.objects.create(value='yes')
self.no = EnumValue.objects.create(value='no')
self.unkown = EnumValue.objects.create(value='unkown')
ynu = EnumGroup.objects.create(name='Yes / No / Unknown')
ynu.enums.add(self.yes)
ynu.enums.add(self.no)
ynu.enums.add(self.unkown)

Attribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,\
                                       enum_group=ynu)

# When you register a model within EAV,
# you can access all of EAV attributes:

Patient.objects.create(name='Bob', eav__age=12,
                           eav__fever=no, eav__city='New York',
                           eav__country='USA')
# You can filter queries based on their EAV fields:

query1 = Patient.objects.filter(Q(eav__city__contains='Y'))
query2 = Q(eav__city__contains='Y') |  Q(eav__fever=no)
  1. Hstore, JSON or JSONB fields in PostgreSQL

PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields.

HStoreField :

Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types.

This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is [not ideal performance-wise](http://archives.postgresql.org/pgsql- performance/2011-05/msg00263.php), especially if you are going to end up storing thousands of items in one field. It also only supports strings for values.

    #app/models.py
from django.contrib.postgres.fields import HStoreField
class Something(models.Model):
    name = models.CharField(max_length=32)
    data = models.HStoreField(db_index=True)

In Django's shell you can use it like this:

    >>> instance = Something.objects.create(
                 name='something',
                 data={'a': '1', 'b': '2'}
           )
>>> instance.data['a']
'1'        
>>> empty = Something.objects.create(name='empty')
>>> empty.data
{}
>>> empty.data['a'] = '1'
>>> empty.save()
>>> Something.objects.get(name='something').data['a']
'1'

You can issue indexed queries against hstore fields:

    # equivalence
Something.objects.filter(data={'a': '1', 'b': '2'})

# subset by key/value mapping
Something.objects.filter(data__a='1')

# subset by list of keys
Something.objects.filter(data__has_keys=['a', 'b'])

# subset by single key
Something.objects.filter(data__has_key='a')    

JSONField :

JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore. Several packages implement JSON/JSONB fields including django- pgfields , but as of Django 1.9, JSONField is a built-in using JSONB for storage. JSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries.

    #app/models.py
from django.contrib.postgres.fields import JSONField
class Something(models.Model):
    name = models.CharField(max_length=32)
    data = JSONField(db_index=True)

Creating in the shell:

    >>> instance = Something.objects.create(
                 name='something',
                 data={'a': 1, 'b': 2, 'nested': {'c':3}}
           )

Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration).

    >>> Something.objects.filter(data__a=1)
>>> Something.objects.filter(data__nested__c=3)
>>> Something.objects.filter(data__has_key='a')
  1. Django MongoDB

Or other NoSQL Django adaptations -- with them you can have fully dynamic models.

NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django- nonrel from standard Django you will need to replace ManyToMany with [ListField](https://stackoverflow.com/questions/3877246/django-nonrel-on- google-app-engine-implications-of-using-listfield-for-manytom) among other things.

Checkout this Django MongoDB example:

    from djangotoolbox.fields import DictField

class Image(models.Model):
    exif = DictField()
...

>>> image = Image.objects.create(exif=get_exif_data(...))
>>> image.exif
{u'camera_model' : 'Spamcams 4242', 'exposure_time' : 0.3, ...}

You can even create [embedded lists](http://django- mongodb.org/topics/embedded-models.html) of any Django models:

    class Container(models.Model):
    stuff = ListField(EmbeddedModelField())

class FooModel(models.Model):
    foo = models.IntegerField()

class BarModel(models.Model):
    bar = models.CharField()
...

>>> Container.objects.create(
    stuff=[FooModel(foo=42), BarModel(bar='spam')]
)
  1. Django-mutant: Dynamic models based on syncdb and South-hooks

Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by [Will Hardy](http://dynamic- models.readthedocs.org/en/latest/index.html) and Michael Hall.

All of these are based on Django South hooks, which, according to [Will Hardy's talk at DjangoCon 2011](http://blip.tv/djangocon- europe-2011/wednesday-1415-will-hardy-5311186) (watch it!) are nevertheless robust and tested in production (relevant source code).

First to implement this was [Michael Hall](http://mhall119.com/2011/02/fun-with-django-meta-classes-and- dynamic-models/).

Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper [lock](https://stackoverflow.com/questions/1123200/how-to-lock-a-critical- section-in-django) in order to allow simultaneous database altering requests.

If you are using Michael Halls lib, your code will look like this:

    from dynamo import models

test_app, created = models.DynamicApp.objects.get_or_create(
                      name='dynamo'
                    )
test, created = models.DynamicModel.objects.get_or_create(
                  name='Test',
                  verbose_name='Test Model',
                  app=test_app
               )
foo, created = models.DynamicModelField.objects.get_or_create(
                  name = 'foo',
                  verbose_name = 'Foo Field',
                  model = test,
                  field_type = 'dynamiccharfield',
                  null = True,
                  blank = True,
                  unique = False,
                  help_text = 'Test field for Foo',
               )
bar, created = models.DynamicModelField.objects.get_or_create(
                  name = 'bar',
                  verbose_name = 'Bar Field',
                  model = test,
                  field_type = 'dynamicintegerfield',
                  null = True,
                  blank = True,
                  unique = False,
                  help_text = 'Test field for Bar',
               )