root/dev/: django-instances-graph-1.1.2.dev0 metadata and description
`django_instances_graph`` provides a way to get the whole graph of instances starting from one
| author | Stephane "Twidi" Angel |
| author_email | s.angel@twidi.com |
| classifiers |
|
| license | BSD |
| File | Tox results | History |
|---|---|---|
django_instances_graph-1.1.2.dev0-py2.py3-none-any.whl
|
|
|
django_instances_graph-1.1.2.dev0.tar.gz
|
|
======================
django_instances_graph
======================
Purpose
=======
The purpose of the ``django_instances_graph`` library is to create a graph of django model instances
with their relations. This graph can then be serialized via pickle, updated manually, and/or used
to create a duplicate of the data used to fill it.
How it works
============
The whole logic is contained in two classes, explicitly named ``Graph`` and ``Instance``.
A few points to see how these two classes are tied together:
- all ``Instance`` have a ``uuid`` (which is NOT the primary key of the tied django model instance)
- all ``Instance`` holds in its ``fields`` attribute (a dict) all its simple values
- a ``Graph`` has an ``instances`` attribute, holding all its instances. It's a dict with the
``uuid`` as keys, and the matching ``Instance`` as values.
- a ``Graph`` has a ``relations`` attribute.
What we call a relation in this library, is a direct link between two instances.
We are closer to the way we can think in database than in Django, because there is no notions of
"many to many" here, because a "many to many" relation is simply a relation between entries in a
"through" tables, and the entries on the both sides of the "many to many".
To summarize, all relations in db, and in django, and then in this libraries, are "foreign key".
So we store the relations in a dict, with keys being the uuid of the instances declaring the
relation. Then the values are also a dict, with keys being the name of the foreign key relation.
Then as values we have sets, with the UUIDs of the instances on the other side of the relation.
And to make this complete and more usable, we also store the relation on the other side.
Creating a graph is as simple as creating an instance, asking the graph to "serialize" it, as seen
in further sections.
THE API
=======
For the examples, will use these models:
.. testsetup::
# This prepare the django environment to run all the "testcode" in this file
# This is not visible in the rendered README
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'tests.settings'
import django
django.setup()
from django.db import connection
connection.creation.create_test_db(verbosity=0)
from tests.models import Author, Book, Tag, Translation
.. code:: python
class Tag(models.Model):
name = models.CharField(max_length=10)
class Author(models.Model):
name = models.CharField(max_length=10)
class Book(models.Model):
name = models.CharField(max_length=10)
author = models.ForeignKey(Author, related_name='books', on_delete=models.CASCADE)
tags = models.ManyToManyField(Tag, related_name='books')
translator = models.ManyToManyField(Author, through="Translation",
related_name="translated_books")
class Translation(models.Model):
book = models.ForeignKey(Book, related_name="translations", on_delete=models.CASCADE)
author = models.ForeignKey(Author, related_name="translations", on_delete=models.CASCADE)
lang = models.CharField(max_length=2)
Also note that all the examples in this documentation are guaranteed to work: they are tested
in ``tests/test_readme_examples.py`` file, and they can be run one after the other
Creating objects
++++++++++++++++
Creating a graph
----------------
This is as simple as:
.. code-block:: python
from django_instances_graph import Graph
graph = Graph()
The ``Graph`` constructor doesn't expect any argument.
Creating an instance
--------------------
First thing to now: if you don't pass an existing graph to the ``Instance`` constructor, a new one
will be created, and will be accessible from the ``graph`` attribute of the created ``Instance``.
So be careful when creating many ``Instance`` objects to always pass a ``Graph``, the best being
to create if before.
The main argument to the ``Instance`` constructor is the ``source``.
You can pass it a Django model:
.. code-block:: python
from django_instances_graph import Instance
author_instance = Instance(Author) # or ``source=Author``
assert author_instance.pk is None
In this case no primary key (accessible via the ``pk`` attribute of the created ``Instance``) will
be saved in the new object.
You can also pass it a Django model instance, not in database, and it will have the same effect as
passing a model (except for one big detail that will see later):
.. code-block:: python
from django_instances_graph import Instance
author = Author(name='john')
author_instance = Instance(author) # or ``source=author``
assert author_instance.pk is None
And of course, you can pass a Django model instance from the database:
.. code-block:: python
from django_instances_graph import Instance
author = Author.objects.create(name='john')
author_instance = Instance(author) # or ``source=author``
assert author_instance.pk == author.pk
In this case, the primary key is saved in the newly created object.
Retrieving an instance from the graph
-------------------------------------
Each instance is created with a ``UUID`` (version 4).
Then you can retrieve the instance using it:
.. code-block:: python
from django_instances_graph import Graph, Instance
graph = Graph()
author = Author.objects.create(name='john')
author_instance = Instance(author, graph=graph)
author_uuid = author_instance.uuid
# later
author_instance = graph.get_instance(author_uuid)
assert author_instance.pk == author.pk
Note that you can set yourself the uuid:
.. code-block:: python
from uuid import uuid4
from django_instances_graph import Graph, Instance
graph = Graph()
author_uuid = uuid4()
author = Author.objects.create(name='john')
author_instance = Instance(author, graph=graph, uuid=author_uuid)
assert author_instance.uuid == author_uuid
If the instance has a pk, you can also retrieve it via it's model + pk:
.. code-block:: python
from django_instances_graph import Graph, Instance
graph = Graph()
author = Author.objects.create(name='john')
author_instance = Instance(author, graph=graph)
author_uuid = author_instance.uuid
# later
author_instance = graph.get_instance(Author, author.pk)
assert author_instance.uuid == author_uuid
Trying to get an instance that does not exist in the graph will raise a ``KeyError``.
Checking if an instance exits in the graph
------------------------------------------
You can do the check by uuid of by model + pk
.. code-block:: python
from uuid import uuid4
from django_instances_graph import Graph, Instance
graph = Graph()
uuid = uuid4()
assert graph.has_instance(uuid) is False
author = Author.objects.create(name='john')
author_instance = Instance(author, graph=graph, uuid=uuid)
assert author_instance.uuid == uuid
assert graph.has_instance(uuid) is True
assert graph.has_instance(Author, author.pk) is True
Saving simple fields in instances
---------------------------------
Each ``Instance`` object has ``fields`` dictionary to hold simple fields values (simple fields
are every fields that are not relations to another model: ``CharField``, ``IntegerField``...)
.. code-block:: python
from django_instances_graph import Graph, Instance
graph = Graph()
author = Instance(Author, graph=graph)
author.fields['name'] = 'john'
assert graph.get_instance(author.uuid).fields['name'] == 'john'
We'll see later that these fields can be automatically filled during the serializing process.
Adding a simple relation
------------------------
The serializing process will add the relation itself. But you can create ones manually.
Remember that a relation is a relation between two instances, via a ``ForeignKey`` or
``OneToOneField`` (which is a sort of ``ForeignKey``)
For example we have a ``ForeignKey`` between ``Book`` and ``Author``, so we can do this:
.. code-block:: python
from django_instances_graph import Graph, Instance
graph = Graph()
author = Instance(Author, graph=graph)
book = Instance(Book, graph=graph)
book.add_relation('author', author)
# or book.add_relation(accessor_name='author', target=author)
# or graph.add_relation(book, 'author', author)
# or graph.add_relation(source=book, accessor_name='author', target=author)
The ``add_relation`` method will check that ``author`` is a correct field type, and that the target
is from the expected model (or a ``ValueError`` will be raised)
This relation could have been added in the opposite way (using the ``related_name`` as the accessor
name):
.. code-block:: python
author.add_relation('books', book)
Adding a M2M relation
---------------------
To add a ``ManyToMany`` relation, we have to distinguish two cases: either the relation has a
auto-created "through" model, or not.
In the first case, it's easy, because the ``Instance`` class has a ``add_direct_m2m_relation``
method, to use this way:
.. code-block:: python
tag1 = Instance(Tag, graph=graph)
tag2 = Instance(Tag, graph=graph)
book.add_direct_m2m_relation('tags', [tag1, tag2])
# or book.add_direct_m2m_relation(accessor_name='tags', targets=[tag1, tag2])
# or graph.add_direct_m2m_relation(source=book, accessor_name='tags', targets=[tag1, tag2])
When a ``ManyToMany`` "entry" is created, what happens in Django is that there is a "through"
model in the middle that has two ``ForeignKey``: one on each side, ie in our case, one to the
``Book`` model, and one to the ``Tag`` model.
The ``add_direct_m2m_relation`` creates the relation in the graph from this "through" model to the
source and the targets. So in our example, we have two "through" entries and it will create 4
relations:
- "through1" to "book"
- "through1" to "tag1"
- "through2" to "book"
- "through2" to "tag2"
Note that the ``add_direct_m2m_relation`` method also returns ``Instance`` objects of the "through"
model, one for each target, in the same order. And it's also important to know that it will not
replace the existing relations, but will just add the specified ones.
This is more complicated when the "through" model is manually defined in the definition of the
``ManyToManyField`` because there is, in general, additional fields, and the graph cannot "guess"
them, so the whole work has to be done manually.
We can see this in an example with the "translators" ``ManyToManyField``
.. code-block:: python
translator1 = Instance(Author, graph=graph)
translator2 = Instance(Author, graph=graph)
# for book => translator1
translation1 = Instance(Translation, graph=graph)
translation1.add_relation('book', book)
translation1.add_relation('author', translator1)
# for book => translator2
translation2 = Instance(Translation, graph=graph)
translation2.add_relation('book', book)
translation2.add_relation('author', translator2)
Retrieving relations from the graph
-----------------------------------
What we want is not to retrieve the relation itself, but the targets of the relation:
To get the author of the book:
.. code-block:: python
author2 = list(book.get_relation_targets('author'))[0]
# or author2 = list(graph.get_relation_targets(source=book, accessor_name='author'))[0]
assert author2 is author
Of course here we have a ``ForeignKey`` so we *should* only have one entry. But we could also have
zero, and in this case an ``IndexError`` will be raised.
It is also possible to get all the books for an author:
.. code-block:: python
books = author.get_relation_targets('books')
assert books == {book}
Here we can see that it makes sense that this method returns a list (in fact, it's a ``set``, so not
ordered).
Note that what is returned are ``Instance`` objects, not instances of the django model.
And to retrieve the targets of a ``ManyToMany``:
.. code-block:: python
tags = book.get_m2m_relation_targets('tags')
# or tags = graph.get_m2m_relation_targets(source=book, accessor_name='tags')
assert tags == {tag1, tag2}
And to get all the books for a tag:
.. code-block:: python
books = tag1.get_m2m_relation_targets('books')
assert books == {book}
In contrary to the ``add_direct_m2m_relation``, this method works for both auto "through" and
manually defined ones, because we want just the targets (it's why there is no "direct" in the name
of this method)
If a manual "through" was defined, to get the "through" entries, simply use the
``get_relation_targets`` method. With our previous example, it should be:
.. code-block:: python
translations = book.get_relation_targets('translations')
assert translations == {translation1, translation2}
Removing relations from the graph
---------------------------------
To remove a direct relation:
.. code-block:: python
book.remove_relation('author', author)
# or graph.remove_relation(book, 'author', author)
assert book.get_relation_targets('author') == set()
And for a ``ManyToMany``:
.. code-block:: python
book.remove_m2m_relation('tags', [tag1, tag2])
# or graph.remove_m2m_relation(book, 'tags', [tag1, tag2])
assert book.get_relation_targets('tags') == set()
Note that this will not remove all the existing relations, but only the specified ones.
To remove all the relations, you can do, for a direct relation:
.. code-block:: python
book.add_relation('author', author) # just to have something
book.clear_relation('author')
# or graph.clear_relation(book, 'author')
assert book.get_relation_targets('author') == set()
And for a ``ManyToMany``:
.. code-block:: python
book.add_direct_m2m_relation('tags', [tag1, tag2]) # just to have something
book.clear_m2m_relation('tags')
# or graph.clear_m2m_relation(book, 'tags')
assert book.get_m2m_relation_targets('tags') == set()
``remove_m2m_relation`` and ``clear_m2m_relation`` accept a ``remove_through_instances``, default
to ``True``, that will remove from the graph the "through" entries of the removed relations.
It can be useful to pass it to ``False`` with manual "through" then there is other fields or other
relations going from or to it.
Also note that these two methods return these "through" instances (even if they are removed from the
graph, they still exist as ``Instance`` objects).
Serialization
+++++++++++++
The serialization is the process and converting a django model instance, to an ``Instance`` of a
graph, saving its fields and relations.
Serializing an instance
-----------------------
Now that we know how to create the graph, instances and relations manually, let's see how to do it
automatically.
First, we can serialize just an instance, not saved in database.
.. code-block:: python
from django_instances_graph import Instance
author = Author(name='john')
author_instance = Instance(author, serialize=True)
assert author_instance.fields['name'] == 'john'
assert author_instance.serialized is True
We can also do it in two steps if for example we used a model as the ``Instance`` source.
.. code-block:: python
from django_instances_graph import Instance
author_instance = Instance(Author)
author_instance.serialize(Author(name='john'))
assert author_instance.fields['name'] == 'john'
assert author_instance.serialized is True
Yes, ``serialize`` expect an instance of the django model, because if an instance of such a model
is passed to the ``Instance`` constructor, it is *not* saved in the ``Instance`` object.
Serializing a graph
-------------------
Serializing an instance is ok, but it's not really what this library is about. We want to
serialize the whole graph, from a starting point.
Let's see how to auto-create the instances and relations from the database.
If we want the whole objects related to a book in the database, it's as simple as passing the
``serialize`` argument to ``True`` when creating an instance:
.. code-block:: python
from django_instances_graph import Instance
author = Author.objects.create(name='author1')
tag1 = Tag.objects.create(name='tag1')
tag2 = Tag.objects.create(name='tag2')
tag3 = Tag.objects.create(name='tag3')
book1 = Book.objects.create(name='book 1', author=author)
book1.tags = [tag1, tag2]
book2 = Book.objects.create(name='book 2', author=author)
book2.tags = [tag1, tag3]
book_instance = Instance(book1, serialize=True)
assert book_instance.pk == book1.pk
assert book_instance.fields['name'] == 'book 1'
author_instance = list(book_instance.get_relation_targets('author'))[0]
assert author_instance.fields['name'] == 'author1'
tag_instances = book_instance.get_m2m_relation_targets('tags')
assert set(t.fields['name'] for t in tag_instances) == {'tag1', 'tag2'}
Note that we didn't set the ``graph`` argument, so we can get it back using ``book_instance.graph``.
But it could of course have been defined before and passed to the ``Instance`` constructor, as
seen before.
What is done by passing ``serialize=True``:
- all the simple fields are saved in ``book_instance.fields``
- all the relations from "book1" to any related model are created
- all "any related model" have their own ``Instance`` in the graph, also serialized, ie their simple
fields but their relations too.
- this is done recursively until there is no more relations to follow.
So we'll have the book, it's author, it's tags. But we'll also have the other books of the authors,
and their tags too, and all the books for all the tags.
Maybe it's that you want but there is a chance that it's not the case.
For this, let's introduce what we call "boundaries".
Defining boundaries
-------------------
In our example, we just want to serialize the book, its relations to an author and to its tags.
So, the boundaries are:
- the "author" ``ForeignKey`` from the ``Book`` model
- the "tag" ``ForeignKey`` from the "through" model between the ``Book`` and ``Tag`` models
To define boundaries, a new class inheriting from ``Graph`` must be defined, and its
``is_relation_boundary`` must be overridden:
.. code-block:: python
from django_instances_graph import Graph
from django_instances_graph.utils import is_through_model
class BookGraph(Graph):
def is_relation_boundary(self, instance, accessor_name, field, field_type):
# The book author is a boundary
if instance.model is Book and accessor_name == 'author':
return True
# The tag of a book <=> tag through is a boundary. We don't block on the m2m field
# because we don't want the through entries to be the boundaries, but the tags
# book --- [not boundary ] --- through model ---- [ boundary] --- tag
if is_through_model(Book, 'tags', instance.model) and accessor_name == 'tag':
return True
return False
Now we can do the serialization and check that the boundaries are correctly set:
.. code-block:: python
graph = BookGraph()
Instance(book1, serialize=True, graph=graph)
assert graph.get_instance(Author, author.pk).is_boundary
for tag in book1.tags.all():
assert graph.get_instance(Tag, tag.pk).is_boundary
What is done when an ``Instance`` is marked as boundary:
- it has a ``is_boundary`` attribute set to ``True``
- the ``Instance`` is created on the graph, and if it's created automatically by the serialization
of another model, only its simple fields will be serialized
- in the case of it is not serialized, no relations are created in the graph starting from it
This will be used in the deserializing process, for example when we want to duplicate a graph, as
we'll see below.
Note that when creating ``Instance`` objects manually (ie not from by just creating one and let the
graph create the other during the serialization process), it is possible to set it as boundary too:
.. code-block:: python
graph = BookGraph()
author_instance = Instance(Author, graph=graph, is_boundary=True)
assert graph.get_instance(author_instance.uuid).is_boundary
Deserialization
+++++++++++++++
The deserialization is the process of converting ``Instance`` objects of a graph, and their
relations, into real django model instances, saved in database.
Deserializing an instance
-------------------------
If the ``Instance`` objects have primary keys, the objects in database will be updated. In the
other case, they will be created.
Not from the database
.. code-block:: python
graph = BookGraph()
author_instance = Instance(Author, graph=graph)
author_instance.fields['name'] = 'john'
author = author_instance.deserialize()
assert isinstance(author, Author)
assert author.pk is not None
assert author.name == 'john'
# The instance has a pk now
assert author_instance.pk == author.pk
Note that you can pass the django model instance that will hold the deserialized data:
.. code-block:: python
graph = BookGraph()
author_instance = Instance(Author, graph=graph)
author_instance.fields['name'] = 'john'
blank_author = Author()
author = author_instance.deserialize(blank_author)
assert isinstance(author, Author)
assert author is blank_author
assert author.pk is not None
assert author.name == 'john'
And now with an existing object from the database:
.. code-block:: python
graph = BookGraph()
original_author = Author.objects.create(name='john')
author_instance = Instance(original_author, graph=graph, serialize=True)
# later
author_instance.fields['name'] = 'peter'
author = author_instance.deserialize()
assert author.pk == original_author.pk
assert author.name == 'peter'
Deserializing a graph
---------------------
This is the most interesting part of this library. It allows, for example:
- to create objects and their relations in a first time, then save the whole in database at the end
(which is not possible with django model instances as the instances must be saved to create
relations between them)
- to extract some data from the database and duplicate them
Start by creating some instances, not in database, and their relations
.. code-block:: python
graph = BookGraph()
# Two things to notice:
# - We pass ``serialize=True`` to save the fields
# - We set the boundaries manually as the boundaries can only be defined automatically
# in the full serialization process from django model instances, which we don't have here
author_instance = Instance(Author(name='john'), graph=graph, serialize=True,
is_boundary=True)
book_instance = Instance(Book(name='my book'), graph=graph, serialize=True)
tag1_instance = Instance(Tag(name='tag1'), graph=graph, serialize=True, is_boundary=True)
tag2_instance = Instance(Tag(name='tag2'), graph=graph, serialize=True, is_boundary=True)
book_instance.add_relation('author', author_instance)
book_instance.add_direct_m2m_relation('tags', [tag1_instance, tag2_instance])
Now we can deserialize the whole graph by simple deserializing one ``Instance``:
.. code-block:: python
book = book_instance.deserialize()
assert book.author.name == 'john'
assert set(book.tags.values_list('name', flat=True)) == {'tag1', 'tag2'}
It's done, the whole graph is saved in database.
Duplicating a graph
-------------------
Duplicating a graph is simple. What we want is to create new objects and relations in database, the
same we have in the graph, but, obviously, with different primary keys.
It's very simple, as the ``Graph`` class provides a ``clear_pks`` method.
So, following the previous deserialization just above, we can do:
.. code-block:: python
graph.clear_pks()
book2 = book_instance.deserialize()
assert book2.pk != book.pk
assert book2.author.name == 'john'
assert set(book2.tags.values_list('name', flat=True)) == {'tag1', 'tag2'}
# 1 author for both, because as a boundary the author is not deserialized if it exist in db
assert book2.author_id == book.author_id
# and same for the tags
book1_tags = set(book.tags.values_list('pk', flat=True))
book2_tags = set(book2.tags.values_list('pk', flat=True))
assert book1_tags == book2_tags
Note that between the call to ``clear_pks`` and ``deserialize``, it is possible to update the graph.
For example:
.. code-block:: python
graph.clear_pks()
# Change a field
book_instance.fields['name'] = 'new book'
# Remove a relation
book_instance.remove_m2m_relation('tags', [tag2_instance])
# And add another
tag3_instance = Instance(Tag(name='tag3'), graph=graph, serialize=True, is_boundary=True)
book_instance.add_direct_m2m_relation('tags', [tag3_instance])
book3 = book_instance.deserialize()
assert book3.name == 'new book'
assert book3.author.name == 'john'
assert set(book3.tags.values_list('name', flat=True)) == {'tag1', 'tag3'}
Overriding
++++++++++
There is two concepts of overriding we'll see in this section: class inheritance, to override
the ``Graph`` and ``Instance`` classes, and changing the values and relations to save it in graph
during the serialization or retrieved from it during the deserialization.
Class inheritance
-----------------
Inherit from Graph
^^^^^^^^^^^^^^^^^^
You can easily inherit from the ``Graph`` class if you want to change its behaviour. But in this
case, don't forget to always create your ``Graph`` instance manually and pass it to each
``Instance`` because if an instance has no ``graph`` passed to its constructor, it will create one
using the default ``Graph`` class.
What you can do by creating your own ``Graph`` subclass:
- define a default class to use for instances, instead of ``Instance`` (which is the default):
.. code-block:: python
from django_instances_graph import Graph, Instance
class MyDefaultInstance(Instance):
pass
class MyGraph(Graph):
default_instance_class = MyDefaultInstance
graph = MyGraph()
instance = Instance(Author, graph=graph)
assert isinstance(instance, MyDefaultInstance)
- define which subclass of ``Instance`` to use for the model you want. If there is no ``Instance``
class defined for a model, the one defined in ``default_instance_class`` will be used.
.. code-block:: python
class AuthorInstance(Instance):
pass
class BookInstance(Instance):
pass
class MyGraph(Graph):
default_instance_class = MyDefaultInstance
instance_classes_for_models = {
Author: AuthorInstance,
Book: BookInstance,
}
graph = MyGraph()
book_instance = Instance(Book, graph=graph)
assert isinstance(book_instance, BookInstance)
author_instance = Instance(Author, graph=graph)
assert isinstance(author_instance, AuthorInstance)
tag_instance = Instance(Tag, graph=graph)
assert isinstance(tag_instance, MyDefaultInstance)
Another way to do this if you "own" your models, is to add a ``graph_instance_class`` attribute
to your model, setting it to the subclass of ``Instance`` you want. Of course you cannot do this on
model you don't own (ie from external applications)
Inherit from Instance
^^^^^^^^^^^^^^^^^^^^^
You may want to inherit from ``Instance`` for example to change the ``__repr__`` method, or to
override the ``override_value_to_serialize`` and ``override_value_to_deserialize`` methods, as we'll
see in the next sub-section.
Value overriding
----------------
(For full example on how to override the serialized or deserialized value for all kind of fields,
you can check the ``OverrideInstance`` class in the ``tests/models.py`` file)
Serialization
^^^^^^^^^^^^^
During the serialization you may want to change on the fly the simple values to save in the
``fields`` attribute of an ``Instance``, or its relations to other objects.
You have one entry point for this on the ``Instance`` class, where you can add your own logic in a
subclass. You return simple values for simple fields, and a django model instance (or list of) for
relations (because when serializing, we convert values from the django model instances to our own
``Instance`` objects, and here we just intercept the values from the django model instances).
Here is an example if we want to change a simple value, a relation, and a many-to-many relation:
.. code-block:: python
from django_instances_graph import Instance
from django_instances_graph.utils import get_through_info
class BookInstance(Instance):
def override_value_to_serialize(self, model_instance, field, accessor_name, field_type,
value):
# ``value`` is the value got from the serialized book (accessible via
# ``model_instance``), but you can also get it by calling ``super``:
value = super().override_value_to_serialize(model_instance, field, accessor_name,
field_type, value)
# For a simple field
if accessor_name == 'name':
# Add a number to the book's name
return '%s (%s)' % (value, 123)
# For a simple relation
elif accessor_name == 'author':
# Change the author
value = Author(name='new author')
# For a many-to-many
elif accessor_name == 'tags':
# Add a new tag: a list of instances from the "through" model is expected
through_model = get_through_info(Book, 'tags')[0]
value = list(value) + [
through_model(
book=model_instance,
tag=Tag(name='new tag'),
)
]
# always return the original value if you don't change it
return value
# This should be defined in the model declaration
Book.graph_instance_class = BookInstance
author = Author.objects.create(name='john')
book = Book.objects.create(name='my book', author=author)
book.tags = [Tag.objects.create(name='tag1'), Tag.objects.create(name='tag2')]
instance = Instance(book, serialize=True)
assert instance.fields['name'] == 'my book (123)'
assert list(instance.get_relation_targets('author'))[0].fields['name'] == 'new author'
tag_instances = instance.get_m2m_relation_targets('tags')
assert set(tag.fields['name'] for tag in tag_instances) == {'tag1', 'tag2', 'new tag'}
Don't forget to also do it on reverse relations to avoid surprises. For example if you return a
different value for a ``OneToOneField``, but the original instance for this field is serialized,
the final relation may not be the one you expect.
Deserialization
^^^^^^^^^^^^^^^
Deserialization is the process of converting ``Instance`` objects and their relations from a
``Graph`` into real django model instances.
There is an entry point on the ``Instance`` class where you can change on the fly the values and
relations that will be used instead of the ones on the ``Graph``.
Note that the expected return value of this method, is simple values for simple fields, and
``Instance`` objects for targets of relations.
Here is an example if we want to change a simple value, a relation, and a many-to-many relation:
.. code-block:: python
from django_instances_graph import Instance
from django_instances_graph.utils import get_through_info
class BookInstance(Instance):
def override_value_to_deserialize(self, field, accessor_name, field_type, value,
model_instance):
# ``value`` is the value got from the deserialized book (accessible via
# ``self``), but you can also get it by calling ``super``:
value = super().override_value_to_deserialize(field, accessor_name, field_type, value,
model_instance)
# For a simple field
if accessor_name == 'name':
# Add a number to the book's name
return '%s (%s)' % (value, 456)
# For a simple relation
elif accessor_name == 'author':
self.clear_relation('author') # it's important, but note that it changes the graph
value = Instance(Author(name='the author'), graph=self.graph, serialize=True)
# For a many-to-many
elif accessor_name == 'tags':
# Add a new tag: a list of ``Instance`` from the "through" model is expected
value = value + self.add_direct_m2m_relation('tags', [
Instance(Tag(name='the tag'), graph=self.graph, serialize=True),
])
# always return the original value if you don't change it
return value
# This should be defined in the model declaration
Book.graph_instance_class = BookInstance
graph = Graph()
book_instance = Instance(Book(name='my book'), graph=graph, serialize=True)
author_instance = Instance(Author(name='john'), graph=graph, serialize=True)
book_instance.add_relation('author', author_instance)
book_instance.add_direct_m2m_relation('tags', [
Instance(Tag(name='tag1'), graph=graph, serialize=True),
Instance(Tag(name='tag2'), graph=graph, serialize=True),
])
book = book_instance.deserialize()
assert book.name == 'my book (456)'
assert book.author.name == 'the author'
assert set(book.tags.values_list('name', flat=True)) == {'tag1', 'tag2', 'the tag'}
Other goodies
+++++++++++++
Serializing the graph
---------------------
What? Serializing the graph? But we just did it above!!
No, we serialized the instances of the graph, but this is about serializing a ``Graph`` object.
What is the purpose of this?
Let's imagine you want to duplicate many times a graph, and because it may be costly, you do it
asynchronously by using, for example, celery.
But before doing the duplicate, which is a deserialization, you must fill the graph. Which is done
by fetching data from the database.
What if we can serialize the state of the graph and simply store it, or pass it, or do whatever we
want with?
Thanks ``pickle``... We made the ``Graph`` and ``Instance`` objects "pickle-ready".
So, to serialize a ``Graph`` object:
.. code-block:: python
# Reset the instance class used by the book as it cannot be pickled from the readme file!
Book.graph_instance_class = None
import pickle
graph = Graph()
author = Author.objects.create(name='john')
book = Book.objects.create(name='my book', author=author)
book.tags = [Tag.objects.create(name='tag1'), Tag.objects.create(name='tag2')]
instance = Instance(book, serialize=True, graph=graph)
pickled_graph = pickle.dumps(graph)
# later, we get back the pickled graph, and we must also have the book's primary key
book_pk = book.pk # in practice, we get it another way
graph = pickle.loads(pickled_graph)
book_instance = graph.get_instance(Book, book_pk)
assert book_instance.fields['name'] == 'my book'
author_instance = list(book_instance.get_relation_targets('author'))[0]
assert author_instance.fields['name'] == 'john'
tag_instances = book_instance.get_m2m_relation_targets('tags')
assert set(t.fields['name'] for t in tag_instances) == {'tag1', 'tag2'}
If you want to add some data to the serialized graph, simply override the ``__getstate__`` and
``__setstate__`` methods in your subclass.
Cloning a graph
---------------
Say you created a graph with objects from the database. And you will update it to do some
manipulations, but still want to keep the original graph.
For this, you can call the ``clone`` method of a ``Graph`` object, that will create a new graph
keeping the same instances and relations as the original one. Then, you can keep the original and
update the clone (or the reverse if you want).
.. code-block:: python
cloned_graph = graph.clone()
assert cloned_graph is not graph
assert {inst.uuid for inst in cloned_graph.instances.values()} == \
{inst.uuid for inst in graph.instances.values()}
book_instance = cloned_graph.get_instance(Book, book_pk)
assert book_instance.fields['name'] == 'my book'
author_instance = list(book_instance.get_relation_targets('author'))[0]
assert author_instance.fields['name'] == 'john'
tag_instances = book_instance.get_m2m_relation_targets('tags')
assert set(t.fields['name'] for t in tag_instances) == {'tag1', 'tag2'}
Cloning uses "pickle" under the hood, so you can use the same way of overriding the graph as defined
above to add more information to the data to serialize.
About through models
--------------------
We talked a lot about the "through" model which is the model between the both sides of a
"many to many" relations.
When this "through" model is manually created and defined in the ``ManyToManyField``, you already
have all the information you may need.
But for auto-created "through" models, it's not so easy.
For this, and because we use this a lot in the code, we provide two utils functions in
``django_instances_graph.utils``:
- ``is_through_model(source_model, accessor_name, through_model)``
This function simply tells if a given model is the "through" model for a relation:
.. code-block:: python
from django.apps import apps
from django_instances_graph.utils import is_through_model
# Get the "through" model from django
books_tags_through_model = apps.get_model('tests', 'Book_tags')
# In the normal direction
assert is_through_model(Book, 'tags', Author) is False
assert is_through_model(Book, 'tags', books_tags_through_model) is True
# But also in the reverse direction
assert is_through_model(Tag, 'books', Author) is False
assert is_through_model(Tag, 'books', books_tags_through_model) is True
- ``get_through_info(model, accessor_name)``
This function returns some useful information about the "through" model for the given many-to-many
relation:
.. code-block:: python
from django_instances_graph.utils import get_through_info
through_model, through_source_field, through_target_field, target_model, \
target_accessor_name = get_through_info(Book, 'tags')
assert through_model is books_tags_through_model
# the field on the through model which links to the source model (passed in argument)
assert through_source_field.name == 'book'
# the field on the through model which links to the target model (on the other side of the m2m)
assert through_target_field.name == 'tag'
# the django model on the other side of the m2m relation
assert target_model is Tag
# the name of the attribute on the target model to access the m2m field
assert target_accessor_name == 'books'
For ``through_source_field`` and ``through_target_field``, use ``.name`` to have the accessor
name from the "through" model to the model as seen in the example above, and
``.related.get_accessor_name()`` to have the accessor name from the model to the "through" model:
.. code-block:: python
assert through_source_field.remote_field.get_accessor_name() == 'Book_tags+'
assert through_target_field.remote_field.get_accessor_name() == 'Book_tags+'
As you can see this the name ending with `+`, you cannot use this name to access "through" entries
from the django model instances (it's only true for auto created "through" models, though), but
you can use it to get relations from the ``Instance`` objects:
.. code-block:: python
through_instances = list(book_instance.get_relation_targets('Book_tags+'))
assert through_instances[0].model is books_tags_through_model
tag_instance = list(tag_instances)[0]
through_instances = list(tag_instance.get_relation_targets('Book_tags+'))
assert through_instances[0].model is books_tags_through_model
Of course all of this works on the other side too:
.. code-block:: python
through_model, through_source_field, through_target_field, target_model, \
target_accessor_name = get_through_info(Tag, 'books')
assert through_model is books_tags_through_model
assert through_source_field.name == 'tag'
assert through_target_field.name == 'book'
assert target_model is Book
assert target_accessor_name == 'tags'
Extensions
==========
Some extensions are currently being written. They will all be available as python packages
prefixed with ``dig-`` (d.i.g for Django Instances Graph).
What you'll soon be able to use:
- ``dig-duplicate``
an extension of what is currently possible about duplicating a graph, but with references to
parent objects, and merge capabilities between two duplicates
- ``dig-visualization``
an extension to have a visual representation of a graph, using ``graphviz``
Installation
============
The ``django_instances_graph`` package is only available on the Magency private pypi server.
.. code-block:: sh
pip install -i https://login:password@pypi.magency.ninja/some/index django_instances_graph
Development
===========
The code can be found on the `Magency mtp-back-modules repository
<https://gitlab.com/magency/products/mtp-back-modules/-/tree/master/django-instances-graph>`_
Install the required packages:
.. code-block:: sh
pip install -i https://login:password@pypi.magency.ninja/some/index -e .[dev]
To run tests, simply launch the ``runtests.sh`` script.
And for pylint:
.. code-block:: sh
PYTHONPATH="$PYTHONPATH:." pylint django_instances_graph tests
When ready, update the version in ``setup.cfg`` then create the package:
.. code-block:: sh
./setup.py sdist bdist_wheel
You can now upload it to ``devpi``:
.. code-block:: sh
devpi use https://login:password@pypi.magency.ninja
devpi login yourlogin
devpi use yourlogin/dev
devpi upload dist/django_instances_graph-VERSION*
Support
=======
python>=3.6
django>=2.2
Render warnings:
<string>:48: (ERROR/3) Unknown directive type "testsetup".
.. testsetup::
# This prepare the django environment to run all the "testcode" in this file
# This is not visible in the rendered README
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'tests.settings'
import django
django.setup()
from django.db import connection
connection.creation.create_test_db(verbosity=0)
from tests.models import Author, Book, Tag, Translation
django_instances_graph
======================
Purpose
=======
The purpose of the ``django_instances_graph`` library is to create a graph of django model instances
with their relations. This graph can then be serialized via pickle, updated manually, and/or used
to create a duplicate of the data used to fill it.
How it works
============
The whole logic is contained in two classes, explicitly named ``Graph`` and ``Instance``.
A few points to see how these two classes are tied together:
- all ``Instance`` have a ``uuid`` (which is NOT the primary key of the tied django model instance)
- all ``Instance`` holds in its ``fields`` attribute (a dict) all its simple values
- a ``Graph`` has an ``instances`` attribute, holding all its instances. It's a dict with the
``uuid`` as keys, and the matching ``Instance`` as values.
- a ``Graph`` has a ``relations`` attribute.
What we call a relation in this library, is a direct link between two instances.
We are closer to the way we can think in database than in Django, because there is no notions of
"many to many" here, because a "many to many" relation is simply a relation between entries in a
"through" tables, and the entries on the both sides of the "many to many".
To summarize, all relations in db, and in django, and then in this libraries, are "foreign key".
So we store the relations in a dict, with keys being the uuid of the instances declaring the
relation. Then the values are also a dict, with keys being the name of the foreign key relation.
Then as values we have sets, with the UUIDs of the instances on the other side of the relation.
And to make this complete and more usable, we also store the relation on the other side.
Creating a graph is as simple as creating an instance, asking the graph to "serialize" it, as seen
in further sections.
THE API
=======
For the examples, will use these models:
.. testsetup::
# This prepare the django environment to run all the "testcode" in this file
# This is not visible in the rendered README
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'tests.settings'
import django
django.setup()
from django.db import connection
connection.creation.create_test_db(verbosity=0)
from tests.models import Author, Book, Tag, Translation
.. code:: python
class Tag(models.Model):
name = models.CharField(max_length=10)
class Author(models.Model):
name = models.CharField(max_length=10)
class Book(models.Model):
name = models.CharField(max_length=10)
author = models.ForeignKey(Author, related_name='books', on_delete=models.CASCADE)
tags = models.ManyToManyField(Tag, related_name='books')
translator = models.ManyToManyField(Author, through="Translation",
related_name="translated_books")
class Translation(models.Model):
book = models.ForeignKey(Book, related_name="translations", on_delete=models.CASCADE)
author = models.ForeignKey(Author, related_name="translations", on_delete=models.CASCADE)
lang = models.CharField(max_length=2)
Also note that all the examples in this documentation are guaranteed to work: they are tested
in ``tests/test_readme_examples.py`` file, and they can be run one after the other
Creating objects
++++++++++++++++
Creating a graph
----------------
This is as simple as:
.. code-block:: python
from django_instances_graph import Graph
graph = Graph()
The ``Graph`` constructor doesn't expect any argument.
Creating an instance
--------------------
First thing to now: if you don't pass an existing graph to the ``Instance`` constructor, a new one
will be created, and will be accessible from the ``graph`` attribute of the created ``Instance``.
So be careful when creating many ``Instance`` objects to always pass a ``Graph``, the best being
to create if before.
The main argument to the ``Instance`` constructor is the ``source``.
You can pass it a Django model:
.. code-block:: python
from django_instances_graph import Instance
author_instance = Instance(Author) # or ``source=Author``
assert author_instance.pk is None
In this case no primary key (accessible via the ``pk`` attribute of the created ``Instance``) will
be saved in the new object.
You can also pass it a Django model instance, not in database, and it will have the same effect as
passing a model (except for one big detail that will see later):
.. code-block:: python
from django_instances_graph import Instance
author = Author(name='john')
author_instance = Instance(author) # or ``source=author``
assert author_instance.pk is None
And of course, you can pass a Django model instance from the database:
.. code-block:: python
from django_instances_graph import Instance
author = Author.objects.create(name='john')
author_instance = Instance(author) # or ``source=author``
assert author_instance.pk == author.pk
In this case, the primary key is saved in the newly created object.
Retrieving an instance from the graph
-------------------------------------
Each instance is created with a ``UUID`` (version 4).
Then you can retrieve the instance using it:
.. code-block:: python
from django_instances_graph import Graph, Instance
graph = Graph()
author = Author.objects.create(name='john')
author_instance = Instance(author, graph=graph)
author_uuid = author_instance.uuid
# later
author_instance = graph.get_instance(author_uuid)
assert author_instance.pk == author.pk
Note that you can set yourself the uuid:
.. code-block:: python
from uuid import uuid4
from django_instances_graph import Graph, Instance
graph = Graph()
author_uuid = uuid4()
author = Author.objects.create(name='john')
author_instance = Instance(author, graph=graph, uuid=author_uuid)
assert author_instance.uuid == author_uuid
If the instance has a pk, you can also retrieve it via it's model + pk:
.. code-block:: python
from django_instances_graph import Graph, Instance
graph = Graph()
author = Author.objects.create(name='john')
author_instance = Instance(author, graph=graph)
author_uuid = author_instance.uuid
# later
author_instance = graph.get_instance(Author, author.pk)
assert author_instance.uuid == author_uuid
Trying to get an instance that does not exist in the graph will raise a ``KeyError``.
Checking if an instance exits in the graph
------------------------------------------
You can do the check by uuid of by model + pk
.. code-block:: python
from uuid import uuid4
from django_instances_graph import Graph, Instance
graph = Graph()
uuid = uuid4()
assert graph.has_instance(uuid) is False
author = Author.objects.create(name='john')
author_instance = Instance(author, graph=graph, uuid=uuid)
assert author_instance.uuid == uuid
assert graph.has_instance(uuid) is True
assert graph.has_instance(Author, author.pk) is True
Saving simple fields in instances
---------------------------------
Each ``Instance`` object has ``fields`` dictionary to hold simple fields values (simple fields
are every fields that are not relations to another model: ``CharField``, ``IntegerField``...)
.. code-block:: python
from django_instances_graph import Graph, Instance
graph = Graph()
author = Instance(Author, graph=graph)
author.fields['name'] = 'john'
assert graph.get_instance(author.uuid).fields['name'] == 'john'
We'll see later that these fields can be automatically filled during the serializing process.
Adding a simple relation
------------------------
The serializing process will add the relation itself. But you can create ones manually.
Remember that a relation is a relation between two instances, via a ``ForeignKey`` or
``OneToOneField`` (which is a sort of ``ForeignKey``)
For example we have a ``ForeignKey`` between ``Book`` and ``Author``, so we can do this:
.. code-block:: python
from django_instances_graph import Graph, Instance
graph = Graph()
author = Instance(Author, graph=graph)
book = Instance(Book, graph=graph)
book.add_relation('author', author)
# or book.add_relation(accessor_name='author', target=author)
# or graph.add_relation(book, 'author', author)
# or graph.add_relation(source=book, accessor_name='author', target=author)
The ``add_relation`` method will check that ``author`` is a correct field type, and that the target
is from the expected model (or a ``ValueError`` will be raised)
This relation could have been added in the opposite way (using the ``related_name`` as the accessor
name):
.. code-block:: python
author.add_relation('books', book)
Adding a M2M relation
---------------------
To add a ``ManyToMany`` relation, we have to distinguish two cases: either the relation has a
auto-created "through" model, or not.
In the first case, it's easy, because the ``Instance`` class has a ``add_direct_m2m_relation``
method, to use this way:
.. code-block:: python
tag1 = Instance(Tag, graph=graph)
tag2 = Instance(Tag, graph=graph)
book.add_direct_m2m_relation('tags', [tag1, tag2])
# or book.add_direct_m2m_relation(accessor_name='tags', targets=[tag1, tag2])
# or graph.add_direct_m2m_relation(source=book, accessor_name='tags', targets=[tag1, tag2])
When a ``ManyToMany`` "entry" is created, what happens in Django is that there is a "through"
model in the middle that has two ``ForeignKey``: one on each side, ie in our case, one to the
``Book`` model, and one to the ``Tag`` model.
The ``add_direct_m2m_relation`` creates the relation in the graph from this "through" model to the
source and the targets. So in our example, we have two "through" entries and it will create 4
relations:
- "through1" to "book"
- "through1" to "tag1"
- "through2" to "book"
- "through2" to "tag2"
Note that the ``add_direct_m2m_relation`` method also returns ``Instance`` objects of the "through"
model, one for each target, in the same order. And it's also important to know that it will not
replace the existing relations, but will just add the specified ones.
This is more complicated when the "through" model is manually defined in the definition of the
``ManyToManyField`` because there is, in general, additional fields, and the graph cannot "guess"
them, so the whole work has to be done manually.
We can see this in an example with the "translators" ``ManyToManyField``
.. code-block:: python
translator1 = Instance(Author, graph=graph)
translator2 = Instance(Author, graph=graph)
# for book => translator1
translation1 = Instance(Translation, graph=graph)
translation1.add_relation('book', book)
translation1.add_relation('author', translator1)
# for book => translator2
translation2 = Instance(Translation, graph=graph)
translation2.add_relation('book', book)
translation2.add_relation('author', translator2)
Retrieving relations from the graph
-----------------------------------
What we want is not to retrieve the relation itself, but the targets of the relation:
To get the author of the book:
.. code-block:: python
author2 = list(book.get_relation_targets('author'))[0]
# or author2 = list(graph.get_relation_targets(source=book, accessor_name='author'))[0]
assert author2 is author
Of course here we have a ``ForeignKey`` so we *should* only have one entry. But we could also have
zero, and in this case an ``IndexError`` will be raised.
It is also possible to get all the books for an author:
.. code-block:: python
books = author.get_relation_targets('books')
assert books == {book}
Here we can see that it makes sense that this method returns a list (in fact, it's a ``set``, so not
ordered).
Note that what is returned are ``Instance`` objects, not instances of the django model.
And to retrieve the targets of a ``ManyToMany``:
.. code-block:: python
tags = book.get_m2m_relation_targets('tags')
# or tags = graph.get_m2m_relation_targets(source=book, accessor_name='tags')
assert tags == {tag1, tag2}
And to get all the books for a tag:
.. code-block:: python
books = tag1.get_m2m_relation_targets('books')
assert books == {book}
In contrary to the ``add_direct_m2m_relation``, this method works for both auto "through" and
manually defined ones, because we want just the targets (it's why there is no "direct" in the name
of this method)
If a manual "through" was defined, to get the "through" entries, simply use the
``get_relation_targets`` method. With our previous example, it should be:
.. code-block:: python
translations = book.get_relation_targets('translations')
assert translations == {translation1, translation2}
Removing relations from the graph
---------------------------------
To remove a direct relation:
.. code-block:: python
book.remove_relation('author', author)
# or graph.remove_relation(book, 'author', author)
assert book.get_relation_targets('author') == set()
And for a ``ManyToMany``:
.. code-block:: python
book.remove_m2m_relation('tags', [tag1, tag2])
# or graph.remove_m2m_relation(book, 'tags', [tag1, tag2])
assert book.get_relation_targets('tags') == set()
Note that this will not remove all the existing relations, but only the specified ones.
To remove all the relations, you can do, for a direct relation:
.. code-block:: python
book.add_relation('author', author) # just to have something
book.clear_relation('author')
# or graph.clear_relation(book, 'author')
assert book.get_relation_targets('author') == set()
And for a ``ManyToMany``:
.. code-block:: python
book.add_direct_m2m_relation('tags', [tag1, tag2]) # just to have something
book.clear_m2m_relation('tags')
# or graph.clear_m2m_relation(book, 'tags')
assert book.get_m2m_relation_targets('tags') == set()
``remove_m2m_relation`` and ``clear_m2m_relation`` accept a ``remove_through_instances``, default
to ``True``, that will remove from the graph the "through" entries of the removed relations.
It can be useful to pass it to ``False`` with manual "through" then there is other fields or other
relations going from or to it.
Also note that these two methods return these "through" instances (even if they are removed from the
graph, they still exist as ``Instance`` objects).
Serialization
+++++++++++++
The serialization is the process and converting a django model instance, to an ``Instance`` of a
graph, saving its fields and relations.
Serializing an instance
-----------------------
Now that we know how to create the graph, instances and relations manually, let's see how to do it
automatically.
First, we can serialize just an instance, not saved in database.
.. code-block:: python
from django_instances_graph import Instance
author = Author(name='john')
author_instance = Instance(author, serialize=True)
assert author_instance.fields['name'] == 'john'
assert author_instance.serialized is True
We can also do it in two steps if for example we used a model as the ``Instance`` source.
.. code-block:: python
from django_instances_graph import Instance
author_instance = Instance(Author)
author_instance.serialize(Author(name='john'))
assert author_instance.fields['name'] == 'john'
assert author_instance.serialized is True
Yes, ``serialize`` expect an instance of the django model, because if an instance of such a model
is passed to the ``Instance`` constructor, it is *not* saved in the ``Instance`` object.
Serializing a graph
-------------------
Serializing an instance is ok, but it's not really what this library is about. We want to
serialize the whole graph, from a starting point.
Let's see how to auto-create the instances and relations from the database.
If we want the whole objects related to a book in the database, it's as simple as passing the
``serialize`` argument to ``True`` when creating an instance:
.. code-block:: python
from django_instances_graph import Instance
author = Author.objects.create(name='author1')
tag1 = Tag.objects.create(name='tag1')
tag2 = Tag.objects.create(name='tag2')
tag3 = Tag.objects.create(name='tag3')
book1 = Book.objects.create(name='book 1', author=author)
book1.tags = [tag1, tag2]
book2 = Book.objects.create(name='book 2', author=author)
book2.tags = [tag1, tag3]
book_instance = Instance(book1, serialize=True)
assert book_instance.pk == book1.pk
assert book_instance.fields['name'] == 'book 1'
author_instance = list(book_instance.get_relation_targets('author'))[0]
assert author_instance.fields['name'] == 'author1'
tag_instances = book_instance.get_m2m_relation_targets('tags')
assert set(t.fields['name'] for t in tag_instances) == {'tag1', 'tag2'}
Note that we didn't set the ``graph`` argument, so we can get it back using ``book_instance.graph``.
But it could of course have been defined before and passed to the ``Instance`` constructor, as
seen before.
What is done by passing ``serialize=True``:
- all the simple fields are saved in ``book_instance.fields``
- all the relations from "book1" to any related model are created
- all "any related model" have their own ``Instance`` in the graph, also serialized, ie their simple
fields but their relations too.
- this is done recursively until there is no more relations to follow.
So we'll have the book, it's author, it's tags. But we'll also have the other books of the authors,
and their tags too, and all the books for all the tags.
Maybe it's that you want but there is a chance that it's not the case.
For this, let's introduce what we call "boundaries".
Defining boundaries
-------------------
In our example, we just want to serialize the book, its relations to an author and to its tags.
So, the boundaries are:
- the "author" ``ForeignKey`` from the ``Book`` model
- the "tag" ``ForeignKey`` from the "through" model between the ``Book`` and ``Tag`` models
To define boundaries, a new class inheriting from ``Graph`` must be defined, and its
``is_relation_boundary`` must be overridden:
.. code-block:: python
from django_instances_graph import Graph
from django_instances_graph.utils import is_through_model
class BookGraph(Graph):
def is_relation_boundary(self, instance, accessor_name, field, field_type):
# The book author is a boundary
if instance.model is Book and accessor_name == 'author':
return True
# The tag of a book <=> tag through is a boundary. We don't block on the m2m field
# because we don't want the through entries to be the boundaries, but the tags
# book --- [not boundary ] --- through model ---- [ boundary] --- tag
if is_through_model(Book, 'tags', instance.model) and accessor_name == 'tag':
return True
return False
Now we can do the serialization and check that the boundaries are correctly set:
.. code-block:: python
graph = BookGraph()
Instance(book1, serialize=True, graph=graph)
assert graph.get_instance(Author, author.pk).is_boundary
for tag in book1.tags.all():
assert graph.get_instance(Tag, tag.pk).is_boundary
What is done when an ``Instance`` is marked as boundary:
- it has a ``is_boundary`` attribute set to ``True``
- the ``Instance`` is created on the graph, and if it's created automatically by the serialization
of another model, only its simple fields will be serialized
- in the case of it is not serialized, no relations are created in the graph starting from it
This will be used in the deserializing process, for example when we want to duplicate a graph, as
we'll see below.
Note that when creating ``Instance`` objects manually (ie not from by just creating one and let the
graph create the other during the serialization process), it is possible to set it as boundary too:
.. code-block:: python
graph = BookGraph()
author_instance = Instance(Author, graph=graph, is_boundary=True)
assert graph.get_instance(author_instance.uuid).is_boundary
Deserialization
+++++++++++++++
The deserialization is the process of converting ``Instance`` objects of a graph, and their
relations, into real django model instances, saved in database.
Deserializing an instance
-------------------------
If the ``Instance`` objects have primary keys, the objects in database will be updated. In the
other case, they will be created.
Not from the database
.. code-block:: python
graph = BookGraph()
author_instance = Instance(Author, graph=graph)
author_instance.fields['name'] = 'john'
author = author_instance.deserialize()
assert isinstance(author, Author)
assert author.pk is not None
assert author.name == 'john'
# The instance has a pk now
assert author_instance.pk == author.pk
Note that you can pass the django model instance that will hold the deserialized data:
.. code-block:: python
graph = BookGraph()
author_instance = Instance(Author, graph=graph)
author_instance.fields['name'] = 'john'
blank_author = Author()
author = author_instance.deserialize(blank_author)
assert isinstance(author, Author)
assert author is blank_author
assert author.pk is not None
assert author.name == 'john'
And now with an existing object from the database:
.. code-block:: python
graph = BookGraph()
original_author = Author.objects.create(name='john')
author_instance = Instance(original_author, graph=graph, serialize=True)
# later
author_instance.fields['name'] = 'peter'
author = author_instance.deserialize()
assert author.pk == original_author.pk
assert author.name == 'peter'
Deserializing a graph
---------------------
This is the most interesting part of this library. It allows, for example:
- to create objects and their relations in a first time, then save the whole in database at the end
(which is not possible with django model instances as the instances must be saved to create
relations between them)
- to extract some data from the database and duplicate them
Start by creating some instances, not in database, and their relations
.. code-block:: python
graph = BookGraph()
# Two things to notice:
# - We pass ``serialize=True`` to save the fields
# - We set the boundaries manually as the boundaries can only be defined automatically
# in the full serialization process from django model instances, which we don't have here
author_instance = Instance(Author(name='john'), graph=graph, serialize=True,
is_boundary=True)
book_instance = Instance(Book(name='my book'), graph=graph, serialize=True)
tag1_instance = Instance(Tag(name='tag1'), graph=graph, serialize=True, is_boundary=True)
tag2_instance = Instance(Tag(name='tag2'), graph=graph, serialize=True, is_boundary=True)
book_instance.add_relation('author', author_instance)
book_instance.add_direct_m2m_relation('tags', [tag1_instance, tag2_instance])
Now we can deserialize the whole graph by simple deserializing one ``Instance``:
.. code-block:: python
book = book_instance.deserialize()
assert book.author.name == 'john'
assert set(book.tags.values_list('name', flat=True)) == {'tag1', 'tag2'}
It's done, the whole graph is saved in database.
Duplicating a graph
-------------------
Duplicating a graph is simple. What we want is to create new objects and relations in database, the
same we have in the graph, but, obviously, with different primary keys.
It's very simple, as the ``Graph`` class provides a ``clear_pks`` method.
So, following the previous deserialization just above, we can do:
.. code-block:: python
graph.clear_pks()
book2 = book_instance.deserialize()
assert book2.pk != book.pk
assert book2.author.name == 'john'
assert set(book2.tags.values_list('name', flat=True)) == {'tag1', 'tag2'}
# 1 author for both, because as a boundary the author is not deserialized if it exist in db
assert book2.author_id == book.author_id
# and same for the tags
book1_tags = set(book.tags.values_list('pk', flat=True))
book2_tags = set(book2.tags.values_list('pk', flat=True))
assert book1_tags == book2_tags
Note that between the call to ``clear_pks`` and ``deserialize``, it is possible to update the graph.
For example:
.. code-block:: python
graph.clear_pks()
# Change a field
book_instance.fields['name'] = 'new book'
# Remove a relation
book_instance.remove_m2m_relation('tags', [tag2_instance])
# And add another
tag3_instance = Instance(Tag(name='tag3'), graph=graph, serialize=True, is_boundary=True)
book_instance.add_direct_m2m_relation('tags', [tag3_instance])
book3 = book_instance.deserialize()
assert book3.name == 'new book'
assert book3.author.name == 'john'
assert set(book3.tags.values_list('name', flat=True)) == {'tag1', 'tag3'}
Overriding
++++++++++
There is two concepts of overriding we'll see in this section: class inheritance, to override
the ``Graph`` and ``Instance`` classes, and changing the values and relations to save it in graph
during the serialization or retrieved from it during the deserialization.
Class inheritance
-----------------
Inherit from Graph
^^^^^^^^^^^^^^^^^^
You can easily inherit from the ``Graph`` class if you want to change its behaviour. But in this
case, don't forget to always create your ``Graph`` instance manually and pass it to each
``Instance`` because if an instance has no ``graph`` passed to its constructor, it will create one
using the default ``Graph`` class.
What you can do by creating your own ``Graph`` subclass:
- define a default class to use for instances, instead of ``Instance`` (which is the default):
.. code-block:: python
from django_instances_graph import Graph, Instance
class MyDefaultInstance(Instance):
pass
class MyGraph(Graph):
default_instance_class = MyDefaultInstance
graph = MyGraph()
instance = Instance(Author, graph=graph)
assert isinstance(instance, MyDefaultInstance)
- define which subclass of ``Instance`` to use for the model you want. If there is no ``Instance``
class defined for a model, the one defined in ``default_instance_class`` will be used.
.. code-block:: python
class AuthorInstance(Instance):
pass
class BookInstance(Instance):
pass
class MyGraph(Graph):
default_instance_class = MyDefaultInstance
instance_classes_for_models = {
Author: AuthorInstance,
Book: BookInstance,
}
graph = MyGraph()
book_instance = Instance(Book, graph=graph)
assert isinstance(book_instance, BookInstance)
author_instance = Instance(Author, graph=graph)
assert isinstance(author_instance, AuthorInstance)
tag_instance = Instance(Tag, graph=graph)
assert isinstance(tag_instance, MyDefaultInstance)
Another way to do this if you "own" your models, is to add a ``graph_instance_class`` attribute
to your model, setting it to the subclass of ``Instance`` you want. Of course you cannot do this on
model you don't own (ie from external applications)
Inherit from Instance
^^^^^^^^^^^^^^^^^^^^^
You may want to inherit from ``Instance`` for example to change the ``__repr__`` method, or to
override the ``override_value_to_serialize`` and ``override_value_to_deserialize`` methods, as we'll
see in the next sub-section.
Value overriding
----------------
(For full example on how to override the serialized or deserialized value for all kind of fields,
you can check the ``OverrideInstance`` class in the ``tests/models.py`` file)
Serialization
^^^^^^^^^^^^^
During the serialization you may want to change on the fly the simple values to save in the
``fields`` attribute of an ``Instance``, or its relations to other objects.
You have one entry point for this on the ``Instance`` class, where you can add your own logic in a
subclass. You return simple values for simple fields, and a django model instance (or list of) for
relations (because when serializing, we convert values from the django model instances to our own
``Instance`` objects, and here we just intercept the values from the django model instances).
Here is an example if we want to change a simple value, a relation, and a many-to-many relation:
.. code-block:: python
from django_instances_graph import Instance
from django_instances_graph.utils import get_through_info
class BookInstance(Instance):
def override_value_to_serialize(self, model_instance, field, accessor_name, field_type,
value):
# ``value`` is the value got from the serialized book (accessible via
# ``model_instance``), but you can also get it by calling ``super``:
value = super().override_value_to_serialize(model_instance, field, accessor_name,
field_type, value)
# For a simple field
if accessor_name == 'name':
# Add a number to the book's name
return '%s (%s)' % (value, 123)
# For a simple relation
elif accessor_name == 'author':
# Change the author
value = Author(name='new author')
# For a many-to-many
elif accessor_name == 'tags':
# Add a new tag: a list of instances from the "through" model is expected
through_model = get_through_info(Book, 'tags')[0]
value = list(value) + [
through_model(
book=model_instance,
tag=Tag(name='new tag'),
)
]
# always return the original value if you don't change it
return value
# This should be defined in the model declaration
Book.graph_instance_class = BookInstance
author = Author.objects.create(name='john')
book = Book.objects.create(name='my book', author=author)
book.tags = [Tag.objects.create(name='tag1'), Tag.objects.create(name='tag2')]
instance = Instance(book, serialize=True)
assert instance.fields['name'] == 'my book (123)'
assert list(instance.get_relation_targets('author'))[0].fields['name'] == 'new author'
tag_instances = instance.get_m2m_relation_targets('tags')
assert set(tag.fields['name'] for tag in tag_instances) == {'tag1', 'tag2', 'new tag'}
Don't forget to also do it on reverse relations to avoid surprises. For example if you return a
different value for a ``OneToOneField``, but the original instance for this field is serialized,
the final relation may not be the one you expect.
Deserialization
^^^^^^^^^^^^^^^
Deserialization is the process of converting ``Instance`` objects and their relations from a
``Graph`` into real django model instances.
There is an entry point on the ``Instance`` class where you can change on the fly the values and
relations that will be used instead of the ones on the ``Graph``.
Note that the expected return value of this method, is simple values for simple fields, and
``Instance`` objects for targets of relations.
Here is an example if we want to change a simple value, a relation, and a many-to-many relation:
.. code-block:: python
from django_instances_graph import Instance
from django_instances_graph.utils import get_through_info
class BookInstance(Instance):
def override_value_to_deserialize(self, field, accessor_name, field_type, value,
model_instance):
# ``value`` is the value got from the deserialized book (accessible via
# ``self``), but you can also get it by calling ``super``:
value = super().override_value_to_deserialize(field, accessor_name, field_type, value,
model_instance)
# For a simple field
if accessor_name == 'name':
# Add a number to the book's name
return '%s (%s)' % (value, 456)
# For a simple relation
elif accessor_name == 'author':
self.clear_relation('author') # it's important, but note that it changes the graph
value = Instance(Author(name='the author'), graph=self.graph, serialize=True)
# For a many-to-many
elif accessor_name == 'tags':
# Add a new tag: a list of ``Instance`` from the "through" model is expected
value = value + self.add_direct_m2m_relation('tags', [
Instance(Tag(name='the tag'), graph=self.graph, serialize=True),
])
# always return the original value if you don't change it
return value
# This should be defined in the model declaration
Book.graph_instance_class = BookInstance
graph = Graph()
book_instance = Instance(Book(name='my book'), graph=graph, serialize=True)
author_instance = Instance(Author(name='john'), graph=graph, serialize=True)
book_instance.add_relation('author', author_instance)
book_instance.add_direct_m2m_relation('tags', [
Instance(Tag(name='tag1'), graph=graph, serialize=True),
Instance(Tag(name='tag2'), graph=graph, serialize=True),
])
book = book_instance.deserialize()
assert book.name == 'my book (456)'
assert book.author.name == 'the author'
assert set(book.tags.values_list('name', flat=True)) == {'tag1', 'tag2', 'the tag'}
Other goodies
+++++++++++++
Serializing the graph
---------------------
What? Serializing the graph? But we just did it above!!
No, we serialized the instances of the graph, but this is about serializing a ``Graph`` object.
What is the purpose of this?
Let's imagine you want to duplicate many times a graph, and because it may be costly, you do it
asynchronously by using, for example, celery.
But before doing the duplicate, which is a deserialization, you must fill the graph. Which is done
by fetching data from the database.
What if we can serialize the state of the graph and simply store it, or pass it, or do whatever we
want with?
Thanks ``pickle``... We made the ``Graph`` and ``Instance`` objects "pickle-ready".
So, to serialize a ``Graph`` object:
.. code-block:: python
# Reset the instance class used by the book as it cannot be pickled from the readme file!
Book.graph_instance_class = None
import pickle
graph = Graph()
author = Author.objects.create(name='john')
book = Book.objects.create(name='my book', author=author)
book.tags = [Tag.objects.create(name='tag1'), Tag.objects.create(name='tag2')]
instance = Instance(book, serialize=True, graph=graph)
pickled_graph = pickle.dumps(graph)
# later, we get back the pickled graph, and we must also have the book's primary key
book_pk = book.pk # in practice, we get it another way
graph = pickle.loads(pickled_graph)
book_instance = graph.get_instance(Book, book_pk)
assert book_instance.fields['name'] == 'my book'
author_instance = list(book_instance.get_relation_targets('author'))[0]
assert author_instance.fields['name'] == 'john'
tag_instances = book_instance.get_m2m_relation_targets('tags')
assert set(t.fields['name'] for t in tag_instances) == {'tag1', 'tag2'}
If you want to add some data to the serialized graph, simply override the ``__getstate__`` and
``__setstate__`` methods in your subclass.
Cloning a graph
---------------
Say you created a graph with objects from the database. And you will update it to do some
manipulations, but still want to keep the original graph.
For this, you can call the ``clone`` method of a ``Graph`` object, that will create a new graph
keeping the same instances and relations as the original one. Then, you can keep the original and
update the clone (or the reverse if you want).
.. code-block:: python
cloned_graph = graph.clone()
assert cloned_graph is not graph
assert {inst.uuid for inst in cloned_graph.instances.values()} == \
{inst.uuid for inst in graph.instances.values()}
book_instance = cloned_graph.get_instance(Book, book_pk)
assert book_instance.fields['name'] == 'my book'
author_instance = list(book_instance.get_relation_targets('author'))[0]
assert author_instance.fields['name'] == 'john'
tag_instances = book_instance.get_m2m_relation_targets('tags')
assert set(t.fields['name'] for t in tag_instances) == {'tag1', 'tag2'}
Cloning uses "pickle" under the hood, so you can use the same way of overriding the graph as defined
above to add more information to the data to serialize.
About through models
--------------------
We talked a lot about the "through" model which is the model between the both sides of a
"many to many" relations.
When this "through" model is manually created and defined in the ``ManyToManyField``, you already
have all the information you may need.
But for auto-created "through" models, it's not so easy.
For this, and because we use this a lot in the code, we provide two utils functions in
``django_instances_graph.utils``:
- ``is_through_model(source_model, accessor_name, through_model)``
This function simply tells if a given model is the "through" model for a relation:
.. code-block:: python
from django.apps import apps
from django_instances_graph.utils import is_through_model
# Get the "through" model from django
books_tags_through_model = apps.get_model('tests', 'Book_tags')
# In the normal direction
assert is_through_model(Book, 'tags', Author) is False
assert is_through_model(Book, 'tags', books_tags_through_model) is True
# But also in the reverse direction
assert is_through_model(Tag, 'books', Author) is False
assert is_through_model(Tag, 'books', books_tags_through_model) is True
- ``get_through_info(model, accessor_name)``
This function returns some useful information about the "through" model for the given many-to-many
relation:
.. code-block:: python
from django_instances_graph.utils import get_through_info
through_model, through_source_field, through_target_field, target_model, \
target_accessor_name = get_through_info(Book, 'tags')
assert through_model is books_tags_through_model
# the field on the through model which links to the source model (passed in argument)
assert through_source_field.name == 'book'
# the field on the through model which links to the target model (on the other side of the m2m)
assert through_target_field.name == 'tag'
# the django model on the other side of the m2m relation
assert target_model is Tag
# the name of the attribute on the target model to access the m2m field
assert target_accessor_name == 'books'
For ``through_source_field`` and ``through_target_field``, use ``.name`` to have the accessor
name from the "through" model to the model as seen in the example above, and
``.related.get_accessor_name()`` to have the accessor name from the model to the "through" model:
.. code-block:: python
assert through_source_field.remote_field.get_accessor_name() == 'Book_tags+'
assert through_target_field.remote_field.get_accessor_name() == 'Book_tags+'
As you can see this the name ending with `+`, you cannot use this name to access "through" entries
from the django model instances (it's only true for auto created "through" models, though), but
you can use it to get relations from the ``Instance`` objects:
.. code-block:: python
through_instances = list(book_instance.get_relation_targets('Book_tags+'))
assert through_instances[0].model is books_tags_through_model
tag_instance = list(tag_instances)[0]
through_instances = list(tag_instance.get_relation_targets('Book_tags+'))
assert through_instances[0].model is books_tags_through_model
Of course all of this works on the other side too:
.. code-block:: python
through_model, through_source_field, through_target_field, target_model, \
target_accessor_name = get_through_info(Tag, 'books')
assert through_model is books_tags_through_model
assert through_source_field.name == 'tag'
assert through_target_field.name == 'book'
assert target_model is Book
assert target_accessor_name == 'tags'
Extensions
==========
Some extensions are currently being written. They will all be available as python packages
prefixed with ``dig-`` (d.i.g for Django Instances Graph).
What you'll soon be able to use:
- ``dig-duplicate``
an extension of what is currently possible about duplicating a graph, but with references to
parent objects, and merge capabilities between two duplicates
- ``dig-visualization``
an extension to have a visual representation of a graph, using ``graphviz``
Installation
============
The ``django_instances_graph`` package is only available on the Magency private pypi server.
.. code-block:: sh
pip install -i https://login:password@pypi.magency.ninja/some/index django_instances_graph
Development
===========
The code can be found on the `Magency mtp-back-modules repository
<https://gitlab.com/magency/products/mtp-back-modules/-/tree/master/django-instances-graph>`_
Install the required packages:
.. code-block:: sh
pip install -i https://login:password@pypi.magency.ninja/some/index -e .[dev]
To run tests, simply launch the ``runtests.sh`` script.
And for pylint:
.. code-block:: sh
PYTHONPATH="$PYTHONPATH:." pylint django_instances_graph tests
When ready, update the version in ``setup.cfg`` then create the package:
.. code-block:: sh
./setup.py sdist bdist_wheel
You can now upload it to ``devpi``:
.. code-block:: sh
devpi use https://login:password@pypi.magency.ninja
devpi login yourlogin
devpi use yourlogin/dev
devpi upload dist/django_instances_graph-VERSION*
Support
=======
python>=3.6
django>=2.2
Render warnings:
<string>:48: (ERROR/3) Unknown directive type "testsetup".
.. testsetup::
# This prepare the django environment to run all the "testcode" in this file
# This is not visible in the rendered README
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'tests.settings'
import django
django.setup()
from django.db import connection
connection.creation.create_test_db(verbosity=0)
from tests.models import Author, Book, Tag, Translation