Datastore Data Modeling

Introduction: Different Data Structures for Different Needs

Understanding the Entity Model

An entity in Datastore mode is a structured data object consisting of a unique key and a set of properties . Unlike Native mode documents that can have deeply nested objects and arrays, an entity's properties are relatively flat. Think of an entity as a structured piece of information designed for efficient key-based access with strong consistency guarantees. In Datastore mode, entities are organized into kinds, similar to Native mode collections. However, entities are designed around indexed properties and ancestor relationships rather than flexible nesting. This means you work with entities as discrete units that can be linked through hierarchical keys rather than embedding related data as nested objects. The power of the entity model comes from ancestor keys and entity groups . Instead of storing related information as nested fields within a single document, you create entities with hierarchical keys that establish parent-child relationships. For example, a blog post entity can have comment entities as descendants, where each comment's key includes the post's key as an ancestor. All entities sharing a common ancestor form an entity group, which provides strong consistency guarantees and supports transactions with full ACID properties. This approach provides predictable consistency and performance. In Native mode, embedding comments inside a blog post limits you to 1 MiB per document and requires reading/writing the entire document for any update. With Datastore mode, each comment is a separate entity, so you can update individual comments independently without size constraints. The entity group model ensures that when you query for a post and its comments, you see a consistent snapshot of the data, which is important for transactional applications.

Same Data, Two Approaches: A Blog Post Example

To understand the practical difference between these modes, let's look at how you would represent the same blog post in both Native mode and Datastore mode. Imagine you're building a blogging platform where each post has an author, content, tags, comments, and view counts. In Native mode, you structure this as a single document with nested objects and arrays. The blog post document contains a nested author object, a comments array of comment objects, and a metadata object. All information lives together in one document. Here's how this looks: Pythonfrom google.cloud import firestore def create_firestore_native_blog_document(): """How blog data looks in Native mode (nested)""" db = firestore.Client() document = { 'title': 'Introduction to NoSQL', 'content': 'NoSQL databases provide flexible schemas...', 'author': { 'name': 'Alice Johnson', 'email': 'alice@example.com', 'bio': 'Software engineer' }, 'tags': ['nosql', 'databases', 'tutorial'], 'comments': [ { 'id': 'comment_001', 'author': 'Bob', 'text': 'Great post!', 'likes': 5, 'timestamp': firestore.SERVER_TIMESTAMP }, { 'id': 'comment_002', 'author': 'Carol', 'text': 'Very helpful', 'likes': 3, 'timestamp': firestore.SERVER_TIMESTAMP } ], 'metadata': { 'views': 150, 'published': True, 'created_at': firestore.SERVER_TIMESTAMP } } db.collection('posts').document('post_001').set(document) return documentfrom google.cloud import firestore def create_firestore_native_blog_document(): """How blog data looks in Native mode (nested)""" db = firestore.Client() document = { 'title': 'Introduction to NoSQL', 'content': 'NoSQL databases provide flexible schemas...', 'author': { 'name': 'Alice Johnson', 'email': 'alice@example.com', 'bio': 'Software engineer' }, 'tags': ['nosql', 'databases', 'tutorial'], 'comments': [ { 'id': 'comment_001', 'author': 'Bob', 'text': 'Great post!', 'likes': 5, 'timestamp': firestore.SERVER_TIMESTAMP }, { 'id': 'comment_002', 'author': 'Carol', 'text': 'Very helpful', 'likes': 3, 'timestamp': firestore.SERVER_TIMESTAMP } ], 'metadata': { 'views': 150, 'published': True, 'created_at': firestore.SERVER_TIMESTAMP } } db.collection('posts').document('post_001').set(document) return document Notice how everything is contained in a single document with nested structures. To display this blog post, you make one query that retrieves the entire document. Now let's see the same data in Datastore mode using the entity model. One difference you'll notice immediately is that Datastore mode requires you to explicitly specify your GCP project ID when creating the client. Both Firestore modes operate within GCP projects, but Datastore mode's client initialization requires the project parameter. In production, you would use your actual project ID, but for examples we'll use 'test-project': Pythonfrom google.cloud import datastore from datetime import datetime def create_datastore_blog_entities(): """How the same blog data looks in Datastore mode (separate entities)""" client = datastore.Client(project='test-project') # Create the blog post entity with flat properties post_key = client.key('BlogPost', 'post_001') post = datastore.Entity(key=post_key) post.update({ 'title': 'Introduction to NoSQL', 'content': 'NoSQL databases provide flexible schemas...', 'author_name': 'Alice Johnson', 'author_email': 'alice@example.com', 'author_bio': 'Software engineer', 'tags': ['nosql', 'databases', 'tutorial'], 'views': 150, 'published': True, 'created_at': datetime.now() }) # Create comment entities with ancestor keys comment_1_key = client.key('BlogPost', 'post_001', 'Comment', 'comment_001') comment_1 = datastore.Entity(key=comment_1_key) comment_1.update({ 'author': 'Bob', 'text': 'Great post!', 'likes': 5, 'timestamp': datetime.now() }) comment_2_key = client.key('BlogPost', 'post_001', 'Comment', 'comment_002') comment_2 = datastore.Entity(key=comment_2_key) comment_2.update({ 'author': 'Carol', 'text': 'Very helpful', 'likes': 3, 'timestamp': datetime.now() }) # Save all entities to Datastore client.put(post) client.put(comment_1) client.put(comment_2) return post, [comment_1, comment_2]from google.cloud import datastore from datetime import datetime def create_datastore_blog_entities(): """How the same blog data looks in Datastore mode (separate entities)""" client = datastore.Client(project='test-project') # Create the blog post entity with flat properties post_key = client.key('BlogPost', 'post_001') post = datastore.Entity(key=post_key) post.update({ 'title': 'Introduction to NoSQL', 'content': 'NoSQL databases provide flexible schemas...', 'author_name': 'Alice Johnson', 'author_email': 'alice@example.com', 'author_bio': 'Software engineer', 'tags': ['nosql', 'databases', 'tutorial'], 'views': 150, 'published': True, 'created_at': datetime.now() }) # Create comment entities with ancestor keys comment_1_key = client.key('BlogPost', 'post_001', 'Comment', 'comment_001') comment_1 = datastore.Entity(key=comment_1_key) comment_1.update({ 'author': 'Bob', 'text': 'Great post!', 'likes': 5, 'timestamp': datetime.now() }) comment_2_key = client.key('BlogPost', 'post_001', 'Comment', 'comment_002') comment_2 = datastore.Entity(key=comment_2_key) comment_2.update({ 'author': 'Carol', 'text': 'Very helpful', 'likes': 3, 'timestamp': datetime.now() }) # Save all entities to Datastore client.put(post) client.put(comment_1) client.put(comment_2) return post, [comment_1, comment_2] The Datastore version separates data into multiple entities. The post has flat properties like author_name and author_email instead of nested objects. Each comment is a separate entity with an ancestor key linking it to the post. The key client.key('BlogPost', 'post_001', 'Comment', 'comment_001') creates a hierarchical relationship. Notice the API differences: Native mode provides firestore.SERVER_TIMESTAMP, a sentinel value that Firestore's server replaces with the actual server time when writing. Datastore mode doesn't have this feature, so you use datetime.now() for client-side timestamps. In Native mode, displaying the blog post requires one query returning the complete document. In Datastore mode, you need two queries: one for the post entity and an ancestor query for comments. However, if you need to update a single comment, Native mode requires reading and writing the entire document, while Datastore mode lets you update just that comment entity.

Querying Entities: Datastore Mode's Indexed Approach

A key difference between modes is how queries work. Native mode lets you query nested fields using dot notation. Datastore mode queries work on flat entity properties and require those properties to be indexed. While single-property queries use automatic indexes, complex queries require composite indexes defined explicitly. Here's how query patterns work in Datastore mode: Pythonfrom google.cloud import datastore def datastore_query_examples(): """Common Datastore mode query patterns""" client = datastore.Client(project='test-project') # Query by author email (single property) query_by_author = client.query(kind='BlogPost') query_by_author.add_filter('author_email', '=', 'alice@example.com') # Query published posts with high views (requires composite index) query_popular = client.query(kind='BlogPost') query_popular.add_filter('published', '=', True) query_popular.add_filter('views', '>', 100) # Query by tag (array property) query_by_tag = client.query(kind='BlogPost') query_by_tag.add_filter('tags', '=', 'nosql') # Ancestor query to get all comments for a post post_key = client.key('BlogPost', 'post_001') query_comments = client.query(kind='Comment', ancestor=post_key) return { 'by_author': query_by_author, 'popular': query_popular, 'by_tag': query_by_tag, 'comments': query_comments }from google.cloud import datastore def datastore_query_examples(): """Common Datastore mode query patterns""" client = datastore.Client(project='test-project') # Query by author email (single property) query_by_author = client.query(kind='BlogPost') query_by_author.add_filter('author_email', '=', 'alice@example.com') # Query published posts with high views (requires composite index) query_popular = client.query(kind='BlogPost') query_popular.add_filter('published', '=', True) query_popular.add_filter('views', '>', 100) # Query by tag (array property) query_by_tag = client.query(kind='BlogPost') query_by_tag.add_filter('tags', '=', 'nosql') # Ancestor query to get all comments for a post post_key = client.key('BlogPost', 'post_001') query_comments = client.query(kind='Comment', ancestor=post_key) return { 'by_author': query_by_author, 'popular': query_popular, 'by_tag': query_by_tag, 'comments': query_comments } The first query searches by a property value using an automatic index. Unlike Native mode where you could query author.email, Datastore mode works with flat properties, so author email must be at the top level. The second query combines multiple conditions but requires a composite index defined in index.yaml. Native mode allows querying multiple fields without pre-configured indexes in most cases. The third query demonstrates array filtering. When you store tags as an array, Datastore mode indexes each value automatically. The fourth query shows ancestor queries, unique to Datastore mode. By specifying ancestor=post_key, you retrieve all comments belonging to that post with strong consistency. Native mode doesn't have entity groups or ancestor queries.

Updating Entities: Property-Level Operations

Updating entities in Datastore mode differs from Native mode's atomic operators. Native mode provides operators like firestore.Increment() and firestore.ArrayUnion() for atomic updates. Datastore mode typically uses read-modify-write patterns or transactions. Here's how update operations work in Datastore mode: Pythonfrom google.cloud import datastore from datetime import datetime def datastore_update_examples(): """Common Datastore mode update patterns""" client = datastore.Client(project='test-project') post_key = client.key('BlogPost', 'post_001') # Increment a view counter (read-modify-write) post = client.get(post_key) post['views'] = post.get('views', 0) + 1 client.put(post) # Add a comment (create new entity with ancestor key) comment_key = client.key('BlogPost', 'post_001', 'Comment', 'comment_003') comment = datastore.Entity(key=comment_key) comment.update({ 'author': 'Eve', 'text': 'Excellent!', 'likes': 0, 'timestamp': datetime.now() }) client.put(comment) # Add a tag to the tags array post = client.get(post_key) current_tags = post.get('tags', []) if 'beginner-friendly' not in current_tags: current_tags.append('beginner-friendly') post['tags'] = current_tags client.put(post) # Update within a transaction for consistency with client.transaction(): post = client.get(post_key) post['views'] += 1 client.put(post)from google.cloud import datastore from datetime import datetime def datastore_update_examples(): """Common Datastore mode update patterns""" client = datastore.Client(project='test-project') post_key = client.key('BlogPost', 'post_001') # Increment a view counter (read-modify-write) post = client.get(post_key) post['views'] = post.get('views', 0) + 1 client.put(post) # Add a comment (create new entity with ancestor key) comment_key = client.key('BlogPost', 'post_001', 'Comment', 'comment_003') comment = datastore.Entity(key=comment_key) comment.update({ 'author': 'Eve', 'text': 'Excellent!', 'likes': 0, 'timestamp': datetime.now() }) client.put(comment) # Add a tag to the tags array post = client.get(post_key) current_tags = post.get('tags', []) if 'beginner-friendly' not in current_tags: current_tags.append('beginner-friendly') post['tags'] = current_tags client.put(post) # Update within a transaction for consistency with client.transaction(): post = client.get(post_key) post['views'] += 1 client.put(post) The first operation increments a counter using read-modify-write, different from Native mode's atomic Increment(). Without a transaction, there's risk of race conditions. The second operation adds a comment by creating a new entity rather than using ArrayUnion(). This entity can be managed independently without affecting the post entity's size. The third operation manages arrays by explicitly checking for duplicates and modifying in memory, more manual than Native mode's ArrayUnion(). The fourth operation uses transactions for consistency, supporting up to 25 entity groups with full ACID properties.

Choosing Your Firestore Mode: Entity vs Document Model

Now that you understand both modes, you need to know when to choose each one. The decision depends on your data structure, access patterns, and consistency requirements. You should use Native mode when your data is naturally hierarchical with nested structures like product catalogs with embedded specifications, user profiles with preferences, or content management systems. Native mode is better when you need flexible query patterns that might change over time or when you want to minimize database operations to retrieve related information. The document model shines when you want atomic update operators for nested fields and arrays without complex read-modify-write patterns. You should use Datastore mode when your data access patterns are well-defined and centered around key-based lookups. If you're building session management systems or inventory systems with predictable lookups, Datastore mode excels. It's particularly useful when you need transactions across multiple root-level entities within an entity group or need to maintain backward compatibility with Cloud Datastore for migration scenarios. The trade-offs involve several dimensions. Native mode offers flexibility with deep nesting but has document size limits. Datastore mode offers predictable performance for key-based access and supports transactions across multiple entity groups but requires composite indexes for complex queries and explicit relationship management through ancestor keys. Here's a comparison of how the approaches differ: Pythondef compare_firestore_modes(): """Compare operations in both Firestore modes""" native_mode = { 'get_post_with_comments': 1, # Single query gets everything 'add_comment': 'ArrayUnion operation', 'update_counter': 'Atomic Increment', 'find_by_nested_field': 'Direct query with dot notation', 'consistency': 'Strong consistency for all operations', 'indexes': 'Automatic for most queries' } datastore_mode = { 'get_post_with_comments': 2, # Entity query + ancestor query 'add_comment': 'Create new entity', 'update_counter': 'Transaction or read-modify-write', 'find_by_nested_field': 'Store as flat properties', 'consistency': 'Strong consistency within entity groups', 'indexes': 'Composite indexes for multi-property queries' } return native_mode, datastore_modedef compare_firestore_modes(): """Compare operations in both Firestore modes""" native_mode = { 'get_post_with_comments': 1, # Single query gets everything 'add_comment': 'ArrayUnion operation', 'update_counter': 'Atomic Increment', 'find_by_nested_field': 'Direct query with dot notation', 'consistency': 'Strong consistency for all operations', 'indexes': 'Automatic for most queries' } datastore_mode = { 'get_post_with_comments': 2, # Entity query + ancestor query 'add_comment': 'Create new entity', 'update_counter': 'Transaction or read-modify-write', 'find_by_nested_field': 'Store as flat properties', 'consistency': 'Strong consistency within entity groups', 'indexes': 'Composite indexes for multi-property queries' } return native_mode, datastore_mode This shows the practical differences. Native mode retrieves posts with comments in one query, uses simple atomic operations, and queries nested fields directly. Datastore mode requires multiple queries but provides independent entity management, transactional updates across entity groups, and explicit relationship control through ancestor keys.

Summary: Understanding Your Firestore Options

You now understand two different ways to use Google Cloud Firestore. Native mode's document model excels at representing hierarchical data with nested objects and arrays, offering flexible queries without requiring pre-configured indexes for most use cases. Datastore mode's entity model provides predictable performance for key-based lookups with strong consistency guarantees within entity groups, making it ideal for transactional applications with well-defined access patterns. Neither mode is universally better than the other; the right choice depends on your data structure, access patterns, and consistency requirements. In the upcoming practice exercises, you'll work with code that demonstrates both modes side by side. You'll see how to transform Native mode's nested document structure into Datastore mode's entity structure with ancestor relationships. You'll practice creating entities with indexed properties, querying with filters and ancestor queries, and updating entities within transactions. These exercises will help you develop intuition for recognizing when the entity model provides advantages over the document model, a skill that will guide your Firestore mode choices in real-world applications.