Models in DynamoBao are defined using YAML configuration files. These definitions are used to generate the corresponding model classes with all necessary fields, indexes, and relationships.
Basic Structure
models:
ModelName:
modelPrefix: "x" # Required: short (1-4 unique character prefix) for the model. This must be unique and cannot change later. It is not user visible.
tableType: "standard" # Optional: "standard" (default) or "mapping"
iterable: false # Optional: "true" or "false" (default). See notes on iteration.
iterationBuckets: 10 # Optional: number of buckets for iteration. Default is 10.
searchable: false # Optional: see "Searchable Models" for the object form.
fields: {} # Required: field definitions
primaryKey: {} # Required: primary key configuration
indexes: {} # Optional: secondary indexes
uniqueConstraints: {} # Optional: unique constraints
Field Types
Fields can be defined with various types and options:
Basic Fields
fields:
stringField:
type: StringField
required: true # Optional: makes the field required
integerField:
type: IntegerField
defaultValue: 0 # Optional: default value
floatField:
type: FloatField
booleanField:
type: BooleanField
defaultValue: false
binaryField:
type: BinaryField # Stores Buffer or Uint8Array data
stringSetField:
type: StringSetField # Stores a set of string values
maxStringLength: 100 # Optional: maximum length of individual strings
maxMemberCount: 20 # Optional: maximum number of items in the set
Date and Time Fields
fields:
dateTime:
type: DateTimeField # Stores any date/time value
createdAt:
type: CreateDateField # Automatically sets creation timestamp
modifiedAt:
type: ModifiedDateField # Automatically updates on modification
ttl:
type: TtlField # Time-to-live field for automatic deletion
UlidField
The UlidField is a special field that automatically generates ULID values. It is recommended to use this field for the primary key of your model.
fields:
postId:
type: UlidField
autoAssign: true
ULIDs are 128-bit values that are lexicographically sortable and can be generated in parallel. They are designed to be collision resistant and are compatible with the UUID format.
They are especially useful for dynamo since they can be used as sort keys and sort by the date they were created. This makes the default sort order when an id is used feel more natural than being purely random or requiring a secondary index to sort by creation date.
StringSetField
The StringSetField stores a set of unique string values. In JavaScript, this field works with Set objects and is backed by DynamoDB's string set type.
fields:
tags:
type: StringSetField
maxStringLength: 50 # Optional: maximum length of individual strings
maxMemberCount: 10 # Optional: maximum number of items in the set
Important notes:
- StringSetField cannot be used in indexes (partition key, sort key, or GSI)
- Empty sets are stored as
nullin DynamoDB - When loading from DynamoDB, missing or null values return an empty
Set() - Can be used in filter expressions with
$containsand$sizeoperations
Usage example:
const user = new User({
name: "John Doe",
tags: new Set(["admin", "premium", "beta"]),
});
// You can also set with an array (duplicates will be removed)
user.tags = ["admin", "premium", "beta"];
// The field will always return a Set object
console.log(user.tags instanceof Set); // true
// Direct mutation operations (recommended)
user.tags.add("vip"); // Add a tag
user.tags.delete("beta"); // Remove a tag
user.tags.clear(); // Clear all tags
await user.save(); // Changes are automatically tracked and saved
// Need to refresh index projections after manual data fixes?
await user.save({ forceReindex: true });
// Filter examples
const premiumUsers = await User.scan({
filter: {
tags: { $contains: "premium" },
},
});
const usersWithManyTags = await User.scan({
filter: {
tags: { $size: { $gt: 3 } },
},
});
VersionField
The VersionField is a special field that automatically generates a new larger version every time the object is saved. It is recommended to use this field for the version of your model. Note that this is a string, not a number.
fields:
version:
type: VersionField
CounterField
The CounterField is a special field that allows you to increment and decrement a counter atomically. This is useful for creating counters that can be incremented/decremented across users without having to worry about race conditions.
fields:
counter:
type: CounterField
TtlField
The TtlField is a special field that allows you to automatically delete an object after a certain amount of time. This is useful for creating time-based TTLs for objects. The field must be named ttl.
fields:
ttl:
type: TtlField
RelatedField
The RelatedField is a special field that allows you to reference another model. This is useful for creating relationships between models.
fields:
userId:
type: RelatedField
model: User
required: true
This field will automatically generate a getter method on the model that allows you to query for the related model. Two of these fields can be used to create a many-to-many relationship. See the Mapping Tables section for more information.
Complete Field Type Reference
StringField: Text dataIntegerField: Whole numbersFloatField: Decimal numbersBooleanField: True/false valuesBinaryField: Binary data (Buffer or Uint8Array)DateTimeField: Date and time valuesCreateDateField: Automatic creation timestampModifiedDateField: Automatic modification timestampUlidField: Unique identifiers (often used for IDs)VersionField: Automatic version tracking for optimistic lockingCounterField: Atomic counters that can be incremented/decrementedRelatedField: References to other modelsTtlField: Time-to-live field for automatic item deletion (field must be namedttl)
Field Options
Common options that can be applied to fields:
fields:
exampleField:
type: StringField
required: true # Field must have a value
defaultValue: "test" # Default value if none provided
Primary Key Configuration
Every model requires a primary key configuration:
primaryKey:
partitionKey: "userId" # Required: field to use as partition key
sortKey: "createdAt" # Optional: field to use as sort key
The primary key's partitionKey and sortKey can never change once the model is created. If you need to change them, you will need to delete the object and create a new object with the new partitionKey and sortKey. With this in mind, it is recommended to choose a partitionKey and sortKey that will not change.
In addition to specifying a field as a partitionKey or sortKey, you can also specify the modelPrefix for either the partitionKey or sortKey. This is useful if you want to query all the objects of a particular model . However, if you expect this particular model to grow large, you should not use the modelPrefix as the partitionKey since all objects of this model will be stored in the same partition.
If the sortKey is not specified, it will use the modelPrefix as the sortKey. This is handled automatically and not something you need to specify.
Indexes
You can add up to 5 secondary indexes (gsi1, gsi2, gsi3, gsi4, gsi5) to a model. Much like a primary key, each index has a partitionKey and sortKey. However, unlike a primary key, you can change the partitionKey and sortKey at any time.
Indexes incur additional costs, so you should only add them if you need them. In particular, if you are creating an index to support an infrequent query, it would be good to test the query with a filter rather than an index to see if the index necessary before adding it.
Secondary indexes enable additional query patterns:
indexes:
tagsForPost:
partitionKey: "postId" # Field to use as partition key
sortKey: "tagId" # Field to use as sort key
indexId: "gsi1" # GSI identifier (gsi1, gsi2, etc.)
# Reference to primary key
postsForTag: primaryKey
Adding the primaryKey to the index is a shorthand that allows you to give the primary key a name and let you query by it. It does not create a new index. If you plan to query by the primary key, it is recommended to name it this way.
Iterating Over All Objects for a Model
DynamoBao supports full-model iteration for tasks like data migration, backfills, and reporting. Iteration is opt-in because it adds a per-row write to a dedicated GSI on every save, which doubles the write cost of the model.
To enable, set iterable: true:
models:
User:
modelPrefix: "u"
iterable: true # opt in
fields:
# ...
Once enabled, the iterateAll() method on the model class returns an async generator that pages through every row. Iterable models are also a prerequisite for searchAll/searchBucket (see Searchable Models above).
Cost and Performance Considerations Each create, update, or delete on an iterable model writes one extra row to the iteration GSI. Doubles the write cost. Worth it for User or Post-shaped models where reads/migrations matter; not worth it for high-volume logging tables, event streams, or anything you're sure you'll only access by primary key.
Handling Large Models with Iteration Buckets
To prevent "hot partition" issues on large iterable models, DynamoBao splits the iteration index across multiple partitions. The number of partitions is set by iterationBuckets, which defaults to 10 when iterable: true.
models:
User:
modelPrefix: "u"
iterable: true
iterationBuckets: 20 # Override the default of 10 for a very large model
fields:
# ...
The iterateAll method handles iterating over all buckets automatically. For parallel processing, you can iterate a specific partition using the iterateBucket(bucketNum) method.
Choosing the number of buckets:
1: Use for small models or when parallel iteration is not needed. All items in one partition.10(default): Good for most models, providing a solid baseline for scalability.20+: For very large models or high-throughput parallel processing scenarios.
A good root model isn't too large and doesn't get written to frequently. For instance, a User model may be a good candidate, since you typically have many fewer users than posts, messages, or other objects in the system. Users also don't get written to as frequently as other objects, so it's a good fit for iteration. Since iteration defaults to using 10 buckets, write capacity is automatically distributed, but you should still be mindful of very high-velocity writes (e.g. bulk importing users without rate limiting).
It is not recommended to use the modelPrefix as the primaryKey's partitionKey, since it cannot be changed later. The iterable feature provides a safer and more flexible way to enable iteration.
Searchable Models
Models can declare a searchable block to auto-populate a _searchText attribute on every save. The framework concatenates the listed string fields, normalizes the result (lowercased by default, punctuation stripped, whitespace collapsed), and stores it on the row. You can then query against it from inside any filter, or use the dedicated searchAll/searchBucket API on iterable models for parallel cross-bucket search.
models:
Post:
modelPrefix: "p"
iterable: true
searchable:
fields: [title, body, summary] # Required: must be StringField
caseSensitive: false # Optional, default false
minTermLength: 1 # Optional, default 1 (no token filtering)
dedupe: false # Optional, default false
fields:
title: { type: StringField }
body: { type: StringField }
summary: { type: StringField }
# ...
Codegen validates that every name in searchable.fields is defined on the model and is a StringField. Mismatches fail the build with a clear error.
How _searchText is computed
- Read each listed field, skip null/undefined values.
- Concat with a single space.
- Lowercase unless
caseSensitive: true. - Strip Unicode non-alphanumeric characters (
/[^\p{L}\p{N}\s]/gu) â spaces. - Collapse whitespace.
- If
dedupe: true: tokenize, dedupe in first-seen order, rejoin. - If
minTermLength > 1: drop tokens shorter than that, rejoin.
Save-time behavior
_searchTextis recomputed only when at least one source field is in the update (or oncreate, or withforceReindex: true). Updates that touch unrelated fields leave_searchTextalone â no extra index churn.- Untouched source fields are backfilled from the current row, so you never lose part of the searchable text just because you only updated one field.
- If the recomputed value is empty (all sources cleared),
_searchTextisREMOVEd and the row drops out of search results.
Eventual consistency
searchAll and searchBucket read from the iter_search_index GSI. DynamoDB GSIs are eventually consistent, so a row whose _searchText was just updated may briefly:
- still appear under its old value (false positive: row matches the old terms for a few seconds after the update); or
- not yet appear under its new value (false negative: row doesn't match the new terms until the index catches up).
The same window applies to iterateAll({ filter: ... }). If you need stronger guarantees for a destructive bulk operation, re-read each item via find() (strongly consistent base read) and re-check the predicate yourself before acting.
Searching iterable models
// Single-term search across all buckets
for await (const batch of Post.searchAll(["alice"])) {
for (const post of batch) console.log(post.title);
}
// Multi-term: AND by default, $or also supported
for await (const batch of Post.searchAll(["iphone", "review"])) { /* ... */ }
for await (const batch of Post.searchAll(["alice", "bob"], { operator: "$or" })) { /* ... */ }
// Phrase as a single term (multi-word substring)
for await (const batch of Post.searchAll(["foo bar"])) { /* ... */ }
// Parallel fan-out across buckets
const buckets = await Promise.all(
Array.from({ length: Post.iterationBuckets }, async (_, b) => {
const out = [];
for await (const batch of Post.searchBucket(b, ["alice"])) out.push(...batch);
return out;
}),
);
// Helper: split a free-form query string into terms (honors quoted phrases)
const terms = Post.tokenizeSearchQuery('"hello world" foo'); // => ["hello world", "foo"]
// Cap total results. Default limit is 100; pass Infinity for unbounded.
const { items, cursor } = await Post.searchAll(["alice"], { limit: 25 });
// Pagination: pass the cursor back to get the next page. cursor === null
// when the search is exhausted.
const next = await Post.searchAll(["alice"], { limit: 25, cursor });
// Drain everything (admin / cleanup scripts):
let cur = null, all = [];
do {
const page = await Post.searchAll(["alice"], { limit: 100, cursor: cur });
all.push(...page.items);
cur = page.cursor;
} while (cur);
searchAll returns { items, cursor }. items is never longer than limit. cursor is an opaque string you pass back on the next call to resume; null means the search is exhausted.
Other options:
parallel: true(default) fans out across every unexhausted bucket concurrently per round; faster wall-clock at the cost of more concurrent capacity. Passparallel: falsefor a sequential walk if you want predictable per-bucket-then-objectId order or you're hitting throttling on a small table.maxQueriesPerBucket: 50(default) bounds DynamoDB Query roundtrips per call per bucket. With 5 buckets that's up to 250 Queries per call worst-case. For sparse-match searches this caps capacity per call â if we hit the cap without fillinglimit, the call returns whatever it found plus a non-null cursor pointing at the next page so the caller can continue with another call.limit: Infinityopts out of the cap. Combine withcursorchaining to drain.
The cursor encodes which buckets are unexhausted and where they left off, the bucket scope it was generated for, plus a hash of the search predicate (terms + operator + searchConfig). Resume validates all of these:
- different terms / operator / searchConfig â throws.
- different model â throws.
- different tenant â throws (cursors are tenant-scoped).
- cursor from
searchAllpassed tosearchBucket(or vice versa) â throws with "Cursor scope mismatch". Use the same API call that generated the cursor.
searchAll accepts an options.filter that's combined with the search predicate, but DynamoDB only allows the index-level filter to reference attributes that are projected onto the GSI. The iter_search_index projects only _searchText (plus the index/base keys) â anything else throws a clear error before the round-trip:
Filter on 'iter_search_index' references attribute(s) that aren't projected: [status]. ...
For non-projected attributes, filter the hydrated batch in JS:
for await (const batch of Post.searchAll(["alice"])) {
for (const post of batch) {
if (post.status === "active") yieldOrCollect(post);
}
}
Searchable + non-iterable models
iterable and searchable are independent flags. A model that's searchable without iterable: true is fully supported and useful â typical use case is partition-scoped substring search: comments under a post, items under a user, anything where you already have a known partition key and want to narrow within it by text content.
How it works mechanically:
_searchTextis populated on every save (the save-time logic doesn't depend oniterable).- The row is not in
iter_search_index(no_iter_pk/_iter_skkeys), so no extra GSI write per save. - All user-defined GSIs use
Projection: ALL, so_searchTextrides along on whichever index you query. searchAll/searchBucketare unavailable (they requireiterable: truefor the bucketed fan-out). At codegen time you'll get a warning ifsearchableis configured withoutiterable: true, in case you forgot.
Two patterns for searching:
Filter on _searchText directly via the standard filter API. The value is auto-normalized using the model's searchConfig so user input matches what was stored. The third arg to queryByIndex is the sort-key condition (pass null if none); the fourth is options:
const { items } = await Post.queryByIndex(
"byUser",
userId,
null,
{ filter: { _searchText: { $contains: "Hello, World!" } } },
);
// 'Hello, World!' is auto-normalized to 'hello world' before being sent
// to DynamoDB, matching what _searchText looks like on disk.
This works on any standard filter context: query, queryByIndex, iterateAll({ filter: ... }), and condition expressions in update/delete. Operators supported on _searchText: $contains, $beginsWith, $eq, $ne, $in, $exists. Values are auto-normalized for string-comparison ops; $exists doesn't take a value.
Or filter post-fetch in JS when you want the normalization but not the index-side filter (e.g., post-fetch ranking):
const term = Post.normalizeSearchTerm(userQuery);
const { items } = await Post.queryByIndex("byUser", userId);
const matches = items.filter((p) => p._dyData._searchText?.includes(term));
Multilingual support
The default normalization preserves any Unicode letter (Chinese, Japanese, Korean, Cyrillic, Arabic, Devanagari, etc.). searchAll(["čšć"]) matches rows whose _searchText contains čšć (including substrings like čšćććş).
The dedupe and minTermLength options use whitespace tokenization, which doesn't fit languages without inter-word spacing (Chinese, Japanese). Leave them at their defaults (dedupe: false, minTermLength: 1) for CJK content; the rest of normalization Just Works.
Adopting search on existing data
When you turn a model searchable: true for the first time on a model that already has rows, those rows lack _searchText. After running bao-update-table to add the new GSI, run:
npx bao-rebuild-search-text Post
This walks every row of the model with iterateAll and re-saves with forceReindex: true so the save-time computation populates _searchText.
Migration: enabling search on existing tables
Iteration uses the legacy iter_index GSI (HASH _iter_pk, RANGE _iter_sk, KEYS_ONLY projection) by default. Search needs the new iter_search_index (same keys, but INCLUDE [_searchText]). To enable search:
- Bump dynamo-bao â no breakage, iteration still uses
iter_index. - Add the GSI:
npx bao-update-tableâ this addsiter_search_indexalongside the legacy index. DynamoDB auto-backfills the GSI's key entries; rows that already lack_searchTextwon't be searchable until you do step 4. - Flip the config: set
db.iterationIndexName: "iter_search_index"in your dynamo-bao config file (e.g.,dynamo-bao.config.js). After this,iterateAllandsearchAllboth query the new index. - Backfill existing rows:
npx bao-rebuild-search-text <ModelName>â walks every row and re-saves withforceReindex: true, populating_searchText. Skip if the model is empty or you only need to search rows newer than today. - (Optional, later) drop the legacy
iter_indexGSI manually to save the duplicate write cost. Verify iteration and search both work first.
You can stop after step 2 if you only want to add the new GSI infrastructure without flipping the runtime over yet â both indexes will be populated by saves until step 5.
Unique Constraints
You can specify up to 3 unique constraints (uc1, uc2, uc3) for a model.
uniqueConstraints:
uniqueEmail:
field: "email"
uniqueConstraintId: "uc1" # Unique constraint identifier
Unique constraints are implemented using Dynamo transactions, so they incur significant costs. Only use them where you really need them (e.g. to prevent duplicate emails, or map to a globally unique external id). Primary keys are already unique, so you may be able to use them instead, though keep in mind that they can't be changed later.
Creating a unique constraint will also enable fast lookups for that field. Before creating an index on a field that has a unique constraint, consider whether you can use the unique constraint instead.
Mapping Tables
Mapping tables are used to create many-to-many relationships between models. They are defined using tableType: "mapping" and typically contain RelatedFields to the models being connected.
Basic Structure
ModelName:
tableType: "mapping" # Identifies this as a mapping table
modelPrefix: "xx" # Required: 2-character prefix
fields:
firstModelId: # Related field to first model
type: RelatedField
model: FirstModel
required: true
secondModelId: # Related field to second model
type: RelatedField
model: SecondModel
required: true
createdAt: # Optional: tracking fields
type: CreateDateField
primaryKey: # Define how records are stored
partitionKey: firstModelId
sortKey: secondModelId
Real Example: TaggedPost
Here's a complete example of a mapping table that connects Tags to Posts:
TaggedPost:
tableType: "mapping"
modelPrefix: "tp"
fields:
tagId:
type: RelatedField
model: Tag
required: true
postId:
type: RelatedField
model: Post
required: true
createdAt:
type: CreateDateField
primaryKey:
partitionKey: tagId
sortKey: postId
indexes:
postsForTag: primaryKey # Query posts for a tag
tagsForPost: # Query tags for a post
partitionKey: postId
sortKey: tagId
indexId: gsi1
recentPostsForTag: # Query posts for a tag by date
partitionKey: tagId
sortKey: createdAt
indexId: gsi2
Generated Methods
When you create a mapping table, the following methods are automatically generated on the related models:
For the Tag model:
// Get all posts for a tag
async getPosts(mapSkCondition=null, limit=null, direction='ASC', startKey=null)
// Get posts for a tag ordered by creation date
async getRecentPosts(mapSkCondition=null, limit=null, direction='ASC', startKey=null)
For the Post model:
// Get all tags for a post
async getTags(mapSkCondition=null, limit=null, direction='ASC', startKey=null)
For many-to-many relationships, the methods are called get rather than query because it's only possible to query on the mapping table, not the tags/posts themselves. In practice, you end up paging through these objects in the order of the mapping table rather than filtering them, so it's more like a get than a query. For one-to-many relationships, we add query methods to the related model since it's possible to query the related model objects.
Best Practices
- Naming Convention: Name mapping tables by combining the two model names (e.g.,
TaggedPost,UserGroup) - Required Fields: Make the relationship fields required to ensure data integrity
- Tracking Fields: Include
createdAtif you need to track when relationships were created - Indexes: Create indexes for querying from both directions
- Sort Keys: Consider using
createdAtas a sort key in secondary indexes for time-based queries
Common Index Patterns
For a mapping table connecting ModelA and ModelB, you will want to use the primaryKey to relate one model to the other. Then you will also want to create an index to query the reverse (assuming you want to be able to query the reverse side of the relationship).
indexes:
# Primary access pattern
bsForA: primaryKey # Query Bs for an A
# Reverse lookup
asForB: # Query As for a B
partitionKey: modelBId
sortKey: modelAId
indexId: gsi1
Complete Example
models:
Post:
modelPrefix: "p"
fields:
postId:
type: UlidField
autoAssign: true
userId:
type: RelatedField
model: User
required: true
title:
type: StringField
required: true
content:
type: StringField
createdAt:
type: CreateDateField
primaryKey:
partitionKey: postId
indexes:
postsForUser:
partitionKey: userId
sortKey: createdAt
indexId: gsi1
iterable: true
Generated Features
The model generator will automatically create:
- Field definitions with validation
- Primary key configuration
- Secondary index queries
- Related field getters (e.g.,
getUser()for auserIdfield) - Unique constraint validators
- Query methods for indexes
- Mapping table relationship helpers
Best Practices
- Use clear, descriptive names for models and fields
- Always include a
createdAtfield for tracking - Use
UlidFieldfor ID fields withautoAssign: true - Define indexes based on your query patterns
- Use mapping tables for many-to-many relationships
- Keep model prefixes unique and short (1-4 characters)