Model Definitions

Models in DynamoBao are defined using YAML configuration files. These definitions are used to generate the corresponding model classes with all necessary fields, indexes, and relationships.

Basic Structure

models:
  ModelName:
    modelPrefix: "x" # Required: short (1-4 unique character prefix) for the model. This must be unique and cannot change later. It is not user visible.
    tableType: "standard" # Optional: "standard" (default) or "mapping"
    iterable: false # Optional: "true" or "false" (default). See notes on iteration.
    iterationBuckets: 10 # Optional: number of buckets for iteration. Default is 10.
    searchable: false # Optional: see "Searchable Models" for the object form.
    fields: {} # Required: field definitions
    primaryKey: {} # Required: primary key configuration
    indexes: {} # Optional: secondary indexes
    uniqueConstraints: {} # Optional: unique constraints

Field Types

Fields can be defined with various types and options:

Basic Fields

fields:
  stringField:
    type: StringField
    required: true # Optional: makes the field required

  integerField:
    type: IntegerField
    defaultValue: 0 # Optional: default value

  floatField:
    type: FloatField

  booleanField:
    type: BooleanField
    defaultValue: false

  binaryField:
    type: BinaryField # Stores Buffer or Uint8Array data

  stringSetField:
    type: StringSetField # Stores a set of string values
    maxStringLength: 100 # Optional: maximum length of individual strings
    maxMemberCount: 20 # Optional: maximum number of items in the set

Date and Time Fields

fields:
  dateTime:
    type: DateTimeField # Stores any date/time value

  createdAt:
    type: CreateDateField # Automatically sets creation timestamp

  modifiedAt:
    type: ModifiedDateField # Automatically updates on modification

  ttl:
    type: TtlField # Time-to-live field for automatic deletion

UlidField

The UlidField is a special field that automatically generates ULID values. It is recommended to use this field for the primary key of your model.

fields:
  postId:
    type: UlidField
    autoAssign: true

ULIDs are 128-bit values that are lexicographically sortable and can be generated in parallel. They are designed to be collision resistant and are compatible with the UUID format.

They are especially useful for dynamo since they can be used as sort keys and sort by the date they were created. This makes the default sort order when an id is used feel more natural than being purely random or requiring a secondary index to sort by creation date.

StringSetField

The StringSetField stores a set of unique string values. In JavaScript, this field works with Set objects and is backed by DynamoDB's string set type.

fields:
  tags:
    type: StringSetField
    maxStringLength: 50 # Optional: maximum length of individual strings
    maxMemberCount: 10 # Optional: maximum number of items in the set

Important notes:

  • StringSetField cannot be used in indexes (partition key, sort key, or GSI)
  • Empty sets are stored as null in DynamoDB
  • When loading from DynamoDB, missing or null values return an empty Set()
  • Can be used in filter expressions with $contains and $size operations

Usage example:

const user = new User({
  name: "John Doe",
  tags: new Set(["admin", "premium", "beta"]),
});

// You can also set with an array (duplicates will be removed)
user.tags = ["admin", "premium", "beta"];

// The field will always return a Set object
console.log(user.tags instanceof Set); // true

// Direct mutation operations (recommended)
user.tags.add("vip"); // Add a tag
user.tags.delete("beta"); // Remove a tag
user.tags.clear(); // Clear all tags

await user.save(); // Changes are automatically tracked and saved

// Need to refresh index projections after manual data fixes?
await user.save({ forceReindex: true });

// Filter examples
const premiumUsers = await User.scan({
  filter: {
    tags: { $contains: "premium" },
  },
});

const usersWithManyTags = await User.scan({
  filter: {
    tags: { $size: { $gt: 3 } },
  },
});

VersionField

The VersionField is a special field that automatically generates a new larger version every time the object is saved. It is recommended to use this field for the version of your model. Note that this is a string, not a number.

fields:
  version:
    type: VersionField

CounterField

The CounterField is a special field that allows you to increment and decrement a counter atomically. This is useful for creating counters that can be incremented/decremented across users without having to worry about race conditions.

fields:
  counter:
    type: CounterField

TtlField

The TtlField is a special field that allows you to automatically delete an object after a certain amount of time. This is useful for creating time-based TTLs for objects. The field must be named ttl.

fields:
  ttl:
    type: TtlField

RelatedField

The RelatedField is a special field that allows you to reference another model. This is useful for creating relationships between models.

fields:
  userId:
    type: RelatedField
    model: User
    required: true

This field will automatically generate a getter method on the model that allows you to query for the related model. Two of these fields can be used to create a many-to-many relationship. See the Mapping Tables section for more information.

Complete Field Type Reference

  • StringField: Text data
  • IntegerField: Whole numbers
  • FloatField: Decimal numbers
  • BooleanField: True/false values
  • BinaryField: Binary data (Buffer or Uint8Array)
  • DateTimeField: Date and time values
  • CreateDateField: Automatic creation timestamp
  • ModifiedDateField: Automatic modification timestamp
  • UlidField: Unique identifiers (often used for IDs)
  • VersionField: Automatic version tracking for optimistic locking
  • CounterField: Atomic counters that can be incremented/decremented
  • RelatedField: References to other models
  • TtlField: Time-to-live field for automatic item deletion (field must be named ttl)

Field Options

Common options that can be applied to fields:

fields:
  exampleField:
    type: StringField
    required: true # Field must have a value
    defaultValue: "test" # Default value if none provided

Primary Key Configuration

Every model requires a primary key configuration:

primaryKey:
  partitionKey: "userId" # Required: field to use as partition key
  sortKey: "createdAt" # Optional: field to use as sort key

The primary key's partitionKey and sortKey can never change once the model is created. If you need to change them, you will need to delete the object and create a new object with the new partitionKey and sortKey. With this in mind, it is recommended to choose a partitionKey and sortKey that will not change.

In addition to specifying a field as a partitionKey or sortKey, you can also specify the modelPrefix for either the partitionKey or sortKey. This is useful if you want to query all the objects of a particular model . However, if you expect this particular model to grow large, you should not use the modelPrefix as the partitionKey since all objects of this model will be stored in the same partition.

If the sortKey is not specified, it will use the modelPrefix as the sortKey. This is handled automatically and not something you need to specify.

Indexes

You can add up to 5 secondary indexes (gsi1, gsi2, gsi3, gsi4, gsi5) to a model. Much like a primary key, each index has a partitionKey and sortKey. However, unlike a primary key, you can change the partitionKey and sortKey at any time.

Indexes incur additional costs, so you should only add them if you need them. In particular, if you are creating an index to support an infrequent query, it would be good to test the query with a filter rather than an index to see if the index necessary before adding it.

Secondary indexes enable additional query patterns:

indexes:
  tagsForPost:
    partitionKey: "postId" # Field to use as partition key
    sortKey: "tagId" # Field to use as sort key
    indexId: "gsi1" # GSI identifier (gsi1, gsi2, etc.)

  # Reference to primary key
  postsForTag: primaryKey

Adding the primaryKey to the index is a shorthand that allows you to give the primary key a name and let you query by it. It does not create a new index. If you plan to query by the primary key, it is recommended to name it this way.

Iterating Over All Objects for a Model

DynamoBao supports full-model iteration for tasks like data migration, backfills, and reporting. Iteration is opt-in because it adds a per-row write to a dedicated GSI on every save, which doubles the write cost of the model.

To enable, set iterable: true:

models:
  User:
    modelPrefix: "u"
    iterable: true # opt in
    fields:
      # ...

Once enabled, the iterateAll() method on the model class returns an async generator that pages through every row. Iterable models are also a prerequisite for searchAll/searchBucket (see Searchable Models above).

Cost and Performance Considerations Each create, update, or delete on an iterable model writes one extra row to the iteration GSI. Doubles the write cost. Worth it for User or Post-shaped models where reads/migrations matter; not worth it for high-volume logging tables, event streams, or anything you're sure you'll only access by primary key.

Handling Large Models with Iteration Buckets

To prevent "hot partition" issues on large iterable models, DynamoBao splits the iteration index across multiple partitions. The number of partitions is set by iterationBuckets, which defaults to 10 when iterable: true.

models:
  User:
    modelPrefix: "u"
    iterable: true
    iterationBuckets: 20 # Override the default of 10 for a very large model
    fields:
      # ...

The iterateAll method handles iterating over all buckets automatically. For parallel processing, you can iterate a specific partition using the iterateBucket(bucketNum) method.

Choosing the number of buckets:

  • 1: Use for small models or when parallel iteration is not needed. All items in one partition.
  • 10 (default): Good for most models, providing a solid baseline for scalability.
  • 20+: For very large models or high-throughput parallel processing scenarios.

A good root model isn't too large and doesn't get written to frequently. For instance, a User model may be a good candidate, since you typically have many fewer users than posts, messages, or other objects in the system. Users also don't get written to as frequently as other objects, so it's a good fit for iteration. Since iteration defaults to using 10 buckets, write capacity is automatically distributed, but you should still be mindful of very high-velocity writes (e.g. bulk importing users without rate limiting).

It is not recommended to use the modelPrefix as the primaryKey's partitionKey, since it cannot be changed later. The iterable feature provides a safer and more flexible way to enable iteration.

Searchable Models

Models can declare a searchable block to auto-populate a _searchText attribute on every save. The framework concatenates the listed string fields, normalizes the result (lowercased by default, punctuation stripped, whitespace collapsed), and stores it on the row. You can then query against it from inside any filter, or use the dedicated searchAll/searchBucket API on iterable models for parallel cross-bucket search.

models:
  Post:
    modelPrefix: "p"
    iterable: true
    searchable:
      fields: [title, body, summary] # Required: must be StringField
      caseSensitive: false # Optional, default false
      minTermLength: 1 # Optional, default 1 (no token filtering)
      dedupe: false # Optional, default false
    fields:
      title: { type: StringField }
      body: { type: StringField }
      summary: { type: StringField }
      # ...

Codegen validates that every name in searchable.fields is defined on the model and is a StringField. Mismatches fail the build with a clear error.

How _searchText is computed

  1. Read each listed field, skip null/undefined values.
  2. Concat with a single space.
  3. Lowercase unless caseSensitive: true.
  4. Strip Unicode non-alphanumeric characters (/[^\p{L}\p{N}\s]/gu) → spaces.
  5. Collapse whitespace.
  6. If dedupe: true: tokenize, dedupe in first-seen order, rejoin.
  7. If minTermLength > 1: drop tokens shorter than that, rejoin.

Save-time behavior

  • _searchText is recomputed only when at least one source field is in the update (or on create, or with forceReindex: true). Updates that touch unrelated fields leave _searchText alone — no extra index churn.
  • Untouched source fields are backfilled from the current row, so you never lose part of the searchable text just because you only updated one field.
  • If the recomputed value is empty (all sources cleared), _searchText is REMOVEd and the row drops out of search results.

Eventual consistency

searchAll and searchBucket read from the iter_search_index GSI. DynamoDB GSIs are eventually consistent, so a row whose _searchText was just updated may briefly:

  • still appear under its old value (false positive: row matches the old terms for a few seconds after the update); or
  • not yet appear under its new value (false negative: row doesn't match the new terms until the index catches up).

The same window applies to iterateAll({ filter: ... }). If you need stronger guarantees for a destructive bulk operation, re-read each item via find() (strongly consistent base read) and re-check the predicate yourself before acting.

Searching iterable models

// Single-term search across all buckets
for await (const batch of Post.searchAll(["alice"])) {
  for (const post of batch) console.log(post.title);
}

// Multi-term: AND by default, $or also supported
for await (const batch of Post.searchAll(["iphone", "review"])) { /* ... */ }
for await (const batch of Post.searchAll(["alice", "bob"], { operator: "$or" })) { /* ... */ }

// Phrase as a single term (multi-word substring)
for await (const batch of Post.searchAll(["foo bar"])) { /* ... */ }

// Parallel fan-out across buckets
const buckets = await Promise.all(
  Array.from({ length: Post.iterationBuckets }, async (_, b) => {
    const out = [];
    for await (const batch of Post.searchBucket(b, ["alice"])) out.push(...batch);
    return out;
  }),
);

// Helper: split a free-form query string into terms (honors quoted phrases)
const terms = Post.tokenizeSearchQuery('"hello world" foo'); // => ["hello world", "foo"]

// Cap total results. Default limit is 100; pass Infinity for unbounded.
const { items, cursor } = await Post.searchAll(["alice"], { limit: 25 });

// Pagination: pass the cursor back to get the next page. cursor === null
// when the search is exhausted.
const next = await Post.searchAll(["alice"], { limit: 25, cursor });

// Drain everything (admin / cleanup scripts):
let cur = null, all = [];
do {
  const page = await Post.searchAll(["alice"], { limit: 100, cursor: cur });
  all.push(...page.items);
  cur = page.cursor;
} while (cur);

searchAll returns { items, cursor }. items is never longer than limit. cursor is an opaque string you pass back on the next call to resume; null means the search is exhausted.

Other options:

  • parallel: true (default) fans out across every unexhausted bucket concurrently per round; faster wall-clock at the cost of more concurrent capacity. Pass parallel: false for a sequential walk if you want predictable per-bucket-then-objectId order or you're hitting throttling on a small table.
  • maxQueriesPerBucket: 50 (default) bounds DynamoDB Query roundtrips per call per bucket. With 5 buckets that's up to 250 Queries per call worst-case. For sparse-match searches this caps capacity per call — if we hit the cap without filling limit, the call returns whatever it found plus a non-null cursor pointing at the next page so the caller can continue with another call.
  • limit: Infinity opts out of the cap. Combine with cursor chaining to drain.

The cursor encodes which buckets are unexhausted and where they left off, the bucket scope it was generated for, plus a hash of the search predicate (terms + operator + searchConfig). Resume validates all of these:

  • different terms / operator / searchConfig → throws.
  • different model → throws.
  • different tenant → throws (cursors are tenant-scoped).
  • cursor from searchAll passed to searchBucket (or vice versa) → throws with "Cursor scope mismatch". Use the same API call that generated the cursor.

searchAll accepts an options.filter that's combined with the search predicate, but DynamoDB only allows the index-level filter to reference attributes that are projected onto the GSI. The iter_search_index projects only _searchText (plus the index/base keys) — anything else throws a clear error before the round-trip:

Filter on 'iter_search_index' references attribute(s) that aren't projected: [status]. ...

For non-projected attributes, filter the hydrated batch in JS:

for await (const batch of Post.searchAll(["alice"])) {
  for (const post of batch) {
    if (post.status === "active") yieldOrCollect(post);
  }
}

Searchable + non-iterable models

iterable and searchable are independent flags. A model that's searchable without iterable: true is fully supported and useful — typical use case is partition-scoped substring search: comments under a post, items under a user, anything where you already have a known partition key and want to narrow within it by text content.

How it works mechanically:

  • _searchText is populated on every save (the save-time logic doesn't depend on iterable).
  • The row is not in iter_search_index (no _iter_pk/_iter_sk keys), so no extra GSI write per save.
  • All user-defined GSIs use Projection: ALL, so _searchText rides along on whichever index you query.
  • searchAll/searchBucket are unavailable (they require iterable: true for the bucketed fan-out). At codegen time you'll get a warning if searchable is configured without iterable: true, in case you forgot.

Two patterns for searching:

Filter on _searchText directly via the standard filter API. The value is auto-normalized using the model's searchConfig so user input matches what was stored. The third arg to queryByIndex is the sort-key condition (pass null if none); the fourth is options:

const { items } = await Post.queryByIndex(
  "byUser",
  userId,
  null,
  { filter: { _searchText: { $contains: "Hello, World!" } } },
);
// 'Hello, World!' is auto-normalized to 'hello world' before being sent
// to DynamoDB, matching what _searchText looks like on disk.

This works on any standard filter context: query, queryByIndex, iterateAll({ filter: ... }), and condition expressions in update/delete. Operators supported on _searchText: $contains, $beginsWith, $eq, $ne, $in, $exists. Values are auto-normalized for string-comparison ops; $exists doesn't take a value.

Or filter post-fetch in JS when you want the normalization but not the index-side filter (e.g., post-fetch ranking):

const term = Post.normalizeSearchTerm(userQuery);
const { items } = await Post.queryByIndex("byUser", userId);
const matches = items.filter((p) => p._dyData._searchText?.includes(term));

Multilingual support

The default normalization preserves any Unicode letter (Chinese, Japanese, Korean, Cyrillic, Arabic, Devanagari, etc.). searchAll(["苹果"]) matches rows whose _searchText contains 苹果 (including substrings like 苹果手机).

The dedupe and minTermLength options use whitespace tokenization, which doesn't fit languages without inter-word spacing (Chinese, Japanese). Leave them at their defaults (dedupe: false, minTermLength: 1) for CJK content; the rest of normalization Just Works.

Adopting search on existing data

When you turn a model searchable: true for the first time on a model that already has rows, those rows lack _searchText. After running bao-update-table to add the new GSI, run:

npx bao-rebuild-search-text Post

This walks every row of the model with iterateAll and re-saves with forceReindex: true so the save-time computation populates _searchText.

Migration: enabling search on existing tables

Iteration uses the legacy iter_index GSI (HASH _iter_pk, RANGE _iter_sk, KEYS_ONLY projection) by default. Search needs the new iter_search_index (same keys, but INCLUDE [_searchText]). To enable search:

  1. Bump dynamo-bao — no breakage, iteration still uses iter_index.
  2. Add the GSI: npx bao-update-table — this adds iter_search_index alongside the legacy index. DynamoDB auto-backfills the GSI's key entries; rows that already lack _searchText won't be searchable until you do step 4.
  3. Flip the config: set db.iterationIndexName: "iter_search_index" in your dynamo-bao config file (e.g., dynamo-bao.config.js). After this, iterateAll and searchAll both query the new index.
  4. Backfill existing rows: npx bao-rebuild-search-text <ModelName> — walks every row and re-saves with forceReindex: true, populating _searchText. Skip if the model is empty or you only need to search rows newer than today.
  5. (Optional, later) drop the legacy iter_index GSI manually to save the duplicate write cost. Verify iteration and search both work first.

You can stop after step 2 if you only want to add the new GSI infrastructure without flipping the runtime over yet — both indexes will be populated by saves until step 5.

Unique Constraints

You can specify up to 3 unique constraints (uc1, uc2, uc3) for a model.

uniqueConstraints:
  uniqueEmail:
    field: "email"
    uniqueConstraintId: "uc1" # Unique constraint identifier

Unique constraints are implemented using Dynamo transactions, so they incur significant costs. Only use them where you really need them (e.g. to prevent duplicate emails, or map to a globally unique external id). Primary keys are already unique, so you may be able to use them instead, though keep in mind that they can't be changed later.

Creating a unique constraint will also enable fast lookups for that field. Before creating an index on a field that has a unique constraint, consider whether you can use the unique constraint instead.

Mapping Tables

Mapping tables are used to create many-to-many relationships between models. They are defined using tableType: "mapping" and typically contain RelatedFields to the models being connected.

Basic Structure

ModelName:
  tableType: "mapping" # Identifies this as a mapping table
  modelPrefix: "xx" # Required: 2-character prefix
  fields:
    firstModelId: # Related field to first model
      type: RelatedField
      model: FirstModel
      required: true
    secondModelId: # Related field to second model
      type: RelatedField
      model: SecondModel
      required: true
    createdAt: # Optional: tracking fields
      type: CreateDateField
  primaryKey: # Define how records are stored
    partitionKey: firstModelId
    sortKey: secondModelId

Real Example: TaggedPost

Here's a complete example of a mapping table that connects Tags to Posts:

TaggedPost:
  tableType: "mapping"
  modelPrefix: "tp"
  fields:
    tagId:
      type: RelatedField
      model: Tag
      required: true
    postId:
      type: RelatedField
      model: Post
      required: true
    createdAt:
      type: CreateDateField
  primaryKey:
    partitionKey: tagId
    sortKey: postId
  indexes:
    postsForTag: primaryKey # Query posts for a tag
    tagsForPost: # Query tags for a post
      partitionKey: postId
      sortKey: tagId
      indexId: gsi1
    recentPostsForTag: # Query posts for a tag by date
      partitionKey: tagId
      sortKey: createdAt
      indexId: gsi2

Generated Methods

When you create a mapping table, the following methods are automatically generated on the related models:

For the Tag model:

// Get all posts for a tag
async getPosts(mapSkCondition=null, limit=null, direction='ASC', startKey=null)

// Get posts for a tag ordered by creation date
async getRecentPosts(mapSkCondition=null, limit=null, direction='ASC', startKey=null)

For the Post model:

// Get all tags for a post
async getTags(mapSkCondition=null, limit=null, direction='ASC', startKey=null)

For many-to-many relationships, the methods are called get rather than query because it's only possible to query on the mapping table, not the tags/posts themselves. In practice, you end up paging through these objects in the order of the mapping table rather than filtering them, so it's more like a get than a query. For one-to-many relationships, we add query methods to the related model since it's possible to query the related model objects.

Best Practices

  1. Naming Convention: Name mapping tables by combining the two model names (e.g., TaggedPost, UserGroup)
  2. Required Fields: Make the relationship fields required to ensure data integrity
  3. Tracking Fields: Include createdAt if you need to track when relationships were created
  4. Indexes: Create indexes for querying from both directions
  5. Sort Keys: Consider using createdAt as a sort key in secondary indexes for time-based queries

Common Index Patterns

For a mapping table connecting ModelA and ModelB, you will want to use the primaryKey to relate one model to the other. Then you will also want to create an index to query the reverse (assuming you want to be able to query the reverse side of the relationship).

indexes:
  # Primary access pattern
  bsForA: primaryKey # Query Bs for an A

  # Reverse lookup
  asForB: # Query As for a B
    partitionKey: modelBId
    sortKey: modelAId
    indexId: gsi1

Complete Example

models:
  Post:
    modelPrefix: "p"
    fields:
      postId:
        type: UlidField
        autoAssign: true
      userId:
        type: RelatedField
        model: User
        required: true
      title:
        type: StringField
        required: true
      content:
        type: StringField
      createdAt:
        type: CreateDateField
    primaryKey:
      partitionKey: postId
    indexes:
      postsForUser:
        partitionKey: userId
        sortKey: createdAt
        indexId: gsi1
    iterable: true

Generated Features

The model generator will automatically create:

  1. Field definitions with validation
  2. Primary key configuration
  3. Secondary index queries
  4. Related field getters (e.g., getUser() for a userId field)
  5. Unique constraint validators
  6. Query methods for indexes
  7. Mapping table relationship helpers

Best Practices

  1. Use clear, descriptive names for models and fields
  2. Always include a createdAt field for tracking
  3. Use UlidField for ID fields with autoAssign: true
  4. Define indexes based on your query patterns
  5. Use mapping tables for many-to-many relationships
  6. Keep model prefixes unique and short (1-4 characters)