Core Concepts
Understanding Synth's data structures is key to using it effectively.
Tree
The Tree is the top-level container for an AST:
typescript
interface Tree {
meta: TreeMetadata // Metadata about the source
root: NodeId // Root node ID (always 0)
nodes: BaseNode[] // All nodes in a flat array
strings: Map<string, number> // String interning pool
}
interface TreeMetadata {
language: string // 'markdown', 'javascript', 'html', etc.
source: string // Original source code
created: number // Timestamp
modified: number // Timestamp
data?: Record<string, unknown> // Custom metadata
}Why Arena-Based Storage?
Synth stores all nodes in a flat array instead of a traditional tree structure:
typescript
// Traditional tree (slow)
{
type: 'root',
children: [
{ type: 'heading', children: [...] }, // Pointer chasing!
{ type: 'paragraph', children: [...] }
]
}
// Synth's arena storage (fast!)
nodes: [
{ id: 0, type: 'root', children: [1, 2] },
{ id: 1, type: 'heading', parent: 0, children: [3] },
{ id: 2, type: 'paragraph', parent: 0, children: [4] },
{ id: 3, type: 'text', parent: 1, children: [] },
{ id: 4, type: 'text', parent: 2, children: [] },
]Benefits:
- Cache-friendly: Contiguous memory layout
- O(1) access: Direct array indexing by ID
- No pointer chasing: 3-10x faster traversal
- Serializable: Easy to JSON.stringify/parse
BaseNode
Every node in Synth, regardless of language, uses the same interface:
typescript
interface BaseNode {
/** Unique ID within the tree (array index) */
id: NodeId
/** Node type - language-specific but consistent */
type: string
/** Source location (optional) */
span?: Span
/** Parent node ID (null for root) */
parent: NodeId | null
/** Child node IDs */
children: NodeId[]
/** Language-specific data */
data?: Record<string, unknown>
}The type Field
Node types are consistent within each language:
typescript
// Markdown types
'root' | 'heading' | 'paragraph' | 'text' | 'emphasis' | 'strong' |
'link' | 'image' | 'code' | 'codeBlock' | 'list' | 'listItem' |
'blockquote' | 'horizontalRule' | 'html' | 'table' | ...
// JavaScript types (ESTree-compatible)
'Program' | 'FunctionDeclaration' | 'VariableDeclaration' |
'Identifier' | 'Literal' | 'CallExpression' | 'MemberExpression' | ...
// HTML types
'document' | 'doctype' | 'element' | 'text' | 'comment' | 'cdata'
// JSON types
'Object' | 'Array' | 'Property' | 'String' | 'Number' | 'Boolean' | 'Null'The data Field
Language-specific data goes in data:
typescript
// Markdown heading
{
id: 1,
type: 'heading',
data: {
depth: 2 // ## = depth 2
}
}
// HTML element
{
id: 5,
type: 'element',
data: {
tagName: 'div',
attributes: { class: 'container', id: 'main' },
selfClosing: false
}
}
// JavaScript function
{
id: 10,
type: 'FunctionDeclaration',
data: {
id: 'myFunction',
async: true,
generator: false
}
}Span (Source Location)
The span tells you where in the source code a node came from:
typescript
interface Span {
start: Position
end: Position
}
interface Position {
line: number // 1-based line number
column: number // 0-based column number
offset: number // 0-based byte offset from start
}Getting Source Text
Use span to extract the original source:
typescript
function getSourceText(tree: Tree, node: BaseNode): string {
if (!node.span) return ''
return tree.meta.source.slice(
node.span.start.offset,
node.span.end.offset
)
}
// Example
const tree = parse('# Hello **World**')
const heading = tree.nodes[1]
console.log(heading.span)
// {
// start: { line: 1, column: 0, offset: 0 },
// end: { line: 1, column: 17, offset: 17 }
// }
console.log(getSourceText(tree, heading))
// "# Hello **World**"NodeId
Node IDs are simple numbers (array indices):
typescript
type NodeId = number
// Access a node by ID
const node = tree.nodes[nodeId]
// Or use the helper function
import { getNode } from '@sylphx/synth'
const node = getNode(tree, nodeId)Why Numbers Instead of Objects?
- Performance: No reference counting, no GC pressure
- Serialization: JSON-safe without custom serializers
- Memory: 8 bytes vs 64+ bytes for object references
- WASM-ready: Works with WebAssembly's linear memory
Tree Operations
Getting Nodes
typescript
import { getNode, getRoot, getChildren, getParent } from '@sylphx/synth'
// Get any node by ID
const node = getNode(tree, 5)
// Get the root node
const root = getRoot(tree)
// Get all children as BaseNode[]
const children = getChildren(tree, nodeId)
// Get parent node
const parent = getParent(tree, nodeId)Modifying Trees
typescript
import { addNode, updateNode, removeNode } from '@sylphx/synth'
// Add a new node
const newId = addNode(tree, {
type: 'paragraph',
parent: 0,
children: []
})
// Update a node
updateNode(tree, newId, {
data: { custom: 'value' }
})
// Remove a node
removeNode(tree, newId)Universal AST Benefits
Because all languages use BaseNode, you can:
1. Write Generic Tools
typescript
// Works for ANY language!
function countNodeTypes(tree: Tree): Map<string, number> {
const counts = new Map<string, number>()
for (const node of tree.nodes) {
counts.set(node.type, (counts.get(node.type) || 0) + 1)
}
return counts
}2. Share Traversal Code
typescript
// Same traversal for all languages
import { traverse } from '@sylphx/synth'
function findAllTextNodes(tree: Tree): BaseNode[] {
const results: BaseNode[] = []
traverse(tree, {
enter: (ctx) => {
if (ctx.node.type === 'text' || ctx.node.type === 'Literal') {
results.push(ctx.node)
}
}
})
return results
}3. Build Cross-Language Tools
typescript
// Lint any language
import { createRule } from '@sylphx/synth-lint'
const noEmptyBlocks = createRule({
name: 'no-empty-blocks',
check: (ctx) => {
if (ctx.node.children.length === 0 &&
['BlockStatement', 'codeBlock', 'element'].includes(ctx.node.type)) {
return { message: 'Empty block detected' }
}
}
})Next Steps
- Traversal - All traversal methods
- Query Index - O(1) node lookups
- Plugins - Transform and visitor plugins