---
title: "Custom Markdown Parsing"
description: "Learn how to implement custom Markdown parsing in Tiptap Editor with this comprehensive guide."
canonical_url: "https://tiptap.dev/docs/editor/markdown/advanced-usage/custom-parsing"
---

# Custom Markdown Parsing

Learn how to implement custom Markdown parsing in Tiptap Editor with this comprehensive guide.

This guide will walk you through the process of implementing custom Markdown parsing in Tiptap Editor. By the end of this tutorial, you'll be able to extract [Tiptap JSON](../glossary#tiptap-json) from [Tokens](../glossary#token).

Extensions can provide custom parsing logic to handle specific Markdown tokens. This is done through the `markdown.parse` handler.

## Creating and understanding a parse handler

A parse handler receives a Markdown token from MarkedJS and returns Tiptap JSON content that can be consumed by the editor.
In addition to the token, the parse function also receives a `helpers` object with utility functions to assist in parsing.

These can be helpful for creating nodes, marks or parsing child `MarkedJS` tokens from `token.tokens`.

```typescript
const MyHeading = Node.create({
  name: 'customHeading',
  // ...

  markdownTokenName: 'heading', // Token type to handle (optional, default is the extension name)
  parseMarkdown: (token, helpers) => {
    return {
      type: 'heading',
      attrs: { level: token.depth },
      content: helpers.parseInline(token.tokens || []),
    }
  },
})
```

In this example the parse handler processes a `heading` token which is passed through by MarkedJS to our Markdown manager.
This token is picked up in this example and transformed into a Tiptap node with the node type `heading`.

The appropriate `level` attribute is extracted from the token and it's inline content (as headlines can only contain marks or inline text) are parsed using the `helpers.parseInline()` function.

> **Important**: Attributes on tokens can vary depending on how the [Tokenizer](../glossary#tokenizer) is configured.

## Parse Helper Functions

As described in the section above, the `helpers` object provides utility functions for parsing child tokens or creating nodes and marks.
Let us go through each helper and see how they can be used.

### Parse inline-level child token with `helpers.parseInline(tokens)`

This helper takes a list of tokens and **tries** to parse them as inline content (text nodes with marks).
It will not verify if the tokens are actually inline tokens so make sure to only pass inline tokens here.

The function returns `TiptapJSON[]` that can be used as the `content` of a Tiptap Node.

```typescript
parse: (token, helpers) => {
  const content = helpers.parseInline(token.tokens || [])

  return {
    type: 'paragraph',
    content,
  }
}
```

### Parse block-level child token with `helpers.parseChildren(tokens)`

Similar to `parseInline()`, but parses tokens as block-level content (e.g., list items, blockquotes, code blocks and more).
It will not verify if the tokens are actually block-level tokens so make sure to only pass block-level tokens here.

The function returns `TiptapJSON[]` that can be used as the `content` of a Tiptap Node.

```typescript
parse: (token, helpers) => {
  // Parse nested block content (e.g., list items)
  const content = helpers.parseChildren(token.tokens || [])

  return {
    type: 'blockquote',
    content,
  }
}
```

### Parsing Marks with `helpers.parseInline()` and `helpers.applyMark()`

Use `helpers.applyMark()` to apply a mark to content:

```typescript
const Bold = Mark.create({
  name: 'bold',

  markdownTokenName: 'strong',
  parseMarkdown: (token, helpers) => {
    const content = helpers.parseInline(token.tokens || []) // parse the inline content inside the mark
    return helpers.applyMark('bold', content) // apply the 'bold' mark to the parsed content
  },
})
```

## HTML Parsing in Markdown

When Markdown contains HTML, it's parsed using your extensions' existing `parseHTML` methods.

```markdown
# Regular Markdown

<custom-component data-foo="bar">
  <p>This HTML is parsed by your extensions</p>
</custom-component>

More **Markdown** here.
```

## Defining the Markdown token name

When parsing tokens to nodes or marks, it can happen that tokens may not map one-to-one to your node or mark names. In that case, you can use `markdownTokenName` to specify which token names to parse and to your nodes or marks type name.

```typescript
const CustomBold = Mark.create({
  name: 'bold',
  // ...

  markdownTokenName: 'strong', // Match 'strong' tokens when parsing
  parseMarkdown: (token, helpers) => { /* ... */ },
  renderMarkdown: (node, helpers) => { /* ... */ },
})
```

This is useful when:

- Markdown token names differ from your node names
- Multiple Markdown tokens map to the same node type
- One node type can be serialized to multiple Markdown formats

## Fallback Parsing

If no extension handles a specific token type, the MarkdownManager provides fallback parsing for common tokens:

- `paragraph` → `{ type: 'paragraph' }`
- `heading` → `{ type: 'heading', attrs: { level } }`
- `text` → `{ type: 'text', text }`
- `html` → Parsed using extensions' `parseHTML` methods

You can override this by providing your own handler for these token types.

## Debug Parsing

Log tokens to understand what MarkedJS produces:

```typescript
const markdown = '# Hello **World**'
const tokens = editor.markdown.instance.lexer(markdown)
console.log(JSON.stringify(tokens, null, 2))
```

### Parse tokens in isolation

```typescript
const token = {
  type: 'heading',
  depth: 1,
  tokens: [{ type: 'text', text: 'Hello' }],
}

const helpers = {
  parseInline: tokens => [{ type: 'text', text: 'Hello' }],
  // ... other helpers
}

const result = myExtension.options.markdown.parse(token, helpers)
console.log(result)
```

## Performance Considerations

### Lazy Parsing

For large documents, consider parsing on demand:

```typescript
let cachedJSON = null

function getJSON() {
  if (!cachedJSON) {
    cachedJSON = editor.markdown.parse(largeMarkdownString)
  }
  return cachedJSON
}
```

### Incremental Updates

Instead of re-parsing the entire document on each change, update specific sections:

```typescript
editor.commands.insertContentAt(position, newMarkdown, { contentType: 'markdown' })
```

## Examples

### Custom Heading Parser

Let's build a custom heading parser for a `customHeading` extension that will extract the heading level and also generate a unique ID for each heading.

```typescript
import { Node } from '@tiptap/core'

const CustomHeading = Node.create({
  name: 'customHeading',

  // ... other config

  parseMarkdown: (token, helpers) => {
    const level = token.depth || 1 // we can get the heading level from the token

    // Add custom attributes
    return {
      type: 'customHeading',
      attrs: {
        level,
        id: `heading-${Math.random()}`, // Generate ID
      },
      content: helpers.parseInline(token.tokens || []), // parse the inline content of the heading token
    }
  },
})
```

### Custom YouTube Embed Parser

Let's create a custom parser for a `youtube` token that turns the token into a `youtubeEmbed` node with the appropriate embed attributes.

```typescript
import { Node } from '@tiptap/core'

const YoutubeEmbed = Node.create({
  name: 'youtubeEmbed',
  atom: true, // this is a self-contained node

  // ... other config

  parseMarkdown: (token) => {
    // Those attributes are extracted from the youtube token
    // we assume that a custom tokenizer is providing these tokens
    // from the Markdown syntax like: ![youtube](videoId?start=60&width=800&height=450)
    const videoId = token.videoId || ''
    const start = token.start || 0
    const width = token.width || 560
    const height = token.height || 315


    // Because this is an atom node, we don't require the helpers
    // to parse any children, as this node is self-contained.
    return {
      type: 'youtubeEmbed',
      attrs: {
        videoId,
        start,
        width,
        height,
      },
    }
  },
})
```
