Import DOCX via REST API

Available in Start planBetav2.25.0

The DOCX import API converts .docx files into Tiptap's JSON format. It uses the same conversion service as the editor extension and produces identical content output.

The REST API is the better choice when you need server-side processing, access to footnote or endnote data (which the extension does not surface), or raw JSON without schema filtering. If you want the content loaded directly into a Tiptap editor with headers and footers auto-applied, use the editor extension instead.

Review the postman collection

You can also experiment with the Document Conversion API by heading over to our Postman Collection.

Import DOCX

POST /v2/convert/import/docx

The /v2/convert/import/docx endpoint converts docx files into Tiptap's JSON format. Users can POST documents to this endpoint and use various parameters to customize how different document elements are handled during the conversion process.

Example (cURL)

curl -X POST "https://api.tiptap.dev/v2/convert/import/docx" \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "X-App-Id: YOUR_APP_ID" \
    -F "file=@/path/to/your/file.docx" \
    -F 'imageUploadConfig={"url":"https://your-image-upload-endpoint.com","headers":{"Authorization":"Bearer your-upload-token"}}' \
    -F "prosemirrorNodes={\"nodeKey\":\"nodeValue\"}" \
    -F "prosemirrorMarks={\"markKey\":\"markValue\"}"

Subscription required

This endpoint requires a valid Tiptap subscription. For more details review our pricing page.

Required headers

Name	Description
`Authorization`	The JWT token to authenticate the request. Example: `Bearer your-jwt-token`
`X-App-Id`	The Convert App-ID from the Convert settings page: https://cloud.tiptap.dev/v2/cloud/convert

Body

Name	Type	Description
`file`	`File`	The file to convert
`imageUploadConfig`	`JSON string`	A JSON object configuring image uploads. Contains `url` (required), and optionally `headers`, `method`, and `queryParams`. See the editor extension docs for the full schema.
`prosemirrorNodes`	`Object string`	Custom node mapping for the conversion, see more info.
`prosemirrorMarks`	`Object string`	Custom mark mapping for the conversion, see more info
`verbose`	`string \| number`	A configuration property to help you control the level of diagnostic output during the import process. This is especially useful for debugging or for getting more insight into what happens during conversion. See more at Verbose output
`extractCssStyles`	`"true" \| "false"`	Experimental. When `"true"`, the response includes a `cssStyles` field with the DOCX style catalog translated to CSS-compatible key/value pairs. Defaults to `"false"`. See CSS style extraction for the output shape and caveats.
`placeholders`	`JSON string`	Opt-in translation of Word `PAGE` / `NUMPAGES` field codes in headers and footers to canonical `{page}` / `{numpages}` text tokens. Send `{}` (empty object) to enable with wire defaults, `{ "page"?: string, "total"?: string }` to enable with a rename map (e.g. `{"total":"pages"}` makes `NUMPAGES` come back as `{pages}`), or `false` to explicitly disable. Omitted (default) keeps Word's cached display value as plain text. Mirrors the editor's `Pages.configure({ placeholders })` option; the editor extension forwards it here automatically. See Page-number fields.

Import verbose output

The DOCX import extension provides a verbose configuration property to help you control the level of diagnostic output during the import process. This is especially useful for debugging or for getting more insight into what happens during conversion.

The verbose property is a bitmask number that determines which types of log messages are emitted. The extension uses the following levels:

Value	Level	Description
1	`log`	General informational logs
2	`warn`	Warnings
4	`error`	Errors

Verbose bitmask

You can combine levels by adding their values together. For example, verbose: 3 will enable both log (1) and warn (2) messages.

The verbose output will give you, along the data property, one more property called logs, which will contain info, warn, and error properties, each of them being an array with all of the information related to that specific verbosity.

{
  "data": {
    "content": {
        // Tiptap JSON
    }
  },
  "logs": {
    "info": [],
    "warn": [
      {
        "message": "Image file not found in media files",
        "fileName": "image1.gif",
        "availableMediaFiles": []
      }
    ],
    "error": [
      {
        "message": "Image upload failed: General error",
        "fileName": "image1.gif",
        "url": "https://your-image-upload-endpoint.com",
        "error": "Unable to connect. Is the computer able to access the url?",
        "context": "uploadImage general error"
      }
    ]
  }
}

Headers & Footers in the response

The import API extracts headers and footers from .docx files. When the document contains header or footer content, additional fields are included in the data object of the response.

Response fields

Field	Type	Description
`content`	`Object`	The main document body as Tiptap JSON
`header`	`Object \| null`	Default header content as Tiptap JSON
`footer`	`Object \| null`	Default footer content as Tiptap JSON
`headerFirstPage`	`Object \| null`	First page header (when "Different First Page" is enabled in Word)
`footerFirstPage`	`Object \| null`	First page footer (when "Different First Page" is enabled in Word)
`headerOdd`	`Object \| null`	Odd page header (when "Different Odd & Even Pages" is enabled in Word)
`footerOdd`	`Object \| null`	Odd page footer (when "Different Odd & Even Pages" is enabled in Word)
`headerEven`	`Object \| null`	Even page header (when "Different Odd & Even Pages" is enabled in Word)
`footerEven`	`Object \| null`	Even page footer (when "Different Odd & Even Pages" is enabled in Word)

Example response

{
  "data": {
    "content": {
      "type": "doc",
      "content": [
        {
          "type": "paragraph",
          "content": [{ "type": "text", "text": "Document body content..." }]
        }
      ]
    },
    "header": {
      "type": "doc",
      "content": [
        {
          "type": "paragraph",
          "content": [{ "type": "text", "text": "Default Header" }]
        }
      ]
    },
    "footer": {
      "type": "doc",
      "content": [
        {
          "type": "paragraph",
          "content": [
            { "type": "text", "text": "Page " },
            { "type": "text", "text": "{page}" },
            { "type": "text", "text": " of " },
            { "type": "text", "text": "{numpages}" }
          ]
        }
      ]
    },
    "headerFirstPage": {
      "type": "doc",
      "content": [
        {
          "type": "paragraph",
          "content": [
            {
              "type": "text",
              "text": "Title Page Header",
              "marks": [{ "type": "bold" }]
            }
          ]
        }
      ]
    },
    "footerFirstPage": null,
    "headerOdd": null,
    "footerOdd": null,
    "headerEven": null,
    "footerEven": null
  }
}

Fields that are null indicate the document does not define content for that specific header or footer slot.

Page-number fields

By default, Word PAGE and NUMPAGES field codes inside headers and footers are imported as the cached display value Word stored when the document was last opened (e.g. the literal text 1). Pass placeholders={} on the request to translate those fields into canonical text tokens instead — {page} for PAGE and {numpages} for NUMPAGES. The cached numeric values are dropped when the field instruction is recognized.

curl -X POST "https://api.tiptap.dev/v2/convert/import/docx" \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "X-App-Id: YOUR_APP_ID" \
    -F "file=@/path/to/your/file.docx" \
    -F "placeholders={}"

Downstream consumers (such as the Pages extension) substitute {page} and {numpages} based on whatever token names they're configured to recognize. To make the wire format match a custom token registry, pass placeholders as a JSON object with page and/or total keys:

curl -X POST "https://api.tiptap.dev/v2/convert/import/docx" \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "X-App-Id: YOUR_APP_ID" \
    -F "file=@/path/to/your/file.docx" \
    -F 'placeholders={"page":"page","total":"total"}'

The example above maps NUMPAGES to {total} (Pages' default name for the total-pages field), so the editor preview substitutes natively without any config. The wire field mirrors Pages.configure({ placeholders }) directly, and the editor extension forwards it here automatically when the Pages extension is installed.

Footnotes & Endnotes in the response

The import API extracts footnotes and endnotes from .docx files. When the document contains footnote or endnote content, additional fields are included in the data object of the response.

Response fields

Field	Type	Description
`footnotes`	`Object`	An object keyed by footnote ID, where each value is a Tiptap JSON document (`{ type: "doc", content: [...] }`)
`endnotes`	`Object`	An object keyed by endnote ID, where each value is a Tiptap JSON document (`{ type: "doc", content: [...] }`)

Documents without footnotes or endnotes return empty objects ({}), so existing integrations are unaffected.

Inline references in the document body are represented as footnoteReference and endnoteReference nodes, each carrying a noteId attribute that links to the corresponding entry in footnotes or endnotes.

Example response

{
  "data": {
    "content": {
      "type": "doc",
      "content": [
        {
          "type": "paragraph",
          "content": [
            { "type": "text", "text": "This sentence has a footnote" },
            { "type": "footnoteReference", "attrs": { "noteId": "1" } },
            { "type": "text", "text": " and an endnote" },
            { "type": "endnoteReference", "attrs": { "noteId": "1" } },
            { "type": "text", "text": "." }
          ]
        }
      ]
    },
    "footnotes": {
      "1": {
        "type": "doc",
        "content": [
          {
            "type": "paragraph",
            "content": [{ "type": "text", "text": "This is the footnote content." }]
          }
        ]
      }
    },
    "endnotes": {
      "1": {
        "type": "doc",
        "content": [
          {
            "type": "paragraph",
            "content": [{ "type": "text", "text": "This is the endnote content." }]
          }
        ]
      }
    },
    "header": null,
    "footer": null,
    "headerFirstPage": null,
    "footerFirstPage": null,
    "headerOdd": null,
    "footerOdd": null,
    "headerEven": null,
    "footerEven": null
  }
}

CSS style extraction (experimental)

Experimental feature

extractCssStyles is opt-in and experimental. The shape of the response may change without notice. Pin exact client versions if you depend on it, and read the CSS injection documentation for the full list of supported selectors, CSS properties, and known limitations.

Pass extractCssStyles=true on the request to have the response include a cssStyles field. The field contains the DOCX default-style catalog translated to CSS-compatible key/value pairs, keyed by CSS selector.

Request

curl -X POST "https://api.tiptap.dev/v2/convert/import/docx" \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "X-App-Id: YOUR_APP_ID" \
    -F "file=@/path/to/your/file.docx" \
    -F "extractCssStyles=true"

Response field

Field	Type	Description
`cssStyles`	`Object`	An object keyed by CSS selector, where each value is an object of CSS properties. Empty (`{}`) when the document has no extractable default styles. The field is omitted from the response entirely when `extractCssStyles` is not set or is `"false"`.

Supported selectors

The extractor maps DOCX named styles to this fixed set of CSS selectors. Styles that don't match the map are not converted.

p, h1, h2, h3, h4, h5, h6, blockquote, ul li, ol li, strong, em, u, s, a, code

Supported CSS properties

Each selector above can carry any subset of the following, depending on what the DOCX style defined.

color, fontSize, fontFamily, fontWeight, fontStyle, textDecoration, backgroundColor, textAlign, marginTop, marginBottom, lineHeight

Example response

{
  "data": {
    "content": {
      "type": "doc",
      "content": [
        {
          "type": "paragraph",
          "content": [{ "type": "text", "text": "Document body..." }]
        }
      ]
    },
    "cssStyles": {
      "p": {
        "fontSize": "16px",
        "fontFamily": "Calibri",
        "lineHeight": 1.15,
        "marginBottom": "8pt"
      },
      "h1": {
        "fontSize": "28px",
        "fontWeight": "bold",
        "color": "#1F3864",
        "marginTop": "24pt",
        "marginBottom": "6pt"
      },
      "a": {
        "color": "#0563C1",
        "textDecoration": "underline"
      },
      "blockquote": {
        "fontStyle": "italic",
        "color": "#595959"
      }
    }
  }
}

Notes on the output:

Values are strings or numbers. Pixel-based properties (fontSize) are emitted as "Npx" strings; point-based spacing (marginTop, marginBottom) is emitted as "Npt"; unitless multipliers (lineHeight under the auto line rule) are emitted as plain numbers.
Style inheritance is resolved. Child DOCX styles include everything they inherit from their parents (via basedOn); the output is flattened and you do not have to walk a chain.
Localized documents are supported. Named styles in non-English Word builds are mapped back to their canonical selector before emission.
Missing fields mean "not defined". A selector only appears when at least one property could be extracted. If the DOCX has no style catalog (or every named style falls outside the 16-selector map), cssStyles is {}.

Custom node and mark mapping

You can override the default node/mark types used during import by specifying them in the body of your request within prosemirrorNodes and prosemirrorMarks respectively. You would need to provide these if your editor uses custom nodes/marks and you want the imported JSON to use those.

For example, if your schema uses a custom node type called textBlock instead of the default paragraph, you can include "{\"textBlock\":\"paragraph\"}" in the request body.

You can similarly adjust headings, lists, marks like bold or italic, and more.

Default nodes

Name	Description
`paragraph`	Defines which ProseMirror type is used for paragraph conversion
`heading`	Defines which ProseMirror type is used for heading conversion
`blockquote`	Defines which ProseMirror type is used for blockquote conversion
`codeblock`	Defines which ProseMirror type is used for codeblock conversion
`bulletlist`	Defines which ProseMirror type is used for bulletList conversion
`orderedlist`	Defines which ProseMirror type is used for orderedList conversion
`listitem`	Defines which ProseMirror type is used for listItem conversion
`hardbreak`	Defines which ProseMirror type is used for hardbreak conversion
`horizontalrule`	Defines which ProseMirror type is used for horizontalRule conversion
`table`	Defines which ProseMirror type is used for table conversion
`tablecell`	Defines which ProseMirror type is used for tableCell conversion
`tableheader`	Defines which ProseMirror type is used for tableHeader conversion
`tablerow`	Defines which ProseMirror type is used for tableRow conversion
`image`	Defines which ProseMirror mark is used for image conversion
`footnoteReference`	Defines which ProseMirror type is used for footnote reference conversion
`endnoteReference`	Defines which ProseMirror type is used for endnote reference conversion

Default marks

Name	Description
`bold`	Defines which ProseMirror mark is used for bold conversion
`italic`	Defines which ProseMirror mark is used for italic conversion
`underline`	Defines which ProseMirror mark is used for underline conversion
`strikethrough`	Defines which ProseMirror mark is used for strikethrough conversion
`link`	Defines which ProseMirror mark is used for link conversion
`code`	Defines which ProseMirror mark is used for code conversion

Support & Limitations

For a detailed breakdown of which document features are supported at each stage of the pipeline, see the Supported features matrix.

PreviouslyCSS injection

Next upCustomize