Question 1

Does it work on scanned PDFs?

Accepted Answer

No. Scanned PDFs contain image pages with no embedded text layer. PDF.js can only extract text that is explicitly encoded in the PDF structure. For scanned documents, you need an OCR tool — this feature is not currently supported.

Question 2

Is my PDF uploaded to a server?

Accepted Answer

No. PDF.js reads and extracts the text layer entirely inside your browser. Your file never leaves your device and no data is sent to any external server.

Question 3

What text encoding is used in the output file?

Accepted Answer

The downloaded .txt file is encoded in UTF-8, which supports all languages and special characters. It is compatible with any text editor, code editor, or word processor.

Question 4

Can I extract text from a specific page only?

Accepted Answer

All pages are extracted at once. The output is organized page by page, so you can scroll to the section you need and copy only the relevant text. Page-range selection may be added in a future update.

Question 5

Why is the extracted text garbled or shows strange characters?

Accepted Answer

PDFs with custom font encodings, symbol fonts, or non-standard character mappings may produce garbled text. This is a known limitation of PDF text extraction — the characters exist in the PDF but their Unicode mapping is non-standard.

Question 6

Does extracted text preserve bold and italic formatting?

Accepted Answer

No. Plain text output contains only character content — rich formatting such as bold, italic, font size, colors, and layout are not preserved. All text appears as unstyled UTF-8 characters.

Question 7

Can I extract text from a password-protected PDF?

Accepted Answer

No. The PDF must be unlocked before text can be extracted. Use the Unlock PDF tool to remove the password, then extract the text from the resulting unprotected file.

Question 8

Is there a page limit?

Accepted Answer

There is no enforced page limit. Very long PDFs — hundreds of pages — may take a few extra seconds to process in the browser, but all pages will be extracted successfully.

PDFからテキスト抽出