PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
-
Updated
Mar 20, 2026 - Java
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 88+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
Free open-source web software for signing PDF (alone or with others) and also organize pages, edit medata and compress pdf
Use TradeRepublic in terminal and mass download all documents
JavaScript bindings for MuPDF
Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient processing for low-resource systems.
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
Java PDF table extraction & OCR library. Extract structured tables from text-based and scanned PDFs using stream, lattice (OpenCV-style grid detection), and hybrid parsing.
Translate many large PDF Reports for free using Python.
A professinal CLI workflow for PhD students to extract, analyze, and visualize academic papers into structured Markdown and Obsidian Canvas.
Web content extraction engine backed by Qt WebEngine.
Powerful PDF data extraction library powered by AI vision models. Transform PDFs into structured, validated data using TypeScript, Zod, and AI providers like Scaleway and Ollama.
🚀 Simplify your research workflow with Claude Scholar, the complete configuration for Claude Code in data science, AI, and academic writing.
Extract presentation slides from videos with accurate timestamps
This sample project provides a preview of the PDF Extract API. Using the sample project and this documentation, you will easily be able to integrate the PDF Extract API in your own server-side code.
AI research assistant that extracts structured patterns from papers using RAG, LangGraph, and Claude. Query across your research library with natural language.
Automated document extraction pipeline using AI vision models for invoice and form data capture
Open WebUI tool for extracting text from PDFs and images using Tesseract OCR. Supports text-based and scanned PDFs, multi-language OCR (English + Swedish), fully offline.
Turn a scientific paper PDF into a presentation slide deck. An Antigravity / Claude Code agent skill.
Pipeline ETL completo em Python para extração de dados de Invoices (PDF), validação rigorosa com Pydantic e análise de métricas de vendas com Pandas. Transforma documentos não estruturados em insights estratégicos.
Add a description, image, and links to the pdf-extraction topic page so that developers can more easily learn about it.
To associate your repository with the pdf-extraction topic, visit your repo's landing page and select "manage topics."