One of the problems that many Open Government Data projects faces is the availability of tons of old documents in PDF format, which is not open and reusable format. Yesterday, Mozilla announced Tabula, a new tool to help liberate tables trapped in PDFs. Do not confuse with “Tabula” the lebanese salad
Tabula online demo is just amazing, to use it simply make a rectangular selection over a table on the PDF pages. That’s it!
There are obviously many limitations since it’s very new solution, for example Tabula only works on text-based PDFs, not scanned documents. however the core features of Tabula are already great, and could help making lots of old documents reusable ! Released under a MIT License.
Sources : https://github.com/jazzido/tabula