Wikipedia Dump Reader
Education
Score 82%
Description:

This simple programs display the text-only wikipedia compressed dumps, currently available at http://download.wikimedia.org/backup-index.html, generally named something like pages-articles.xml.bz2.

It's fairly useable now although lots of rendering issues occurs

Features includes a Qt viewer with basic text markup, following links, ability to read directly on the .bz2 compressed file (altough some index creations step is needed on first run), tab-like list of articles with load-in-the-background by default, a simple but useful keyword search, very light source-code, optional latex rendering.

The code requires PyQt4

Older versions has been tested on Fedora Core 4 and Kubuntu with PyQt4.1 (Python 2.4, Qt 4.2), and Ubuntu Gutsy.

See included README

Note that the development tree is now hosted on launchpad. See https://launchpad.net/wikipediadumpreader/

Any comment is welcome.
Last changelog:

10 years ago

Updated to 0.2.10:
- Use a new indexing scheme for the entrylist - articles load faster now
- Upgrade path for old indexing scheme
- Utf8 fixes for non-ascii pathnames
- experimental RPM package - feedback welcome at the project website : https://launchpad.net/wikipediadumpreader

(jul 09: updated the ubuntu package for Jaunty's Python2.6 compatibility)

Updated to 0.2.9:
- make it able to load Wiktionary non-uppercased words
- Ability to load a 64-bits module - Thanks to Michael Heide
- added a small UI layout - Thanks to GreenReaper
- Better corrupted files handling

Updated to 0.2.8:
- Sorry : no program changes, but a much more friendly opening dialog
Built a rough Ubuntu package, to ease installation for unexperienced users running Ubuntu Gutsy or Hardy


Updated to 0.2.7:
- minor rendering fixes
- a few more macros

Updated to 0.2.6:
- better wikisyntax parsing
- minor bugfixes

Updated to 0.2.5:
- Bugfixes and improvement in rendering.
- Moved the development tree to lp
- optional fontsize

Updated to 0.2.4:
- Optional Latex/texvc call to render math. thanks to Mathieu Beliveau

Updated to 0.2.3:
- Fixed an obvious overflow bug in the index creation code.
Rebuilding the index is necessary, sorry. To force it, delete the two *idx files before running the program, and be patient (English dumps index creation takes several dozen minutes)
- basic table and footnotes support

Updated to 0.2.1 : fix a bug when reading articles on blocks boundaries
Updated to 0.2.2 : improved wiki rendering for lists and definitions

l1zard

6 years ago

hi in the tar.bz2 file is no source. just a bunch of executable files and some sources for parts needed by the programm but no configure or make file to get this software installed or packaged.

Report

REMF

8 years ago

"experimental RPM package - feedback welcome at the project website : https://launchpad.net/wikipediadumpreader"

fantastic news if this means what i think it means, i.e. that opensuse/mandriva/fedora users will be able to easily install your fantastic program.

thanks

Report

REMF

8 years ago

............ on further development? :)

a PyQT 4.5 version, or something else........

Report

C

benji2

8 years ago

Hi,
Thanks for your support.
Sadly, i have no real plans except of occasional maintenance, or integrating contributor's help.
WikipediaDumpReader should work with any PyQt4 version from 4.1, including 4.5. If you meant "Webkit" version, i don't have any plans for that - at least until 4.5 is default in some LTS release. I take compatibility very seriously as a lot of my users are not bleeding-edge upgraders ;-)

Report

REMF

8 years ago

cheers for the update.

once again my thanks for creating an awesome program.

do you know if their are any easy to install suse packages available?

Report

tuxpost

9 years ago

Any suggestion which dump to download at the wikimedia site? These xml dumps seems to be special interest versions only with 3-8 MB in size.

Report

orivej

9 years ago

*-pages-articles.xml.bz2

Report

tuxpost

9 years ago

Yes, the pages-*. But there are several pages-* for several wikis with different sizes and different content. ;) Or maybe I'm blind.

Report

tuxpost

9 years ago

different pages-articles.xml.bz2

Report

tuxpost

9 years ago

So the last post. ;) enwiki seems to be the dump for the english version, dewiki for the german version and so on.

Report

orivej

9 years ago

There is a typo in the version 0.2.8 in the file “mathexp.py” on line 33: there is “self.textvc = "texvc"” (texTvc) where it should be “self.texvc = "texvc"”.

Report

REMF

9 years ago

This is the only real linux competitor to the windows based wikitaxi application.

I note you mention something about ubuntu packages, is there any chance you could provide the same convenience for opensuse?

Mant thanks

Report

sinosure

9 years ago

Can this reader run under Maemo of Nokia N800.

It seemed that maemo don't have pyqt4 :(



Report

REMF

9 years ago

Hi there,

is there any further news on what will happen next with this excellent program?

forgive my ignorance, but will it work on KDE4, specifically Opensuse 11.1 using KDE 4.1.2?

cheers

Report

C

benji2

9 years ago

Hi again,
Wikipedia Dump Reader doesn't use any "KDE" features, only PyQt4. Therefore, it should work the same either on KDE 3, 4, or any non-KDE-based environment, as long as PyQt4 is installed.

Regarding future development, i don't have clear plans currently, as it already does what i intended it to do (+ i'm lazy).

Do you think some major feature is missing for a convenient use ? Maybe the suggested cleaning of non-reachable links should be on my todo list...

Report

REMF

9 years ago

that would be an awesome start, i will give it some more thought and see what i come up with in addition to the link clean up.

many thanks

Report

applegrew

10 years ago

I have recently tried to run dumpReader over a dump from en.wiktionary. It gets into infinite loop whenever there is redirect,e.g. whenever I try to open the article Garbage, I get the message (in the console) "Garbage" redirects to "garbage", and this message repeats forever and the application hangs. (Even when I try to open garbage (it starts with small 'g'), even then I get the exact same output and the application hangs again.

Another note: When I start dumpReader.py
I get the following errors in the console.

dumpReader.py:11: RuntimeWarning: Python C API version mismatch for module bz2: This Python has API version 1013, module bz2 has version 1012.
import bz2
Error while loading math parser

-----------
I have 2.5.1 running in Kubuntu Gutsy Gibbon.

Report

C

benji2

9 years ago

Hi,
Thanks for the report. I first need to get a fresher english dump to trigger the bug, hope to have time to fix it soon.
Regarding the python error, it's pretty safe to ignore it. If it bothers you, see included README on why it does and how to fix.

Report

REMF

10 years ago

are you working on any further improvements you can tell us about?

Regards

Report

slyfoot

9 years ago

I like this, but can anyone tell me what code I need to add in order to increase the size of the fonts? I'm visually impaired and it's too difficult to see!

Report

C

benji2

9 years ago

Hi !
I just uploaded version 0.2.5, which ease fontsize changing. From the README:

Q. Can i change the text size ?
A. Font Size can now be changed, altough you will have to manually modify
the program : Edit the "dumpReader.py" file, go to the line which says
"fontSize = 9" and change "9" to whatever point size fits you best.
This will only change the font size of the text area.

Note that i don't put any "preferences" dialog in the application itself, as i don't feel it's yet needed.
Regards,

Report

C

benji2

9 years ago

Hi,
Sorry for the delay. As I didn't have much time to work on it, i only did minor updates. I guess i may occasionally hack on it, but not very actively. I moved the (source) code to the launchpad code hosting for people who may be interested.

Report

REMF

10 years ago

still in active development. congrats and my thanks.

Report

andrewmin

10 years ago

I love the idea, but I have one suggestion. What about using wget (or curl or whatever) to download the latest version from the repository? Then, you wouldn't have to redownload it manually.

Other than that, great job!

Report

C

benji2

10 years ago

Hi Thanks
Regarding your suggestion, it would be great - but it's indeed not possible to do, because there is no way to "update" the already downloaded dump. The only way to get more up-to-date wikipedia data is to delete the old dump (including indexes files) and fully download a new one.
Therefore, it's pointless to do that automatically. On the other hand, i'll add a few lines in the README explaining exactly that, so the user is not confused when he wants fresher data.

Report

10 years ago

Updated to 0.2.10:
- Use a new indexing scheme for the entrylist - articles load faster now
- Upgrade path for old indexing scheme
- Utf8 fixes for non-ascii pathnames
- experimental RPM package - feedback welcome at the project website : https://launchpad.net/wikipediadumpreader

(jul 09: updated the ubuntu package for Jaunty's Python2.6 compatibility)

Updated to 0.2.9:
- make it able to load Wiktionary non-uppercased words
- Ability to load a 64-bits module - Thanks to Michael Heide
- added a small UI layout - Thanks to GreenReaper
- Better corrupted files handling

Updated to 0.2.8:
- Sorry : no program changes, but a much more friendly opening dialog
Built a rough Ubuntu package, to ease installation for unexperienced users running Ubuntu Gutsy or Hardy


Updated to 0.2.7:
- minor rendering fixes
- a few more macros

Updated to 0.2.6:
- better wikisyntax parsing
- minor bugfixes

Updated to 0.2.5:
- Bugfixes and improvement in rendering.
- Moved the development tree to lp
- optional fontsize

Updated to 0.2.4:
- Optional Latex/texvc call to render math. thanks to Mathieu Beliveau

Updated to 0.2.3:
- Fixed an obvious overflow bug in the index creation code.
Rebuilding the index is necessary, sorry. To force it, delete the two *idx files before running the program, and be patient (English dumps index creation takes several dozen minutes)
- basic table and footnotes support

Updated to 0.2.1 : fix a bug when reading articles on blocks boundaries
Updated to 0.2.2 : improved wiki rendering for lists and definitions

product-maker 15 91

File (click to download) Version Description Filetype Packagetype License Downloads Date Filesize OCS-Install DL
Details
version
0.2.10
updated Aug 16 2009
added Aug 29 2007
downloads today
0
page views today 1