Recoll

Utilities

Source (link to git-repo or to original if based on someone elses unmodified work): Add the source-code for this project on opencode.net

7
Score 78.4%
Description:

Recoll is a personal full text search tool for Unix/Linux.

It is based on the very strong Xapian backend.

It provides an easy to use, feature-rich interface with a Qt GUI.

Most common document types are supported are supported along with their compressed versions (Text, HTML, PDF, Dvi, PostScript, Openoffice, Lyx, Scribus, Word/Excel/PPT, Abiword, Kword, Wordperfect, RTF, djvu, gaim logs, maildir and mailbox mail folders including attachments, misc media files).

Powerful query facilities are provided from simple keyword entry to assisted boolean query building with proximity clauses, filtering on file types or location. A Xesam-compatible query language also supports field searches, and date filtering.

Multiple character sets are supported. Internal processing and storage uses Unicode UTF-8.

Recoll has few dependancies. No database daemon, Web server, or exotic language/framework is necessary. In the default setup, it only runs on your system when you need it. Indexing can be performed in batch mode or in real time.

Thanks to Xapian, indexing does not tax system resources excessively and searching is very fast.
Last changelog:

5 years ago

Latest 1.19 is 1.19.13: this hopefully fixes the last remaining bug in the multithreading code, which was causing quite rare, but ennoying crashes. You definitely want to upgrade to this version if you are running recoll 1.19.

Release 1.19 brings faster indexing for multiprocessors, new results management features (multiple attachment saves, duplicates listing), advanced search history storage, and other performance and usability enhancements. Also, a nice new PPT filter, Python 3 compatibility, and, for Ubuntu Users a Scope for the Dash on Saucy and Trusty.

Release 1.18.1 brings optional case- and diacritics-sensitive searches, complex search history, direct access to hit pages for PDF documents.

Release 1.17.3 brings a number of usability improvement: management of indexing operations from the GUI, filtering on file size, extended directory filtering, Ubuntu Unity Lens, thumbnails in result lists, Okular notes and Gnumeric filters, etc.

Release 1.16.2 brings a long list of small improvements and bug fixes. Image previews, negative directory filtering, anchored searches, more popup menu entries, etc. Please check the release notes for details (http://www.recoll.org/release-1.16.html).

Release 1.15 (.9): Enhanced native Qt 4 user interface (no more Qt 3 compatibility). Switchable table-like display for the results. Direct access to sort functions. Negative directory filtering. Web archive formats.

Release 1.14 (.3): Modification date searches and filtering. New GNU info filter. Improved Thunderbird mail indexing. Other small bug fixes. date searches and filtering, arbitrary email header indexing, new audio tag extractor based on the Mutagen Python library, and miscellaneous other improvements.

Release 1.13 (.04): New class of persistent filters and indexed file types: zip, chm, ics. Improved big text files handling, Firefox visited pages indexing. Quite a few other performance and usability improvements.

Release 1.12: new KDE KIO slave module, collapsing of identical results, context-sensitive F1 help, saving email attachments and other embedded documents to files, and other small improvements and bug fixes.

Release 1.11: easy filtering of results by document type, nicer previews which use html when possible, python programming interface for indexing and searching, better support for the Xesam user query language, new filter framework, better support for arbitrary field indexing and searching.

Release 1.10:
- Created mailing-list to improve support. Check home page.
- Fixed openSuse 11 compile issues.
- Fixed bug in interpreting email mime structure, which resulted in base-64 decoding errors.
- Fixed "Prev" button in preview window. Would actually go forward when walking the search terms.
- Allow setting the highlight color for search terms in result list and preview
- Added svg filter
- Ensure that in case the data of a file can't be indexed because of some error, at least the file name is indexed.
- Improve query language to support OR queries of terms with field specifications (ie: title:someterm OR author:someauthor).
- Fix filename search to split patterns on white space, so that a "*.jpg *.jpeg" search does what's expected. Means you now need to use double-quotes if there is actual embedded white space.
- Jump directly to the external editor choice dialog instead of opening preferences when an external viewer is not found.
- Allow stopping indexing through menu action (only works with qt4 for now).
- Create an "indexedmimetypes" configuration variable to allow explicitely restricting the file types which do get indexed.
- Adds support for CJK text, and a GUI configuration tool for the main configuration file.

Release 1.9: This release brings a number of small practical improvements: new filters: Wordperfect, Abiword, Kword, jpeg, flac, ogg; better control of disk and memory usage during indexing; improved abstract generation; arbitrary field support; improved qt4 support; and miscellaneous user interface improvements and bug fixes, described in more detail in the Changes file.

stalin2000

3 years ago

Also have a look at the improved Icon Theme and Layout for Recoll:

https://www.linux-apps.com/p/1162008/

Report

9

darthroe

5 years ago

I prefer Recoll over using KDE's integrated search.

Report

dglent

7 years ago

Here are rpms for Mageia 1 64bits
http://mageia-gr.org/rpm/1/x86_64/recoll-1.17.1-1mgr1.x86_64.rpm
http://mageia-gr.org/rpm/1/x86_64/kio-recoll-1.17.1-1mgr1.x86_64.rpm

Report

stalin2000

8 years ago

If the icons are too big or if you don't like them, here is an alternative more beatiful Icon-Theme:

http://kde-look.org/content/show.php/Alternative+Icon+Theme+Recoll?content=145669

Report

9

google01103

8 years ago

running rpmbuiild --rebuild on recoll-1.15.2-0.src.rpm results in an error

" /usr/lib64/gcc/x86_64-suse-linux/4.5/../../../../x86_64-suse-linux/bin/ld: cannot find -luuid
collect2: ld returned 1 exit status"

but I do have libuuid-devel, libuuid1 and uuid rpms installed

ps the link on your dl page for 1.15.2 src.rpm actually links to 1.15.0 for openSuse

no biggy since it will soon be in repo
thanks,

Report

C

medoc

8 years ago

Thanks, I'll take a closer look at this when I'm back in 10 days.
jf

Report

9

google01103

8 years ago

of course it compiled from source fine

thanks,

Report

9

google01103

8 years ago

Just compiled and

1) file name column in table view is empty

2) adding size column might be useful

3) in regular view the reason for the indentationing (to me) isn't obvious - seems based on year quarters but having each document in the indented group indented one more than the previous is excessive

thanks,

Report

C

medoc

8 years ago

Hi,
About the file name column: if a previous version of recoll was installed, you need a full reindex (recollindex -z). Else, this is a bug, please get in touch with me (jfd@recoll.org).

Size column: right-click on the table header, you should be able to customize the columns to your content (else, see email address above...)

Indentation: this is not intentional, I've seen it happen, it seems to depend on the Qt version. Try to reset the result list paragraph format (in the query preferences, just set it to empty to restore the default). Maybe you can try to update Qt too. If nothing works, please get in touch.

jf

Report

9

google01103

8 years ago

thanks,

1) file still name not showing in table view after running 'recoll -z' , sent email

2) did not notice columns could be added

3) clearing 'paragraph format' resolved indenting

thanks,

Report

9

google01103

8 years ago

sorry but for clarification tile=filename so the filename column is not necessary, correct? Or is filename supposed to equal url? Either way not sure what the purpose of filename column is now

thanks,

Report

C

medoc

8 years ago

filename is the short name for the file (without the path). For people who give meaningful names to files it's sometimes actually more interesting than the document title. This depends on local taste and type of document, it was added as a separate field following popular request :)

By the way the command to reset the index would be recollindex -z, not recoll -z, but maybe that's what you did.

Report

C

medoc

8 years ago

Oh yes, and when a document does not have an internal title (ie: text/plain), recoll uses the file name as stand in, so that in this case filename==title

Report

C

medoc

8 years ago

Just not to let this thread unclosed: the filenames field finally got to work for not entirely clear reasons, but anyway, all was well that ended well :)

Report

C

medoc

8 years ago

This is becoming seriously mysterious !

We need to check what happens during indexing.

- Set loglevel to 6 in the config (either from indexing preferences or by editing recoll.conf)
- Create a small text file inside the indexed area, ie:
cd
echo atextfile > bogus.txt

- try to index it:
recollindex -i bogus.txt

- You should see in the log the data record created for the file:
:5:../rcldb/rcldb.cpp:1128:Rcl::Db::add: new doc record:
url=file:///home/.../2010telephs.txt
...
filename=2010telephs.txt

If the filename field is not there, this is an indexing issue, else it's a query issue, we'll concentrate on the appropriate area at the next step.
If you need to repeat the test, run "recollindex -e bogus.txt" to erase the index data for the file first (else no reindexing will be performed).

I'm leaving for ten days this afternoon, I'll get back to this then, if you're patient enough to still be around :)

Report

9

google01103

8 years ago

I have no ~/.recoll/fields

I did delete all in that folder except recoll.conf and reindexed (recollindex -z) and I get the same - nothing in the filename column.

I am running 1.5.1

see http://simplest-image-hosting.net/jpeg-0-recoll0

as always, thanks

Report

C

medoc

8 years ago

(seems I can't reply to the last comment, so replying here).

Normally ALL documents have file names stored as a field in the index. And they also all have titles, because if no internal title (ie: html <title> or email Subject:) is found, then the file name is copied in there.

So I don't understand why you don't see the file names.

Maybe try to empty ~/.recoll/fields in case there is something weird in there, then retry the recollindex -z (sorry about the repetition).

Report

9

google01103

8 years ago

so then you are defaulting title to filename and not displaying filename in the cases where the document does not have an internal filename? And since I don't add title to doc's I create I should just delete the filename columns

ps - I did run recollindez -z, as suggested (just posted recoll -z)

pss - would you format the size colum (add comma's)

thsnks,

Report

9

jamjam

8 years ago

I like the customization options. I added an extra <br> to space out the query results. cool!

Report

9

jamjam

8 years ago

...it would be nice if the result rows could be rendered with alternating background colors.

Report

molecule-eye

9 years ago

In the search results, it's generally best to use the pdf filename rather than the embedded title info which is rarely accurate (and things will likely never change in this respect). Yes, the filename is listed underneath, but it's not easy to discern.

Report

C

medoc

9 years ago

Hi,
There is now (1.14) a "filename" field which you can use in a custom result paragraph format, ie, <b>%(filename)</b> to display the file name prominently.

Report

dovidhalevi

9 years ago

Recoll is the only one which does not slow down my old PIII clunker. I do not dare try to run nepomuk and strigi, the kde4 resource hawgs. Those kde4 daemons are crippling.

So ... I am sure I am not the only one who would like to use recoll INSTEAD of the above. I will likely roll my own runner using the CLI and parsing its output when I have time to devote to it. Some .so based backends would be nice, even a library to directly access recoll's/xapian's data.

Report

C

medoc

9 years ago

Probably the best approach for custom search interfaces would be to use the Python API which can access most, if not all, Recoll functionality (and I'm willing to extend it). Of course there is a .so with a C++ api behind this, but I think that the C++ api is quite unwieldy and it would be better to use the Python one.
(and sorry I did not answer earlier, I rarely look at this page, and don't get email when comments are added).

jf

Report

dovidhalevi

9 years ago

The recoll runner is posted on this site so give it a try!

I used C++ and I run the recoll CLI. Might try to lift code from recoll sources sometime for direct calls but this might not be a great advantage.

Python scripting is an alternative and is safer since it will not slam plasma in a clinch. However, having to keep interpreters or vm's (Java) resident full time would seem to me too heavy a hit for spot-usage runners or plasmoids.

Report

5 years ago

Latest 1.19 is 1.19.13: this hopefully fixes the last remaining bug in the multithreading code, which was causing quite rare, but ennoying crashes. You definitely want to upgrade to this version if you are running recoll 1.19.

Release 1.19 brings faster indexing for multiprocessors, new results management features (multiple attachment saves, duplicates listing), advanced search history storage, and other performance and usability enhancements. Also, a nice new PPT filter, Python 3 compatibility, and, for Ubuntu Users a Scope for the Dash on Saucy and Trusty.

Release 1.18.1 brings optional case- and diacritics-sensitive searches, complex search history, direct access to hit pages for PDF documents.

Release 1.17.3 brings a number of usability improvement: management of indexing operations from the GUI, filtering on file size, extended directory filtering, Ubuntu Unity Lens, thumbnails in result lists, Okular notes and Gnumeric filters, etc.

Release 1.16.2 brings a long list of small improvements and bug fixes. Image previews, negative directory filtering, anchored searches, more popup menu entries, etc. Please check the release notes for details (http://www.recoll.org/release-1.16.html).

Release 1.15 (.9): Enhanced native Qt 4 user interface (no more Qt 3 compatibility). Switchable table-like display for the results. Direct access to sort functions. Negative directory filtering. Web archive formats.

Release 1.14 (.3): Modification date searches and filtering. New GNU info filter. Improved Thunderbird mail indexing. Other small bug fixes. date searches and filtering, arbitrary email header indexing, new audio tag extractor based on the Mutagen Python library, and miscellaneous other improvements.

Release 1.13 (.04): New class of persistent filters and indexed file types: zip, chm, ics. Improved big text files handling, Firefox visited pages indexing. Quite a few other performance and usability improvements.

Release 1.12: new KDE KIO slave module, collapsing of identical results, context-sensitive F1 help, saving email attachments and other embedded documents to files, and other small improvements and bug fixes.

Release 1.11: easy filtering of results by document type, nicer previews which use html when possible, python programming interface for indexing and searching, better support for the Xesam user query language, new filter framework, better support for arbitrary field indexing and searching.

Release 1.10:
- Created mailing-list to improve support. Check home page.
- Fixed openSuse 11 compile issues.
- Fixed bug in interpreting email mime structure, which resulted in base-64 decoding errors.
- Fixed "Prev" button in preview window. Would actually go forward when walking the search terms.
- Allow setting the highlight color for search terms in result list and preview
- Added svg filter
- Ensure that in case the data of a file can't be indexed because of some error, at least the file name is indexed.
- Improve query language to support OR queries of terms with field specifications (ie: title:someterm OR author:someauthor).
- Fix filename search to split patterns on white space, so that a "*.jpg *.jpeg" search does what's expected. Means you now need to use double-quotes if there is actual embedded white space.
- Jump directly to the external editor choice dialog instead of opening preferences when an external viewer is not found.
- Allow stopping indexing through menu action (only works with qt4 for now).
- Create an "indexedmimetypes" configuration variable to allow explicitely restricting the file types which do get indexed.
- Adds support for CJK text, and a GUI configuration tool for the main configuration file.

Release 1.9: This release brings a number of small practical improvements: new filters: Wordperfect, Abiword, Kword, jpeg, flac, ogg; better control of disk and memory usage during indexing; improved abstract generation; arbitrary field support; improved qt4 support; and miscellaneous user interface improvements and bug fixes, described in more detail in the Changes file.

12345678910
114
product-maker darthroe May 06 2014 9 excellent
product-maker sealbhach Dec 14 2012 9 excellent
product-maker dmeyer Oct 14 2012 9 excellent
product-maker remix Apr 03 2012 9 excellent
product-maker paulus3005 Mar 24 2012 3 bad
product-maker cjann Dec 29 2011 9 excellent
product-maker seaman123 Sep 23 2011 9 excellent
product-maker kerenskyy May 29 2011 9 excellent
product-maker lazx888 May 04 2011 9 excellent
product-maker google01103 Mar 05 2011 9 excellent
product-maker groo Feb 17 2011 9 excellent
product-maker yuksing Feb 04 2011 9 excellent
product-maker Alesvol Feb 03 2011 9 excellent
product-maker LazyKent Feb 02 2011 9 excellent
product-maker jamjam Nov 24 2010 9 excellent
product-maker Base: 4 x 5.0 Ratings
omiliya
Nov 30 2013
remix
Apr 03 2012
stalin2000
Oct 02 2011
kerenskyy
May 29 2011
groo
Feb 17 2011
tittiatcoke
Sep 19 2010
cjann
Mar 02 2010
File (click to download) Version Description PackagetypeArchitectureRelease Channel Downloads Date Filesize DL OCS-Install MD5SUM
*Needs pling-store or ocs-url to install things
Pling
0 Affiliates
Details
license
version
1.19.13
updated May 06 2014
added Jan 29 2007
downloads 24h
0
pageviews 24h 8
System Tags app software