NAME
mu-index - index e-mail messages stored in Maildirs
SYNOPSIS
mu [common-options] index
DESCRIPTION
mu index is the mu command for scanning the contents of Maildir directories and storing the results in a Xapian database. The data can then be queried using mu-find(1).
Before the first time you run mu index, you must run mu init to initialize the database.
index understands Maildirs as defined by Daniel Bernstein for qmail(7). In addition, it understands recursive Maildirs (Maildirs within Maildirs), Maildir++. It also supports VFAT-based Maildirs which use ! or ; as the separators instead of :.
E-mail messages which are not stored in something resembling a maildir leaf-directory (cur and new) are ignored, as are the cache directories for notmuch and gnus, and any dot-directory.
Symlinks are followed, and the directories can be spread over multiple filesystems; however note that moving files around is much faster when multiple filesystems are not involved. Be careful to avoid self-referential symlinks!
If there is a file called .noindex in a directory, the contents of that directory and all of its subdirectories will be ignored. This can be useful to exclude certain directories from the indexing process, for example directories with spam-messages.
If there is a file called .noupdate in a directory, the contents of that directory and all of its subdirectories will be ignored. This can be useful to speed up things you have some maildirs that never change.
.noupdate does not affect already-indexed message: you can still search for them. .noupdate is ignored when you start indexing with an empty database (such as directly after mu init).
There also the option --lazy-check which can greatly speed up indexing; see below for details.
The first run of mu index may take a few minutes if you have a lot of mail (tens of thousands of messages). Fortunately, such a full scan needs to be done only once; after that it suffices to index the changes, which goes much faster. See the ’PERFORMANCE (i,ii,iii)’ below for more information.
The optional ’phase two’ of the indexing-process is the removal of messages from the database for which there is no longer a corresponding file in the Maildir. If you do not want this, you can use -n, --nocleanup.
When mu index catches one of the signals SIGINT, SIGHUP or SIGTERM (e.g., when you press Ctrl-C during the indexing process), it attempts to shutdown gracefully; it tries to save and commit data, and close the database etc. If it receives another signal (e.g., when pressing Ctrl-C once more), mu index will terminate immediately.
INDEX OPTIONS
--lazy-check
in lazy-check mode, mu does not consider messages for
which the time-stamp (ctime) of the directory they reside in
has not changed since the previous indexing run. This is
much faster than the non-lazy check, but won’t update
messages that have change (rather than having been added or
removed), since merely editing a message does not update the
directory time-stamp. Of course, you can run mu-index
occasionally without --lazy-check, to pick up such
messages.
--nocleanup
disable the database cleanup that mu does by default
after indexing.
--reindex
perform a complete reindexing of all the messages in the
maildir.
--muhome
use a non-default directory to store and read the database,
write the logs, etc. By default, mu uses the XDG Base
Directory Specification (e.g. on GNU/Linux this defaults to
~/.cache/mu and ~/.config/mu). Earlier
versions of mu defaulted to ~/.mu, which now
requires --muhome=~/.mu.
The environment variable MUHOME can be used as an alternative to --muhome. The latter has precedence.
COMMON OPTIONS
-d,
--debug
makes mu generate extra debug information, useful for
debugging the program itself. By default, debug information
goes to the log file, ~/.cache/mu/mu.log. It can safely be
deleted when mu is not running. When running with --debug
option, the log file can grow rather quickly. See the note
on logging below.
-q,
--quiet
causes mu not to output informational messages and progress
information to standard output, but only to the log file.
Error messages will still be sent to standard error. Note
that mu index is much faster with --quiet, so it is
recommended you use this option when using mu from scripts
etc.
--log-stderr
causes mu to not output log messages to standard error, in
addition to sending them to the log file.
--nocolor
do not use ANSI colors. The environment variable
NO_COLOR can be used as an alternative to
--nocolor.
-V,
--version
prints mu version and copyright information.
-h,
--help
lists the various command line options.
ENCRYPTION
mu index does not decrypt messages, and only the metadata (such as headers) of encrypted messages makes it to the database. mu view and mu4e can decrypt messages, but those work with the message directly and the information is not added to the database.
PERFORMANCE
indexing in
ancient times (2009?)
As a non-scientific benchmark, a simple test on the
author’s machine (a Thinkpad X61s laptop using Linux
2.6.35 and an ext3 file system) with no existing database,
and a maildir with 27273 messages:
$ sudo sh -c
’sync && echo 3 >
/proc/sys/vm/drop_caches’
$ time mu index --quiet
66,65s user 6,05s system 27% cpu 4:24,20 total
(about 103 messages per second)
A second run, which is the more typical use case when there is a database already, goes much faster:
$ sudo sh -c
’sync && echo 3 >
/proc/sys/vm/drop_caches’
$ time mu index --quiet
0,48s user 0,76s system 10% cpu 11,796 total
(more than 56818 messages per second)
Note that each test flushes the caches first; a more common use case might be to run mu index when new mail has arrived; the cache may stay quite ’warm’ in that case:
$ time mu index
--quiet
0,33s user 0,40s system 80% cpu 0,905 total
which is more than 30000 messages per second.
indexing in
2012
As per June 2012, we did the same non-scientific benchmark,
this time with an Intel i5-2500 CPU @ 3.30GHz, an ext4 file
system and a maildir with 22589 messages. We start without
an existing database.
$ sudo sh -c
’sync && echo 3 >
/proc/sys/vm/drop_caches’
$ time mu index --quiet
27,79s user 2,17s system 48% cpu 1:01,47 total
(about 813 messages per second)
A second run, which is the more typical use case when there is a database already, goes much faster:
$ sudo sh -c
’sync && echo 3 >
/proc/sys/vm/drop_caches’
$ time mu index --quiet
0,13s user 0,30s system 19% cpu 2,162 total
(more than 173000 messages per second)
indexing in
2016
As per July 2016, we did the same non-scientific benchmark,
again with the Intel i5-2500 CPU @ 3.30GHz, an ext4 file
system. This time, the maildir contains 72525 messages.
$ sudo sh -c
’sync && echo 3 >
/proc/sys/vm/drop_caches’
$ time mu index --quiet
40,34s user 2,56s system 64% cpu 1:06,17 total
(about 1099 messages per second).
indexing in
2022
A few years later and it is June 2022. There’s a lot
more happening during indexing, but indexing became
multi-threaded and machines are faster; e.g. this is with an
AMD Ryzen Threadripper 1950X (16 cores) @ 3.399GHz.
The instructions are a little different since we have a proper repeatable benchmark now. After building,
$ sudo sh -c
’sync && echo 3 >
/proc/sys/vm/drop_caches’
% THREAD_NUM=4 build/lib/tests/bench-indexer -m perf
# random seed: R02Sf5c50e4851ec51adaf301e0e054bd52b
1..1
# Start of bench tests
# Start of indexer tests
indexed 5000 messages in 20 maildirs in 3763ms; 752
μs/message; 1328 messages/s (4 thread(s))
ok 1 /bench/indexer/4-cores
# End of indexer tests
# End of bench tests
Things are again a little faster, even though the index does a lot more now (text-normalizatian, and pre-generating message-sexps). A faster machine helps, too!
recent
releases
Indexing the the same 93000-message mail corpus with the
last few releases:
Quite some variation!
Over time new features / refactoring can change the timings quite a bit. At least for now, the latest code is both the fastest and the most featureful!
EXIT CODE
This command returns 0 upon successful completion, or a non-zero exit code otherwise.
0. |
success | ||
2. |
no matches found. Try a different query | ||
11. |
database schema mismatch. You need to re-initialize mu, see mu-init(1) | ||
19. |
failed to acquire lock. Some other program has exclusive access to the mu database | ||
99. |
caught an exception |
REPORTING BUGS
Please report bugs at https://github.com/djcb/mu/issues.
AUTHOR
Dirk-Jan C. Binnema <djcb [AT] djcbsoftware.nl>
COPYRIGHT
This manpage is part of mu 1.12.5.
Copyright © 2008-2024 Dirk-Jan C. Binnema. License GPLv3+: GNU GPL version 3 or later https://gnu.org/licenses/gpl.html. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.