I have a file which contains filenames (and the full path to them) and I want to search for a word within all of them. some pseudo-code to explain:
grep keyword or cat files.txt > grep keyword cat files txt | grep keyword
the problem is that I can only get grep to search the filenames, not the contents of the actual files
Thanks for reading
-
Long time when last created a bash shell script, but you could store the result of the first grep (the one finding all filenames) in an array and iterate over it, issuing even more grep commands.
A good starting point should be the bash scripting guide.
-
cat files.txt | xargs grep keywordor
grep keyword `cat files.txt`should do the trick.
Pitfalls:
If files.txt contains file names with spaces, either solution will malfunction, because "This is a filename.txt" will be interpreted as four files, "This", "is", "a", and "filename.txt". A good reason why you shouldn't have spaces in your filenames, ever.
- There are ways around this, but none of them is trivial. (find ... -print0 / xargs -0 is one of them.)
The second (cat) version can result in a very long command line (which might fail when exceeding the limits of your environment). The first (xargs) version handles long input automatically; xargs offers several options to control the details.
sigjuice : xargs will not spawn a seperate process for every line it reads from standard input. xargs will call grep with as many arguments as possible (say ARG_MAX). The number of times grep will be called is ceil(num_files/ARG_MAX).DevSolar : Correct... I mis-read the xargs manpage in that regard. Edited. (The actual limits of xargs can be determined with "xargs --show-limits". -
Both of the answers from DevSolar work (tested on Linux Ubuntu), but the xargs version is preferable if there may be many files, since it will avoid running into command line length limits.
so:
cat files.txt | xargs grep keywordis the way to go
DevSolar : Added it to my answer, plus a note that xargs invokes a new process for each file.pixelbeat : xargs does not spawn a new process for each argumentsigjuice : You won the "Useless Use of Cat" Award. :-) http://partmaps.org/era/unix/award.html -
tr '\n' '\0' <files.txt | LANG=C xargs -r0 grep -F keyword- tr will delimit names with NUL character so that spaces not significant (note the corresponding -0 option to xargs).
- xargs -r will start a single grep process for a "large" number of files, but not start any grep process if there are no files.
- LANG=C means use quick routines for matching, rather than slow locale ones
- grep -F means use quick string matching rather than slow regular expression matching
DevSolar : Doesn't the LANG=C thing fail once filenames aren't ASCII-7?pixelbeat : Not for fixed strings no. If you wan't to grep for things like '[:upper:]' then yes.
0 comments:
Post a Comment