| René Nyffenegger's collection of things on the web | |
|
René Nyffenegger on Oracle - Most wanted - Feedback
|
Recursive find and grep with perl | ||
|
I find it unbelievable how hard it is to grep recursively on a windows system in files. There are many
times I wished I had had something like grep and find on Windows. To counter this shortcoming, I have
written a little perl script that I use if I need to find certain text in a directory:
use strict;
use warnings;
use Cwd;
use File::Find;
my $search_pattern=$ARGV[0];
my $file_pattern =$ARGV[1];
find(\&d, cwd);
sub d {
my $file = $File::Find::name;
$file =~ s,/,\\,g;
return unless -f $file;
return unless $file =~ /$file_pattern/;
open F, $file or print "couldn't open $file\n" && return;
while (<F>) {
if (my ($found) = m/($search_pattern)/o) {
print "found $found in $file\n";
last;
}
}
close F;
}
It is called as follows
perl find.pl (?i)sorting html
The first argument is the regular expression. I use (?i) to indicate that I want to search case insensitive. The second argument
specifies that only files should be considered that contain html in their name.
Here's a C++ version of a recursive directory descender.
There is also a very flexible grep for windows (an exe that allows to recursively
search for patterns within files).
By the way, File::Finder is a wrapper around File::Find.
Improved find by Christopher Hilding
Christopher Hilding sent me a mail:
Hey,
Through your help I have finally got a solution to find in files (grep) on
Windows! However, there was a bug where the file_pattern also matched
directories. I.e. if you did "find.pl searchpattern htm" to find
"searchpattern" in all files containing the word "htm", and the *directory
path* to a file contained "htm" too, it would match as if the *file name* had
contained the word "htm", even if the file did not have it. The reason is that
you match $file (the full path) against the given filepattern, I changed it to
$_ which is just the actual file *name*.
I've also commented the file and created a case against trying to execute
without valid parameters, since yours used to crash ("invalid regex" errors).
It now outputs a help menu. I also changed it so that the file*name* match is
case insensitive by default so you don't have to keep specifying (?i)htm to
match HTM, htm, Htm, etc. Just in case there are variations.
I compiled the filename-matching regex as it won't be changing and may be used
to scan the name of thousands of files depending on what directory you are
scanning from. I also added the option to specify just the search pattern and
omitting a filename to make it search all files.
When it comes to the output of the script, I added a file match counter at the
end. And an extensive search mode where it shows all the matching lines of the
files along with their line numbers. I changed the output format of both search
types to a more advanced output with file numbering and clearer list (both for
the simple list and extended list). Umm, that's all I can remember but I
probably did even more. Enjoy! And thanks for starting me off with a great perl
sample!
Best Regards, Christopher H.
use strict;
use warnings;
use Cwd;
use File::Find;
use File::Basename;
my ($in_rgx,$in_files,$simple,$matches,$cwd);
sub trim($) {
my $string = shift;
$string =~ s/[\r\n]+//g;
$string =~ s/\s+$//;
return $string;
}
# 1: Get input arguments
if ($#ARGV == 0) { # *** ONE ARGUMENT *** (search pattern)
($in_rgx,$in_files,$simple) = ($ARGV[0],".",1);
}
elsif ($#ARGV == 1) { # *** TWO ARGUMENTS *** (search pattern + filename or flag)
if (($ARGV[1] eq '-e') || ($ARGV[1] eq '-E')) { # extended
($in_rgx,$in_files,$simple) = ($ARGV[0],".",0);
}
else { # simple
($in_rgx,$in_files,$simple) = ($ARGV[0],$ARGV[1],1);
}
}
elsif ($#ARGV == 2) { # *** THREE ARGUMENTS *** (search pattern + filename + flag)
($in_rgx,$in_files,$simple) = ($ARGV[0],$ARGV[1],0);
}
else { # *** HELP *** (either no arguments or more than three)
print "Usage: ".basename($0)." regexpattern [filepattern] [-E]\n\n" .
"Hints:\n" .
"*) If you need spaces in your pattern, put quotation marks around it.\n" .
"*) To do a case insensitive match, use (?i) preceding the pattern.\n" .
"*) Both patterns are regular expressions, allowing powerful searches.\n" .
"*) The file pattern is always case insensitive.\n";
exit;
}
if ($in_files eq '.') { # 2: Output search header
print basename($0).": Searching all files for \"${in_rgx}\"... (".(($simple) ? "simple" : "extended").")\n";
}
else {
print basename($0).": Searching files matching \"${in_files}\" for \"${in_rgx}\"... (".(($simple) ? "simple" : "extended").")\n";
}
if ($simple) { print "\n"; } # 3: Traverse directory tree using subroutine 'findfiles'
($matches,$cwd) = (0,cwd);
$cwd =~ s,/,\\,g;
find(\&findfiles, $cwd);
sub findfiles { # 4: Used to iterate through each result
my $file = $File::Find::name; # complete path to the file
$file =~ s,/,\\,g; # substitute all / with \
return unless -f $file; # process files (-f), not directories
return unless $_ =~ m/$in_files/io; # check if file matches input regex
# /io = case-insensitive, compiled
# $_ = just the file name, no path
# 5: Open file and search for matching contents
open F, $file or print "\n* Couldn't open ${file}\n\n" && return;
if ($simple) { # *** SIMPLE OUTPUT ***
while (<F>) {
if (m/($in_rgx)/o) { # /o = compile regex
# file matched!
$matches++;
print "---" . # begin printing file header
sprintf("%04d", $matches) . # file number, padded with 4 zeros
"--- ".$file."\n"; # file name, keep original name
# end of file header
last; # go on to the next file
}
}
} # *** END OF SIMPLE OUTPUT ***
else { # *** EXTENDED OUTPUT ***
my $found = 0; # used to keep track of first match
my $binary = (-B $file) ? 1 : 0; # don't show contents if file is bin
$file =~ s/^\Q$cwd//g; # remove current working directory
# \Q = quotemeta, escapes string
while (<F>) {
if (m/($in_rgx)/o) { # /o = compile regex
# file matched!
if (!$found) { # first matching line for the file
$found = 1;
$matches++;
print "\n---" . # begin printing file header
sprintf("%04d", $matches) . # file number, padded with 4 zeros
"--- ".uc($file)."\n"; # file name, converted to uppercase
# end of file header
if ($binary) { # file is binary, do not show content
print "Binary file.\n";
last;
}
}
print "[$.]".trim($_)."\n"; # print line number and contents
#last; # uncomment to only show first line
}
}
} # *** END OF EXTENDED OUTPUT ***
# 6: Close the file and move on to the next result
close F;
}
#7: Show search statistics
print "\nMatches: ${matches}\n";
# Search Engine Source: http://www.adp-gmbh.ch/perl/find.html
# Rewritten by Christopher Hilding, Dec 02 2006
# Formatting adjusted to my liking by Rene Nyffenegger, Dec 22 2006
|