2

I have programmed a Perl script which has two input files:

  1. The first file has on each line phrase and then a value between parentheses. Here an example:

    hello all (0.5)
    hi all (0.63)
    good bye all (0.09)
    
  2. The second file has a list of rules. For example:

    hello all -> salut (0.5)
    hello all -> salut à tous (0.5)
    hi all -> salut (0.63)
    good bye all -> au revoir (0.09)
    good bye -> au revoir  (0.09)
    

The script has to read the second file and for each line it extracts the phrase before the arrow (e.g. for the 1st line: hello all) and it will check if this phrase is present in the first file (in our example here it is found).

If it is present it write the whole line hello all -> salut (0.5) to the output. So in this example the output file should be:

hello all -> salut (0.5)
hello all -> salut à tous (0.5)
hi all -> > salut (0.63)
good bye all -> au revoir (0.09)

My idea is to put all the contents of the first file into a hash table. For this here my script:

#!/usr/bin/perl

use warnings;

my $vocabFile = "file1.txt";
my %hashFR =();
open my $fh_infile, '<', $InFile or die "Can't open $InFile\n";

while ( my $Ligne = <$fh_infile> ) {
  if ( $Ligne =~ /(/ ) {
    my ($cle, $valeur) = split /(/, $Ligne;
    say $cle; 
    $h{$cle}  = $valeur;
  }     
}

My question now: how do I extract the segment of word just before the arrow and search for it in the hash table?

Thank you for your help

3 Answers 3

2

You need to use strict. This would cause your program to fail when it encountered undeclared variables like $InFile (I assume you meant to use $vocabFile). I'm going to ignore those types of issues in the code you posted because you can fix them yourself once you turn on strict.

First, a couple of logic issues with your existing code. You don't seem to actually use the numbers in parentheses that you store as your hash values, but if you ever do want to use them, you should probably get rid of the trailing ):

    my ($cle, $valeur) = split /[()]/, $Ligne;

Next, strip leading and trailing whitespace before using a string as a hash key. You may think "foo" and "foo " are the same word, but Perl doesn't.

$cle =~ s/^\s+//;
$cle =~ s/\s+$//;

Now, you're already most of the way there. You clearly already know how to read in a file, how to use split, and how to use a hash. You just need to put these all together. Read in the second file:

open my $fh2, "<", "file2" or die "Can't open file2: $!";

while (<$fh2>) {
    chomp;

...get the part before the ->

    my ($left, $right) = split /->/;

...strip leading and trailing whitespace from the key

    $left =~ s/^\s+//;
    $left =~ s/\s+$//;

...and print out the whole line if the key exists in your hash

    print $_, "\n" if exists $hash{$left};

...don't forget to close the filehandle when you're done with it

close $fh2;

(although as amon points out, this is not strictly necessary, especially since we're reading and not writing. There's a nice PerlMonks thread dealing with this topic.)

2
  • This is an incredibly nice answer. +1 all the way! Nitpick: closing isn't that neccessary, and die !? is a syntax error ;-) you meant use autodie or die "Can't open file2: $!".
    – amon
    Commented Sep 20, 2013 at 20:39
  • @amon Thank you, and fixed. That's what I get for using the answer box as my compiler ;) Commented Sep 20, 2013 at 20:46
1

This can be done very straightforwardly by creating a hash directly from the contents of the first file, and then reading each line of the second, checking the hash to see if it should be printed.

use strict;
use warnings;
use autodie;

my %permitted = do {
  open my $fh, '<', 'f1.txt';
  map { /(.+?)\s+\(/, 1 } <$fh>;
};

open my $fh, '<', 'f2.txt';
while (<$fh>) {
  my ($phrase) = /(.+?)\s+->/;
  print if $permitted{$phrase};
}

output

hello all -> salut (0.5)
hello all -> salut à tous (0.5)
hi all -> salut (0.63)
good bye all -> au revoir (0.09)
1
  • Thank you for all your replie. I'am using now the Borodin's version (tahnks to him) @Borodin: How can i change tu*o make it print the result in the text file use strict; use warnings; use autodie; my $out = "result2.txt"; open outFile, ">$out" or die $!; my %permitted = do { open my $fh, '<', 'f1.txt'; map { /(.+?)\s+(/, 1 } <$fh>; }; open my $fh, '<', 'f2.txt'; while (<$fh>) { my ($phrase) = /(.+?)\s+->/; print if $permitted{$phrase}; } close outFile;
    – Poisson
    Commented Sep 21, 2013 at 13:20
1
#!/usr/bin/perl

use strict; use warnings;
use Data::Dumper;

open my $FILE_1, '<', shift @ARGV;
open my $FILE_2, '<', shift @ARGV;

my @file1 = <$FILE_1>;
my @file2= <$FILE_2>;

close $FILE_1;
close $FILE_2;
# Store "segments" from the first file in hash:
my %first_file_hash = map { chomp $_; my ($a) = $_ =~ /^(.*?)\s*\(/; $a => 1 } @file1;

my @result;
# Process file2 content:
foreach my $line (@file2) {
    chomp $line;
    # Retrieve "segment" from the line:
    my ($string) = $line =~ /^(.*?)\s+->/;
    # If it is present in file1, store it for future usage:
    if ($string and $first_file_hash{ $string }) {
        push @result, $line;
    }
}

open my $F, '>', 'output.txt';
print $F join("\n", @result);
close $F;

print "\nDone!\n";

Run as:

perl script.pl file1.txt file2.txt

Cheers!

3
  • 1
    This is a nice answer, but great answers also explain what they are doing, instead of dumping code. There are some issues with your code. One, you aren't using any error handling for open. I suggest to use autodie as a remedy. Two, your code is very inefficient. Instead of push @result, ..., you could print out that line directly!
    – amon
    Commented Sep 20, 2013 at 20:37
  • @amon - yes obviously! It is not perfect, but it is not a "production" code too. It was just an example. My intention was to outline solution, focusing on retrieving data.
    – robert.r
    Commented Sep 20, 2013 at 20:50
  • @amon - one more thing - if anything is not clear in my code feel free to ask! I thought it is almost self-documenting ;)
    – robert.r
    Commented Sep 20, 2013 at 21:01

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.