NAME
Tie::Hash::Abbrev::BibRefs - match bibliographic references to the original titles
SYNOPSIS
use Tie::Hash::Abbrev::BibRefs;
tie my %hash, 'Tie::Hash::Abbrev::BibRefs',
preprocess => sub { s/\s+[[:upper:]]:.*// },
stopwords => [ qw( a and de del der des di
et for für i if in la las
of on part Part Pt. Sect.
the to und ) ],
exceptions => { jpn => 'japan',
natl => 'national' };
$hash{'Physical Review B'} = '0163-1829';
print $hash{'Phys. Rev. B: Condens. Matter Mater. Phys.'};
# will print '0163-1829'
DESCRIPTION
This module is an attempt to ease the mapping of often abbreviated bibliographical references to the original titles.
To achieve this, it simplyfies the title according to parameterizable rules and stores it as a normalized key.
When accessing the hash, the key given is also normalized and compared to the normalized version of the original title. In addition, each word (words are separated by whitespace) may be abbreviated by specifying only the first few letters.
If more than one matching hash entry is found, the values of all matching entries are compared; as long as they are all equal (or all undef), the lookup is still considered to be successful.
KEY NORMALIZATION
The process of normalization is implemented as follows:
execute any preprocessing code (see "SYNOPSIS" in example above), which is expected to operate on
$_
. You can use subroutine references or strings here; strings will be eval()uated.split the key into parts (at whitespace).
remove any parts contained in the list of stopwords (see example above).
replace any parts contained in the list of exceptions by their corresponding value. If the value is undef, the entire part will be removed. (In the example above, "Jpn" would be replaced by "japan".) This lookup is done case-insensitively.
remove any non-word characters at the end of each part or followed by a dash
ADDITIONAL METHODS
debug
turn debug mode on (when given a true value as argument) or off (when given a false value). Returns the (possibly new) value.
In debug mode, the "find" method will print debug messages to STDERR.
delete_abbrev
my @deleted = tied(%hash)->delete_abbrev('foo','bar');
Will delete all elements on the basis of all unambiguous abbreviations given as arguments and return a (possibly empty) list of all deleted values.
exceptions
get or set the exceptions table for the hash. Expects hash references or undef, which clears the table. Returns a reference to the new exception table.
preprocess
set up the preprocessing code chain for the hash. Any code references or strings will be added to the chain, an undef will clear the chain.
stopwords
get or set the /stopwords for the hash. Any arguments given will be added to the list of stopwords. An undef
as argument will clear the list of stopwords. The method returns the new list of stopwords (in an unsorted manner).
INTERNAL METHODS
The following methods should usually not be called "from the outside"; the main intention of ducumenting them is that the author still wants to understand his own module in case changes will be neccessary later. :o)
exact
expects a key as first and a position as second argument. Returns the position if the given key equals (case-insensitively) the real key stored at that position or undef if not.
find
This is the central method for lookups, used by exists() and FETCH
.
It expects a key as its only argument.
Upon success, the method returns an array index at which the corresponding value can be found, or undef otherwise.
normalize
Given a key as the its only argument, this method will return the normalized key in scalar and a three element list in array context, consisting of
- 0.
-
the "prefix"
- 1.
-
the "search pattern" and
- 2.
-
the "normalized key".
pos
expects an (usually normalized) key as (its only) argument and returns the position at which this key is stored (if it exists) or should be sorted (if it does not already exist).
startover
expects no arguments and simply resets the iterator for the hash, so that the next call to each() will return the first key/value pair again.
BUGS
None known so far.
AUTHOR
Martin H. Sluka
mailto:[email protected]
http://martin.sluka.de/
THANKS TO
Dr. Hermann Schier from the Max Planck Institute for Solid State Research in Stuttgart/Germany for initiating and underwriting the development of this module and for contribution a lot of ideas.
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
SEE ALSO
3 POD Errors
The following errors were encountered while parsing the POD:
- Around line 14:
Non-ASCII character seen before =encoding in 'für'. Assuming CP1252
- Around line 380:
Expected text after =item, not a number
- Around line 384:
Expected text after =item, not a number