Encode::Arabic::Buckwalter - Perl extension for Tim Buckwalter's transliteration of Arabic |
Encode::Arabic::Buckwalter - Perl extension for Tim Buckwalter's transliteration of Arabic
$Revision: 1.7 $ $Date: 2003/09/01 11:55:41 $
use Encode::Arabic::Buckwalter; # imports just like 'use Encode' would, plus more
while ($line = <>) { # Tim Buckwalter's mapping into the Arabic script
print encode 'utf8', decode 'buckwalter', $line; # 'buckwalter' alias 'Buckwalter' }
# shell filter of data, e.g. in *n*x systems instead of viewing the Arabic script proper
% perl -MEncode::Arabic::Buckwalter -pe 'encode "buckwalter", decode "utf8", $_'
Tim Buckwalter's notation is a one-to-one transliteration of the Arabic script for Modern Standard Arabic, using lower ASCII characters to encode the graphemes of the original script. This system has been very popular in Natural Language Processing, however, there are limits to its applicability due to numerous non-alphabetic codes involved.
The module takes care of the Encode::Encoding programming interface, while the
effective code is Tim Buckwalter's tr
ick:
$encode =~ tr[\x{060C}\x{061B}\x{061F}\x{0621}-\x{063A}\x{0640}-\x{0652} # !! no break in true perl !! \x{0670}\x{0671}\x{067E}\x{0686}\x{06A4}\x{06AF}] [,;?'|>&<}AbptvjHxd*rzs$SDTZEg_fqklmnhwYyFNKaui~o`{PJVG];
$decode =~ tr[,;?'|>&<}AbptvjHxd*rzs$SDTZEg_fqklmnhwYyFNKaui~o`{PJVG] [\x{060C}\x{061B}\x{061F}\x{0621}-\x{063A}\x{0640}-\x{0652} # !! no break in true perl !! \x{0670}\x{0671}\x{067E}\x{0686}\x{06A4}\x{06AF}];
If the first element in the list to use
is :xml
, the alternative mapping is introduced that suits
the XML etiquette. This option is there only to replace the >&<
reserved characters by OWI
while still having a one-to-one notation. There is no XML parsing involved, and the markup would get
distorted if subject to decode
!
$using_xml = eval q { use Encode::Arabic::Buckwalter ':xml'; decode 'buckwalter', 'OWI' }; $classical = eval q { use Encode::Arabic::Buckwalter; decode 'buckwalter', '>&<' };
# $classical eq $using_xml and $classical eq "\x{0623}\x{0624}\x{0625}"
The module exports as if use Encode
also appeared in the package. The other import
options are
just delegated to Encode and imports performed properly.
Encode::Arabic, Encode, Encode::Encoding
Tim Buckwalter's Qamus http://www.qamus.org
Buckwalter Arabic Morphological Analyzer http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp
Xerox Arabic Home Page http://www.arabic-morphology.com
Arabeyes Duali Project http://www.arabeyes.org/project.php
Otakar Smrz, http://ckl.mff.cuni.cz/smrz/
eval { 'E<lt>' . 'smrz' . "\x40" . ( join '.', qw 'ckl mff cuni cz' ) . 'E<gt>' }
Perl is also designed to make the easy jobs not that easy ;)
Copyright 2003 by Otakar Smrz
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Encode::Arabic::Buckwalter - Perl extension for Tim Buckwalter's transliteration of Arabic |