I thought I'd understood how to use unicode support in perl, but
evidently not. In the script below, I'm stumped as to:
1) why the regex won't match ''.
2) why the substitution is carried out, but the result isn't in UTF8,
nor is it UTF8 re-encoded in UTF8 (uncomment #require Encode;
........... #Encode::decode_utf8($_); to test this )
TIA
Robin
#!/usr/bin/perl -w
use strict;
use diagnostics-verbose;
#require Encode;
binmode (DATA,":utf8");
binmode (STDOUT,":utf8");
for (<DATA>){
if (/(<!--\@hidden-->)/gs){
print "match: ",$1,"\n";
#Encode::decode_utf8($_);
s/$1/=AC=AE/gs;
}elsif(/()/gs){
print "match: ",$1,"\n";
s/$1/12/gs;
}
print;
}
__DATA__
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
<TITLE> A Web Page</TITLE>
</HEAD>
<BODY>
<BLOCKQUOTE>
<H3>=AC=E8=9E=AEnews<FONT COLOR=#FF3300>1</FONT></H3>
... and this is a web page.
<P>
<IMG ALT="A Filler" WIDTH="450" HEIGHT="296">
<P>
hidden marker here -----><FONT
COLOR=#FF3300><!--@hidden--></FONT><------<BR>
</BLOCKQUOTE>
</BODY>
</HTML>
Andrew Mace - 15 Jun 2005 19:54 GMT
Try "use utf8" - it lets Perl know that your script contains utf8 chars.
More info: http://perlpod.com/5.9.1/lib/utf8.html
Andrew
> I thought I'd understood how to use unicode support in perl, but
> evidently not. In the script below, I'm stumped as to:
[quoted text clipped - 55 lines]
> </BODY>
> </HTML>
Robin - 15 Jun 2005 20:26 GMT
thanks Andrew and Sherm
I went back to look at perluniintro because I was sure I could remember
reading that the "use utf8" pragma was no longer needed, right under
where it says this it continues "Only one case remains where an
explicit "use utf8" is needed: if your Perl script itself is encoded in
UTF-8...."
*sigh*
Robin
Sherm Pendley - 15 Jun 2005 20:05 GMT
> I thought I'd understood how to use unicode support in perl, but
> evidently not. In the script below, I'm stumped as to:
[quoted text clipped - 3 lines]
> UTF8, nor is it UTF8 re-encoded in UTF8 (uncomment #require
> Encode; ........... #Encode::decode_utf8($_); to test this )
The binmode() calls you've included tell Perl that the data coming
from and going to those file handles is UTF8 encoded.
But, you have UTF8-encoded text in your code, too. To tell Perl about
that, you need to use the "use utf8;" pragma.
sherm--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org