22 jun 2006 kl. 20.29 skrev Tommy Nordgren:
> 22 jun 2006 kl. 20.15 skrev Sherm Pendley:
>
[quoted text clipped - 18 lines]
> I've already tried that. That was what i was doing when I got
> garbage.
> 22 jun 2006 kl. 20.29 skrev Tommy Nordgren:
>
[quoted text clipped - 18 lines]
> I found the problem it is necessary to
> 1) use the use utf8 pragma;
That's only needed if your actual Perl code is UTF-8 encoded, like my
example was. If your UTF-8 data is coming from an external source,
"use utf8" has no effect.
sherm--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
>>>>How do I write proper utf 8 characters to a file? I write only
>>>>two characters, and they come out as four
>>>>garbage characters when I view the file in an editor.
The only reason for that can be that you have your editor set to open
files as MacRoman or some non-utf-8 charset. Provided your editor
prefs are set to open as utf-8 or you opt for utf-8 in the open file
dialog you will not get this problem.
> I found the problem it is necessary to
>1) use the use utf8 pragma;
>2) Explicitly write a BOM byte sequence immediately after opening the file.
>point 2 is where I erred. I expected the BOM to be added automatically,
>when opening a file for write with the utf-8 encoding.
You would need to give an example of what you are doing, but neither
of those things should be necessary and nor should it be necessary to
specify utf-8 when opening the filehandle as Sherm suggested.
The following script will write "ö", utf8-encoded to "trash.txt" on
the desktop:
#!/usr/bin/perl
my $text = "ö";
my $f = "$ENV{HOME}/desktop/trash.txt";
open F, ">$f" or die $!;
print F $text;
close F;
If you open the file as utf-8 you will see "ö" and if you open it as
MacRoman you will see "ö". You could also open it as Traditional
Chinese or Simplified Chinese or many other things and see other
things. UTF-8 byte order is always the same, so there is no need for
a BOM, though some editors might use it as a hint.
JD
Joel Rees - 23 Jun 2006 15:03 GMT
> If you open the file as utf-8 you will see "" and if you open it
> as MacRoman you will see "". You could also open it as
> Traditional Chinese or Simplified Chinese or many other things and
> see other things. UTF-8 byte order is always the same, so there is
> no need for a BOM, though some editors might use it as a hint.
Given that his editor seems to have interpreted the file as utf-8
with the BOM in place and as something else without the BOM, we might
guess that his editor recognizes the BOM.
We could also, of course, guess that his login account is set to
default to something other than utf-8, which is also in keeping with
my experience with Mac OS X when the user has not deliberately messed
around with things.