> use utf8;
> ...
[quoted text clipped - 9 lines]
> doesn't put the matched text into $matched when there are wide
> characters in $blah
Nor should it, even if the text is plain old ASCII - there's a bug in
the above code that has nothing to do with string encoding.
Take a ten-character ASCII string: 'abcdefghij'. Match it for 'fgh',
and $position will be 8, as expected.
So, the length of $blah is 10, and $position is 8. So, the above call
to substr amounts to:
$matched = substr('abcdefghij', -2, 10);
Which returns 'ij'; everything in the string *after* what was matched
by the regex.
As the docs for substr() state, if the offset (second argument) is
negative, the offset is taken from the end of the string, and if the
combination of offset and length is partially outside of the string,
the portion inside the string is returned. With an offset of -2, it's
obviously impossible to take ten characters beginning two from the
end, so only the remaining two are returned.
But really, why bother with substr() at all? Just parenthesize the
regex, and store the results in a list:
my @matched = ($blah =~ m/(<regex>)/g);
That will return a list of all the strings matching the expression.
If, after such a match you need to know the positions of the matched
strings in $blah, have a look at @- or @+.
sherm--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org