|
|
sybperl-l Archive
Up Prev Next
From: "Jayson Pifer" <JPifer at jefco dot com>
Subject: Re: Switch SEPARATOR between different BCP runs
Date: Oct 31 2003 8:52PM
The proper way of doing this is supposed to be with the quote regex qr//.
I tested this method and it worked using this quick hack in the BLK
module:
------------------------------------------------------------------
sub _readln {
my $sep = shift;
my $ln;
my @d;
if(defined($ln = )) {
chomp $ln;
my $compiled = qr/$sep/; ## Added line
@d = split(/$compiled/, $ln, -1); ## Used $compiled instead of
$sep
}
@d;
}
------------------------------------------------------------------
As to performance, I think the jury is still out. My speed gains are
minimal, but I suspect if the regular expression gets more complex the
performance will make a difference. Here are the results on my machine:
------------------------------------------------------------------
Benchmark: timing 5 iterations of Compiled with //o, Compiled with qr//,
Constant, Variable...
Compiled with //o: 43 wallclock secs (27.20 usr + 0.00 sys = 27.20 CPU)
Compiled with qr//: 36 wallclock secs (29.13 usr + 0.00 sys = 29.13 CPU)
Constant: 34 wallclock secs (26.01 usr + 0.00 sys = 26.01 CPU)
Variable: 36 wallclock secs (30.04 usr + 0.00 sys = 30.04 CPU)
------------------------------------------------------------------
And here is the test script to critique....
------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use Benchmark;
@ARGV = qw(/usr/dict/words);
my @words = <>;
push @words, @words;
push @words, @words;
push @words, @words;
push @words, @words;
my $var = '[a|B]+?c{1}';
my $var2 = qr/[a|B]c{1}\s+?/; ## Semi-complex regex...could get far worse
timethese( 5, {
'Constant' => sub { for(@words) { split(/[f][o][o]/); } },
'Variable' => sub { for(@words) { split(/$var/); } },
'Compiled with //o' => sub { for(@words) { split(/$var/o); } },
'Compiled with qr//' => sub { for(@words) { split(/$var2/); } },
});
exit;
------------------------------------------------------------------
--Jayson
"Scott Zetlan"
Sent by: owner-sybperl-l@peppler.org
10/31/03 08:32 AM
To:
cc: "Sybperl-L Mailing List"
Subject: Re: Switch SEPARATOR between different BCP runs
When I was porting BCP.pm to BLK.pm, I experimented with removing that
/o optimisation. I discovered that on my architecture (puny Sun Ultra
5) it made no difference whatsoever. Since it caused no harm, I left it
in, following the example in BCP.pm (which uses DB-Library calls instead
of CT-Library calls).
Anyone have any empirical evidence of an advantage to leaving the /o in
place? If not, I suggest it be removed from the module entirely.
Scott
Michael Peppler wrote on 10/30/2003, 7:52 PM:
> On Thu, 2003-10-30 at 13:29, Michael Peppler wrote:
> > On Thu, 2003-10-30 at 11:33, Lin, Arthur wrote:
> >
> > >
> > > For the second BCP run I set
> > >
> > >
> $GP_BLK->config(
> >
> > > FIELDS => 3,
> > >
> > > BATCH_SIZE => 6000,
> > >
> > > SEPARATOR => '\t');
> > >
> > >
> > >
> > > die "\tBCP in $ListTable failed\n" unless ( $GP_BLK->run ==
> > > $RowCount );
> > > unlink $ListFile, "$ListFile.err";
> > >
> > > But it fails because it does not take '\t' as a separator.
> > >
> > > Am I doing something wrong here to reset the separator ?
> >
> > No - I think that you've hit a bug in the BLK module. I have to admit
> > that I don't use it myself, and I wrote the original code many years
> > ago...
>
> Right - here's the problem:
>
> The _readln() and _readln_meta() subroutines in BLK.pm, which read a
> line of data from the bcp file, and splits it based on the separator,
> use a regular expression with the /o switch. This means - compile this
> regular expression once, which is an optimization that works really
> well, as long as a single program only uses one type of separator. But
> it means that when the separator changes perl doesn't realize this, and
> of course things break.
>
> For now you can fix the problem by changing the
> @d = split(/$sep/o, $ln, -1);
> line to
> @d = split(/$sep/, $ln, -1);
> which will force the regular expression to be re-evaluated for each
> line. I haven't benchmarked this to see what this costs in terms of
> performance hit.
>
> BTW - this is the same problem as bug id 410 in the sybperl bug
database
> (http://www.peppler.org/cgi-bin/bug.cgi?__state=2&id=410)
>
> Michael
> --
> Michael Peppler Data Migrations, Inc.
> mpeppler@peppler.org http://www.mbay.net/~mpeppler
> Sybase T-SQL/OpenClient/OpenServer/C/Perl developer available for
> short or
> long term contract positions -
http://www.mbay.net/~mpeppler/resume.html
>
Jefferies archives and reviews outgoing and incoming e-mail. It may be produced at the request of regulators or in connection with civil litigation.
Jefferies accepts no liability for any errors or omissions arising as a result of transmission. Use by other than intended recipients is prohibited.
|