PEPPLER.ORG
Michael Peppler
Sybase Consulting
Menu
Home
Sybase on Linux
Install Guide for Sybase on Linux
General Sybase Resources
General Perl Resources
Freeware
Sybperl
Sybase::Simple
DBD::Sybase
BCP Tool
Bug Tracker
Mailing List Archive
Downloads Directory
FAQs
Sybase on Linux FAQ
Sybperl FAQ
Personal
Michael Peppler's resume

sybperl-l Archive

Up    Prev    Next    

From: "Jayson Pifer" <JPifer at jefco dot com>
Subject: Re: Switch SEPARATOR between different BCP runs
Date: Oct 31 2003 8:52PM

The proper way of doing this is supposed to be with the quote regex qr//.
I tested this method and it worked using this quick hack in the BLK
module:
------------------------------------------------------------------
sub _readln {
    my $sep = shift;
    my $ln;
    my @d;
    if(defined($ln = )) {
        chomp $ln;
        my $compiled = qr/$sep/;  ## Added line
        @d = split(/$compiled/, $ln, -1); ## Used $compiled instead of
$sep
    }
    @d;
}
------------------------------------------------------------------


As to performance, I think the jury is still out.  My speed gains are
minimal, but I suspect if the regular expression gets more complex the
performance will make a difference.  Here are the results on my machine:
------------------------------------------------------------------
Benchmark: timing 5 iterations of Compiled with //o, Compiled with qr//,
Constant, Variable...
Compiled with //o: 43 wallclock secs (27.20 usr +  0.00 sys = 27.20 CPU)
Compiled with qr//: 36 wallclock secs (29.13 usr +  0.00 sys = 29.13 CPU)
  Constant: 34 wallclock secs (26.01 usr +  0.00 sys = 26.01 CPU)
  Variable: 36 wallclock secs (30.04 usr +  0.00 sys = 30.04 CPU)
------------------------------------------------------------------


And here is the test script to critique....
------------------------------------------------------------------
#!/usr/bin/perl -w

use strict;
use Benchmark;

@ARGV = qw(/usr/dict/words);
my @words = <>;

push @words, @words;
push @words, @words;
push @words, @words;
push @words, @words;

my $var = '[a|B]+?c{1}';
my $var2 = qr/[a|B]c{1}\s+?/;  ## Semi-complex regex...could get far worse

timethese( 5, {
    'Constant'           => sub { for(@words) { split(/[f][o][o]/); } },
    'Variable'           => sub { for(@words) { split(/$var/);      } },
    'Compiled with //o'  => sub { for(@words) { split(/$var/o);     } },
    'Compiled with qr//' => sub { for(@words) { split(/$var2/);     } },
});

exit;
------------------------------------------------------------------

--Jayson






"Scott Zetlan" 
Sent by: owner-sybperl-l@peppler.org
10/31/03 08:32 AM


        To:
        cc:     "Sybperl-L Mailing List" 
        Subject:        Re: Switch SEPARATOR between different BCP runs


When I was porting BCP.pm to BLK.pm, I experimented with removing that
/o optimisation.  I discovered that on my architecture (puny Sun Ultra
5) it made no difference whatsoever.  Since it caused no harm, I left it
in, following the example in BCP.pm (which uses DB-Library calls instead
of CT-Library calls).

Anyone have any empirical evidence of an advantage to leaving the /o in
place?  If not, I suggest it be removed from the module entirely.

Scott

Michael Peppler wrote on 10/30/2003, 7:52 PM:

 > On Thu, 2003-10-30 at 13:29, Michael Peppler wrote:
 > > On Thu, 2003-10-30 at 11:33, Lin, Arthur wrote:
 > >
 > > >
 > > > For the second BCP run I set
 > > >
 > > >
 > $GP_BLK->config(
 > >
 > > >                         FIELDS => 3,
 > > >
 > > >                         BATCH_SIZE => 6000,
 > > >
 > > >                         SEPARATOR => '\t');
 > > >
 > > >
 > > >
 > > >         die "\tBCP in $ListTable failed\n" unless ( $GP_BLK->run ==
 > > > $RowCount );
 > > >         unlink $ListFile, "$ListFile.err";
 > > >
 > > > But it fails because it does not take '\t' as a separator.
 > > >
 > > > Am I doing something wrong here to reset the separator ?
 > >
 > > No - I think that you've hit a bug in the BLK module. I have to admit
 > > that I don't use it myself, and I wrote the original code many years
 > > ago...
 >
 > Right - here's the problem:
 >
 > The _readln() and _readln_meta() subroutines in BLK.pm, which read a
 > line of data from the bcp file, and splits it based on the separator,
 > use a regular expression with the /o switch. This means - compile this
 > regular expression once, which is an optimization that works really
 > well, as long as a single program only uses one type of separator. But
 > it means that when the separator changes perl doesn't realize this, and
 > of course things break.
 >
 > For now you can fix the problem by changing the
 >     @d = split(/$sep/o, $ln, -1);
 > line to
 >     @d = split(/$sep/, $ln, -1);
 > which will force the regular expression to be re-evaluated for each
 > line. I haven't benchmarked this to see what this costs in terms of
 > performance hit.
 >
 > BTW - this is the same problem as bug id 410 in the sybperl bug
database
 > (http://www.peppler.org/cgi-bin/bug.cgi?__state=2&id=410)
 >
 > Michael
 > --
 > Michael Peppler                              Data Migrations, Inc.
 > mpeppler@peppler.org                 http://www.mbay.net/~mpeppler
 > Sybase T-SQL/OpenClient/OpenServer/C/Perl developer available for
 > short or
 > long term contract positions -
http://www.mbay.net/~mpeppler/resume.html
 >








Jefferies archives and reviews outgoing and incoming e-mail.  It may be produced at the request of regulators or in connection with civil litigation.
Jefferies accepts no liability for any errors or omissions arising as a result of  transmission. Use by other than intended recipients is prohibited.