PEPPLER.ORG
Michael Peppler
Sybase Consulting
Menu
Home
Sybase on Linux
Install Guide for Sybase on Linux
General Sybase Resources
General Perl Resources
Freeware
Sybperl
Sybase::Simple
DBD::Sybase
BCP Tool
Bug Tracker
Mailing List Archive
Downloads Directory
FAQs
Sybase on Linux FAQ
Sybperl FAQ
Personal
Michael Peppler's resume

sybperl-l Archive

Up    Prev    Next    

From: Michael Peppler <mpeppler at peppler dot org>
Subject: RE: out of memory -- CTlib
Date: Feb 8 2001 10:18PM

Cox, Mark writes:
 > Thanks for the help and suggestions.
 > 
 > --- The scalar(localtime) change made a surprising difference in processing
 > time.

fork/exec is expensive, as you noticed.

 > 
 > --- I ended up increasing the swap size on the Unix box and this solved the
 > memory problem for the   moment.
 > 
 > ---  I am looking at writing it out to a file instead of keeping it all in
 > memory but I need to see if this is any better than just reading it directly
 > from the database for each record. The main advantage to keeping all of the
 > info in memory is the enormous decrease in processing time.  Typically 2hrs
 > becomes 10 min.  I will let you know how it goes.

Consider using a dbm file (or something similar) to manage the
size. This will let you use a hash (although the "value" part of the
hash should probably be a string) and will use a constant amount of
memory. 

You could do something like this (off the top of my head - not
guarantee correct!)

use NDBM_File;

my %hash;

tie(%hash, 'NDBM_File', '/tmp/my_dbm_file', O_CREAT|O_RDWR, 0666);

....

while(@data = $dbh->ct_fetch) {
    $hash{$data[0]} = join('|', @data);
}

....

You could also use the Storable module (get it from CPAN) to
"serialize" the data in @data to store/retrieve it from the dbm file.

Michael


 > -----Original Message-----
 > From: Michael Peppler [mailto:mpeppler@peppler.org]
 > Sent: Thursday, February 08, 2001 11:15 AM
 > To: SybPerl Discussion List
 > Subject: Re: out of memory -- CTlib
 > 
 > 
 > Cox, Mark writes:
 >  > 
 >  > Any sugestions or help would be welcome.
 >  > 
 >  > I am using ct_lib to select large look-up tables from the data base for
 > feed
 >  > processing.  I tend to assign all of the info in the data base into a
 > hash
 >  > keyed on a specific value in the database and then read the file line by
 >  > line using the key as a quick lookup. What I am running into however is
 > that
 >  > if I try to read in more than 100,000 records or so I get an 'Out of
 >  > Memory!' error.  Is there a more efficient way to read in a large number
 > of
 >  > records into a hash table?  Any help or suggestions would be most
 > welcome.
 > 
 > 100,000 records in a hash table is quite a lot. Have you checked with
 > ps or top to see how much memory you are using? Do you have
 > limit/ulimit set?
 > 
 > I don't see any obvious problems with your code.
 >  > 			if (!($y % 10000) && ($y !=0)) {
 >  > 				print "$y Records processed at " , `date`;
 >  > 			}
 > 
 > You can use scalar(localtime) instead of `date` which will avoid a
 > fork()/exec() and should speed things up.
 > 
 > Michael
 > -- 
 > Michael Peppler - Data Migrations Inc. - mpeppler@peppler.org
 > http://www.mbay.net/~mpeppler - mpeppler@mbay.net
 > International Sybase User Group - http://www.isug.com
 > Sybase on Linux mailing list: ase-linux-list@isug.com
 > 

-- 
Michael Peppler - Data Migrations Inc. - mpeppler@peppler.org
http://www.mbay.net/~mpeppler - mpeppler@mbay.net
International Sybase User Group - http://www.isug.com
Sybase on Linux mailing list: ase-linux-list@isug.com