Author Topic: Nice Test Data File 33MB Approximately 3,173,959 records  (Read 5579 times)

0 Members and 2 Guests are viewing this topic.

JRS

  • Guest
Re: Nice Test Data File 33MB Approximately 3,173,959 records
« Reply #15 on: June 01, 2012, 08:51:28 PM »
Quote
Splitting it into 22 million strings is harsh indeed!

I'm happy SB didn't blow up and would really like to know how this runs in memory. It's a good test of SB's string and array handling. Windows crashed with 2.9 GB free. I guess a Linux box with 4 to 8 GB of system memory and a decent processor is what is needed to test this.

I think I'll create a large Ubuntu 64 instance on Amazon EC2 to test this. I'll post my results soon.
« Last Edit: June 01, 2012, 10:59:57 PM by JRS »

JRS

  • Guest
Re: Nice Test Data File 33MB Approximately 3,173,959 records
« Reply #16 on: June 02, 2012, 12:26:59 AM »
16 Seconds to convert a 151 MB comma delimited text file to a string array of 19,043,754 elements.

Code: [Select]
IMPORT t.bas

s = t::LoadString("worldcitiespop.txt")

SPLITA s BY "," TO a

PRINT UBOUND(a),"\n"

ubuntu@ip-10-176-145-104:~$ time scriba wc.sb
19043754

real   0m16.388s
user   0m10.593s
sys   0m5.796s
ubuntu@ip-10-176-145-104:~$


EC2 Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

* One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

« Last Edit: June 02, 2012, 02:33:19 AM by JRS »

JRS

  • Guest
Re: Nice Test Data File 33MB Approximately 3,173,959 records
« Reply #17 on: June 02, 2012, 01:23:26 AM »
I ran the string concat/replace benchmark that was on the Basic Programming Form and there isn't much difference between it running on the EC2 instance and my laptop.

EC2
ubuntu@ip-10-176-145-104:~/sb$ scriba kentbench.sb
exec.tm.sec   str.length
4 sec      256 KB
18 sec      512 KB

Finished
ubuntu@ip-10-176-145-104:~/sb$

Laptop
jrs@laptop:~/sb/test$ scriba kentbench.sb
exec.tm.sec   str.length
5 sec      256 KB
34 sec      512 KB

Finished
jrs@laptop:~/sb/test$

If I had the memory (4-8 GB) in my laptop, I would see a result of under 20 seconds for the city file converted to an array.


kryton9

  • Guest
Re: Nice Test Data File 33MB Approximately 3,173,959 records
« Reply #18 on: June 02, 2012, 11:53:00 AM »
It is interesting to play with huge data. Just the fact that 150+ MB can be manipulated so quickly is amazing. Just doing a windows copy of it over from my sata hard disk  to my usb 3.0 thumbdrive takes quite a few seconds.

JRS

  • Guest
Re: Nice Test Data File 33MB Approximately 3,173,959 records
« Reply #19 on: June 02, 2012, 02:02:59 PM »
The task actually took about 11 seconds to print the UBOUND of the built array but it took an additional 5 or so seconds to release used memory before returning back to the OS command prompt.