June 24, 2024, 12:27:51 AM

News:

Own IWBasic 2.x ? -----> Get your free upgrade to 3.x now.........


Tables and Cardinality

Started by Jim Scott, January 03, 2007, 08:18:49 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Jim Scott

Hi Folks and Happy New Year to all.  Thanks for your help.  I've made some pretty good progress in learning EBasic.

When I apply the following code;

Def pdb as Pointer
Def pTables as Pointer
Def hstmt as Int
Def NumOfCols as Int
Def TotalRows as Int

StorageCount = 0
TotalRows = 0
OpenConsole
pdb=dbConnect("Microsoft Text Driver (*.txt; *.csv)",GetStartPath,"")

If pdb <> Null
Print "Connection To Database Established"
pTables = dbListTables(pdb)
IF pTables <> NULL
PRINT "Tables:"
FOR temp = EACH pTables as STRING
PRINT #temp
PRINT "Cardinality:",dbCardinality(pdb,#temp)
NEXT
ListRemoveAll(pTables,TRUE)
Else
Print "No Tables to list"
ENDIF
hstmt = dbExecSQL(pdb,"SELECT * FROM WhosHereJim.txt")
If hstmt <> Null
NumOfCols = dbGetNumCols(hstmt)
Print "Number of Columns in Dataset = ", NumOfCols

Print "Cardinality of WhosHereJim.txt = ", dbCardinality(pdb,"WhosHereJim.txt")

While dbGetNext(hstmt) <> Null
TotalRows = TotalRows + 1
EndWhile
dbFreeSQL(hstmt)
EndIf
dbDisconnect(pdb)
Else
Print "Connection Not Established"
EndIf
Print "Number of Rows, Including header, = ", TotalRows + 1

Print "Press any key to exit"
Do:Until INKEY$ <> ""
CloseConsole
END


To the attached txt file.  I don't get what I expect.

Here's a copy of my console output when I run the code;

Connection To Database Established
Tables:
Number of Columns in Dataset = 12
Cardinality of WhosHereJim.CSV = 117
Number of Rows, Including header, = 152
Press any key to exit


I expected to see at least "WhosHereJim.txt" and any other text files in the directory.  Clearly, pTables is not coming up NULL since it prints the "Tables:" part.

In addition, I expected the cardinality to be equal to the number of rows counted.  Am I missing something simple here?  If you open the included file, you'll see 152 rows.
Jim Scott

Ionic Wind Support Team

The Microsoft text driver has a lot of limitations.  You can't list the tables for example since there really isn't a database for the Jet engine to use.  Cardinality through SQL won't work correctly either and you will have to count the rows manually in a loop. 

You have to understand that the only thing the text driver has to work with is what is in the file.  Unlike a true database like access, SQL Server, Oracle, etc, there aren't any fields that tells the driver how many records there are or links for relations between tables.  Things like keys and indexes won't work with CSV files either.

In reality the only valid things you can do with a text file are read the records and add records to the end of a table.  Deletes and inserts aren't supported.

Paul
Ionic Wind Support Team

Jim Scott

Well that explains it.  So I guess the best that I can do with a text file is to read it all into memory, sort it, modify, insert and delete rows then delete the old file and rewrite a new file then?   Thanks Paul.
Jim Scott

Ionic Wind Support Team

If I was forced to work with ASCII data in a text file I wouldn't use the database functions myself.  Simpler to design a UDT to hold the records, read each line of the file into memory parsing using INSTR to locate the commas, and making my modifications using the UDT records.

More control that way, and you can write it out anyway you wish.

Another option you probably haven't considered is reading the file with the text driver and creating a temporary access database with the dbCreateMDB function.  Then you can manipulate the records of the temporary access database to your hearts content, using the full power of the database commands.  Afterwords just scroll through each record of the access database writing them out to a file as CSV again.

Paul.
Ionic Wind Support Team

Bruce Peaslee

If you have Access on your computer, you can even define all of the records ahead of time and provide test data. That's what I do. But Access is not cheap.
Bruce Peaslee
"Born too loose."
iTired (There's a nap for that.)
Well, I headed for Las Vegas
Only made it out to Needles

Jim Scott

Quote from: Paul Turley on January 03, 2007, 07:34:50 PM
If I was forced to work with ASCII data in a text file I wouldn't use the database functions myself.  Simpler to design a UDT to hold the records, read each line of the file into memory parsing using INSTR to locate the commas, and making my modifications using the UDT records.

More control that way, and you can write it out anyway you wish.

Another option ... snipped for brevity

Paul.

Hey Paul,

As a learning exercise, I decided to do it two ways with the data I'm working with.  One program gets the data using the Microsoft text driver with a schema.ini file.  The other program gets the data by reading a line at a time and looking for commas and parsing as you suggested.  The following table list the time, in milliseconds, that it takes for each program to get the data from the disk with increasing number of records to retrieve.  The times are the average of 5 runs each.

Records         Direct Line Reads      Using dbCommands
150                       10                           265
300                       16                           318
600                       31                           445
1200                     63                           710
2400                     125                         1225
4800                     250                         2250

For real small files, the time it takes in the Direct Line Reads method is just the disk drive access time.  For the Using dbCommands method, the times don't quite double as I double the number of records to read so there is a possibility that this method would be faster than the Direct Line Reads method for very large files.

It is a whole bunch faster just reading lines from the file and looking for commas than using the db commands.  Also, using the db commands (with the Microsoft text driver) I am having trouble handling some null data (there is a pattern, but I haven't figured out how to fix it yet).  There is much better control just reading the file directly.

Just thought I'd report my findings, and Thanks again for your help.

Jim
Jim Scott

Ionic Wind Support Team

I would imagine it would always be faster.  Emergence code is fairly efficient and you only have one layer of code, namely your own. 

The Microsoft text driver goes through multiple layers ODBC->SQL->Jet->OS just to read a line of text and parse it. Not to mention it has to read and parse the schema.ini file or use defaults located in the registry, which is a bit more overhead.

I've easily beaten Access speeds for sequential operations on binary data as well.  But Access isn't the fastest beast in the world by a long shot.  Reading speed is not really the reason you used a database anyway.  Relational operations on tables, and SQL queries are the primary reasons for chosing a database over just a flat file.

Paul.
Ionic Wind Support Team