IonicWind Software

IWBasic => General Questions => Topic started by: Andy on November 07, 2017, 12:29:09 AM

Title: Reading .reg files
Post by: Andy on November 07, 2017, 12:29:09 AM
This one is just for curiosity.

I can create (export) .reg files which contain registry details the same as the regedit program which both my program and regedit can read.

But I can't read any .reg files created by the regedit program - notepad opens them okay.

I can't find any info from msdn at all.

Does anyone know what format they are stored in?

Here is the simple program to read a file:

string ln
file myfile

openconsole

IF(OPENFILE(myfile, getstartpath + "10.reg", "R") = 0)
WHILE EOF(myfile) = 0
IF(READ(myfile,ln) = 0)

            print "ln = ",ln,"    Length is ",len(ln)

ENDIF
WEND

CLOSEFILE myfile
   endif

do:until inkey$ <> ""
closeconsole
end


Attached is a simple .reg file

Please rename it from .txt to .reg!

Right click on it and choose EDIT to have a look at it and then try to read it with the above code.



Title: Re: Reading .reg files
Post by: fasecero on November 07, 2017, 11:01:24 AM
It's the encoding. Open your text file in Notepad - Save as, at the bottom you should see the "encoding" dropdown - select ANSI - and save.

Title: Re: Reading .reg files
Post by: Andy on November 08, 2017, 05:27:09 AM
Yes I realised it was in unicode format eventually (really wish they didn't do that as it's a pain in the you know what!).

I've used the W2S function to convert it but it only converts the very first line, all other lines with text are displayed with the question mark symbols (?)

So that leaves me puzzled again.

And one further question, how can I tell if a file is in unicode format?

Thanks,
Andy.

Title: Re: Reading .reg files
Post by: LarryMc on November 08, 2017, 01:33:36 PM
I found this Andy
Quote"It isn't enough to just determine Unicode vs. ASCII because Unicode itself comes in various flavors (UTF-8, UTF-16BE, UTF-16LE, etc).  The file format that you are reading should define how the text is encoded (or how to determine it from a header, but that is specific to the file type).

For text (and CSV) files, Windows provides an API that you can use to determine if a given byte sequence is Unicode.  The function name is (no surprise) IsTextUnicode (http://msdn.microsoft.com/en-us/library/dd318672). Another thing you should probably do is check for a BOM (Byte Order Mark), which, if present, tells you that the text is for sure Unicode, and what the encoding is (UTF-16BE vs. UTF-16LE, for example).

Raymond Chen wrote an article that gives us insight into how Notepad determines the encoding of a text file:  http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx

You can read more about BOMs from this FAQ on the Unicode organization's web site:  http://unicode.org/faq/utf_bom.html"
Title: Re: Reading .reg files
Post by: fasecero on November 08, 2017, 02:37:47 PM
Yep, not an easy task to be honest. Here you can read a discussion about this topic. They also are talking about BOM & IsTextUnicode.
https://stackoverflow.com/questions/4672659/whats-the-best-way-to-identify-unicode-encoded-text-files-in-windows
Title: Re: Reading .reg files
Post by: fasecero on November 08, 2017, 04:53:31 PM
Man this one was tough. Couldn't find any example so I made one myself - maybe it will need some revision. You have to use two differents files: "10_ansi.txt" & "10_unicode.txt"



$include "windowssdk.inc"

' var
INT j
STRING fullpath

' entry point
OPENCONSOLE

fullpath = GETSTARTPATH + "10_ansi.txt" ' "10_ansi.txt" - "10_unicode.txt"

IF IsFileUNICODE(fullpath) = 0 THEN
PRINT "FILE IS ANSI"
PRINT fullpath
PRINT

STRING ln
FILE myfile

IF(OPENFILE(myfile, getstartpath + "10.reg", "R") = 0)
WHILE EOF(myfile) = 0
IF READ(myfile, ln) = 0 THEN
            PRINT ln
ENDIF
WEND

CLOSEFILE myfile
   ENDIF
ELSE
PRINT "FILE IS UNICODE"
PRINT fullpath
PRINT

' TODO: get unicode data as strings
' OPENFILEW/EOFW/READW/CLOSEFILEW
ENDIF

PRINT
PRINT "  Press any key to exit..."
DO:UNTIL INKEY$ <> ""
END

SUB IsFileUNICODE(string path), INT
HANDLE hFile = CreateFileW(S2W( path), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL)
INT response = 0

IF hFile <> INVALID_HANDLE_VALUE THEN
' get file size
LARGE_INTEGER size

IF GetFileSizeEx(hFile, &size) THEN
INT filesize = size.QuadPart ' size of the file in bytes
INT buffsize = 100 ' number of bytes we want to check for ansi or unicode content
IF buffsize > filesize THEN buffsize = filesize

pointer buffer = NEW(char,buffsize + 1)
DWORD NumberOfBytesRead = 0
ReadFile(hFile, buffer, buffsize, &NumberOfBytesRead, NULL)

IF NumberOfBytesRead THEN
response = IsTextUnicode(buffer, NumberOfBytesRead, NULL)
ENDIF

DELETE buffer
ENDIF

CloseHandle(hFile)
ENDIF

RETURN response
ENDSUB


Just pass in any text file to IsFileUNICODE(path) and hopefully it should tell you if the file is unicode or not.
Title: Re: Reading .reg files
Post by: jalih on November 08, 2017, 10:23:24 PM
Quote from: fasecero on November 08, 2017, 04:53:31 PM
Man this one was tough. Couldn't find any example so I made one myself - maybe it will need some revision.

Here is my old version written in MiniBASIC

##ifndef WIN32
##define WIN32
##endif

##ifdef WIN32
##define WIN32_LEAN_AND_MEAN
##endif

##include "winsdk\windef.mbi"
##include "winsdk\winbase.mbi"
##include "winsdk\wingdi.mbi"
##include "winsdk\winuser.mbi"


type FILETIME
UINT64 qwTime
end type

type WIN32_FILE_ATTRIBUTE_DATA
  UINT dwFileAttributes
  FILETIME ftCreationTime
  FILETIME ftLastAccessTime
  FILETIME ftLastWriteTime
  UINT nFileSizeHigh
  UINT nFileSizeLow
end type


WIN32_FILE_ATTRIBUTE_DATA dat

string filter = "Text files|*.txt|All Files|*.*||"
string filename = ChooseFile("Select file",NULL,1,filter,"txt")

GetFileAttributesExA(filename,0,dat)

int64 size64
size64 = dat.nFileSizeHigh
size64 = size64 << 32
size64 |= dat.nFileSizeLow

uint size = size64

pointer buffer = new(string, size)

hFile = fopen(filename, "R")
int bytesread = fread(hFile, buffer, size)
fclose(hFile)

if IsTextUnicode(buffer, size)
print "Text file is probably Unicode."
else
print "Text file is probably ANSI."
endif

delete(buffer)


do:until inkey$ <> ""
Title: Re: Reading .reg files
Post by: fasecero on November 09, 2017, 11:58:41 AM
Thank you, good to have a reference in case of any trouble. I see that you used the entire file in IsTextUnicode, I just take the first bytes. If a problem arise, we could either increase the 'block' size or use all the content. I'm avoiding this for now to gain (hypothetically) some speed.