April 18, 2024, 01:13:59 PM

News:

IonicWind Snippit Manager 2.xx Released!  Install it on a memory stick and take it with you!  With or without IWBasic!


Reading .reg files

Started by Andy, November 07, 2017, 12:29:09 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Andy

November 07, 2017, 12:29:09 AM Last Edit: November 07, 2017, 03:18:18 AM by Andy
This one is just for curiosity.

I can create (export) .reg files which contain registry details the same as the regedit program which both my program and regedit can read.

But I can't read any .reg files created by the regedit program - notepad opens them okay.

I can't find any info from msdn at all.

Does anyone know what format they are stored in?

Here is the simple program to read a file:

string ln
file myfile

openconsole

IF(OPENFILE(myfile, getstartpath + "10.reg", "R") = 0)
WHILE EOF(myfile) = 0
IF(READ(myfile,ln) = 0)

            print "ln = ",ln,"    Length is ",len(ln)

ENDIF
WEND

CLOSEFILE myfile
   endif

do:until inkey$ <> ""
closeconsole
end


Attached is a simple .reg file

Please rename it from .txt to .reg!

Right click on it and choose EDIT to have a look at it and then try to read it with the above code.



Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.

fasecero

It's the encoding. Open your text file in Notepad - Save as, at the bottom you should see the "encoding" dropdown - select ANSI - and save.


Andy

November 08, 2017, 05:27:09 AM #2 Last Edit: November 08, 2017, 05:34:16 AM by Andy
Yes I realised it was in unicode format eventually (really wish they didn't do that as it's a pain in the you know what!).

I've used the W2S function to convert it but it only converts the very first line, all other lines with text are displayed with the question mark symbols (?)

So that leaves me puzzled again.

And one further question, how can I tell if a file is in unicode format?

Thanks,
Andy.

Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.

LarryMc

I found this Andy
Quote"It isn't enough to just determine Unicode vs. ASCII because Unicode itself comes in various flavors (UTF-8, UTF-16BE, UTF-16LE, etc).  The file format that you are reading should define how the text is encoded (or how to determine it from a header, but that is specific to the file type).

For text (and CSV) files, Windows provides an API that you can use to determine if a given byte sequence is Unicode.  The function name is (no surprise) IsTextUnicode (http://msdn.microsoft.com/en-us/library/dd318672). Another thing you should probably do is check for a BOM (Byte Order Mark), which, if present, tells you that the text is for sure Unicode, and what the encoding is (UTF-16BE vs. UTF-16LE, for example).

Raymond Chen wrote an article that gives us insight into how Notepad determines the encoding of a text file:  http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx

You can read more about BOMs from this FAQ on the Unicode organization's web site:  http://unicode.org/faq/utf_bom.html";
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

fasecero

Yep, not an easy task to be honest. Here you can read a discussion about this topic. They also are talking about BOM & IsTextUnicode.
https://stackoverflow.com/questions/4672659/whats-the-best-way-to-identify-unicode-encoded-text-files-in-windows

fasecero

November 08, 2017, 04:53:31 PM #5 Last Edit: November 08, 2017, 05:00:09 PM by fasecero
Man this one was tough. Couldn't find any example so I made one myself - maybe it will need some revision. You have to use two differents files: "10_ansi.txt" & "10_unicode.txt"



$include "windowssdk.inc"

' var
INT j
STRING fullpath

' entry point
OPENCONSOLE

fullpath = GETSTARTPATH + "10_ansi.txt" ' "10_ansi.txt" - "10_unicode.txt"

IF IsFileUNICODE(fullpath) = 0 THEN
PRINT "FILE IS ANSI"
PRINT fullpath
PRINT

STRING ln
FILE myfile

IF(OPENFILE(myfile, getstartpath + "10.reg", "R") = 0)
WHILE EOF(myfile) = 0
IF READ(myfile, ln) = 0 THEN
            PRINT ln
ENDIF
WEND

CLOSEFILE myfile
   ENDIF
ELSE
PRINT "FILE IS UNICODE"
PRINT fullpath
PRINT

' TODO: get unicode data as strings
' OPENFILEW/EOFW/READW/CLOSEFILEW
ENDIF

PRINT
PRINT "  Press any key to exit..."
DO:UNTIL INKEY$ <> ""
END

SUB IsFileUNICODE(string path), INT
HANDLE hFile = CreateFileW(S2W( path), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL)
INT response = 0

IF hFile <> INVALID_HANDLE_VALUE THEN
' get file size
LARGE_INTEGER size

IF GetFileSizeEx(hFile, &size) THEN
INT filesize = size.QuadPart ' size of the file in bytes
INT buffsize = 100 ' number of bytes we want to check for ansi or unicode content
IF buffsize > filesize THEN buffsize = filesize

pointer buffer = NEW(char,buffsize + 1)
DWORD NumberOfBytesRead = 0
ReadFile(hFile, buffer, buffsize, &NumberOfBytesRead, NULL)

IF NumberOfBytesRead THEN
response = IsTextUnicode(buffer, NumberOfBytesRead, NULL)
ENDIF

DELETE buffer
ENDIF

CloseHandle(hFile)
ENDIF

RETURN response
ENDSUB


Just pass in any text file to IsFileUNICODE(path) and hopefully it should tell you if the file is unicode or not.

jalih

Quote from: fasecero on November 08, 2017, 04:53:31 PM
Man this one was tough. Couldn't find any example so I made one myself - maybe it will need some revision.

Here is my old version written in MiniBASIC

##ifndef WIN32
##define WIN32
##endif

##ifdef WIN32
##define WIN32_LEAN_AND_MEAN
##endif

##include "winsdk\windef.mbi"
##include "winsdk\winbase.mbi"
##include "winsdk\wingdi.mbi"
##include "winsdk\winuser.mbi"


type FILETIME
UINT64 qwTime
end type

type WIN32_FILE_ATTRIBUTE_DATA
  UINT dwFileAttributes
  FILETIME ftCreationTime
  FILETIME ftLastAccessTime
  FILETIME ftLastWriteTime
  UINT nFileSizeHigh
  UINT nFileSizeLow
end type


WIN32_FILE_ATTRIBUTE_DATA dat

string filter = "Text files|*.txt|All Files|*.*||"
string filename = ChooseFile("Select file",NULL,1,filter,"txt")

GetFileAttributesExA(filename,0,dat)

int64 size64
size64 = dat.nFileSizeHigh
size64 = size64 << 32
size64 |= dat.nFileSizeLow

uint size = size64

pointer buffer = new(string, size)

hFile = fopen(filename, "R")
int bytesread = fread(hFile, buffer, size)
fclose(hFile)

if IsTextUnicode(buffer, size)
print "Text file is probably Unicode."
else
print "Text file is probably ANSI."
endif

delete(buffer)


do:until inkey$ <> ""

fasecero

Thank you, good to have a reference in case of any trouble. I see that you used the entire file in IsTextUnicode, I just take the first bytes. If a problem arise, we could either increase the 'block' size or use all the content. I'm avoiding this for now to gain (hypothetically) some speed.