This one is just for curiosity.
I can create (export) .reg files which contain registry details the same as the regedit program which both my program and regedit can read.
But I can't read any .reg files created by the regedit program - notepad opens them okay.
I can't find any info from msdn at all.
Does anyone know what format they are stored in?
Here is the simple program to read a file:
string ln
file myfile
openconsole
IF(OPENFILE(myfile, getstartpath + "10.reg", "R") = 0)
WHILE EOF(myfile) = 0
IF(READ(myfile,ln) = 0)
print "ln = ",ln," Length is ",len(ln)
ENDIF
WEND
CLOSEFILE myfile
endif
do:until inkey$ <> ""
closeconsole
end
Attached is a simple .reg file
Please rename it from .txt to .reg!
Right click on it and choose EDIT to have a look at it and then try to read it with the above code.
It's the encoding. Open your text file in Notepad - Save as, at the bottom you should see the "encoding" dropdown - select ANSI - and save.
Yes I realised it was in unicode format eventually (really wish they didn't do that as it's a pain in the you know what!).
I've used the W2S function to convert it but it only converts the very first line, all other lines with text are displayed with the question mark symbols (?)
So that leaves me puzzled again.
And one further question, how can I tell if a file is in unicode format?
Thanks,
Andy.
I found this Andy
Quote"It isn't enough to just determine Unicode vs. ASCII because Unicode itself comes in various flavors (UTF-8, UTF-16BE, UTF-16LE, etc). The file format that you are reading should define how the text is encoded (or how to determine it from a header, but that is specific to the file type).
For text (and CSV) files, Windows provides an API that you can use to determine if a given byte sequence is Unicode. The function name is (no surprise) IsTextUnicode (http://msdn.microsoft.com/en-us/library/dd318672). Another thing you should probably do is check for a BOM (Byte Order Mark), which, if present, tells you that the text is for sure Unicode, and what the encoding is (UTF-16BE vs. UTF-16LE, for example).
Raymond Chen wrote an article that gives us insight into how Notepad determines the encoding of a text file: http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx
You can read more about BOMs from this FAQ on the Unicode organization's web site: http://unicode.org/faq/utf_bom.html"
Yep, not an easy task to be honest. Here you can read a discussion about this topic. They also are talking about BOM & IsTextUnicode.
https://stackoverflow.com/questions/4672659/whats-the-best-way-to-identify-unicode-encoded-text-files-in-windows
Man this one was tough. Couldn't find any example so I made one myself - maybe it will need some revision. You have to use two differents files: "10_ansi.txt" & "10_unicode.txt"
$include "windowssdk.inc"
' var
INT j
STRING fullpath
' entry point
OPENCONSOLE
fullpath = GETSTARTPATH + "10_ansi.txt" ' "10_ansi.txt" - "10_unicode.txt"
IF IsFileUNICODE(fullpath) = 0 THEN
PRINT "FILE IS ANSI"
PRINT fullpath
PRINT
STRING ln
FILE myfile
IF(OPENFILE(myfile, getstartpath + "10.reg", "R") = 0)
WHILE EOF(myfile) = 0
IF READ(myfile, ln) = 0 THEN
PRINT ln
ENDIF
WEND
CLOSEFILE myfile
ENDIF
ELSE
PRINT "FILE IS UNICODE"
PRINT fullpath
PRINT
' TODO: get unicode data as strings
' OPENFILEW/EOFW/READW/CLOSEFILEW
ENDIF
PRINT
PRINT " Press any key to exit..."
DO:UNTIL INKEY$ <> ""
END
SUB IsFileUNICODE(string path), INT
HANDLE hFile = CreateFileW(S2W( path), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL)
INT response = 0
IF hFile <> INVALID_HANDLE_VALUE THEN
' get file size
LARGE_INTEGER size
IF GetFileSizeEx(hFile, &size) THEN
INT filesize = size.QuadPart ' size of the file in bytes
INT buffsize = 100 ' number of bytes we want to check for ansi or unicode content
IF buffsize > filesize THEN buffsize = filesize
pointer buffer = NEW(char,buffsize + 1)
DWORD NumberOfBytesRead = 0
ReadFile(hFile, buffer, buffsize, &NumberOfBytesRead, NULL)
IF NumberOfBytesRead THEN
response = IsTextUnicode(buffer, NumberOfBytesRead, NULL)
ENDIF
DELETE buffer
ENDIF
CloseHandle(hFile)
ENDIF
RETURN response
ENDSUB
Just pass in any text file to IsFileUNICODE(path) and hopefully it should tell you if the file is unicode or not.
Quote from: fasecero on November 08, 2017, 04:53:31 PM
Man this one was tough. Couldn't find any example so I made one myself - maybe it will need some revision.
Here is my old version written in MiniBASIC
##ifndef WIN32
##define WIN32
##endif
##ifdef WIN32
##define WIN32_LEAN_AND_MEAN
##endif
##include "winsdk\windef.mbi"
##include "winsdk\winbase.mbi"
##include "winsdk\wingdi.mbi"
##include "winsdk\winuser.mbi"
type FILETIME
UINT64 qwTime
end type
type WIN32_FILE_ATTRIBUTE_DATA
UINT dwFileAttributes
FILETIME ftCreationTime
FILETIME ftLastAccessTime
FILETIME ftLastWriteTime
UINT nFileSizeHigh
UINT nFileSizeLow
end type
WIN32_FILE_ATTRIBUTE_DATA dat
string filter = "Text files|*.txt|All Files|*.*||"
string filename = ChooseFile("Select file",NULL,1,filter,"txt")
GetFileAttributesExA(filename,0,dat)
int64 size64
size64 = dat.nFileSizeHigh
size64 = size64 << 32
size64 |= dat.nFileSizeLow
uint size = size64
pointer buffer = new(string, size)
hFile = fopen(filename, "R")
int bytesread = fread(hFile, buffer, size)
fclose(hFile)
if IsTextUnicode(buffer, size)
print "Text file is probably Unicode."
else
print "Text file is probably ANSI."
endif
delete(buffer)
do:until inkey$ <> ""
Thank you, good to have a reference in case of any trouble. I see that you used the entire file in IsTextUnicode, I just take the first bytes. If a problem arise, we could either increase the 'block' size or use all the content. I'm avoiding this for now to gain (hypothetically) some speed.