How to decipher text encoding

#HOW TO DECIPHER TEXT ENCODING CODE#

The second most (6.7%) used encoding is ISO-8859-1. Var streamReader = new StreamReader(stream, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true), detectEncodingFromByteOrderMarks: true) Public static string ReadAsString(this Stream stream) This works for me: public static class StreamExtension If someone is looking for a 93.9% solution.

If more as one codepage pops up, ask the user to specify more text.

Loop through all codepages, and display the ones that give a solution with the user provided text.

I've created a small app that the user can use to open the file with, and enter a text that user knows it will appear in the file, when the correct codepage is used.

If somebody is called François or something, with your human intelligence you can guess this.

Open the received file in Notepad, look at a garbled piece of text.

The receivers are also end-users, by now this is what they know about codepages: Codepages exist, and are annoying. The files we receive are from end-users, they do not have a clue about codepages. Thanks for your answers, this is what I've done.

#HOW TO DECIPHER TEXT ENCODING CODE#

The detectEncodingFromByteOrderMarks, on the StreamReader constructor, works for UTF8 and other unicode marked files, but I'm looking for a way to detect code pages, like ibm850, windows1252. Is there a way to (automatically) detect the codepage of a text file? When reading, these files sometimes contain garbage, because the files where created in a different/unknown codepage. In our application, we receive text files (.