Export Gedcom File Dialog

Export Format Dialog

Family Historian supports all standard GEDCOM file formats, as well as non-standard ones, both for export and import. All of the formats allow ordinary unaccented letters and numbers to be reproduced perfectly. The main difference between the various formats concerns how they handle accent characters and special symbols.

Formats

ANSI

Until recently, ANSI has been almost certainly the most commonly used format for GEDCOM files, at least in a Windows environment.  The most common format is now probably UTF-8.  Despite its popularity, ANSI is not strictly-speaking an approved standard GEDCOM format. However, most Windows users are likely to find that ANSI is more than adequate for their purposes, and that it allows them to record and use most well-known European accent characters if needed. For sharing information between English-speakers, it may be worth considering - especially if the intended recipient is a Windows user and if their software is old and doesn't support UTF-8. If you are planning to send a GEDCOM file to a non-English-speaker, UTF-8 is probably a better choice (as long as the person in question has software which supports UTF-8).

Unicode

Unicode is an approved standard GEDCOM format, and a good option if you want to send your GEDCOM file to someone who you know uses software which supports it (although UTF-8 is more widely-supported).

If you choose this option, the file will be output in UTF-16 encoding, 'Little Endian' (the default for Windows), and with a 'BOM' (see technical note below).

Unicode (without BOM) Same as 'Unicode' except that a 'BOM' is not output (see technical note below). Only use this option, in preference to plain 'Unicode' if you have some particular reason to do so (e.g. you are sending a file to someone who uses software which requires it).
Unicode (Big Endian) Same as 'Unicode' except that the format is 'Big Endian' (see technical note below). It is possible that some Mac or Unix software might need GEDCOM files to be in this format.
Unicode (Big Endian without BOM) Same as 'Unicode' except that the format is 'Big Endian' and a 'BOM' is not output (see technical note below). Only use this option, if you have some particular reason to do so (e.g. you are sending a file to someone who uses software which requires it).
UTF-8

This is normally the best option to use.  Strictly speaking, UTF-8 is also Unicode. The name 'Unicode' is the name of the character set. 'UTF-8' is the name of the encoding. So this option arguably should be labelled as "Unicode with UTF-8 encoding". However, the practice has grown up of using the term "UTF-8" as shorthand for this encoding of Unicode.  This is an excellent format that is widely-supported and produces files that are more compact than ordinary Unicode (that is, UTF-16 files).

UTF-8 (without BOM) Same as 'UTF-8' except that a 'BOM' is not output (see technical note below). Only use this option, in preference to plain 'UTF-8' if you have some particular reason to do so (e.g. you are sending a file to someone who uses software which requires it).
ANSEL This is an approved standard format for GEDCOM. At the time the GEDCOM spec was written, it was the preferred character set for GEDCOM. It provides better support for handling accent characters than either ANSI or ASCII, but not nearly as good as Unicode or UTF-8. However, it is not widely used and not widely supported.
ASCII

This too is an approved standard format for GEDCOM. It is the simplest and least-sophisticated of any of the various formats. It provides no support for any accent characters at all, and only a very limited support for symbols. For example, the dollar sign '$' is supported in ASCII, but the British pound sign '£' is not. If you export your data in this format, you are likely to find that accented characters have been converted into their non-accented equivalents, and symbols like pound signs have become question marks. This format is rarely likely to be useful.

 

Technical Note

'BOM' is an acronym which stands for 'Byte Order Mark'. BOMs may be used with UTF-8 and Unicode (UTF-16), but are optional. It is normal in Windows to use them, but they can cause problems with some applications that aren't expecting them to be there.

'Little Endian' and 'Big Endian' are terms which refer to byte order standards within files.

'UTF-16' and 'UTF-8' are alternative encodings for Unicode.