2

I have some files in a git repository which are unicode tab-separated value files. I know that these files are either UTF-8 or UTF-16 encoded at generation-time.

For my Windows workstation, where I sometimes want to edit them in Excel (don't ask), I want to smudge them to UTF-16, no matter whether they arrive as UTF-8 or UTF-16.

But in the other direction, I do always want to have UTF-8 in the internal representation in the repository. (I also want diffs to be meaningful, so the same “from anything to UTF-8” applies to the diff attribute.)

Currently, my .gitattributes defines

*.tsv diff=winutf16 filter=winutf16

which means (.git/config)

[filter "winutf16"]
    clean = iconv -f utf-16 -t utf-8
    smudge = iconv -f utf-8 -t utf-16
    required
[diff "winutf16"]
    textconv = iconv -f utf-16 -t utf-8

Given that I know it's only one of these two Unicode encoding options, I should be able to get the encoding using something like -f $(file -b --mime-encoding file.tsv), but that would require me to specify a file name, whereas the docs state

Upon checkout, when the smudge command is specified, the command is fed the blob object from its standard input, and its standard output is used to update the worktree file. Similarly, the clean command is used to convert the contents of worktree file upon checkin.

so all I get is a blob to stdin, once.

Is there a clean way to do this on Windows without installing things beyond what is already implied?

0

You must log in to answer this question.

Browse other questions tagged .