Questions tagged [encoding]
Encoding is a set of rules used to represent data in a form that can be stored and transmitted to another process or system. Character encoding (e.g. Windows-1252, ISO-8859-1, UTF-8, UTF-16) refers to the way character data is represented as a series of bytes. Binary encoding (e.g. Base64) refers to the way binary data is transformed into a series of characters.
140 questions
3
votes
2
answers
1k
views
What is the collation used while comparing Unicode string literals in SQL Server 2019?
My understanding is that the collation for comparing Unicode string literals is determined by the database collation.
My database is using SQL_Latin1_General_CP1_CI_AS collation.
When I compare N'ß' ...
1
vote
1
answer
756
views
Upgrading MySQL utf8mb3 to utf8mb4 - how to replicate documentation behaviour?
I'm looking to upgrade MySQL fields from mb3 to mb4 in a large database and from reading the documentation I understand that the biggest problem is mb3 stores stuff in 3 bytes, but mb4 stores stuff in ...
2
votes
1
answer
160
views
Insert gives error code 1366 even though charset is utf8mb4
I am trying to manually copy a table from one server to another.
I used mysqldump to export the table. The import fails with the following error:
ERROR 1366 (22007) at line 6: Incorrect string value: '...
1
vote
0
answers
60
views
Issue with hash storage resulting from password_verify() output
Consider the following code:
$random_token = random_bytes(32);
$token_hash = password_hash(
$random_token,
PASSWORD_DEFAULT
);
$token_base64 = sodium_bin2base64(
$random_token,
...
2
votes
1
answer
1k
views
Postgres encoding issue - find values that have no equivalent in (client) encoding
I have Postgresql database that internally uses UTF-8 encoding. Some clients that connect to the database use LATIN 2 (ISO 8859-2) client encoding when they connect to database. I can't change the ...
1
vote
1
answer
1k
views
How to display non-English character correctly in db2cmd?
On Db2 v11.5.7 on Linux/x86_64 I have a UTF-8 database.
Executing db2 get db cfg for test1 returns:
Database code page = 1208
Database code set = UTF-8
On my Windows 10 computer in ...
2
votes
1
answer
466
views
How can I detect failure in CONVERT( tbl AS utf8)?
MySQL provides the CONVERT(`tbl` AS utf8) expression to transcode text from one charset to another.
Sometimes that conversion cannot succeed, because the destination charset does not include ...
2
votes
1
answer
672
views
How can I detect double-encoded MySQL columns and rows, and validate the repair?
My database provider, bless their hearts, migrated our MySQL databases to another server recently, and introduced double-encoding of UTF-8 data via Latin1 into our text data. Strings like 'emdash—here'...
0
votes
1
answer
40
views
Synapse column store datetime encoding
I have a table in an Azure Synapse database. This table has a clustered columnstore index. It has a datetime column. The minimum value in this column is 2001-03-27 00:00:00 and the maximum is 2022-12-...
0
votes
0
answers
66
views
Create dadabase with spesific encoding from shell script
I want to create a database from shell script, I add this line to my script
psql -c 'CREATE DATABASE template1 WITH OWNER = postgres ENCODING = 'LATIN1' TABLESPACE = pg_default LC_COLLATE = "...
0
votes
1
answer
965
views
Import postgresql database with Latin encoding
I'm going to upgrade from postgresql 9 to postgresql 11, the encoding used is Latin.
I've exported the database using this command pg_dumpall | gzip > /tmp/db.sql.gz
I extracted the exported ...
1
vote
1
answer
1k
views
Unknown charset: utf8mb4
Context: I send queries to my MariaDB server(Ver 15.1 Distrib 10.3.34-MariaDB) through a Python script and mysql.connector module.
(I don't know which one it's among those 3 in my list:
mysql-...
1
vote
0
answers
149
views
Data in one or more tables corrupt after recovering tables with alter table discard/import tablespace
I accidentally dropped my schema and had to recover all the tables per the steps in this post; Recover schema and the data in it from accidentally deleted schema
However, I notice that all of the ...
0
votes
2
answers
128
views
Where can we find a copy of the "sakila" toy database with uncorrupted city names?
The "sakila" toy database which is available by download as https://downloads.mysql.com/docs/sakila-db.zip from https://dev.mysql.com/doc/index-other.html contains a city table whose city ...
0
votes
2
answers
416
views
How mysql store data and their encoding
I have few databases in latin1 and i will migrate it to utf8. I have some characters like 'œ' that are utf8 characters.
I want to know how mysql store the encoding ?
Cause if it stores utf8 characters ...
4
votes
2
answers
9k
views
How to handle short UUIDs with Postgres?
I see that many web services (Stripe comes to mind) use a special encoding for their UUIDs. Instead of the usual encoding a44521d0-0fb8-4ade-8002-3385545c3318 they are going to be encoded using a ...
1
vote
1
answer
713
views
Why can't binary data be inserted/displayed as ones and zeroes?
If I have a column of type binary or varbinary, I imagine the data as a sequence of bits. For example, it makes sense to me that 01001 (as a base 2 number) could be a valid value in a binary(5) column....
9
votes
3
answers
4k
views
Is it possible to use OPENROWSET to import fixed width UTF8 encoded files?
I have an example data file with following contents and saved with UTF8 encoding.
oab~opqr
öab~öpqr
öab~öpqr
The format of this file is fixed width with columns 1 to 3 each being allocated 1 ...
1
vote
0
answers
543
views
Convert any string to url valid percent encoding in BigQuery
I am trying to convert any string with any set of special characters into a valid url of the format below.
In Bigquery
Example:
/artwork-v2/-̴̕ι-̶͔͛n̴e̷p̸u̴̒n̵uś̵̥o̵̙̾rt̷͗um̶̹͐-20380
encodes to:
/...
0
votes
1
answer
736
views
How can I know if there is data loss when converting mysql character set(s)?
I'm converting a large (70G) legacy mysql database that is mostly utf8 (but with a sprinkling of other encodings in fields) to one that is (as uniformly as possible) utf8mb4 (utf8mb4_unicode_ci).
The ...
2
votes
1
answer
2k
views
What is the impact of converting latin1/latin1_swedish_ci to utf8mb4/utf8mb4_unicode_ci?
I was facing some issues with the character's encoding. Those are resolved by updating the CHARACTER and COLLATE for some columns in the table. So my concern is
Is this conversion safe?
Or can this ...
2
votes
1
answer
526
views
Unable to enter characters in MariaDB
I have just switched from MySQL to MariaDB and am running into a very silly problem: I cannot enter any extended characters in the database (for example, ö ä or å).
The system is utf8, and I tried ...
5
votes
1
answer
3k
views
Msg 6355 "Conversion of one or more characters from XML to target collation impossible" when querying sys.dm_exec_query_plan
I like to find missing indexes on the go, looking at the execution plans!
It can potentially give me an indication where further to look at if I want to improve something that is currently running.
...
2
votes
2
answers
2k
views
How to create a new column in SELECT CASE when string contains Arabic words?
I have a problem with a select case when statement, I would like to add a new column in the select case when statement, I got the result but with ????? because I have set the column to Arabic words, I ...
3
votes
1
answer
1k
views
Does a huge key length value for a mulibyte column affect the index performance?
When I look at the EXPLAIN results, the key len value is always calculated based on the actual column length multiplied on the maximum number of bytes for the chosen encoding. Say, for a varchar(64) ...
1
vote
1
answer
2k
views
mysqldump dumps different data with and without --no-create-info
mysqldump dumps different representations of data when called with/without --no-create-info.
Test case
First, create the test table and populate it with interesting data.
CREATE TABLE `test` (
`...
0
votes
1
answer
2k
views
How to fix double-encoded UTF8 characters in postgres
I have a dataset (shapefile) with the same problem as the post below:
https://stackoverflow.com/questions/11436594/how-to-fix-double-encoded-utf8-characters-in-an-utf-8-table
"A previous LOAD ...
1
vote
1
answer
3k
views
PostgreSQL pg_dump -E encoding option not working
I have a UTF8 database qdb and I want to back it up to a plain file using the same UTF8 encoding. I am using pg_dump as I don't have pgAdmin working now. I however can't get pg_dump to output a UTF8-...
1
vote
1
answer
2k
views
How to escape special characters in MySQL
When I do select * .. | mysql ... > /tmp/file from a table with text, there are some problematic characters that prevent me from loading it to a different db using copy (postgres) or load into (...
1
vote
0
answers
2k
views
PSQL console encoding
How can I have a working PSQL console using UTF8 encoding under Windows?
I have a Windows server and client. The Postgres 12 database contains tables with content in multiple languages (ex: English, ...
4
votes
2
answers
1k
views
Does MySQL 8 ASCII vs utf8mb4_0900_ai_ci size differ when only using ASCII characters?
If I only use only ASCII characters, will VARCHAR (255) with utf8mb4_0900_ai_ci be larger on disk than VARCHAR (255) using ASCII?
2
votes
1
answer
13k
views
Saving images as base64 encoded strings, why is it bad?
I've seen this on one of the production databases I've come across and these images apparently cover a large portion of their DB. After researching a lot I couldn't really find a lot of good answers ...
1
vote
1
answer
6k
views
Compress JSON String Stored in PostgreSQL, such as MessagePack?
JSON strings are currently being stored in a PostgreSQL 11 table in a text field. For example, a row can have the text field asks containing the string:
{"0.000295":1544.2,"0.000324":1050,"0.000325":...
1
vote
0
answers
358
views
MariaDB REGEXP_REPLACE Invalid utf8 byte sequence
I'm working with a fairly old (15 years) old database. It started as a MySQL3, was upgraded several times, the frontends had problems with proper encoding, the whole database was converted to utf8mb4 (...
1
vote
2
answers
2k
views
Oracle to T-SQL OPENQUERY special character conversion issues
I'm struggling to figure out where the character encoding issue on my Linked server may be coming from here. The ZPDT_PAT_ALPHA column should have a degrees symbol at the end, as shown by the DUMP.
...
2
votes
2
answers
560
views
charindex thrown off by extended characters
I've got a column with a filenames stored in a nvarchar(255). I'm trying to parse out the file extension using
reverse( left( reverse(filename), charindex('.', reverse(filename) ) -1 ) )
This works ...
2
votes
0
answers
160
views
Create/Edit SSIS Derived Column Task From Text/Code for Large ETL
I have to insert a large UTF-8 encoded flat file to a 1252 encoded sql server. In order to do so, I am using SSIS to type cast each column. Is there a way, other than copy pasting through all 400 ...
7
votes
1
answer
17k
views
Query to find rows containing ASCII characters in a given range
I am using some scripts from another topic, but the accepted answer isn't working for all my data scenarios. I would have asked my question on the original How to check for Non-Ascii Characters post, ...
4
votes
2
answers
843
views
Different characters, same ASCII code?
I have this query that throws two results:
SELECT id FROM table1 WHERE id like 'nm041033%'
nm0410331
nm0410331
And this slightly different query that throws only one result:
SELECT id FROM table1 ...
3
votes
1
answer
4k
views
Is there a MySQL character set and encoding that will allow for both emojis and accents?
I've got a database of terms that get added to by one group of users, and queried against by another.
I was running into problems when people would query for an emoji in the database and my React app ...
2
votes
2
answers
2k
views
Handling data encoding issues while loading data to SQL from Script (Notepad++)
I'm pretty sure this is not a SQL Server problem.
I already asked a question HERE with an awesome explanation, BUT I still can't explain to the guys where I work that it has nothing to do with SQL ...
20
votes
3
answers
32k
views
PostgreSQL: difference between collations 'C' and 'C.UTF-8'
In PostgreSQL, what is the difference between collations C and C.UTF-8?
Both show up in rows of pg_collation. Is it perhaps the case that C.UTF-8 is the same as C with encoding UTF-8 regardless or ...
0
votes
1
answer
10k
views
How can I search for a hex string in oracle?
There's a database record that is incorrect. The name field displays like this in a browser ( this is incorrect ):
Thcˇodore
And when I look at the record via SQL results, I see:
Thc\xCB\x87odore
I'...
6
votes
2
answers
2k
views
Byte ordering for multibyte characters in SQL Server versus Oracle
I am currently in the process of migrating data from Oracle to SQL Server and I'm encountering an issue trying to validate the data post-migration.
Environment Details:
Oracle 12 - AL32UTF8 ...
1
vote
2
answers
10k
views
Bulk insert not retaining special chars of UTF-8 Encoded txt file Sql Server 2008
I have a stored procedure that bulk imports a text file and inserts it into my database.
CREATE TABLE DBO.TEMP_STORE
(
ID nvarchar(max),
[MONTH] nvarchar(max),
[...
4
votes
3
answers
2k
views
Unicode storage of \u202b RLE and \u202c PDE in a Unicode-aware database?
I'm building a new product for toponyms and in it the Arabic shows kinda like this:
^IArabic^I<202b>ﺰﻤﺑﺎﺑﻮﻳ<202c>^I<202b>ﺞﻫﻭﺮﻳﺓ ﺰﻤﺑﺎﺑﻮﻳ<202c>$
Actually not quite. This is a ...
3
votes
1
answer
4k
views
How to insert a Unicode character verbose into a varchar DB?
I need to insert this character '●' into a VARCHAR column of a MSSQL database with collation set as SQL_Latin1_General_CP1_CI_AS (or at least mock what my Python + Windows MSSQL Driver might have done)...
3
votes
2
answers
4k
views
SQL Server 2019 UTF-8 Support Benefits
I'm already quite comfortable with using COMPRESS() and DECOMPRESS() in an internal forum software for our company (Currently in SQL Server 2017), but trying to make the database as efficient as ...
0
votes
1
answer
70
views
Users running ANSI scripts are having problems with special characers [closed]
Long story short. How can I fix users environments, to make them run our scripts using ANSI encode?
The problem is, we send them scripts to run on their databases using ANSI encode.
But some of ...
17
votes
1
answer
12k
views
Error starting SQL Server 2017 service. Error Code 3417
I have SQL Server 2017 installed on my computer. This is what SELECT @@VERSION returns:
Microsoft SQL Server 2017 (RTM-GDR) (KB4293803) - 14.0.2002.14 (X64) Jul 21 2018 07:47:45 Copyright (C) ...