From this, MySQL load data infile command works well with hexadecimal delimiter like X'01' or X'1e' in my case. But the same command can't be run with same command load data infile on MemSQL.
I tried specifying various forms of of the same delimiter \x1e like:
'0x1e' or 0x1e
X'1e'
'\x1e' or 'x1e'
All the above don't work and throw either syntax error or other error like this:
This is like the delimiter can't be resolved correctly:
mysql> load data local infile '/container/data/sf10/region.tbl.hex' into table REGION CHARACTER SET utf8 fields terminated by '\x1e' lines terminated by '\n';
ERROR 1261 (01000): Row 1 doesn't contain data for all columns
This is syntax error:
mysql> load data local infile '/container/data/sf10/region.tbl.hex' into table REGION CHARACTER SET utf8 fields terminated by 0x1e lines terminated by '\n';
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '0x1e lines terminated by '\n'' at line 1
mysql>
The data is actually delimited by non-printable hexadecimal character of \x1e and line terminated by regular \n. Use cat -A can see the delimited characters as ^^. So the delimiter should be correct.
$ cat -A region.tbl.hex
0^^AFRICA^^lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to $
1^^AMERICA^^hs use ironic, even requests. s$
Are there a correct way to use hex values as delimiter? I can't find such information in documentation.
For the purpose of comparison, hex delimiter (0x1e) can work well on MySQL:
mysql> load data local infile '/tmp/region.tbl.hex' into table region CHARACTER SET utf8 fields terminated by 0x1e lines terminated by '\n';
Query OK, 5 rows affected (0.01 sec)
Records: 5 Deleted: 0 Skipped: 0 Warnings: 0
MemSQL supported hex delimiters as of 6.7, of the form in the last code block in your question. Prior to that, you would need the literal quoted 0x1e character in your sql string, which is annoying to do from a CLI. If youre on an older version you may need to upgrade.
Related
I am working on a Sybase ASE database and would like to know the character encoding (UTF8 or ASCII or whatever) used by the databae.
What's the command to show which character encoding the database uses?
The command you're looking for is actually a system stored procedure:
1> sp_helpsort
2> go
... snip ...
Sort Order Description
------------------------------------------------------------------
Character Set = 190, utf8
Unicode 3.1 UTF-8 Character Set
Class 2 Character Set
Sort Order = 50, bin_utf8
Binary sort order for the ISO 10646-1, UTF-8 multibyte encodin
g character set (utf8).
... snip ...
From this output we see this particular ASE dataserver has been configured with a default character set of utf8 and default sort order of binary (bin_utf8). This means all data is stored as utf8 and all indexing/sort operations are performed using a binary sort order.
Keep in mind ASE can perform character set conversions (for reads and writes) based on the client's character set configuration. Though the successfulness of said conversions will depend on the character sets in question (eg, a client connecting with utf8 may find many characters cannot be converted for storage in a dataserver defined with a default character set of iso_1).
With a query:
select
cs.name as server_character_set,
cs.description as character_set_description
from
master..syscharsets cs left outer join
master..sysconfigures cfg on
cs.id = cfg.value
where
cfg.config = 131
Example output:
server_character_set character_set_description
utf8 Unicode 3.1 UTF-8 Character Set
I am trying to import 2 csv files where 1 has nodes and another has relationship. In relationship, I have a column where data needs to be splitted. I tried using --array-delimiter='|' but I get an error. I was hoping if someone could help me with this issue.
node.csv
identifier:ID,name:LABEL
1,apple
2,ball
3,cat
rel.csv
source_ids:START_ID,target_ids:END_ID,type:TYPE,data
1,2,connection,
2,3,relation,test1|test2
3,1,connection,test1
1,3,relation,test4|test3|tet6
If I use the command below, I get data with | in it. It will not split data field.
neo4j-admin import --verbose --ignore-extra-columns=true --nodes C:/Users/Sam/Documents/node.csv --relationships C:/Users/Sam/Documents/rel.csv.
I get result in a format of :
{
"data": "test4|test3|tet6"
}
What I want is :
{
"data": ["test4","test3","tet6"]
}
When I try:
neo4j-admin import --verbose --ignore-extra-columns=true --array-delimiter= "|" --nodes C:/Users/Sam/Documents/node.csv --relationships C:/Users/Sam/Documents/rel.csv.
I get an error:
Invalid value for option '--array-delimiter': cannot convert '' to char (java.lang.Ill
egalArgumentException: Unsupported character '')
[picocli WARN] Could not format 'Maximum memory that neo4j-admin can use for various d
ata structures and caching to improve performance. Values can be plain numbers, like 1
0000000 or e.g. 20G for 20 gigabyte, or even e.g. 70%.' (Underlying error: Conversion
= '.'). Using raw String: '%n' format strings have not been replaced with newlines. Pl
ease ensure to escape '%' characters with another '%'.
[picocli WARN] Could not format 'Maximum memory that neo4j-admin can use for various d
ata structures and caching to improve performance. Values can be plain numbers, like 1
0000000 or e.g. 20G for 20 gigabyte, or even e.g. 70%.' (Underlying error: Conversion
= '.'). Using raw String: '%n' format strings have not been replaced with newlines. Pl
ease ensure to escape '%' characters with another '%'.
[picocli WARN] Could not format 'Maximum memory that neo4j-admin can use for various d
ata structures and caching to improve performance. Values can be plain numbers, like 1
0000000 or e.g. 20G for 20 gigabyte, or even e.g. 70%.' (Underlying error: Conversion
= '.'). Using raw String: '%n' format strings have not been replaced with newlines. Pl
ease ensure to escape '%' characters with another '%'.
Thanks,
Sam
You have an extraneous space character in your command line.
Change:
--array-delimiter= "|"
to:
--array-delimiter="|"
To define an array type, append [] to the type. Example:
source_ids:START_ID,target_ids:END_ID,type:TYPE,data:[]
Refer documentation
I am trying to import a backup of a Firebase Real-Time Database from Google Cloud Storage into BigQuery and getting the following error:
Invalid field name "Name ". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long. Table: tabletest
I have tried a second dataset that returns the following error:
Invalid field name "-Kq4_0dsRwKfOGGxGoQv". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long. Table: tabletest
This is the second dataset:
{"users":{"someUserID_abc":{"tests":{"-Kq4_0dsRwKfOGGxGoQv":"07/02/2019 19:44:22","-Kq4_vUQTQ3b6gqMkfRL":"07/02/2019 19:48:20","-Kq4a84n9WMu3NGiE4qW":"07/02/2019 19:53:36"}}}}
In my initial (very large) dataset, there are lots of unique firebase keys that were uniquely generated by firebase and usually start with "-" (which seems to create the error with BigQuery).
My settings on the Create table screen are:
Source
Create table from: Google Cloud Storage
Select file from GCS bucket: myproject-backups/2019-07-03T02:23:34Z_myproject_data.json.gz
File format: JSON (Newline delimited)
Destination
Project name: myproject
Dataset name: database
Table type: Native table
Table name: tabletest
Schema
Auto detect
(checked) Schema and input parameters
enter code here
After I click the "Create table" button on the "Create table" screen, I get the error above.
I have tried different dataset and table names, adjusted the table type, looked through each of the following for answers without any luck:
https://github.com/metabase/metabase/issues/4087
https://cloud.google.com/bigquery/docs/schemas
https://github.com/bxparks/bigquery-schema-generator/issues/14
https://github.com/metabase/metabase/pull/4707
https://github.com/metabase/metabase/issues/3997
loading data from a datastore backup into big query throws invalid field name error
Based on here, I am guessing the error is that "Name" has a space in it (and might be from the database?) and I have lots of random keys generated by firebase (which start with a leading "-"). It's a very large dataset, so I can't even unzip and open the initial gzip file without the program freezing (even plain text editor).
EDIT:
I created the following script to gunzip, "clean" the data for all keys in the .gzip, and re-gzip the file:
# remove spaces, replace leading dashmarks (replace - with the word 'dashmark'), add an underscore before numbers
file=$"myfile.json.gz"
gunzip "$file"
sed -e "s/Name /Name/g" -e "s/-/dashmark/g" -e "s/{\"\([0-9]+\)/{\"_\1/g" -e "s/,\"\([0-9]+\)/,\"_\1/g" -e "s/,\"\"/\,\"_\"/g" -e "s/{\"\"/\{\"_\"/g" < "${file%.gz}" | gzip -c > "${file%.gz}.gz"
This seems to eliminate the "Invalid field name..." errors, but introduces the following error:
Error while reading data, error message: Failed to parse JSON:
Unexpected token; Could not parse value; Could not parse value; Could
not parse value; Could not parse value; Could not parse value; Could
not parse value; Could not parse value; Could not parse value; Could
not parse value; Could not parse value; Could not parse value; Could
not parse value; Parser terminated before end of string
I am admittedly new to sed statements, so perhaps I typo'd or miswrote script in a way that caused an invalid JSON object to be created in my attempt?
Does anyone know how to remove spaces/special characters from all keys in the .gzip (or any other way to resolve this error to import the Firebase RTD .gzip into BigQuery)?
If you copied the error message exactly, it appears that there is a space or some other whitespace character in the string "Name ". That whitespace character is invalid, as the error message is telling you. You'll have to dig through your data to figure out where exactly that invalid character is coming from.
This error can also be printed when you run
bq mk --table --schema xyz.json
And the JSON file is not present in the folder from which you ran the command.
BigQuery error in mk operation: Invalid field name "xyz.json". Fields must contain only letters, numbers, and underscores, start with
a letter or underscore, and be at most 300 characters long.
It's file not found but the error makes you think there's a problem in your schema JSON.
I'm trying to sum a column in a medium sized data file (15M rows), but I get the following error:
$> q -Ht 'select sum(value) from datafile.txt'
Error('field larger than field limit (131072)'
My search led to links suggesting a change of default field size in python parsing of csv.fieldsize(), however after checking with awk I verified that my file has no large fields.
never forget: cleanse your data before processing
I found that my data file is full of product names with single and double quotes (single quotes for possessive names, and doubles to represent 'inches'. This causes the python parser to read the delimiter as literal characters within the field.
Do this:
sed s:\"::g data.txt > tmp ; sed s:\'::g tmp > data.txt
Terrible, terrible single/double quotes in data.
For some reason the code below breaks in psql as supplied with Greenplum at the \copy stage:
\set tmp1 public.tmp1
DROP TABLE IF EXISTS :tmp1;
CREATE TABLE :tmp1 (new_id varchar(255), old_id BIGINT) DISTRIBUTED BY (old_id);
\echo :tmp1
\copy :tmp1 FROM 'file1.csv' WITH DELIMITER '|' HEADER CSV;
ERROR: syntax error at or near ":"
LINE 1: COPY :tmp1 FROM STDIN WITH DELIMITER '|' HEADER CSV;
How can you use a variable table name with the copy command in psql?
I don't think this has to do with greenplum or an old PostgreSQL version. The \copy meta command does not expand variables as documented here:
Unlike most other meta-commands, the entire remainder of the line is always taken to be the arguments of \copy, and neither variable interpolation nor backquote expansion are performed in the arguments.
To workaround you can build, store and execute the command in multiple steps. Replace your \copy ... line with
\set cmd '\\copy ' :tmp1 ' FROM ''file1.csv'' WITH DELIMITER ''|'' HEADER CSV'
:cmd
With this, you need to double (escape) the \ and ' within the embedded meta command. Keep in mind that \set concatenates all further arguments to the second one, so you need to quote spaces between the arguments.