Escape characters within clickhouse client terminal - character-encoding

I'm using the clickhouse CLI on a XTERM terminal with a bash shell on Redhat EL6 OS.
The output is unreadable due to terminal escape sequences.
For instance:
SELECT count(*)
FROM system.tables
ââcount()ââ
â 35 â
âââââââââââ
1 rows in set. Elapsed: 0.002 sec.
Things get better when I use the --format=PrettySpace option but eventual NULL values are still unreadable:
SELECT DISTINCT ont_index
FROM port_status_events
WHERE isNull(ont_index) OR (ont_index < 2)
ORDER BY ont_index ASC NULLS FIRST
ââont_indexââ
â á´ºáµá´¸á´¸ â
â 0 â
â 1 â
âââââââââââââ
3 rows in set. Elapsed: 0.003 sec. Processed 11.57 thousand rows, 23.13 KB (3.50 million rows/s., 6.99 MB/s.)
Is there a way to tell the client I'm using a different type of terminal?

As #Thomas Dickey correctly said, this has nothing to do with terminal escape sequences but with character encoding.
I changed by PuTTY settings to UTF-8 and everything works correctly now.

Related

Regular expression to validate field structure

I would like to implement a regular expression in linux that using grep allows me to verify that a field contains 15 numerical values and that the value occupying the fifth position (starting from left) is either a 5 or a 6.
I have reached the point of defining the requirement that it contains a maximum of 15 values, however, I can not get that the one that occupies the fifth position is a 5 or 6. It would be:
grep -E "^[0-9]{1,15}"
Any idea?
For exactly 15 numbers, and the 5 position is either 5 or 6:
grep -E "^[0-9]{4}[56][0-9]{10}$"
^ Start of string
[0-9]{4} Match 4 digits
[56] Match either 5 or 6
[0-9]{10} Match 10 digits
$ End of string
To match at least the first 5 characters followed by 0-10 digits after it, and allow a partial match like matching 123462222233333 in 12346222223333344444
grep -Eo "^[0-9]{4}[56][0-9]{0,10}"

Ansible regex_findall multiple strings

Cisco IOS routers, doing a "dir", and I want to grab all file names with ".bin" in the name.
Example string:
Directory of flash0:/
1 -rw- 95890300 May 24 2015 11:27:22 +00:00 c2900-universalk9-mz.SPA.153-3.M5.bin
2 -rw- 68569216 Feb 8 2019 20:15:26 +00:00 c3900e-universalk9-mz.SPA.151-4.M10.bin
3 -rw- 46880 Oct 25 2017 19:08:56 +00:00 pdcamadeusrtra-cfg
4 -rw- 600 Feb 1 2019 19:36:44 +00:00 vlan.dat
260153344 bytes total (95637504 bytes free)
I've figured out how to pull "bin", but I can't figure out how to pull the whole filename (starting with " c", ending in "bin"), because I want to then use the values and delete unwanted files.
I'm new to programming, so the regex examples are a little confusing.
You can use this regex
^[\w\W]+?(?=(c.*\.bin))\1$
^ - Start of string.
[\w\W]+? - Match anything one or more time ( Lazy mode ).
(?=(c.*\.bin)) - Positive lookahead match c followed by anything followed by \.bin ( Group 1)
\1 - Match group 1.
$ - End of string.
Demo
To match the filename that start with a c (or at the start of the string) you might use a negative lookbehind (?<!\S) to check what is on the left is not a non-whitespace character.
Then match either 1+ times not a whitespace character \S+ or list in a character class [\w.-]+ what the allowed characters are to match. After that match a dot \. followed by bin.
At the end you might use a word boundary \b to prevent bin being part of a larger word:
(?<!\S)[\w.-]+\.bin\b
regex101 demo
Thank you Code Maniac!
Your code finds one instance, and I needed to find all. Using what you gave me plus messing around with some other examples, I found this to work:
binfiles="{{ dir_response.stdout[0] | regex_findall('\b(?=(c.*.bin))\b') }}"
Now I get this:
TASK [set_fact] ********************************************************************************************************
task path: /export/home/e130885/playbooks/ios-switch-upgrade/ios_clean_flash.yml:16
Tuesday 12 February 2019 08:29:58 -0600 (0:00:00.350) 0:00:03.028 ******
ok: [10.35.91.200] => changed=false
ansible_facts:
binfiles:
- c2900-universalk9-mz.SPA.153-3.M5.bin
- c3900e-universalk9-mz.SPA.151-4.M10.bin
- c2800nm-adventerprisek9-mz.151-4.M12a.bin
Onto the next task of figuring out how to use each element. Thank you!

MemSQL load data infile does not support hexadecimal delimiter

From this, MySQL load data infile command works well with hexadecimal delimiter like X'01' or X'1e' in my case. But the same command can't be run with same command load data infile on MemSQL.
I tried specifying various forms of of the same delimiter \x1e like:
'0x1e' or 0x1e
X'1e'
'\x1e' or 'x1e'
All the above don't work and throw either syntax error or other error like this:
This is like the delimiter can't be resolved correctly:
mysql> load data local infile '/container/data/sf10/region.tbl.hex' into table REGION CHARACTER SET utf8 fields terminated by '\x1e' lines terminated by '\n';
ERROR 1261 (01000): Row 1 doesn't contain data for all columns
This is syntax error:
mysql> load data local infile '/container/data/sf10/region.tbl.hex' into table REGION CHARACTER SET utf8 fields terminated by 0x1e lines terminated by '\n';
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '0x1e lines terminated by '\n'' at line 1
mysql>
The data is actually delimited by non-printable hexadecimal character of \x1e and line terminated by regular \n. Use cat -A can see the delimited characters as ^^. So the delimiter should be correct.
$ cat -A region.tbl.hex
0^^AFRICA^^lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to $
1^^AMERICA^^hs use ironic, even requests. s$
Are there a correct way to use hex values as delimiter? I can't find such information in documentation.
For the purpose of comparison, hex delimiter (0x1e) can work well on MySQL:
mysql> load data local infile '/tmp/region.tbl.hex' into table region CHARACTER SET utf8 fields terminated by 0x1e lines terminated by '\n';
Query OK, 5 rows affected (0.01 sec)
Records: 5 Deleted: 0 Skipped: 0 Warnings: 0
MemSQL supported hex delimiters as of 6.7, of the form in the last code block in your question. Prior to that, you would need the literal quoted 0x1e character in your sql string, which is annoying to do from a CLI. If youre on an older version you may need to upgrade.

GIZA++ :Forbidden zero sentence length 0

I have been using GIZA++ for translation of sentence when I used on test dataset an error is displayed "ERROR: Forbidden zero sentence length 0". IS there any way to avoid this error
I had the same problem with the en-vi corpus. (English-Vietnamese)
Because your corpus data is too long or not clean.
You should clean up your corpus data.
It will limit sentence length to 80. This is the command with Moses tools.
~/mosesdecoder/scripts/training/clean-corpus-n.perl
~/corpus/train en vi
~/corpus/train.clean 1 80
Or you can adjust manually.
Try to cut down the length of each line less than 100 characters or 80 words.

Huffman tree, is this correct?

I'm trying to make create a correct huffman tree and was wondering if this was correct. The top number is the frequency/weight and the bottom number is the ASCII code. The string is
"hhiiiisssss". If I entered this into a text file, there would be only one LF correct? I'm not sure why my program is reading in two.
14
-1
/ \
9 5
-1 s(115)
/ \
5 4
-1 i(105)
/ \
3 2
h(104) LF(10)
In a text file there would only be one LF if there is only one line of text, correct.
Something else is wrong though. There are only two 'h' in your string but your tree shows three, and a total of 14 characters. I'm guessing it's a typo?
Aside from that it looks ok and your huffman codes would be (depending on whether you pick '0' for left or right):
s: 1
i: 01
LF: 001
h: 000

Resources