How to make nntplib author name human readable? - character-encoding

python NNTPLib is giving me author name such as ,
"=?Utf-8?B?RGVubmlzIEJhc2hhbQ==?= < someone#someforum.com >"
(Quotes for clarity).
How do i encode this text in human readable format?

The double equals at the end was a dead giveaway. Base 64 uses that for padding strings that do not have an even multiple of 4 bytes.
In python:
print "RGVubmlzIEJhc2hhbQ==".decode('base_64')
In PHP:
<?php echo base64_decode("RGVubmlzIEJhc2hhbQ=="); ?>
Enjoy.
PS... Dennis Basham

Related

How to convert a formatted string into plain text

User copy paste and send data in following format: "𝕛𝕠𝕧π•ͺ π••π•–π•“π•“π•šπ•–"
I need to convert it into plain txt (we can say ascii chars) like 'jovy debbie'
It comes in different font and format:
ex:
'π‘±π’†π’π’Šπ’„π’‚ π‘«π’–π’ˆπ’π’”'
'π™ΆπšŽπšŸπš’πšŽπš•πš’πš— π™½πš’πšŒπš˜πš•πšŽ π™»πšžπš–πš‹πšŠπš'
Any Help will be Appreciated, I already refer other stack overflow question but no luck :(
Those letters are from the Mathematical Alphanumeric Symbols block.
Since they have a fixed offset to their ASCII counterparts, you could use tr to map them, e.g.:
"𝕛𝕠𝕧π•ͺ π••π•–π•“π•“π•šπ•–".tr("𝕒-𝕫", "a-z")
#=> "jovy debbie"
The same approach can be used for the other styles, e.g.
"π‘±π’†π’π’Šπ’„π’‚ π‘«π’–π’ˆπ’π’”".tr("𝒂-𝒛𝑨-𝒁", "a-zA-Z")
#=> "Jenica Dugos"
This gives you full control over the character mapping.
Alternatively, you could try Unicode normalization. The NFKC / NFKD forms should remove most formatting and seem to work for your examples:
"𝕛𝕠𝕧π•ͺ π••π•–π•“π•“π•šπ•–".unicode_normalize(:nfkc)
#=> "jovy debbie"
"π‘±π’†π’π’Šπ’„π’‚ π‘«π’–π’ˆπ’π’”".unicode_normalize(:nfkc)
#=> "Jenica Dugos"

Print 8 digits after dot

I have this number:
$total = '0.04149951583898188';
I want to diplay only the first 8 digits after the dot, like this: 0.0414995
How can I do this?
Here is my answer, guessing from your profile & the code snippet that you’re probably talking about PHP. Do not forget to tell us which language you want help with, we shouldn’t have to guess!
Have a look at PHP’s round() function:
<?php
$total = '0.04149951583898188';
echo round((float)$total, 8); // Cast the $total string variable to a type float, and round it
?>

How to convert .txt files to .xls files using informix 4GL codes

I got a question to be disscuss.I am working on INFORMIX 4GL programs. That programs produce output text files.This is an example of the output:
Lot No|Purchaser name|Billing|Payment|Deposit|Balance|
J1006|JAUHARI BIN HAMIDI|5285.05|4923.25|0.00|361.80|
J1007|LEE, CHIA-JUI AKA LEE, ANDREW J. R.|5366.15|5313.70|0.00|52.45|
J1008|NAZRIN ANEEZA BINTI NAZARUDDIN|5669.55|5365.30|0.00|304.25|
J1009|YAZID LUTFI BIN AHMAD LUTFI|3180.05|3022.30|0.00|157.75|
From that output text files(.txt) files, we can open it manually from the excel(.xls) files.From this case, is that any 4gl codes or any commands that we can use it for open the text files in microsoft excell automatically right after we run the program?If there any ideas,please share with me... Thank You
The output shown is in the normal Informix UNLOAD format, using the pipe as a delimiter between fields. The nearest approach to this for Excel is a CSV file with comma-separated values. Generating one of those from that output is a little fiddly. You need to enclose fields containing a comma inside double quotes. You need to use commas in place of pipes. And you might have to worry about backslashes too.
It is a moot point whether it is easier to do the conversion in I4GL or whether to use a program to do the conversion. I think the latter, so I wrote this script a couple of years ago:
#!/usr/bin/env perl
#
# #(#)$Id: unl2csv.pl,v 1.1 2011/05/17 10:20:09 jleffler Exp $
#
# Convert Informix UNLOAD format to CSV
use strict;
use warnings;
use Text::CSV;
use IO::Wrap;
my $csv = new Text::CSV({ binary => 1 }) or die "Failed to create CSV handle ($!)";
my $dlm = defined $ENV{DBDELIMITER} ? $ENV{DBDELIMITER} : "|";
my $out = wraphandle(\*STDOUT);
my $rgx = qr/((?:[^$dlm]|(?:\\.))*)$dlm/sm;
# $csv->eol("\r\n");
while (my $line = <>)
{
print "1: $line";
MultiLine:
while ($line eq "\\\n" || $line =~ m/[^\\](?:\\\\)*\\$/)
{
my $extra = <>;
last MultiLine unless defined $extra;
$line .= $extra;
}
my #fields = split_unload($line);
$csv->print($out, \#fields);
}
sub split_unload
{
my($line) = #_;
my #fields;
print "$line";
while ($line =~ $rgx)
{
printf "%d: %s\n", scalar(#fields), $1;
push #fields, $1;
}
return #fields;
}
__END__
=head1 NAME
unl2csv - Convert Informix UNLOAD to CSV format
=head1 SYNOPSIS
unl2csv [file ...]
=head1 DESCRIPTION
The unl2csv program converts a file from Informix UNLOAD file format to
the corresponding CSV (comma separated values) format.
The input delimiter is determined by the environment variable
DBDELIMITER, and defaults to the pipe symbol "|".
It is not assumed that each input line is terminated with a delimiter
(there are two variants of the UNLOAD format, one with and one without
the final delimiter).
=head1 EXAMPLES
Input:
10|12|excessive|cost \|of, living|
20|40|bou\\ncing tigger|grrrrrrrr|
Output:
10,12,"excessive","cost |of, living"
20,40,"bou\ncing tigger",grrrrrrrr
=head1 RESTRICTIONS
Since the csv2unl program does not know about binary blob data, it
cannot convert such data into the hex-encoded format that Informix
requires.
It can and does handle text blob data.
=head1 PRE-REQUISITES
Text::CSV_XS
=head1 AUTHOR
Jonathan Leffler <jleffler#us.ibm.com>
=cut
I generate Excel files from 4GL code by writing a XML with the Excel progid ("?mso-application progid=\"Excel.Sheet\"?) so Excel opens it as such.
Its like writing HTML from 4GL, you just wite HTML code to file. But with Excel you write XML.

list of garbage characters like Ò€ℒ

I am using librets to retrieve data form my RETS Server. Somehow librets Encoding method is not working and I am receiving some weird characters in my output. I noticed characters like '’' is replaced with Ò€ℒ. I am unable to find a fix for librets so i decided to replace such garbage characeters with actual values after downloading data. What I need is a list of such garbage string and their equivalent characters. I googled for this but not found any resource. Can anyone point me to the list of such garbage letters and their actual values or a piece of code which can generate such letter.
thanx
Search for the term "UTF-8", because that's what you're seeing.
UTF-8 is a way of representing Unicode characters as a sequence of bytes. ("Unicode characters" are the full range of letters and symbols used all in human languages.) Typically, one Unicode character becomes 1, 2, or 3 bytes in UTF-8. When those bytes (numbers from 0 to 255) are displayed using the character set normally used by Windows, they appear as "garbage" -- in this case, 3 "garbage letters" which are really the 3 bytes of a UTF-8 encoding.
In your example, you started with the smart quote character ’. Its representation in Unicode is the number 8217, or U+2019 (2019 is the hexadecimal for 8217). (Search for "Unicode" for a complete list of Unicode characters and their numbers.) The UTF-8 representation of the number 8217 is the three byte sequence 226, 128, 153. And when you display those three bytes as characters, using the Windows "CP-1252" character encoding (the ordinary way of displaying text on Windows in the USA), they appear as Ò€ℒ. (Search for "CP-1252" to see a table of bytes and characters.)
I don't have any list for you. But you could make one if you wrote a program in a language that has built-in support for Unicode and UTF-8. All I can do is explain what you are seeing.
If there is a way to tell librets to use UTF-8 when downloading, that might automatically solve your problem. I don't know anything about librets, but now that you know the term "UTF-8" you might be able to make progress.
Question reminder:
"...I noticed characters like '’' is replaced with Ò€ℒ... i decided to
replace such garbage characeters with actual values after downloading
data. What I need is a list of such garbage string and their
equivalent characters."
Strictly dealing with this part:
"What I need is a list of such garbage string and their equivalent
characters."
Using php, you can generate these characters and their equivalence. Working with all 1,111,998 Unicode points or 109,449 Utf8 symbols is impractical. You may use the ASCII range in the following loop between &#128 and &#258 or another range that is more relevant to your context.
<?php
for ($i=128; $i<258; $i++)
$tmp1 .= "<tr><td>".htmlentities("&#$i;")."</td><td>".html_entity_decode("&#".$i.";",ENT_NOQUOTES,"utf-8")."</td><td>&#".$i.";</td></tr>";
echo "<table border=1>
<tr><td>&#</td><td>"Garbage"</td><td>symbol</td></tr>";
echo $tmp1;
echo "</table>";
?>
From experience, in an ASCII context, most "garbage" symbols originate in the range &#128 to &#257 + (seldom) &#8129 to &#8246.
In order for the "garbage" symbols to display, the html page charset must be set to iso-1 or whichever other charset that caused the problem in the first place. They will not show if the charset is set to utf-8.
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
.
"i decided to replace such garbage characeters with actual values
after downloading data"
You CANNOT undo the "garbage" with php utf8_decode(), which would actually create more "garbage" on already "garbage". But, you may use the simple and fast search and replace php str_replace() function.
First, generate 2 arrays for each set of "garbage" symbols you wish to replace. The first array is the Search term:
<?php
//ISO 8859-1 (Latin-1) special chars are found in the range 128 to 257
$tmp1 = "\$SearchArr = array(";
for ($i=128; $i<258; $i++)
$tmp1 .= "\"".html_entity_decode("&#".$i.";",ENT_NOQUOTES,"utf-8")."\", ";
$tmp1 = substr($tmp1,0,strlen($tmp1)-2);//erases last comma
$tmp1 .= ");";
$tmp1 = htmlentities($tmp1,ENT_NOQUOTES,"utf-8");
?>
The second array is the replace term:
<?php
//Adapt for your relevant range.
$tmp2 = "\$ReplaceArr = array(\n";
for ($i=128; $i<258; $i++)
$tmp2 .= "\"&#".$i.";\", ";
$tmp2 = substr($tmp2,0,strlen($tmp2)-2);//erases last comma
$tmp2 .= ");";
echo $tmp1."\n<br><br>\n";
echo $tmp2."\n";
?>
Now, you've got 2 arrays that you can copy and paste to use and reuse to clean any of your infected strings like this:
$InfectedString = str_replace($SearchArr,$ReplaceArr,$InfectedString);
Note: utf8_decode() is of no help for cleaning up "garbage" symbols. But, it can be used to prevent further contamination. Alternatively a mb_ function can be useful.

COBOL accurate LENGTH OF (XML-TEXT) when ampersands escaped?

I'm using the intrinsic function XML-PARSE with XML that looks like this:
<MSGBODYTXT>
<LN>One & two</LN>
</MSGBODYTXT>
By my count, the following string is 13 bytes long.
"One & two"
But when I take $
LENGTH OF(XML-TEXT) $
I get only 9 bytes.
What can I do to get the correct, 13-byte length?
The problem is that XML PARSE translates & into the character it represents, an ampersand. If
you look at the CONTENT-CHARACTERS associated with the <LN> tag you will see: One & two which
is 9 characters long, just as the LENGTH OF operator on XML-TEXT says it is.
Note that if you were to use XML GENERATE on data item LN having the value One & two
it will generate as <LN>One & two</LN> which is a symetrical operation.

Resources