How to convert HTML entities to plain text in YQL or Y! Pipes? - yql

I have the source in html entities (eg: cover)
I want to convert them into simple readable plain text (eg: abcdef...)
how can i do that using YQL or Yahoo Pipes?
(eg: i want to convert cover into cover using YQL or Yahoo pipes)

Use the following components:
A yql stylesheet
select * from xslt where url="//foo.html" and stylesheet="//bar.xsl"
An entity declaration
<!DOCTYPE xsl:stylesheet[
<!ENTITY 99 "c">
<!ENTITY 111 "o">
<!ENTITY 118 "v">
<!ENTITY 101 "e">
<!ENTITY 114 "r">
]>
References
XML Entity Definitions for Characters (2nd Edition)

Related

Pandoc Citeproc doesn't work on HTML format

I'm trying to reference cites from the .bib file in the HTML but without success. The function perfectly works for Markdown, so my question is does the citeproc work on other formats except for MD?
Here are some examples which I use:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="en">
<body>
Test [#test1]
</body>
</html>
Command: pandoc --bibliography=test.bib --citeproc test.html -o test.html -s --metadata-file=test.yaml
The .bib file contains the following:
#article{test1,
author = {Rathod, N and Kulawik, P and Ozogul, Y and Ozogul, F and Bekhit, A},
title = {Recent developments in non-thermal processing for seafood and seafood products: cold plasma, pulsed electric field and high hydrostatic pressure},
journal = {International Journal of Food Science & Technology},
date = {2022},
year = {2022},
pages = {774--790},
volume = {57},
number = {2},
doi = {10.1111/ijfs.15392},
raw = {Rathod, N. B., Kulawik, P., Ozogul, Y., Ozogul, F., & Bekhit, A. E. D. A. (2022). Recent
developments in non-thermal processing for seafood and seafood products: cold plasma, pulsed
electric field and high hydrostatic pressure. International Journal of Food Science &
Technology, 57(2), 774-790. https://doi.org/10.1111/ijfs.15392}
}
I have created the Lua filter which covers only partial cases. I'm a newbie in Lua and can not currently make the complex filter as we have it for MD.
Thank you.

How do I recode an XML document in XMLSpy to use entities

I have a rather large XSLT template which contains bilingual text (national characters in UTF-8). I am looking for a function that will recode all CDATA elements inside to use XML # entities, allowing me to store the XSLT as plain US-ASCII encoding.
Here is a basic example:
<?xml version="1.0" encoding="UTF-8"?>
<test>Soirée</test>
where é is encoded as C3 A9. The desired output would be
<?xml version="1.0" encoding="US-ASCII"?>
<test>Soirée</test>
where é corresponds to the codepoint for 'LATIN SMALL LETTER E WITH ACUTE' (U+00E9). Changing the encoding preamble on the first example results in an error as the UTF-8 bytes become invalid.
Is there a simple way to do this or do I have to resort to a macro?

Plain-text formatted report into Crystal Reports

I have these plain-text reports. Can I please have some suggestions on what general approach to use to get these into crystal reports? Can I parse the file within CR? Do I need to script outside CR and re-write to a different format? I'm new to CR, and if there's a different software that would be better for this I'm open to that. The end-goal is to transform these reports into a nicely formated pdf with company logo, a few graphics, styled table, etc.
You can see the file contains data fields that I would need to capture such as account number, name etc. But then also a variable length table with headers.
Example report: (This is the exact format, with just identifying information changed)
MY REPORT Page: #
10/18/17 09:44 AM
ACCOUNT NUMBER : 456789
COMPANY NAME : JOHN DOE
ADDRESS : 8001 ANYWHERE DR
SOME PLACE, USA
SYSTEM TYPE : MAXMODEL9000
ID EVENT TYPE DESCRIPTION
---------- ---------------------- --------------------------------
3 TYPE 1 BLAH BLAH
4 TYPE 1 BLAH
5 TYPE 1 BLAH BLAH BLAH
6 TYPE 2 DR
7 TYPE 3 KITCHEN
11 TYPE 3 SOMETHING
12 TYPE 4 SOME DESCRIPTION
13 TYPE1 TEST

Simple Wikipedia Text into Plain Text Parser?

I'm searching for a simple parser that translates a String with wiki markup code to readable plain text, e.g.
A lot of these sources can also be used to add to other parts of the article, like the plot section. <font color="silver">[[User:Silver seren|Silver]]</font><font color="blue">[[User talk:Silver seren|seren]]</font><sup>[[Special:Contributions/Silver seren|C]]</sup> 05:34, 22 March 2012 (UTC)
to
A lot of these sources can also be used to add to other parts of the article, like the plot section. SilverserenC 05:34, 22 March 2012 (UTC)
I tried it with DKPro JWPL (where also the above example comes from) but this framework plain text output doesn't parse wiki talk pages (discussions) in the right way. It simply deletes lines that start with a number of ":" characters which are crucial for the talk pages.
Okay, I found out that the old wikipedia parser from JWPL is working: "de.tudarmstadt.ukp.wikipedia.parser"
link
You can use it like:
MediaWikiParserFactory pf = new MediaWikiParserFactory(Language.english);
MediaWikiParser parser = pf.createParser();
ParsedPage pp = parser.parse("some wiki code with markups");
System.out.println(pp.getText());

Is it possible to use double quotes in CSV files on iPad?

I am having issues with the special CSV interpreter (no idea what its called) on iPad mobile browser.
iPad appears to reserve the character " as reserved or special. When this character appears the string is treated as a literal instead of seperated as a CSV.
INPUT:
1111,64-1111-11,Some Tool 12", 112233
Give the input above, the CSV mobile-safari display shows ([] represents a column)
[1111] [64-1111-11] [Some Tool 12, 112233]
Note that the " is missing. Also note that 112233 is not in its own column like it should be.
Question 2:
How can I get the CSV display tool in safari to not treat a six digit number as a phone number?
1234567
Shows up as a hyperlink and asks to "Add Contact" when I click it. I do not want the hyperlink.
UPDATE
iPad is ignoring the escape character (or backslash is not the escape character) for double quotes in CSV files. I am looking at the hex version of the file and I have
\" or 5C 22 (in hex with UTF-8 encoding).
Unfortuntely, the iPad displays the backslash and still treats " as a special character, thereby corrupting my data formatting. Anybody know how I can possibly use " on iPad CSV?
With regards the quotes, have you tried escaping them in the output?
EDIT: conventional escaping doesn't work for CSV files, my apologies. Most specifications state the following:
Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes.
So, testing this on your CSV snippet, a file formatted like this:
1111,64-1111-11,"Some Tool 12""", 112233
or even like this:
1111,64-1111-11,Some Tool 12"""", 112233
… opens in Mobile Safari OK. How good or bad that looks in Excel you'd need to check.
Moving to the second issue, to prevent Mobile Safari from presenting numbers as phone numbers, add this to your page's head element:
<meta name="format-detection" content="telephone=no" />

Resources