Format text with indentation - ruby-on-rails

I am writing multi-lines in the file.
Number Name
Jack
The first line is a header.
I am going to write lines like the above format.
If the number is null, I am writing a line like this.
file.print "Jack".indentation("Number".length + 2)
There are 2 spaces between Number and Name in the header.
But it is showing like this since the length 6 spaces are not the same as "Number".
Number Name
Jack
Is there any solution to format better?

Related

How to make a variable non delimited file to be a delimited one

Hello guys I want to convert my non delimited file into a delimited file
Example of the file is as follows.
Name. CIF Address line 1 State Phn Address line 2 Country Billing Address line 3
Alex. 44A. Biston NJ 25478163 4th,floor XY USA 55/2018 kenning
And so on all the data are in this format.
First three lines are metadata and then the data.
How can I make it delimited in proper format using logic.
There are two parts in the problem:
how to find the column widths
how to split each line into fields and output a new line with delimiters
I could not propose an automated solution for the first one, because (not knowing anything about the metadata format), there is no clear way to find where one column ends and the next one begins. Some of the column headings contain multiple space-separated words and space is also used as a separator between the headings (and apparently one cannot use the rule "more than one space means the end of a heading name" because there's only one space between "Address line 2" and "Country" - and they're clearly separate columns. Clearly, finding the correct column widths requires understanding English and this is not something that you can write a program for.
For the second problem, things are much easier - once you have the column positions. If you figure the column positions manually (or programmatically, if you know something about the metadata that I don't - and you have a simple method for finding what's a column heading), then a program written in AWK can do this, for example:
cols="8,15,32,40,53,66,83,105"
awk_prog='BEGIN {
nt=split(cols,tabs,",")
delim=","
ORS=""
}
{ o=1 ;
for (i in tabs) { t=tabs[i] ; f=substr($0,o,t-o); sub(" *$","",f) ; print f
delim ; o=t } ;
print substr($0, o) "\n"
}'
awk -v cols="$cols" "$awk_prog" input_file
NOTE that the above program does not deal correctly with the case when the separator character (e.g. ",") appears inside the data. If you decide to use this as-is, be sure to use a separator that is not present in the input data. It may be better to modify the code to escape any separator characters found in the input data (there are different ways to do this - depends on what you plan to feed the output file to).

Regex for clearly single line words with wildcards in Swift

I'm attempting to construct a regex string in Swift 4 that gets characters at the start of a line where some are known and others aren't.
Let's say I've got a text file with line breaks for each word that reads as follows:
pucker
tuckered
duckerdinger
sucker punch
I'd like to get every word that contains "cker" in it that's 1 to 8 characters long.
I'm attempting to use this statement ^..cker..{1,8} as my RegEx string. All I'm getting is a partial match in Patterns (a Mac App), but Regex101.com's saying no match, and most importantly, Xcode says I'm using an invalid regex. I've also tried ^(..cker..) and a bazillion other variations.
What am I screwing up and how do I fix it? What I'm trying to do seems like it would be super simple, but I've wasted more time than I care to admit fiddling with it.
Update:
This has been the best I've been able to get so far...
"\\b..cker..", but I'm only able to get words that are exactly 8 characters long. I'd like to capture words that contain "cker" that are the 3rd, 4th, 5th, and 6th letters while capturing words up to 8 characters long.
Try this regex:
\b(?=.*cker)[a-zA-Z]{1,8}\b
Click for Demo
Explanation:
\b - matches a word boundary
(?=.*cker) - Positive Lookahead to make sure our string should contain the character sequence cker
[a-zA-Z]{1,8} - Matches 1 to 8 occurrences of a letter
\b - matches a word boundary

how to tokenize/parse/search&replace document by font AND font style in LibreOffice Writer?

I need to update a bilingual dictionary written in Writer by first parsing all entries into their parts e.g.
main word (font 1, bold)
foreign equivalent transliterated (font 1, italic)
foreign equivalent (font 2, bold)
part of speech (font 1, italic)
Each line of the document is the main word followed by the parts listed above, each separated by a space or punctuation.
I need to automate the process of walking through the whole file, line by line, and place a delimiter between each part, ignoring spaces and punctuation, so I can mass import it into a Calc file. In other words, "each part" is a sequence of character (ignoring spaces and punctuation) that have the same font AND font-style.
I have tried the standard Search&Replace feature, and AltSearch extension, but neither are able to complete the task. The main problem is I am not able to write a search query that says:
Find: consecutive characters with the same font AND font_style, ignore spaces and punctuation
Replace: term found above + "delimiter"
Any suggestions how I can write a script for this, or if an existing tool can solve the problem?
Thanks!
Pseudo code for desired effect:
var delimiter = "|"
Go to beginning of document
While not end of document do:
var $currLine = get line from doc
var $currChar = get next character which is not space or punctuation;
var $font = currChar.font
var $font_style - currChar.font_style (e.g. bold, italic, normal)
While not end of line do:
$currChar = next character which is not space or punctuation;
if (currChar.font != $font || currChar.font_style != $font_style) { // font or style has changed
print $delimiter
$font = currChar.font
$font_style - currChar.font_style (e.g. bold, italic, normal)
}
end While
end While
Here are tips for each of the things your pseudocode does.
First, the easiest way to move line by line is with the TextViewCursor, although it is slow. Notice the XLineCursor section. For the while loop, oVC.goDown() will return false when the end of the document is reached. (oVC is our variable for the TextViewCursor).
Get each character by calling oVC.goRight(0, False) to deselect followed by oVC.goRight(1, True) to select. Then the selected value is obtained by oVC.getString(). To ignore space and punctuation, perhaps use python's isalnum() or the re module.
To determine the font of the character, call oVC.getPropertyValue(attr). Values for attr could simply be CharAutoStyleName and CharStyleName to check for any changes in formatting.
Or grab a list of specific properties such as 'CharFontFamily', 'CharFontFamilyAsian', 'CharFontFamilyComplex', 'CharFontPitch', 'CharFontPitchAsian' etc. Character properties are described at https://wiki.openoffice.org/wiki/Documentation/DevGuide/Text/Formatting.
To insert the delimiter into the text: oVC.getText().insertString(oVC, "|", 0).
This python code from github shows how to do most of these things, although you'll need to read through it to find the relevant parts.
Alternatively, instead of using the LibreOffice API, unzip the .odt file and parse content.xml with a script.

COBOL accurate LENGTH OF (XML-TEXT) when ampersands escaped?

I'm using the intrinsic function XML-PARSE with XML that looks like this:
<MSGBODYTXT>
<LN>One & two</LN>
</MSGBODYTXT>
By my count, the following string is 13 bytes long.
"One & two"
But when I take $
LENGTH OF(XML-TEXT) $
I get only 9 bytes.
What can I do to get the correct, 13-byte length?
The problem is that XML PARSE translates & into the character it represents, an ampersand. If
you look at the CONTENT-CHARACTERS associated with the <LN> tag you will see: One & two which
is 9 characters long, just as the LENGTH OF operator on XML-TEXT says it is.
Note that if you were to use XML GENERATE on data item LN having the value One & two
it will generate as <LN>One & two</LN> which is a symetrical operation.

RAILS 3 CSV "Illegal quoting" is a lie

I've hit a problem during parsing of a CSV file where I get the following error:
CSV::MalformedCSVError: Illegal quoting on line 3.
RAILS code in question:
csv = CSV.read(args.local_file_path, col_sep: "\t", headers: true)
Line 3 in the CSV file is:
A-067067 VO VIA CE 0 8 8 SWCH Ter 4, Loc Is Here, Mne, Per Fl Auia/Sey IMAC NEK_HW 2011-03-09 09:47:44 2011-03-09 11:50:26 2011-01-13 10:49:17 2011-02-14 14:02:43 2011-02-14 14:02:44 0 0 771 771 46273 "[O/H 15/02] B270 W31 ""TEXT TEXT 2 X TEXT SWITC" SOME_TEXT SOME_TEXT N/A Name Here RESOLVED_CLOSED RESOLVED_CLOSED
UPDATE: Tabs don't appear to have come across above. See pastebin RAW TEXT: http://pastebin.com/4gj7iUpP
I've read numerous threads all over StackOverflow and Google about why this is and I understand that. But the CSV row above has perfectly legal quoting does it not?
The CSV is tab delimited and there is only a tab followed by the quote on either side of the column in question. There is 1 quote in that field and it is double quoted to escape it. So what gives? I can't work it out. :(
Assuming I've got something wrong here, I'd like the solution to include a way to work around the issue as I don't have control over how the CSV is constructed.
This part of your CSV is at fault:
46273 "[O/H 15/02] B270 W31 ""TEXT TEXT 2 X TEXT SWITC" SOME_TEXT
At least one of these parts has a stray space:
46273 "
" SOME_TEXT
I'd guess that the "3" and the double are supposed to be separated by one or more tabs but there is a space before the quote. Or, there is a space after the quote on the other end when there are only supposed to be tabs between the closing quote and the "S".
CSV escapes double quotes by double them so this:
"[O/H 15/02] B270 W31 ""TEXT TEXT 2 X TEXT SWITC"
is supposed to be a single filed that contains an embedded quote:
[O/H 15/02] B270 W31 "TEXT TEXT 2 X TEXT SWITC
If you have a space before the first quote or after the last quote then, since your fields are tab delimited, you have an unescaped double quote inside a field and that's where your "illegal quoting" error comes from.
Try sending your CSV file through cat -t (which should represent tabs as ^I) to find where the stray space is.

Resources