How do I import words from a text file? - python-import

h
words_to_guess= (which code should I enter here in order to import words from a text file )
my text file is names word.txt so i tried
words_to_guess =import words.txt

You can assign the content of a .txt file to a variable using this code:
words_to_guess = open("words.txt")
Just make sure that the file "words.txt" is in the same directory as the .py file (or if you're compiling it to a .exe, the same directory as the .exe file)
I would also like to point out that based on the screenshot you provided, it looks like you're trying to get a random word from the .txt file. Once you've done the above code, I would recommend adding this code below it as well:
words_to_guess = words_to_guess.split()
This will take the content of "words_to_guess" and split every word into a list that can be further accessed. You can then call:
word = random.choice(words_to_guess)
And it will select a random element from the array into the "word" variable, hence providing you a random word from the .txt file.
Just note that in the split() function, a word is determined by the spaces in between, so if you have a word like "Halloween Pumpkin" or "American Flag", the split() function would make each individual word an element, so it would be transferred into ["Halloween", "Pumpkin"] or ["American", "Flag"].
That's all!

Related

jsPDF doesn't show certain letters properly (ū)

I am trying to create a PDF File to export using jsPDF library. In one of the lines I am trying to write a word that contains this 'ū' letter,
doc.text('Hūla', 20, 30);
However, when doing so the exported file doesn't contain this letter but instead it becomes
'H k l a' with spaces in between and a k instead of the ū.
What can I do in order to have this printed properly?
The solution was to use a font that supports this. I had to try multiple ones in order to get it working with this letter ū (it is not a specific language)
The font was Amiri. It also supported the Arabic font.

How to make a variable non delimited file to be a delimited one

Hello guys I want to convert my non delimited file into a delimited file
Example of the file is as follows.
Name. CIF Address line 1 State Phn Address line 2 Country Billing Address line 3
Alex. 44A. Biston NJ 25478163 4th,floor XY USA 55/2018 kenning
And so on all the data are in this format.
First three lines are metadata and then the data.
How can I make it delimited in proper format using logic.
There are two parts in the problem:
how to find the column widths
how to split each line into fields and output a new line with delimiters
I could not propose an automated solution for the first one, because (not knowing anything about the metadata format), there is no clear way to find where one column ends and the next one begins. Some of the column headings contain multiple space-separated words and space is also used as a separator between the headings (and apparently one cannot use the rule "more than one space means the end of a heading name" because there's only one space between "Address line 2" and "Country" - and they're clearly separate columns. Clearly, finding the correct column widths requires understanding English and this is not something that you can write a program for.
For the second problem, things are much easier - once you have the column positions. If you figure the column positions manually (or programmatically, if you know something about the metadata that I don't - and you have a simple method for finding what's a column heading), then a program written in AWK can do this, for example:
cols="8,15,32,40,53,66,83,105"
awk_prog='BEGIN {
nt=split(cols,tabs,",")
delim=","
ORS=""
}
{ o=1 ;
for (i in tabs) { t=tabs[i] ; f=substr($0,o,t-o); sub(" *$","",f) ; print f
delim ; o=t } ;
print substr($0, o) "\n"
}'
awk -v cols="$cols" "$awk_prog" input_file
NOTE that the above program does not deal correctly with the case when the separator character (e.g. ",") appears inside the data. If you decide to use this as-is, be sure to use a separator that is not present in the input data. It may be better to modify the code to escape any separator characters found in the input data (there are different ways to do this - depends on what you plan to feed the output file to).

how to tokenize/parse/search&replace document by font AND font style in LibreOffice Writer?

I need to update a bilingual dictionary written in Writer by first parsing all entries into their parts e.g.
main word (font 1, bold)
foreign equivalent transliterated (font 1, italic)
foreign equivalent (font 2, bold)
part of speech (font 1, italic)
Each line of the document is the main word followed by the parts listed above, each separated by a space or punctuation.
I need to automate the process of walking through the whole file, line by line, and place a delimiter between each part, ignoring spaces and punctuation, so I can mass import it into a Calc file. In other words, "each part" is a sequence of character (ignoring spaces and punctuation) that have the same font AND font-style.
I have tried the standard Search&Replace feature, and AltSearch extension, but neither are able to complete the task. The main problem is I am not able to write a search query that says:
Find: consecutive characters with the same font AND font_style, ignore spaces and punctuation
Replace: term found above + "delimiter"
Any suggestions how I can write a script for this, or if an existing tool can solve the problem?
Thanks!
Pseudo code for desired effect:
var delimiter = "|"
Go to beginning of document
While not end of document do:
var $currLine = get line from doc
var $currChar = get next character which is not space or punctuation;
var $font = currChar.font
var $font_style - currChar.font_style (e.g. bold, italic, normal)
While not end of line do:
$currChar = next character which is not space or punctuation;
if (currChar.font != $font || currChar.font_style != $font_style) { // font or style has changed
print $delimiter
$font = currChar.font
$font_style - currChar.font_style (e.g. bold, italic, normal)
}
end While
end While
Here are tips for each of the things your pseudocode does.
First, the easiest way to move line by line is with the TextViewCursor, although it is slow. Notice the XLineCursor section. For the while loop, oVC.goDown() will return false when the end of the document is reached. (oVC is our variable for the TextViewCursor).
Get each character by calling oVC.goRight(0, False) to deselect followed by oVC.goRight(1, True) to select. Then the selected value is obtained by oVC.getString(). To ignore space and punctuation, perhaps use python's isalnum() or the re module.
To determine the font of the character, call oVC.getPropertyValue(attr). Values for attr could simply be CharAutoStyleName and CharStyleName to check for any changes in formatting.
Or grab a list of specific properties such as 'CharFontFamily', 'CharFontFamilyAsian', 'CharFontFamilyComplex', 'CharFontPitch', 'CharFontPitchAsian' etc. Character properties are described at https://wiki.openoffice.org/wiki/Documentation/DevGuide/Text/Formatting.
To insert the delimiter into the text: oVC.getText().insertString(oVC, "|", 0).
This python code from github shows how to do most of these things, although you'll need to read through it to find the relevant parts.
Alternatively, instead of using the LibreOffice API, unzip the .odt file and parse content.xml with a script.

Invalid '`' error when using local macro

I am following instructions from this link on how to append Stata files via a foreach loop. I think that it's pretty straightforward.
However, when I try to refer to each f in datafiles in my foreach loop, I receive the error:
invalid `
I've set my working directory and the data is in a subfolder called csvfiles. I am trying to call each file f in the csvfiles subfolder using my local macro datafiles and then append each file to an aggregate Stata dataset called data.dta.
I've included the code from my do file below:
clear
local datafiles: dir "csvfiles" files "*.csv"
foreach f of local datafiles {
preserve
insheet using “csvfiles\`f'”, clear
** add syntax here to run on each file**
save temp, replace
restore
append using temp
}
rm temp
save data.dta, replace
The backslash character has meaning to Stata: it will prevent the interpretation of any following character that has a special meaning to Stata, in particular the left single quote character
`
will not be interpreted as indicating a reference to a macro.
But all is not lost: Stata will allow you to use the forward slash character in path names on any operating system, and on Windows will take care of doing what must be done to appease Windows. Replacing your insheet command with
insheet using “csvfiles/`f'”, clear
should solve your problem.
Note that the instructions you linked to do exactly that; some of the code includes backslashes in path names, but where a macro is included, forward slashes are used instead.

Parse a Word Document By Font?

I'm currently trying to write a script which would run through a word document and output to a text file all the lines that are written in a certain font.
So if I had the document:
"This is the first line of the document.
This is the second line of the document.
This is the third line of the document."
And say normal lines are Times New Roman, bold is Arial, and italics is Sans Serif.
Then, ideally, I could parse the document for all lines in Arial and the text file output would have the line:
This is the second line of the document.
Any idea on how to do this from a script? I was thinking about first converting the doc into xml, but I do not think this is possible within a script.
You'll want to use the FIND object, and the FONT property of the FIND object.
So, something like this:
Public Sub FindTest()
Dim r As Range
Set r = ActiveDocument.Content
With r.Find
.ClearFormatting
.Style = "SomeStyleName"
Do While .Execute(Forward:=True, Format:=True) = True
'---- we found a range
Dim duperange As Range
Set duperange = r.Duplicate
Debug.Print r.Text
Loop
End With
End Sub
Note that where I've specified Style, you could specify font formatting via the FIND.FONT object, or various other formatting options. Just browse around the FIND object to see what's available.

Resources