how do i get rid of whitespace in python3 whilst using turtle? - python-3.2

How do you remove whitespace in python3?
I am writing a program that needs to complete various tests. the program is using turtle and basically the user enters various commands and the program needs to run turtle to complete them. one of the tests looks like this:
forward 200
right 90
forward 400
right 90
forward 100
right 90
forward 400
right 90
forward 100
The last 2 lines are whitespace and my program so far can run them but when it gets to the whitespace it crashes. how do I get python to strip this whitespace and where should I put it in my code? thanks

I'm not sure how your instructions are being stored so I'll give you an answer for each of my most likely assumptions.
it's a list
This is the easiest one... use list comprehension to get rid of any instructions made up entirely of whitespace
instruction_list = [instruction.strip() for instruction in instruction_list if instruction.strip()]
It's a big long string with newline characters
instruction_list = instruction_string.split('\n') #make a list
instruction_list = [instruction.strip() for instruction in instruction_list if instruction.strip()] #process list as before
instruction_string = '\n'.join(instruction_list) #reconstruct string
It's a file
for line in instruction_file:
if not line.strip(): #if the line is made up entirely of whitespace
continue #skip to the next loop iteration
process_instruction(line)

Related

Don't match dot in beginning of string

I have one path in form of string like this Folder1/File.png
But in this string sometimes if file is hidden or folder is hidden I don't want it to be matched by my regex.
regex = %r{([a-zA-Z0-9_ -]*)\/[^.]+$}
input_path = "Folder_1/.file" # This shouldn't be matched.
input_path = "Folder/file.png" # This should be matched.
But my regex works for first input but its not even matching second one.
You are currently looking for \/[^.]+$, that is a / followed by any character except . until the end. Since the filename+extension format has a . character, it fails to match the second case.
Instead of using [^.]+$, check only that the character following / is not ., and match everything after that:
([a-zA-Z0-9_ -]*)\/[^.].*$
While there are some suggestions here that work, my suggestion would be
\/[^.][^\/\n]+$
It finds a slash, followed by anything but a dot, which in turn is followed by one, or more, of anything but a slash or a newline.
To handle the two lines given as an example,
Folder_1/.file
Folder/file.png
it takes 8 steps.
The suggested ones all work, but ([a-zA-Z0-9_ -]*)\/[^.] takes 75 steps, ([a-zA-Z0-9_ -]*)\/[^.]+\.[^.]+\z 78 steps and ([a-zA-Z0-9_ -]*)\/[^.].*$ takes 77 steps.
This may be totally irrelevant and I may have missed some angle, but I wanted to mention it ;)
Se it here at regex101.
regex = %r{([a-zA-Z0-9_ -]*)\/[^.]}

Building Latex/Tex arguments in lua

I use lua to make some complex job to prepare arguments for macros in Tex/LaTex.
Part I
Here is a stupid minimal example :
\newcommand{\test}{\luaexec{tex.print("11,12")}}% aim to create 11,12
\def\compare#1,#2.{\ifthenelse{#1<#2}{less}{more}}
\string\compare11,12. : \compare11,12.\\ %answer is less
\string\test : \test\\ % answer is 11,12
\string\compare : \compare\test. % generate an error
The last line creates an error. Obviously, Tex did not detect the "," included in \test.
How can I do so that \test is understood as 11 followed by , followed by 12 and not the string 11,12 and finally used as a correctly formed argument for \compare ?
There are several misunderstandings of how TeX works.
Your \compare macro wants to find something followed by a comma, then something followed by a period. However when you call
\compare\test
no comma is found, so TeX keeps looking for it until finding either the end of file or a \par (or a blank line as well). Note that TeX never expands macros when looking for the arguments to a macro.
You might do
\expandafter\compare\test.
provided that \test immediately expands to tokens in the required format, which however don't, because the expansion of \test is
\luaexec{tex.print("11,12")}
and the comma is hidden by the braces, so it doesn't count. But it wouldn't help nonetheless.
The problem is the same: when you do
\newcommand{\test}{\luaexec{tex.print("11,12")}}
the argument is not expanded. You might use “expanded definition” with \edef, but the problem is that \luaexec is not fully expandable.
If you do
\edef\test{\directlua{tex.sprint("11,12")}}
then
\expandafter\compare\test.
would work.

Lua pattern help (Double parentheses)

I have been coding a program in Lua that automatically formats IRC logs from a roleplay. In the roleplay logs there is a specific guideline for "Out of character" conversation, which we use double parentheses for. For example: ((<Things unrelated to roleplay go here>)). I have been trying to have my program remove text between double brackets (and including both brackets). The code is:
ofile = io.open("Output.txt", "w")
rfile = io.open("Input.txt", "r")
p = rfile:read("*all")
w = string.gsub(p, "%(%(.*?%)%)", "")
ofile:write(w)
The pattern here is > "%(%(.*?%)%)" I've tried multiple variations of the pattern. All resulted in fruitless results:
1. %(%(.*?%)%) --Wouldn't do anything.
2. %(%(.*%)%) --Would remove *everything* after the first OOC message.
Then, my friend told me that prepending the brackets with percentages wouldn't work, and that I had to use backslashes to 'escape' the parentheses.
3. \(\(.*\)\) --resulted in the output file being completely empty.
4. (\(\(.*\)\)) --Same result as above.
5. (\(\(.*?\)\) --would for some reason, remove large parts of the text for no apparent reason.
6. \(\(.*?\)\) --would just remove all the text except for the last line.
The short, absolute question:
What pattern would I need to use to remove all text between double parentheses, and remove the double parentheses themselves too?
You're friend is thinking of regular expressions. Lua patterns are similar, but different. % is the correct escape character.
Your pattern should be %(%(.-%)%). The - is similar to * in that it matches any number of the preceding sequence, but while * tries to match as many characters as it can (it's greedy), - matches the least amount of characters possible (it's non-greedy). It won't go overboard and match extra double-close-parenthesis.

Easiest way to remove Latex tag (but not its content)?

I am using TeXnicCenter to edit a LaTeX document.
I now want to remove a certain tag (say, emph{blabla}} which occurs multiple times in my document , but not tag's content (so in this example, I want to remove all emphasization).
What is the easiest way to do so?
May also be using another program easily available on Windows 7.
Edit: In response to regex suggestions, it is important that it can deal with nested tags.
Edit 2: I really want to remove the tag from the text file, not just disable it.
Using a regular expression do something like s/\\emph\{([^\}]*)\}/\1/g. If you are not familiar with regular expressions this says:
s -- replace
/ -- begin match section
\\emph\{ -- match \emph{
( -- begin capture
[^\}]* -- match any characters except (meaning up until) a close brace because:
[] a group of characters
^ means not or "everything except"
\} -- the close brace
and * means 0 or more times
) -- end capture, because this is the first (in this case only) capture, it is number 1
\} -- match end brace
/ -- begin replace section
\1 -- replace with captured section number 1
/ -- end regular expression, begin extra flags
g -- global flag, meaning do this every time the match is found not just the first time
This is with Perl syntax, as that is what I am familiar with. The following perl "one-liners" will accomplish two tasks
perl -pe 's/\\emph\{([^\}]*)\}/\1/g' filename will "test" printing the file to the command line
perl -pi -e 's/\\emph\{([^\}]*)\}/\1/g' filename will change the file in place.
Similar commands may be available in your editor, but if not this will (should) work.
Crowley should have added this as an answer, but I will do that for him, if you replace all \emph{ with { you should be able to do this without disturbing the other content. It will still be in braces, but unless you have done some odd stuff it shouldn't matter.
The regex would be a simple s/\\emph\{/\{/g but the search and replace in your editor will do that one too.
Edit: Sorry, used the wrong brace in the regex, fixed now.
\renewcommand{\emph}[1]{#1}
any reasonably advanced editor should let you do a search/replace using regular expressions, replacing emph{bla} by bla etc.

Reading EDI Formatted Files

I'm new to EDI, and I have a question.
I have read that you can get most of what you need about an EDI format by looking at the last 3 characters of the ISA line. This is fine if every EDI used line breaks to separate entities, but I have found that many are single line files with any number of characters used as breaks. I have noticed that the VERY last character in every EDI I've parsed is the break character. I've looked at a few hundred, and have found no exceptions to this. If I first grab that character, and use that to obtain the last 3 of the ISA line, should I reasonably expect that I will be able to parse data from an EDI?
I don't know if this helps, but the EDI 'types' in question tend to be 850, 875. I'm not sure if that is a standard or not, but it may be worth mentioning.
the transaction type of edi doesn't really matter (850 = order, 875 = grocery po). having written a few edi parsers, here are a few things i've found:
you should be able to count on the ISA (and the ISA only) being fixed width (105 characters if memory serves).
strip off the first 105 characters. everything after that and before the first occurance of "GS" is your line terminator (this can be anything, include a 0x07 - the beep - so watch out if you're outputting to stdout for debugging or you may have a bunch of beeps coming out of the speaker). normally this is 1 or 2 characters, sometimes it can be more (if the person sending you the data adds an extra terminator for some reason). once you have the line terminator, you can get the segment (field) delimiter. i normally pull the 3 character of the GS line and use that, though the 4th character of the ISA line should work as well.
also be aware that you can get a file with multiple ISA's in it. in that case you cannot count on the line or field separators being the same within each ISA.
another thing .. it is also possible (again, not sure if its spec) for an edi file to have a variable length ISA. this is very rare, but i had to accommodate it. if that happens you have to parse the line into its fields. the last field in the ISA is only a character long, so you can determine the real length of the ISA from it. if it were me, i wouldn't worry about this unless you see a file like it. it is a rare occurance.
what i've said above may not be to the letter of the "spec" ... that is, i'm not sure its legal to have different line separators in the same file, but in different ISAs, but it is technically possible and I accommodate it because i have to process files that come through in that manner. the edi processor i use processes upwards of 5000 files a day with over 3000 possible sources of data (so i see a lot of weird stuff).
best regards,
don
EDI content is composed of segments and elements.
To parse it, you will need to break it up into segments first, and then elements like so (in PHP):
<?php
$edi = "YOUR EDIT STRING!";
$segment_delimeter = "~";
$element_delimeter = "*";
//First break it into segments
$segments = explode($segment_delimiter, $edi);
//Now break each segment into elements
$segs_and_elems = array();
foreach($segments as $segment){
$segs_and_elems[] = explode(element_delimeter, $segment);
}
//To echo out what type of EDI this is for example:
foreach($segs_and_elems as $seg){
if($seg[0] == "GS"){ echo($seg[1]); }
}
?>
Hope this helps get you started.
For header information the following java will let you get the basic info pretty easy.
C# has the split as well and the code looks very similar
try {
String sCurrentLine;
fileContent = new BufferedReader(new FileReader(filePathName));
sCurrentLine = fileContent.readLine();
// get the delimiter after ISA, if you know your field delimiter just force it.
// we look at lots of different senders messages so never sure what it will be.
delimiterElement = sCurrentLine.substring(3,1); // Grab the delimiter they are using
String[] splitMessage = sCurrentLine.split(delimiterElement,16); // to get the messages if everything is on one line of course
senderQualifier = splitMessage[5]; //who sent something we need fixed qualifier
senderID = splitMessage[6]; //who sent something we need fixed alias
ISA = splitMessage[13]; // Control number
testIndicator = splitMessage[15];
dateStamp = splitMessage[9];
timeStamp = splitMessage[10];
... do stuff with the pieces of info ...

Resources