FastqGeneralIterator Output - biopython

I'm using FastqGeneralIterator, but I find that it removes the # from the 1st line of a fastq file and also the information for the 3rd line (it removes the entire 3rd line).
I added the # in the 1st line in the following way:
for line in open("prova_FiltraN_CE_filt.fastq"):
fout.write(line.replace('SEQ', '#SEQ'))
I want to add also the 3rd line, that starts with + and there is nothing after that. For example:
#SEQILMN0
TCATCGTA....
+
#<BBBFFF.....
Can someone help me?

you can use, String Formatting Operations %
from Bio.SeqIO.QualityIO import FastqGeneralIterator
with open("prova_FiltraN_CE_filt.fastq", "rU") as handle:
for (title, sequence, quality) in FastqGeneralIterator(handle):
print("#%s\n%s\n+\n%s" % (title, sequence, quality))
you get fastq print format, using FastqGeneralIterator
#SEQILMN0
TCATCGTA....
+
#<BBBFFF....

Related

Google-spreadsheets automatically cancels the code's indentation, how to recover?

I use google-spreadsheets,I want to copy python code to google-spreadsheets,but I find google-spreadsheets automatically cancels the code's indentation.
My code:
import pandas as pd
import csv
rs = pd.read_csv(r'D:/Clustering_TOP.csv',encoding='utf-8')
with open('D:/Clustering_TOP.csv','r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
csv_title = rows[0]
csv_title = csv_title[1:]
len_csv_title = len(csv_title)
for i in range(len_csv_title):
for j in range(i+1):
print(str(csv_title[j])+'_'+str(csv_title[i]) + " = " + str(rs[csv_title[i]].corr(rs[csv_title[j]])), end='\t')
print()
When I paste the code to google-docs,the code turns into this:
And there is no indentation in the paste option.
How can I keep the indentation of the code?
How about this workaround? In your title, we can see how to recover?. Do you want to add the indentation to the script without indentation which had already been pasted? Or when you paste the script, do you want to keep the indentation? If in the case of later, I think that there are 2 patterns. If the situation is the later, please choose one of 2 patterns. I think that there are several workarounds. So please think of this as one of them.
Pattern 1
Paste the script with line by line.
Pattern 2
Paste the script in the cell "A1".
Paste this formula in the cell "A2".
=TRANSPOSE(SPLIT(A1, char(10)))
You can retrieve only the script with the indentation by copying values and pasting as values only.
If this was not what you want, I'm sorry.

how to tokenize/parse/search&replace document by font AND font style in LibreOffice Writer?

I need to update a bilingual dictionary written in Writer by first parsing all entries into their parts e.g.
main word (font 1, bold)
foreign equivalent transliterated (font 1, italic)
foreign equivalent (font 2, bold)
part of speech (font 1, italic)
Each line of the document is the main word followed by the parts listed above, each separated by a space or punctuation.
I need to automate the process of walking through the whole file, line by line, and place a delimiter between each part, ignoring spaces and punctuation, so I can mass import it into a Calc file. In other words, "each part" is a sequence of character (ignoring spaces and punctuation) that have the same font AND font-style.
I have tried the standard Search&Replace feature, and AltSearch extension, but neither are able to complete the task. The main problem is I am not able to write a search query that says:
Find: consecutive characters with the same font AND font_style, ignore spaces and punctuation
Replace: term found above + "delimiter"
Any suggestions how I can write a script for this, or if an existing tool can solve the problem?
Thanks!
Pseudo code for desired effect:
var delimiter = "|"
Go to beginning of document
While not end of document do:
var $currLine = get line from doc
var $currChar = get next character which is not space or punctuation;
var $font = currChar.font
var $font_style - currChar.font_style (e.g. bold, italic, normal)
While not end of line do:
$currChar = next character which is not space or punctuation;
if (currChar.font != $font || currChar.font_style != $font_style) { // font or style has changed
print $delimiter
$font = currChar.font
$font_style - currChar.font_style (e.g. bold, italic, normal)
}
end While
end While
Here are tips for each of the things your pseudocode does.
First, the easiest way to move line by line is with the TextViewCursor, although it is slow. Notice the XLineCursor section. For the while loop, oVC.goDown() will return false when the end of the document is reached. (oVC is our variable for the TextViewCursor).
Get each character by calling oVC.goRight(0, False) to deselect followed by oVC.goRight(1, True) to select. Then the selected value is obtained by oVC.getString(). To ignore space and punctuation, perhaps use python's isalnum() or the re module.
To determine the font of the character, call oVC.getPropertyValue(attr). Values for attr could simply be CharAutoStyleName and CharStyleName to check for any changes in formatting.
Or grab a list of specific properties such as 'CharFontFamily', 'CharFontFamilyAsian', 'CharFontFamilyComplex', 'CharFontPitch', 'CharFontPitchAsian' etc. Character properties are described at https://wiki.openoffice.org/wiki/Documentation/DevGuide/Text/Formatting.
To insert the delimiter into the text: oVC.getText().insertString(oVC, "|", 0).
This python code from github shows how to do most of these things, although you'll need to read through it to find the relevant parts.
Alternatively, instead of using the LibreOffice API, unzip the .odt file and parse content.xml with a script.

Copy a table from iPython notebook into Word?

I want to copy a table from iPython notebook into a Word doc. I'm using Word for Mac 2011. The table is a standard pandas output and looks like this:
If I use Apple+C to copy the table, and then paste it into a Word doc, I get this:
Surely there must be an easier way?
Creating a table with the same number of rows/columns in Word and then trying to paste the cells there doesn't work either.
I guess I could screenshot the table, but I'd like to include the raw data in the document if possible.
The problem in this case (from the Word perspective) is not the table layout - it's the paragraph layout. Each paragraph has a substantial indent on right and left, and more space before/after than you would normally want.
I don't think any of the Paste options (e.g. Paste Special) in Word is going to help, unless you paste as unformatted text, then select the text, convert to a table, then proceed from there.
But, even a simple Word VBA macro such as this one will leave you with something a bit more manageable. (Select a table you copied in, then run the macro). A little bit more work on the code would probably allow you to get most of the formatting you want, most of the time.
Sub fixupSelectedTable()
With Selection.Tables(1).Range.ParagraphFormat
.LeftIndent = 0
.RightIndent = 0
.SpaceBefore = 0
.SpaceAfter = 0
.LineSpacingRule = wdLineSpaceSingle
End With
End Sub
If you are more familiar with Applescript, the equivalent looks something like this:
-- you may need to fix up the application name
-- (I use this to ensure that the script uses the Open Word 2011 doc
-- and does not try to start Word for Mac 15 (2016))
tell application "/Applications/Microsoft Office 2011/Microsoft Word.app"
tell the paragraph format of the text object of table 1 of the text object of the selection
set paragraph format left indent to 0
set paragraph format right indent to 0
set space before to 0
set space after to 0
set line spacing rule to line space single
end tell
end tell

NotePad++ Changing Few Number Entries

Here is a simple lists where I like to change the numbers: the entries are as below and it got over 300 entries like it
tom112
smith113
harry114
linda115
cindy106
samantha147
It need to be changed to
tom212
smith213
harry214
...and so on.
Please assist using notepad++ regular expression.
Thanks.
Assuming it's just a matter of replacing a name followed by a number starting with 1 with the same number but starting with 2 instead:
Ctrl + H for search & replace.
Check Regular expression under Search Mode.
Next to Find what type or copy in ([a-zA-Z]+)1([0-9]+).
Next to Replace with type or copy in \12\2.
Click Replace All and that should do it.
Add any other characters that might appear in the name before the number inside the first set of brackets with a-zA-Z.

GNU-M4: Strip empty lines

How can I strip empty lines (surplus empy lines) from an input file using M4?
I know I can append dnl to the end of each line of my script to suppress the newline output, but the blank lines I mean are not in my script, but in a data file that is included (where I am not supposed to put dnl's).
I tried something like that:
define(`
',`')
(replace a new-line by nothing)
But it didn't work.
Thanks.
I use divert() around my definitions :
divert(-1) will suppress the output
divert(0) will restore the output
Eg:
divert(-1)dnl output supressed starting here
define(..)
define(..)
divert(0)dnl normal output starting here
use_my_definitions()...
I understand your problem to be a data file with extra line breaks, meaning that where you want to have the pattern data<NL>moredata you have things like data<NL><NL>moredata.
Here's a sample to cut/paste onto your command line that uses here documents to generate a data set and runs an m4 script to remove the breaks in the data set. You can see the patsubst command replaces every instance of one or more newlines in sequence (<NL><NL>*) with exactly one newline.
cat > data << -----
1, 2
3, 4
5, 6
7, 8
9, 10
11, 12
e
-----
m4 << "-----"
define(`rmbreaks', `patsubst(`$*', `
*', `
')')dnl
rmbreaks(include(data))dnl
-----

Resources