I am writing a csv reader to generate Genbank files to capture annotations with sequence.
First I used a Bio.SeqRecord and got correctly formatted output but the SeqRecord class lacks fields that I need.
Blockquote
FEATURES Location/Qualifiers
HCDR1 27..35
HCDR2 50..66
HCDR3 99..109
I switched to Bio.GenBank.Record and have the needed fields except now the annotation formatting is wrong. It can't have the extra "type:" "location:" and "qualifiers:" text and the information should all be on one line.
Blockquote
FEATURES Location/Qualifiers
type: HCDR1
location: [26:35]
qualifiers:
type: HCDR2
location: [49:66]
qualifiers:
type: HCDR3
location: [98:109]
qualifiers:
The code for pulling annotations is the same for both versions. Only the class changed.
# Read csv entries and create a container with the data
container = Record()
container.locus = row['Sample']
container.size = len(row['Seq'])
container.residue_type="PROTEIN"
container.data_file_division="PRI"
container.date = (datetime.date.today().strftime("%d-%b-%Y")) # today's date
container.definition = row['FullCloneName']
container.accession = [row['Vgene'],row['HCDR3']]
container.version = getpass.getuser()
container.keywords = [row['ProjectName']]
container.source = "test"
container.organism = "Homo Sapiens"
container.sequence = row['Seq']
annotations = []
CDRS = ["HCDR1", "HCDR2", "HCDR3"]
for CDR in CDRS:
start = row['Seq'].find(row[CDR])
end = start + len(row[CDR])
feature = SeqFeature(FeatureLocation(start=start, end=end), type=CDR)
container.features.append(feature)
I have looked at the source code for Bio.Genbank.Record but can't figure out why the SeqFeature class has different formatting output compared to Bio.SeqRecord.
Is there an elegant fix or do I write a separate tool to reformat the annotations in the Genbank file?
After reading the source code again, I discovered Bio.Genbank.Record has its own Features method that takes key and location as strings. These are formatted correctly in the output Genbank file.
CDRS = ["HCDR1", "HCDR2", "HCDR3"]
for CDR in CDRS:
start = row['Seq'].find(row[CDR])
end = start + len(row[CDR])
feature = Feature()
feature.key = "{}".format(CDR)
feature.location = "{}..{}".format(start, end)
container.features.append(feature)
Related
At the moment, I am doing both:
pos = nx.spring_layout(G)
f1 = plt.figure(figsize=(18,10))
default_axes = plt.axes(frameon=True)
nx.draw_networkx(G, node_size=600, alpha=0.8, ax=default_axes, pos=pos)
edge_labels = nx.get_edge_attributes(G, "weight")
nx.draw_networkx_edge_labels(G, pos=pos, edge_labels=edge_labels)
plt.savefig('graph.jpg')
I would like to be able to select if to display, save or both (what I am doing now)
There is no built-in option for that in networkx. One option is to wrap the code in a function along these lines:
def custom(G, plot=True, save_file=False):
'''plots G by default. save_file should be a string'''
pos = nx.spring_layout(G)
f1 = plt.figure(figsize=(18,10))
default_axes = plt.axes(frameon=True)
nx.draw_networkx(G, node_size=600, alpha=0.8, ax=default_axes, pos=pos)
edge_labels = nx.get_edge_attributes(G, "weight")
nx.draw_networkx_edge_labels(G, pos=pos, edge_labels=edge_labels)
if save: plt.savefig(save) # can allow custom save name
if plot: plt.show()
return
Note that if the figure displays regardless of the option passed to the command, then the inline option might need to be disabled.
I'm getting same error as this question, but with XQuery:
SaxonApiException: The context item for axis step ./CLIENT is absent
When running from the command line, all is good. So I don't think there is a syntax problem with the XQuery itself. I won't post the input file unless needed.
The XQuery is displayed with a Console.WriteLine before the error appears:
----- Start: XQUERY:
(: FLWOR = For Let Where Order-by Return :)
<MyFlightLegs>
{
for $flightLeg in //FlightLeg
where $flightLeg/DepartureAirport = 'OKC' or $flightLeg/ArrivalAirport = 'OKC'
order by $flightLeg/ArrivalDate[1] descending
return $flightLeg
}
</MyFlightLegs>
----- End : XQUERY:
Error evaluating (<MyFlightLegs {for $flightLeg in root/descendant::FlightLeg[DepartureAirport = "OKC" or ArrivalAirport = "OKC"] ... return $flightLeg}/>) on line 4 column 20
XPDY0002: The context item for axis step root/descendant::FlightLeg is absent
I think that like the other question, maybe my input XML file is not properly specified.
I took the samples/cs/ExamplesHE.cs run method of the XQuerytoStream class.
Code there for easy reference is:
public class XQueryToStream : Example
{
public override string testName
{
get { return "XQueryToStream"; }
}
public override void run(Uri samplesDir)
{
Processor processor = new Processor();
XQueryCompiler compiler = processor.NewXQueryCompiler();
compiler.BaseUri = samplesDir.ToString();
compiler.DeclareNamespace("saxon", "http://saxon.sf.net/");
XQueryExecutable exp = compiler.Compile("<saxon:example>{static-base-uri()}</saxon:example>");
XQueryEvaluator eval = exp.Load();
Serializer qout = processor.NewSerializer();
qout.SetOutputProperty(Serializer.METHOD, "xml");
qout.SetOutputProperty(Serializer.INDENT, "yes");
qout.SetOutputStream(new FileStream("testoutput.xml", FileMode.Create, FileAccess.Write));
Console.WriteLine("Output written to testoutput.xml");
eval.Run(qout);
}
}
I changed to pass the Xquery file name, the xml file name, and the output file name, and tried to make a static method out of it. (Had success doing the same with the XSLT processor.)
static void DemoXQuery(string xmlInputFilename, string xqueryInputFilename, string outFilename)
{
// Create a Processor instance.
Processor processor = new Processor();
// Load the source document
DocumentBuilder loader = processor.NewDocumentBuilder();
loader.BaseUri = new Uri(xmlInputFilename);
XdmNode indoc = loader.Build(loader.BaseUri);
XQueryCompiler compiler = processor.NewXQueryCompiler();
//BaseUri is inconsistent with Transform= Processor?
//compiler.BaseUri = new Uri(xqueryInputFilename);
//compiler.DeclareNamespace("saxon", "http://saxon.sf.net/");
string xqueryFileContents = File.ReadAllText(xqueryInputFilename);
Console.WriteLine("----- Start: XQUERY:");
Console.WriteLine(xqueryFileContents);
Console.WriteLine("----- End : XQUERY:");
XQueryExecutable exp = compiler.Compile(xqueryFileContents);
XQueryEvaluator eval = exp.Load();
Serializer qout = processor.NewSerializer();
qout.SetOutputProperty(Serializer.METHOD, "xml");
qout.SetOutputProperty(Serializer.INDENT, "yes");
qout.SetOutputStream(new FileStream(outFilename,
FileMode.Create, FileAccess.Write));
eval.Run(qout);
}
Also two questions regarding "BaseURI".
1. Should it be a directory name, or can it be same as the Xquery file name?
2. I get this compile error: "Cannot implicity convert to "System.Uri" to "String".
compiler.BaseUri = new Uri(xqueryInputFilename);
It's exactly the same thing I did for XSLT which worked. But it looks like BaseUri is a string for XQuery, but a real Uri object for XSLT? Any reason for the difference?
You seem to be asking a whole series of separate questions, which are hard to disentangle.
Your C# code appears to be compiling the query
<saxon:example>{static-base-uri()}</saxon:example>
which bears no relationship to the XQuery code you supplied that involves MyFlightLegs.
The MyFlightLegs query uses //FlightLeg and is clearly designed to run against a source document containing a FlightLeg element, but your C# code makes no attempt to supply such a document. You need to add an eval.ContextItem = value statement.
Your second C# fragment creates an input document in the line
XdmNode indoc = loader.Build(loader.BaseUri);
but it doesn't supply it to the query evaluator.
A base URI can be either a directory or a file; resolving relative.xml against file:///my/dir/ gives exactly the same result as resolving it against file:///my/dir/query.xq. By convention, though, the static base URI of the query is the URI of the resource (eg file) containing the source query text.
Yes, there's a lot of inconsistency in the use of strings versus URI objects in the API design. (There's also inconsistency about the spelling of BaseURI versus BaseUri.) Sorry about that; you're just going to have to live with it.
Bottom line solution based on Michael Kay's response; I added this line of code after doing the exp.Load():
eval.ContextItem = indoc;
The indoc object created earlier is what relates to the XML input file to be processed by the XQuery.
I want to parse a config file which has information like:
[MY_WINDOW_0]
Address = 0xA0B0C0D0
Size = 0x100
Type = cpu0
[MY_WINDOW_1]
Address = 0xB0C0D0A0
Size = 0x200
Type = cpu0
[MY_WINDOW_2]
Address = 0xC0D0A0B0
Size = 0x100
Type = cpu1
into a LUA table as follows
CPU_TRACE_WINDOWS =
{
["cpu0"] = {{address = 0xA0B0C0D0, size = 0x100},{address = 0xB0C0D0A0, size = 0x200},}
["cpu1"] = {{address = 0xC0D0A0B0, size = 0x100},...}
}
I tried my best with some basic LUA string manipulation functions but couldn't get the output that I'm looking for due to repetition of strings in each sections like 'Address',' Size', 'Type' etc. Also my actual config file is huge with 20 such sections.
I got so far, this is basically one section of the code, rest would be just repetition of the logic.
OriginalConfigFile = "test.cfg"
os.execute("cls")
CPU_TRACE_WINDOWS = {}
local bus
for line in io.lines(OriginalConfigFile) do
if string.find(line, "Type") ~= nil then
bus = string.gsub(line, "%a=%a", "")
k,v = string.match(bus, "(%w+) = (%w+)")
table.insert(CPU_TRACE_WINDOWS, v)
end
end
Basically I'm having trouble with coming up with the FINAL TABLE STRUCTURE that I need. v here is the different rvalues of the string "Type". I'm having issues with arranging it in the table. I'm currently working to find a solution but I thought I could ask for help meanwhile.
This should work for you. Just change the filename to wherever you have your config file stored.
f, Address, Size, Type = io.input("configfile"), "", "", ""
CPU_TRACE_WINDOWS = {}
for line in f:lines() do
if line:find("MY_WINDOW") then
Type = ""
Address = ""
Size = ""
elseif line:find("=") then
_G[line:match("^%a+")] = line:match("[%d%a]+$")
if line:match("Type") then
if not CPU_TRACE_WINDOWS[Type] then
CPU_TRACE_WINDOWS[Type] = {}
end
table.insert(CPU_TRACE_WINDOWS[Type], {address = Address, size = Size})
end
end
end
end
It searches for the MY_WINDOW phrase and resets the variable. If the table exists within CPU_TRACE_WINDOWS, then it just appends a new table value, otherwise it just creates it. Note that this is dependent upon Type always being the last entry. If it switches up anywhere, then it will not have all the required information. There may be a cleaner way to do it, but this works (tested on my end).
Edit: Whoops, forgot to change the variables in the middle there if MY_WINDOW matched. That needed to be corrected.
Edit 2: Cleaned up the redundancy with table.insert. Only need it once, just need to make sure the table is created first.
As per in this FileHelpers 3.1 example, you can automatically detect a CSV file format using the FileHelpers.Detection.SmartFormatDetector class.
But the example goes no further. How do you use this information to dynamically parse a CSV file? It must have something to do with the DelimitedFileEngine but I cannot see how.
Update:
I figured out a possible way but had to resort to using reflection (which does not feel right). Is there another/better way? Maybe using System.Dynamic? Anyway, here is the code I have so far, it ain't pretty but it works:
// follows on from smart detector example
FileHelpers.Detection.RecordFormatInfo lDetectedFormat = formats[0];
Type lDetectedClass = lDetectedFormat.ClassBuilderAsDelimited.CreateRecordClass();
List<FieldInfo> lFieldInfoList = new List<FieldInfo>(lDetectedFormat.ClassBuilderAsDelimited.FieldCount);
foreach (FileHelpers.Dynamic.DelimitedFieldBuilder lField in lDetectedFormat.ClassBuilderAsDelimited.Fields)
lFieldInfoList.Add(lDetectedClass.GetField(lField.FieldName));
FileHelperAsyncEngine lFileEngine = new FileHelperAsyncEngine(lDetectedClass);
int lRecNo = 0;
lFileEngine.BeginReadFile(cReadingsFile);
try
{
while (true)
{
object lRec = lFileEngine.ReadNext();
if (lRec == null)
break;
Trace.WriteLine("Record " + lRecNo);
lFieldInfoList.ForEach(f => Trace.WriteLine(" " + f.Name + " = " + f.GetValue(lRec)));
lRecNo++;
}
}
finally
{
lFileEngine.Close();
}
As I use the SmartFormatDetector to determine the exact format of the incoming Delimited files you can use following appoach:
private DelimitedClassBuilder GetFormat(string file)
{
var detector = new FileHelpers.Detection.SmartFormatDetector();
var format = detector.DetectFileFormat(file);
return format.First().ClassBuilderAsDelimited;
}
private List<T> ConvertFile2Objects<T>(string file, out DelimitedFileEngine engine)
{
var format = GetSeperator(file); // Get Here your FormatInfo
engine = new DelimitedFileEngine(typeof(T)); //define your DelimitdFileEngine
//set some Properties of the engine with what you need
engine.ErrorMode = ErrorMode.SaveAndContinue; //optional
engine.Options.Delimiter = format.Delimiter;
engine.Options.IgnoreFirstLines = format.IgnoreFirstLines;
engine.Options.IgnoreLastLines = format.IgnoreLastLines;
//process
var ret = engine.ReadFileAsList(file);
this.errorCount = engine.ErrorManager.ErrorCount;
var err = engine.ErrorManager.Errors;
engine.ErrorManager.SaveErrors("errors.out");
//return records do here what you need
return ret.Cast<T>().ToList();
}
This is an approach I use in a project, where I only know that I have to process Delimited files of multiple types.
Attention:
I noticed that with the files I recieved the SmartFormatDetector has a problem with tab delimiter. Maybe this should be considered.
Disclaimer: This code is not perfected but in a usable state. Modification and/or refactoring is adviced.
I get the following error in caret trainControl() using the custom methods syntax documented in the package vignette (pdf) on page 46. Does anyone know if this document out of date or incorrect? It seems at odds with the caret documentation page where the "custom" parameter is not used.
> fitControl <- trainControl(custom=list(parameters=lpgrnn$grid, model=lpgrnn$fit,
prediction=lpgrnn$predict, probability=NULL,
sort=lpgrnn$sort, method="cv"),
number=10)
Error in trainControl(custom = list(parameters = lpgrnn$grid, model = lpgrnn$fit, :
unused argument (custom = list(parameters = lpgrnn$grid, model = lpgrnn$fit,
prediction = lpgrnn$predict, probability = NULL, sort = lpgrnn$sort, method = "cv",
number = 10))
The cited pdf is out of date. The caret website is the canonical source of documentation.