Hi I'm studying on a plant that has no reference genome and only has one scaffold assembly and one gff3 annotation file. Can I create an index with the same assembly and gff3 in STAR and do the mapping? If I want to do de novo assembly using the Galaxy Can I import the links of 20 fastq files from the RNA-seq of the samples together to construct the Reference Genome with Trinity? If this is possible, please help me import the links of these files together into Galaxy and make the reference genome.
Thanks a lot
Related
I want to search a sequence for the enzyme CneH16IP, which has the recognition site GAYNNNNNCTTGY. When I import the list of enzymes included in the Restriction package, the enzyme is not included (code below)
from Bio import Restriction
from Bio.Restriction import *
dir()
I am assuming that not all of the enzymes from REBASE are imported by default, so is there a way to import all of them as an option? Or to add new enzymes? Alternatively, should I just try to do a string search that allows the degenerate bases and discontinuous bases?
Thank you for any help!
I'm new to Altova Correct me if I'm wrong.
I'm in a situation to map XML schema File to X12 and here I have a problem to get the actual X12 structure.
Requirements: X12.6020 - 811 (Consolidated Service Invoice/Statement)
Picture of the Actual X12 Structure and My mapped X12 Structure
https://ibb.co/cWFBAa
Picture of the XML file
https://ibb.co/ggZDqa
Picture of the Mapping from XML to X12
https://ibb.co/k5qWbF
In the above XML all notice Objects are grouped using 'group-adjecent' library function based on 'contractId' and each group should have one 'Detail' and I need to iterate though the individual group and create 'Sub-detail' and 'Sub-sub-detail' for each notice under group.
I have used 'Add Duplicate Input After' function on EDI 811 Component to create a duplicate 'LoopHL' node. ie., 'LoopHL', 'LoopHL1', 'LoopHL2' in the mapping picture above.
Ask me if you need any further information, if you feel it will be helpful to understand my question.
Thanks in advance, guys.
My task is to take a bpmn 2.0 xml file and map it as good as possible (with a certain error rate) to available web services. For example when my bpmn file explains the process of buying a pizza, i give 10€ and get back 1 pizza. Now it should map that bpmn to the webservice that needs an of type int with the name "money" etc.
How is that even possible? I searched for a few hours now and came up with the following:
I found https://github.com/camunda/camunda-bpm-platform and can easily use it to parse a plain .bpmn file to a java object structure which i can then query. Easy.
After parsing the xml notation i should analyze it and search for elements that input data and elements that output data for this are the only things i can map to wsdl (wsdl only describes the structure of the webservice: names of variables, types of variables, number of variables). Problem: I do not find any 1:1 elements i can easily declare as "when this bpmn element is used, it 100% means that the process is getting some input named x". What should i do here? What can i map?
I found ws-bpel. As far as i understand i can somehow transfer bpmn to ws-bpel which should be better modeling of the process and more easily be mappable to a wsdl (?). Camunda however doesn't offer this functionality and i am restricted to open source software.
Any suggestions what i should do?
Is there any documentation of the moses.ini format for Moses? Running moses at the command line without arguments returns available feature names but not their available arguments. Additionally, the structure of the .ini file is not specified in the manual that I can see.
The main idea is that the file contains settings that will be used by the translation model. Thus, the documentation of values and options in moses.ini should be looked up in the Moses feature specifications.
Here are some excerpt I found on the Web about moses.ini.
In the Moses Core, we have some details:
7.6.5 moses.ini All feature functions are specified in the [feature] section. It should be in the format:
* Feature-name key1=value1 key2=value2 .... For example, KENLM factor=0 order=3 num-features=1 lazyken=0 path=file.lm.gz
Also, there is a hint on how to print basic statistics about all components mentioned in the moses.ini.
Run the script
analyse_moses_model.pl moses.ini
This can be useful to set the order of mapping steps to avoid explosion of translation options or just to check that the model components are as big/detailed as we expect.
In the Center for Computational Language and EducAtion Research (CLEAR) Wiki, there is a sample file with some documentation:
Parameters
It is recommended to make an .ini file to storage all of your setting.
input-factors
- Using factor model or not
mapping
- To use LM in memory (T) or read the file in hard disk directly (G)
ttable-file
- Indicate the num. of source-factor, num. of target-factor, num of score, and
the path to translation table file
lmodel-file
- Indicate the type using for LM (0:SRILM, 1:IRSTLM), using factor number, the order (n-gram) of LM, and the path to language model file
If it is not enough, there is another description on this page, see "Decoder configuration file" section
The sections
[ttable-file] and [lmodel-file] contain pointers to the phrase table
file and language model file, respectively. You may disregard the
numbers on those lines. For the time being, it's enough to know that
the last one of the numbers in the language model specification is the
order of the n-gram model.
The configuration file also contains some feature weights. Note that
the [weight-t] section has 5 weights, one for each feature contained
in the phrase table.
The moses.ini file created by the training process will not work with
your decoder without modification because it relies on a language
model library that is not compiled into our decoder. In order to make
it work, open the moses.ini file and find the language model
specification in the line immediately after the [lmodel-file] heading.
The first number on this line will be 0, which stands for SRILM.
Change it into 8 and leave the rest of the line untouched. Then your
configuration should work.
I am trying to use OpenNLP in a project I am working in and i am very new to it. I tried out using the Named Entity Recognition with the training data available at http://opennlp.sourceforge.net/models-1.5/
However I want to see the training data that have been used. i.e. to actually open the .bin file and see its content in English. Can some one pls point me in the correct direction.
I have tried to use UltraISO to read the .bin file but i was not successful.
PLs help !!
Thanx :)
Use the Unix file command to find the file type, like file en-token.bin. For most OpenNLP .bin files, it will tell you that these are just ZIP files.
the bin file is actually the bytes of a serialized java object representing a TokenNameFinder implementation called a NameFinderME (ME meaning Maximum entropy, which is the main multinomial logistic regression (ish) algorithm used in OpenNLP). You will not be able to see the training data by doing anything to this file.
Correction: it's not the name finder, it's the namefinderMODEL that is serialized.