Convert a Tree to SemanticGraph in Stanford parser - parsing

I want to covert a Tree to SemanticGraph in Stanford Parser as followings:
LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
LexicalizedParserQuery lpq=lp.lexicalizedParserQuery();
String sentence="This is a sentence.";
List<CoreLabel> tokenizedSentence = tokenizerFactory.getTokenizer(new StringReader(sentence)).tokenize();
lpq.parse(tokenizedSentence);
Tree depTree = lpq.getBestParse();
SemanticGraph semanticGraph = ParserAnnotatorUtils.generateUncollapsedDependencies(depTree);
ParserAnnotatorUtils.generateUncollapsedDependencies(depTree) works on version 2.0.4. But it does not work on Version 3.5.2.

You can try something like:
Tree tree = ...
GrammaticalStructureFactory gsf = new UniversalEnglishGrammaticalStructureFactory();
SemanticGraph dependencyGraph = SemanticGraphFactory.generateCollapsedDependencies( gsf.newGrammaticalStructure(tree), GrammaticalStructure.Extras.NONE );

Related

How to load Seurat Object into WGCNA Tutorial Format

As far as I can find, there is only one tutorial about loading Seurat objects into WGCNA (https://ucdavis-bioinformatics-training.github.io/2019-single-cell-RNA-sequencing-Workshop-UCD_UCSF/scrnaseq_analysis/scRNA_Workshop-PART6.html). I am really new to programming so it's probably just my inexperience, but I am not sure how to load my Seurat object into a format that works with WGCNA's tutorials (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/).
Here is what I have tried thus far:
This tries to replicate datExpr and datTraits from part I.1:
library(WGCNA)
library(Seurat)
#example Seurat object -----------------------------------------------
ERlist <- list(c("CPB1", "RP11-53O19.1", "TFF1", "MB", "ANKRD30B",
"LINC00173", "DSCAM-AS1", "IGHG1", "SERPINA5", "ESR1",
"ILRP2", "IGLC3", "CA12", "RP11-64B16.2", "SLC7A2",
"AFF3", "IGFBP4", "GSTM3", "ANKRD30A", "GSTT1", "GSTM1",
"AC026806.2", "C19ORF33", "STC2", "HSPB8", "RPL29P11",
"FBP1", "AGR3", "TCEAL1", "CYP4B1", "SYT1", "COX6C",
"MT1E", "SYTL2", "THSD4", "IFI6", "K1AA1467", "SLC39A6",
"ABCD3", "SERPINA3", "DEGS2", "ERLIN2", "HEBP1", "BCL2",
"TCEAL3", "PPT1", "SLC7A8", "RP11-96D1.10", "H4C8",
"PI15", "PLPP5", "PLAAT4", "GALNT6", "IL6ST", "MYC",
"BST2", "RP11-658F2.8", "MRPS30", "MAPT", "AMFR", "TCEAL4",
"MED13L", "ISG15", "NDUFC2", "TIMP3", "RP13-39P12.3", "PARD68"))
tnbclist <- list(c("FABP7", "TSPAN8", "CYP4Z1", "HOXA10", "CLDN1",
"TMSB15A", "C10ORF10", "TRPV6", "HOXA9", "ATP13A4",
"GLYATL2", "RP11-48O20.4", "DYRK3", "MUCL1", "ID4", "FGFR2",
"SHOX2", "Z83851.1", "CD82", "COL6A1", "KRT23", "GCHFR",
"PRICKLE1", "GCNT2", "KHDRBS3", "SIPA1L2", "LMO4", "TFAP2B",
"SLC43A3", "FURIN", "ELF5", "C1ORF116", "ADD3", "EFNA3",
"EFCAB4A", "LTF", "LRRC31", "ARL4C", "GPNMB", "VIM",
"SDR16C5", "RHOV", "PXDC1", "MALL", "YAP1", "A2ML1",
"RP1-257A7.5", "RP11-353N4.6", "ZBTB18", "CTD-2314B22.3", "GALNT3",
"BCL11A", "CXADR", "SSFA2", "ADM", "GUCY1A3", "GSTP1",
"ADCK3", "SLC25A37", "SFRP1", "PRNP", "DEGS1", "RP11-110G21.2",
"AL589743.1", "ATF3", "SIVA1", "TACSTD2", "HEBP2"))
genes = c(unlist(c(ERlist,tnbclist)))
mat = matrix(rnbinom(500*length(genes),mu=500,size=1),ncol=500)
rownames(mat) = genes
colnames(mat) = paste0("cell",1:500)
sobj = CreateSeuratObject(mat)
sobj = NormalizeData(sobj)
sobj$ClusterName = factor(sample(0:1,ncol(sobj),replace=TRUE))
sobj$Patient = paste0("Patient", 1:500)
sobj = AddModuleScore(object = sobj, features = tnbclist,
name = "TNBC_List",ctrl=5)
sobj = AddModuleScore(object = sobj, features = ERlist,
name = "ER_List",ctrl=5)
#WGCNA -----------------------------------------------------------------
sobjwgcna <- sobj
sobjwgcna <- FindVariableFeatures(sobjwgcna, selection.method = "vst", nfeatures = 2000,
verbose = FALSE, assay = "RNA")
options(stringsAsFactors = F)
sobjwgcnamat <- GetAssayData(sobjwgcna)
datExpr <- t(sobjwgcnamat)[,VariableFeatures(sobjwgcna)]
datTraits <- sobjwgcna#meta.data
datTraits = subset(datTraits, select = -c(nCount_RNA, nFeature_RNA))
I then copy-paste the code as written in the WGCNA I.2a tutorial (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-02-networkConstr-auto.pdf), and that all works until I get to this line in the I.3 tutorial (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-03-relateModsToExt.pdf):
MEList = moduleEigengenes(datExpr, colors = moduleColors)
Error in t.default(expr[, restrict1]) : argument is not a matrix
I tried converting both moduleColors and datExpr into a matrix with as.matrix(), but the error still persists.
Hopefully this makes sense, and thanks for reading!
So doing as.matrix(datExpr) right after datExpr <- t(sobjwgcnamat)[,VariableFeatures(sobjwgcna)] worked. I had been trying it right before MEList = moduleEigengenes(datExpr, colors = moduleColors)
and that didn't work. Seems simple but order matters I guess.

Can I get subject verb object in string format using NLP in java?

This is sample code:
LexicalizedParser lp = **new LexicalizedParser("englishPCFG.ser.gz");**
String[] sent = { "This", "is", "an", "easy", "sentence", "." };
Tree parse = (Tree) lp.apply(Arrays.asList(sent));
parse.pennPrint();
System.out.println();
TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
tp.print(parse);
For example: "I use a parser" outputs (in Stanford parser)
nsubj(use-2, I-1)
root(ROOT-0, use-2)
det(parser-4, a-3)
dobj(use-2, parser-4)
But, How can I print subject verb and object without relation.
for example, I want to print
subject = I
verb = use
det = a
object = parser

How to use prepare_analogy_questions and check_analogy_accuracy functions in text2vec package?

Following code:
library(text2vec)
text8_file = "text8"
if (!file.exists(text8_file)) {
download.file("http://mattmahoney.net/dc/text8.zip", "text8.zip")
unzip ("text8.zip", files = "text8")
}
wiki = readLines(text8_file, n = 1, warn = FALSE)
# Create iterator over tokens
tokens <- space_tokenizer(wiki)
# Create vocabulary. Terms will be unigrams (simple words).
it = itoken(tokens, progressbar = FALSE)
vocab <- create_vocabulary(it)
vocab <- prune_vocabulary(vocab, term_count_min = 5L)
# Use our filtered vocabulary
vectorizer <- vocab_vectorizer(vocab)
# use window of 5 for context words
tcm <- create_tcm(it, vectorizer, skip_grams_window = 5L)
RcppParallel::setThreadOptions(numThreads = 4)
glove_model = GloVe$new(word_vectors_size = 50, vocabulary = vocab, x_max = 10, learning_rate = .25)
word_vectors_main = glove_model$fit_transform(tcm, n_iter = 20)
word_vectors_context = glove_model$components
word_vectors = word_vectors_main + t(word_vectors_context)
causes error:
qlst <- prepare_analogy_questions("questions-words.txt", rownames(word_vectors))
> Error in (function (fmt, ...) :
invalid format '%d'; use format %s for character objects
File questions-words.txt from word2vec sources https://github.com/nicholas-leonard/word2vec/blob/master/questions-words.txt
This was a small bug in information message formatting (after introduction of futille.logger). Just fixed it and pushed to github.
You can install updated version of the package with devtools::install_github("dselivanov/text2vec"

Convert Xml String with node prefixes to XElement

This is my xml string
string fromHeader= "<a:From><a:Address>http://ex1.example.org/</a:Address></a:From>";
I want to load it into an XElement, but doing XElement.Parse(fromHeader) gives me an error due to the 'a' prefixes. I tried the following:
XNamespace xNSa = "http://www.w3.org/2005/08/addressing";
string dummyRoot = "<root xmlns:a=\"{0}\">{1}</root>";
var fromXmlStr = string.Format(dummyRoot, xNSa, fromHeader);
XElement xFrom = XElement.Parse(fromXmlStr).Elements().First();
which works, but seriously, do i need 4 lines of code to do this! What is a quickest / shortest way of getting my XElement?
I found out the above 4 lines are equivalent to
XNamespace xNSa = "http://www.w3.org/2005/08/addressing";
XElement xFrom = new XElement(xNSa + "From", new XElement(xNSa + "Address", "http://ex1.example.org/"));
OR ALTERNATIVELY move the NS into the 'From' element before parsing.
var fromStr = "<a:From xmlns:a=\"http://www.w3.org/2005/08/addressing\"><a:Address>http://ex1.example.org/</a:Address></a:From>";
XElement xFrom = XElement.Parse(fromStr);

Is there an easier way to modify a value in a subsubsub record field in Erlang?

So I've got a fairly deep hierarchy of record definitions:
-record(cat, {name = '_', attitude = '_',}).
-record(mat, {color = '_', fabric = '_'}).
-record(packet, {cat = '_', mat = '_'}).
-record(stamped_packet, {packet = '_', timestamp = '_'}).
-record(enchilada, {stamped_packet = '_', snarky_comment = ""}).
And now I've got an enchilada, and I want to make a new one that's
just like it except for the value of one of the subsubsubrecords.
Here's what I've been doing.
update_attitude(Ench0, NewState)
when is_record(Ench0, enchilada)->
%% Pick the old one apart.
#enchilada{stamped_packet = SP0} = Ench0,
#stamped_packet{packet = PK0} = SP0,
#packet{cat = Tag0} = PK0,
%% Build up the new one.
Tude1 = Tude0#cat{attitude = NewState},
PK1 = PK0#packet{cat = Tude1},
SP1 = SP0#stamped_packet{packet = PK1},
%% Thank God that's over.
Ench0#enchilada{stamped_packet = SP1}.
Just thinking about this is painful. Is there a better way?
As Hynek suggests, you can elide the temporary variables and do:
update_attitude(E = #enchilada{stamped_packet = (P = #packet{cat=C})},
NewAttitude) ->
E#enchilada{stamped_packet = P#packet{cat = C#cat{attitude=NewAttitude}}}.
Yariv Sadan got frustrated with the same issue and wrote Recless, a type inferring parse transform for records which would allow you to write:
-compile({parse_transform, recless}).
update_attitude(Enchilada = #enchilada{}, Attitude) ->
Enchilada.stamped_packet.packet.cat.attitude = Attitude.
Try this:
update_attitude(E = #enchilada{
stamped_packet = (SP = #stamped_packet{
packet = (P = #packet{
cat = C
})})}, NewState) ->
E#enchilada{
stamped_packet = SP#stamped_packet{
packet = P#packet{
cat = C#cat{
attitude = NewState
}}}}.
anyway, structures is not most powerful part of Erlang.

Resources