How to parse a TypeScript code base into ASTs

How to parse a TypeScript code base into ASTs - parsing

I want to parse TypeScript projects into ASTs.
I can parse single file by :
import ts = require("typescript");
var fs = require('fs');
var util = require('util');
const ast = ts.createSourceFile('sample.ts', fs.readFileSync('sample.ts').toString(), ts.ScriptTarget.ES6, true);
console.log("AST:"+util.inspect(ast));
I can even loop through the files and filter files by extension and run above code to generate ASt.
However I want to parse the whole project in such a way that the relationships (like imports) will be preserved in AST.
For example:
If, a.ts is referencing var x from b.ts as below:
a.ts:
var y = x;
b.ts:
var x = 5;
In this case signature of x in a .ts should be resolved as : b.ts.x or equivalent.
I just want all such relationships resolved in the ASts as I parse the .ts files one by one.

You can load your project using
ts.createProgram(rootNames: string[], options: CompilerOptions, host?: CompilerHost, oldProgram?: Program)
rootNames is the list of all typescript files in your project. As far as I know, unless you declare type explicitly, AST will have no reference to it.
for eg. If you have
class MyClass {
// some code
}
let instance1 = new MyClass();
let instance2: MyClass = new MyClass();
In AST, node for instance1 will have type property as undefined, for instance2 type property will have proper TypeReference
For type checking you can use Program.getTypeChecker(). This returns a TypeChecker which can be used to analyse files in the program.

Related

Can I parse some F# code at run-time that reference types in my current assembly?

Say I have the following type defined:
type Foo = { A: string; B: int }
I want a function parse, such that:
let myfoo = parse<Foo> "{A = \"foo\"; B = 5}"
gives me an instance of type Foo (or error).
Is this possible using FSharp.Compiler.Service?
UPDATE:
While there are other questions that address parsing of F# code, they don't address having references in the current assembly.

You can do this by referencing the current assembly from the hosted F# interactive - this only works if you are running this from a compiled program (which has assembly located on disk) and if your types are public, but it may do the trick in your case.
Given the usual setup documented on the Embedding F# Interactive page, you can do something like this:
module Program
type Test = { A:int; B:string }
// (omitted code to initialize the fsi service)
let fsiSession = FsiEvaluationSession.Create(...)
// Run #r command to reference the current assembly
let loc = System.Reflection.Assembly.GetExecutingAssembly().Location
fsiSession.EvalInteraction(sprintf "#r #\"%s\"" loc)
// Open the module or namespace containing your types
fsiSession.EvalInteraction("open Program")
// Evaluate code using the type and cast it back to our type
let value = fsiSession.EvalExpression("{A=0; B=\"hi\"}").Value.ReflectionValue :?> Test
printfn "%A" value

SideInputs corrupt the data in DataFlow's Pipeline

I have a Dataflow pipeline (SDK 2.1.0, Apache Beam 2.2.0) which simply reads RDF (in N-Triples, so it's just text files) from GCS, transforms it somehow and writes it back to GCS, but in a different bucket. In this pipeline I employ side inputs which are three single files (one file per side input) and use them in a ParDo.
To work with RDF in Java I use Apache Jena, so each file is read into an instance of Model class. Since Dataflow doesn't have Coder for it, I developed it myself (RDFModelCoder, see below). It works fine in number of other pipelines I created.
The problem with this particular pipeline is when I add the side inputs, the execution fails with an exception indicating a corruption of the data, i.e. some garbage is added. Once I remove the side inputs, the pipeline finishes execution successfully.
The exception (it's thrown from RDFModelCoder, see below):
Caused by: org.apache.jena.atlas.RuntimeIOException: java.nio.charset.MalformedInputException: Input length = 1
at org.apache.jena.atlas.io.IO.exception(IO.java:233)
at org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
at org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
at org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:235)
at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:229)
at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:151)
at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:92)
at org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:48)
at org.apache.jena.riot.lang.RiotParsers.createParser(RiotParsers.java:57)
at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:198)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:298)
at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:288)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:237)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:417)
at org.apache.jena.riot.RDFDataMgr.parseFromInputStream(RDFDataMgr.java:870)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:268)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:254)
at org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:69)
at org.apache.jena.rdf.model.impl.ModelCom.read(ModelCom.java:305)
And here you can see the garbage (at the end):
<http://example.com/typeofrepresentative/08> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#NamedIndividual> . ������** �����I��.�������������u�������
The pipeline:
val one = p.apply(TextIO.read().from(config.getString("source.one")))
.apply(Combine.globally(SingleValue()))
.apply(ParDo.of(ConvertToRDFModel(RDFLanguages.NTRIPLES)))
val two = p.apply(TextIO.read().from(config.getString("source.two")))
.apply(Combine.globally(SingleValue()))
.apply(ParDo.of(ConvertToRDFModel(RDFLanguages.NTRIPLES)))
val three = p.apply(TextIO.read().from(config.getString("source.three")))
.apply(Combine.globally(SingleValue()))
.apply(ParDo.of(ConvertToRDFModel(RDFLanguages.NTRIPLES)))
val sideInput = PCollectionList.of(one).and(two).and(three)
.apply(Flatten.pCollections())
.apply(View.asList())
p.apply(RDFIO.Read
.from(options.getSource())
.withSuffix(RDFLanguages.strLangNTriples))
.apply(ParDo.of(SparqlConstructETL(config, sideInput))
.withSideInputs(sideInput))
.apply(RDFIO.Write
.to(options.getDestination())
.withSuffix(RDFLanguages.NTRIPLES))
And just to provide the whole picture here are implementations of SingleValue and ConvertToRDFModel ParDos:
class SingleValue : SerializableFunction<Iterable<String>, String> {
override fun apply(input: Iterable<String>?): String {
if (input != null) {
return input.joinToString(separator = " ")
}
return ""
}
}
class ConvertToRDFModel(outputLang: Lang) : DoFn<String, Model>() {
private val lang: String = outputLang.name
#ProcessElement
fun processElement(c: ProcessContext?) {
if (c != null) {
val model = ModelFactory.createDefaultModel()
model.read(StringReader(c.element()), null, lang)
c.output(model)
}
}
}
The implementation of RDFModelCoder:
class RDFModelCoder(private val decodeLang: String = RDFLanguages.strLangNTriples,
private val encodeLang: String = RDFLanguages.strLangNTriples)
: AtomicCoder<Model>() {
private val LOG = LoggerFactory.getLogger(RDFModelCoder::class.java)
override fun decode(inStream: InputStream): Model {
val bytes = StreamUtils.getBytes(inStream)
val model = ModelFactory.createDefaultModel()
model.read(ByteArrayInputStream(bytes), null, decodeLang) // the exception is thrown from here
return model
}
override fun encode(value: Model, outStream: OutputStream?) {
value.write(outStream, encodeLang, null)
}
}
I checked the side input files multiple times, they're fine, they have UTF-8 encoding.

Most likely the error is in the implementation of RDFModelCoder. When implementing encode/decode one has to remember that the provided InputStream and OutputStream are not exclusively owned by the current instance being encoded/decoded. E.g. there might be more data in the InputStream after the encoded form of your current Model. When using StreamUtils.getBytes(inStream) you are grabbing both data of the current encoded Model and anything else that was in the stream.
Generally when writing a new Coder it's a good idea to only combine existing Coder's rather than hand-parsing the stream: that is less error-prone. I would suggest to convert the model to/from byte[] and use ByteArrayCoder.of() to encode/decode it.

Apache Jena provides the Elephas IO modules which have Hadoop IO support, since Beam supports Hadoop InputFormat IO you should be able to use that to read in your NTriples file.
This will likely be far more efficient since the NTriples support in Elephas is able to parallelise the IO and avoid caching the entire model into memory (in fact it won't use Model at all):
Configuration myHadoopConfiguration = new Configuration(false);
// Set Hadoop InputFormat, key and value class in configuration
myHadoopConfiguration.setClass("mapreduce.job.inputformat.class",
NTriplesInputFormat.class, InputFormat.class);
myHadoopConfiguration.setClass("key.class", LongWritable.class, Object.class);
myHadoopConfiguration.setClass("value.class", TripleWritable.class, Object.class);
// Set any other Hadoop config you might need
// Read data only with Hadoop configuration.
p.apply("read",
HadoopInputFormatIO.<LongWritable, TripleWritable>read()
.withConfiguration(myHadoopConfiguration);
Of course this may require you to refactor your overall pipeline somewhat.

How do I copy DOORS modules between folders/projects using DXL?

I am new to both DOORS and DXL. I've been trying to copy a module in a project template to any given project folder using DXL, but my approaches haven't been working. Here's the part of my script where the copy and paste operations are attempted:
// Where string originalModule is the path to the module being copied.
// Where string targetPath is the path to where the copied module should be pasted.
ModName_ originalMMP = module(originalModule)
string originalMMPdesc = description(originalMMP)
clipCopy(originalMMP)
clipPaste(targetPath)
clipClear()
Whenever I run my script in the DOORS' DXL editor, I get an error indicating that the functions clipCopy() and clipPaste() have invalid arguments. In the DXL reference manual, it indicates that the type of the arguments should be of Item type, but I'm not totally sure I'm understanding that.
I have tried this other approach as well:
// The same conventions as above are used for the originalModule and targetPath
// string type variables.
// The variable string targetPathTemp contains the path to the replicated
// file New Module Temp
ModName_ originalMMP = module(originalModule)
string originalMMPdesc = description(originalMMP)
bool OK = copy(originalMMP,"New Module Temp", originalMMPdesc)
ModName_ newMMP = module(targetPathTemp)
// Moving and Renaming:
ErrMess = move(newMMP, targetPath)
ErrMess = rename(copiedMMP,newModuleName, originalMMPdesc)
I get the same errors as clipCopy() and clipPaste() for the functions: copy() and move().
Does anyone have any idea of what am I doing wrong, and what exactly am I not understanding?
Thanks in advance!

I think clipCopy and its brethren only work with Items. Use Item originalMMP = item(originalModule) instead of ModName_...

How to create a dynamic variable in dart

I am moving java script to dart, in java script I create dynamic variable like
window["text" + pageNumber] = 123;
alert(window["text" + pageNumber]);
How can I do it with dart?

In Dart Window (the type of window) is a class. You can't dynamically add properties to a Dart class.
window["text" + pageNumber] = 123; would work with a Map. Object representation in JS is quite similar to a map and therefore this works there.
If another class implements the [] operator you could call it on instances of that class as well but it would still not add properties. What it actually does just depends on the implementation of the [] operator.
There are probably different ways in Dart to achieve what you want, but you didn't add details about what actual problem you try to solve.
You can use normal global variables in Dart like explained in
Global Variables in Dart.
For your use case you can create a global Map variable this way
final Map<String,int> myGlobals = <String,int>{};
to create a map that stores integer values with string names.
Set values with myGlobals['someName'] = 123; and read them with print(myGlobals['someName']);.
If you need to set a global value that is also available for JS libraries you might use, you can use dart-js-interop
import 'dart:js';
import 'dart:html';
main() {
int pagenumber = 5;
context['Window']['text$pagenumber'] = 123;
window.alert('${context['Window']['text$pagenumber']}');
}
Try it on DartPad.
Hint:
"text" + pageNumber doesn't work when pageNumber is not a string.
In Dart you can't add string and numbers.
"text" + pageNumber.toString() would work but 'text$pagenumber' is a more darty way to do this. In string interpolation toString() is called automatically for you.
See also Dart js-interop not working if .dart file isn't included.

How to Get Filename when using file pattern match in google-cloud-dataflow

Someone know how to get Filename when using file pattern match in google-cloud-dataflow?
I'm newbee to use dataflow. How to get filename when use file patten match, in this way.
p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*.txt"))
I'd like to how I detect filename that kinglear.txt,Hamlet.txt, etc.

If you would like to simply expand the filepattern and get a list of filenames matching it, you can use GcsIoChannelFactory.match("gs://dataflow-samples/shakespeare/*.txt") (see GcsIoChannelFactory).
If you would like to access the "current filename" from inside one of the DoFn's downstream in your pipeline - that is currently not supported (though there are some workarounds - see below). It is a common feature request and we are still thinking how best to fit it into the framework in a natural, generic and high-performant way.
Some workarounds include:
Writing a pipeline like this (the tf-idf example uses this approach):
DoFn readFile = ...(takes a filename, reads the file and produces records)...
p.apply(Create.of(filenames))
.apply(ParDo.of(readFile))
.apply(the rest of your pipeline)
This has the downside that dynamic work rebalancing features won't work particularly well, because they currently apply at the level of Read PTransform's only, but not at the level of ParDo's with high fan-out (like the one here, which would read a file and produce all records); and parallelization will only work to the level of files but files will not be split into sub-ranges. At the scale of reading Shakespeare this is not an issue, but if you are reading a set of files of wildly different size, some extremely large, then it may become an issue.
Implementing your own FileBasedSource (javadoc, general documentation) which would return records of type something like Pair<String, T> where the String is the filename and the T is the record you're reading. In this case the framework would handle the filepattern matching for you, dynamic work rebalancing would work just fine, however it is up to you to write the reading logic in your FileBasedReader.
Both of these work-arounds are non-ideal, but depending on your requirements, one of them may do the trick for you.

Update based on latest SDK
Java (sdk 2.9.0):
Beams TextIO readers do not give access to the filename itself, for these use cases we need to make use of FileIO to match the files and gain access to the information stored in the file name. Unlike TextIO, the reading of the file needs to be taken care of by the user in transforms downstream of the FileIO read. The results of a FileIO read is a PCollection the ReadableFile class contains the file name as metadata which can be used along with the contents of the file.
FileIO does have a convenience method readFullyAsUTF8String() which will read the entire file into a String object, this will read the whole file into memory first. If memory is a concern you can work directly with the file with utility classes like FileSystems.
From: Document Link
PCollection<KV<String, String>> filesAndContents = p
.apply(FileIO.match().filepattern("hdfs://path/to/*.gz"))
// withCompression can be omitted - by default compression is detected from the filename.
.apply(FileIO.readMatches().withCompression(GZIP))
.apply(MapElements
// uses imports from TypeDescriptors
.into(KVs(strings(), strings()))
.via((ReadableFile f) -> KV.of(
f.getMetadata().resourceId().toString(), f.readFullyAsUTF8String())));
Python (sdk 2.9.0):
For 2.9.0 for python you will need to collect the list of URI from outside of the Dataflow pipeline and feed it in as a parameter to the pipeline. For example making use of FileSystems to read in the list of files via a Glob pattern and then passing that to a PCollection for processing.
Once fileio see PR https://github.com/apache/beam/pull/7791/ is available, the following code would also be an option for python.
import apache_beam as beam
from apache_beam.io import fileio
with beam.Pipeline() as p:
readable_files = (p
| fileio.MatchFiles(‘hdfs://path/to/*.txt’)
| fileio.ReadMatches()
| beam.Reshuffle())
files_and_contents = (readable_files
| beam.Map(lambda x: (x.metadata.path,
x.read_utf8()))

One approach is to build a List<PCollection> where each entry corresponds to an input file, then use Flatten. For example, if you want to parse each line of a collection of files into a Foo object, you might do something like this:
public static class FooParserFn extends DoFn<String, Foo> {
private String fileName;
public FooParserFn(String fileName) {
this.fileName = fileName;
}
#Override
public void processElement(ProcessContext processContext) throws Exception {
String line = processContext.element();
// here you have access to both the line of text and the name of the file
// from which it came.
}
}
public static void main(String[] args) {
...
List<String> inputFiles = ...;
List<PCollection<Foo>> foosByFile =
Lists.transform(inputFiles,
new Function<String, PCollection<Foo>>() {
#Override
public PCollection<Foo> apply(String fileName) {
return p.apply(TextIO.Read.from(fileName))
.apply(new ParDo().of(new FooParserFn(fileName)));
}
});
PCollection<Foo> foos = PCollectionList.<Foo>empty(p).and(foosByFile).apply(Flatten.<Foo>pCollections());
...
}
One downside of this approach is that, if you have 100 input files, you'll also have 100 nodes in the Cloud Dataflow monitoring console. This makes it hard to tell what's going on. I'd be interested in hearing from the Google Cloud Dataflow people whether this approach is efficient.

I also had the 100 input files = 100 nodes on the dataflow diagram when using code similar to #danvk. I switched to an approach like this which resulted in all the reads being combined into a single block that you can expand to drill down into each file/directory that was read. The job also ran faster using this approach rather than the Lists.transform approach in our use case.
GcsOptions gcsOptions = options.as(GcsOptions.class);
List<GcsPath> paths = gcsOptions.getGcsUtil().expand(GcsPath.fromUri(options.getInputFile()));
List<String>filesToProcess = paths.stream().map(item -> item.toString()).collect(Collectors.toList());
PCollectionList<SomeClass> pcl = PCollectionList.empty(p);
for(String fileName : filesToProcess) {
pcl = pcl.and(
p.apply("ReadAvroFile" + fileName, AvroIO.Read.named("ReadFromAvro")
.from(fileName)
.withSchema(SomeClass.class)
)
.apply(ParDo.of(new MyDoFn(fileName)))
);
}
// flatten the PCollectionList, combining all the PCollections together
PCollection<SomeClass> flattenedPCollection = pcl.apply(Flatten.pCollections());

This might be a very late post for the above question, but I wanted to add answer with Beam bundled classes.
This could also be seen as an extracted code from the solution provided by #Reza Rokni.
PCollection<String> listOfFilenames =
pipe.apply(FileIO.match().filepattern("gs://apache-beam-samples/shakespeare/*"))
.apply(FileIO.readMatches())
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
(FileIO.ReadableFile file) -> {
String f = file.getMetadata().resourceId().getFilename();
System.out.println(f);
return f;
}));
pipe.run().waitUntilFinish();
Above PCollection<String> will have a list of files available at any provided directory.

I was struggling with the same use case while using wildcard to read files from GCS but also needed to modify the collection based on the file name.The key is to use ReadFromTextWithFilename instead of readfromtext In java you already have a way out and you can use:
String filename =context.element().getMetadata().resourceId().getCurrentDirectory().toString()
inside your processElement method.
But for Python below technique will work:
-> Use beam.io.ReadFromTextWithFilename for reading the wildcard path from GCS
-> As per the document, ReadFromTextWithFilename returns the file's name and the file's content.
Below is the code snippet:
class GetFileNameFromWildcard(beam.DoFn):
def process(self, element, *args, **kwargs):
file_path, content = element
schema = ["id","name","mob","email","dept","store"]
store_name = file_path.split("/")[-2]
content_list = content.split(",")
content_list.append(store_name)
out_dict = dict(zip(schema,content_list))
print(out_dict)
yield out_dict
def run():
pipeline_options = PipelineOptions()
with beam.Pipeline(options=pipeline_options) as p:
# saving main session so that it can load global namespace on the Cloud Dataflow Worker
init = p | 'Begin Pipeline With Initiator' >> beam.Create(
["pcollection initializer"]) | 'Read From GCS' >> beam.io.ReadFromTextWithFilename(
"gs://<bkt-name>/20220826/*/dlp*", skip_header_lines=1) | beam.ParDo(
GetFileNameFromWildcard()) | beam.io.WriteToText(
'df_out.csv')

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to parse a TypeScript code base into ASTs - parsing

Related

Can I parse some F# code at run-time that reference types in my current assembly?

SideInputs corrupt the data in DataFlow's Pipeline

How do I copy DOORS modules between folders/projects using DXL?

How to create a dynamic variable in dart

How to Get Filename when using file pattern match in google-cloud-dataflow

Categories

Resources