How to generate truncated Javadoc? - parsing

I encountered a problem with Javadoc generation. I would like to create a truncated Javadoc consisting only of specific comments, specifically - test cases (which are described by every #Test annotation), packages and classes, all other information is not needed. Is it possible to limit other elements in Javadoc while generating it without parsing it ?
For example:
/**
* <b>Test case A:</b>
*
* <ol>
* <li>Navigate to main application</li>
* <li>In actions drop-down select value = 'Run'</li>
* <li>Assert that application is running</li>
* </ol>
*
**/
#Test
public void testMainApp() throws IOException {
navigateToMainApp();
selectAndExecute();
Assert.assertEquals("Applications is not running: ", true, status.equals(RUNNING));
}
Thanks in advance!

You can use maven to generate javadocs(http://maven.apache.org/plugins/maven-javadoc-plugin/javadoc-mojo.html). You will be able to exclude packages and source files for which you do not want to create docs.
Hope this helps.

Related

open layers 3 - Namespace "ol" already declared error on startup

I am using instructions from here on how to get started with open layers and I got the error: Namespace "ol" already declared - source ol-debug.js and the error
this.Va is not a function - source ol.js
I am pretty sure I have included the ol.js, ol-debug.js and ol.css files properly in my index.html.
Link to open layers js and css files.
This is the relevant part from ol-debug.js file -
/**
* Defines a namespace in Closure.
*
* A namespace may only be defined once in a codebase. It may be defined using
* goog.provide() or goog.module().
*
* The presence of one or more goog.provide() calls in a file indicates
* that the file defines the given objects/namespaces.
* Provided symbols must not be null or undefined.
*
* In addition, goog.provide() creates the object stubs for a namespace
* (for example, goog.provide("goog.foo.bar") will create the object
* goog.foo.bar if it does not already exist).
*
* Build tools also scan for provide/require/module statements
* to discern dependencies, build dependency files (see deps.js), etc.
*
* #see goog.require
* #see goog.module
* #param {string} name Namespace provided by this file in the form
* "goog.package.part".
*/
goog.provide = function(name) {
if (goog.isInModuleLoader_()) {
throw Error('goog.provide can not be used within a goog.module.');
}
if (!COMPILED) {
// Ensure that the same namespace isn't provided twice.
// A goog.module/goog.provide maps a goog.require to a specific file
if (goog.isProvided_(name)) {
throw Error('Namespace "' + name + '" already declared.');
}
}
goog.constructNamespace_(name);
};
You need to declare either ol.js or ol-debug.js, not both of them. The error is coming from the fact that you're declaring both of them and it is creating a namespace conflict.

How to Get Filename when using file pattern match in google-cloud-dataflow

Someone know how to get Filename when using file pattern match in google-cloud-dataflow?
I'm newbee to use dataflow. How to get filename when use file patten match, in this way.
p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*.txt"))
I'd like to how I detect filename that kinglear.txt,Hamlet.txt, etc.
If you would like to simply expand the filepattern and get a list of filenames matching it, you can use GcsIoChannelFactory.match("gs://dataflow-samples/shakespeare/*.txt") (see GcsIoChannelFactory).
If you would like to access the "current filename" from inside one of the DoFn's downstream in your pipeline - that is currently not supported (though there are some workarounds - see below). It is a common feature request and we are still thinking how best to fit it into the framework in a natural, generic and high-performant way.
Some workarounds include:
Writing a pipeline like this (the tf-idf example uses this approach):
DoFn readFile = ...(takes a filename, reads the file and produces records)...
p.apply(Create.of(filenames))
.apply(ParDo.of(readFile))
.apply(the rest of your pipeline)
This has the downside that dynamic work rebalancing features won't work particularly well, because they currently apply at the level of Read PTransform's only, but not at the level of ParDo's with high fan-out (like the one here, which would read a file and produce all records); and parallelization will only work to the level of files but files will not be split into sub-ranges. At the scale of reading Shakespeare this is not an issue, but if you are reading a set of files of wildly different size, some extremely large, then it may become an issue.
Implementing your own FileBasedSource (javadoc, general documentation) which would return records of type something like Pair<String, T> where the String is the filename and the T is the record you're reading. In this case the framework would handle the filepattern matching for you, dynamic work rebalancing would work just fine, however it is up to you to write the reading logic in your FileBasedReader.
Both of these work-arounds are non-ideal, but depending on your requirements, one of them may do the trick for you.
Update based on latest SDK
Java (sdk 2.9.0):
Beams TextIO readers do not give access to the filename itself, for these use cases we need to make use of FileIO to match the files and gain access to the information stored in the file name. Unlike TextIO, the reading of the file needs to be taken care of by the user in transforms downstream of the FileIO read. The results of a FileIO read is a PCollection the ReadableFile class contains the file name as metadata which can be used along with the contents of the file.
FileIO does have a convenience method readFullyAsUTF8String() which will read the entire file into a String object, this will read the whole file into memory first. If memory is a concern you can work directly with the file with utility classes like FileSystems.
From: Document Link
PCollection<KV<String, String>> filesAndContents = p
.apply(FileIO.match().filepattern("hdfs://path/to/*.gz"))
// withCompression can be omitted - by default compression is detected from the filename.
.apply(FileIO.readMatches().withCompression(GZIP))
.apply(MapElements
// uses imports from TypeDescriptors
.into(KVs(strings(), strings()))
.via((ReadableFile f) -> KV.of(
f.getMetadata().resourceId().toString(), f.readFullyAsUTF8String())));
Python (sdk 2.9.0):
For 2.9.0 for python you will need to collect the list of URI from outside of the Dataflow pipeline and feed it in as a parameter to the pipeline. For example making use of FileSystems to read in the list of files via a Glob pattern and then passing that to a PCollection for processing.
Once fileio see PR https://github.com/apache/beam/pull/7791/ is available, the following code would also be an option for python.
import apache_beam as beam
from apache_beam.io import fileio
with beam.Pipeline() as p:
readable_files = (p
| fileio.MatchFiles(‘hdfs://path/to/*.txt’)
| fileio.ReadMatches()
| beam.Reshuffle())
files_and_contents = (readable_files
| beam.Map(lambda x: (x.metadata.path,
x.read_utf8()))
One approach is to build a List<PCollection> where each entry corresponds to an input file, then use Flatten. For example, if you want to parse each line of a collection of files into a Foo object, you might do something like this:
public static class FooParserFn extends DoFn<String, Foo> {
private String fileName;
public FooParserFn(String fileName) {
this.fileName = fileName;
}
#Override
public void processElement(ProcessContext processContext) throws Exception {
String line = processContext.element();
// here you have access to both the line of text and the name of the file
// from which it came.
}
}
public static void main(String[] args) {
...
List<String> inputFiles = ...;
List<PCollection<Foo>> foosByFile =
Lists.transform(inputFiles,
new Function<String, PCollection<Foo>>() {
#Override
public PCollection<Foo> apply(String fileName) {
return p.apply(TextIO.Read.from(fileName))
.apply(new ParDo().of(new FooParserFn(fileName)));
}
});
PCollection<Foo> foos = PCollectionList.<Foo>empty(p).and(foosByFile).apply(Flatten.<Foo>pCollections());
...
}
One downside of this approach is that, if you have 100 input files, you'll also have 100 nodes in the Cloud Dataflow monitoring console. This makes it hard to tell what's going on. I'd be interested in hearing from the Google Cloud Dataflow people whether this approach is efficient.
I also had the 100 input files = 100 nodes on the dataflow diagram when using code similar to #danvk. I switched to an approach like this which resulted in all the reads being combined into a single block that you can expand to drill down into each file/directory that was read. The job also ran faster using this approach rather than the Lists.transform approach in our use case.
GcsOptions gcsOptions = options.as(GcsOptions.class);
List<GcsPath> paths = gcsOptions.getGcsUtil().expand(GcsPath.fromUri(options.getInputFile()));
List<String>filesToProcess = paths.stream().map(item -> item.toString()).collect(Collectors.toList());
PCollectionList<SomeClass> pcl = PCollectionList.empty(p);
for(String fileName : filesToProcess) {
pcl = pcl.and(
p.apply("ReadAvroFile" + fileName, AvroIO.Read.named("ReadFromAvro")
.from(fileName)
.withSchema(SomeClass.class)
)
.apply(ParDo.of(new MyDoFn(fileName)))
);
}
// flatten the PCollectionList, combining all the PCollections together
PCollection<SomeClass> flattenedPCollection = pcl.apply(Flatten.pCollections());
This might be a very late post for the above question, but I wanted to add answer with Beam bundled classes.
This could also be seen as an extracted code from the solution provided by #Reza Rokni.
PCollection<String> listOfFilenames =
pipe.apply(FileIO.match().filepattern("gs://apache-beam-samples/shakespeare/*"))
.apply(FileIO.readMatches())
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
(FileIO.ReadableFile file) -> {
String f = file.getMetadata().resourceId().getFilename();
System.out.println(f);
return f;
}));
pipe.run().waitUntilFinish();
Above PCollection<String> will have a list of files available at any provided directory.
I was struggling with the same use case while using wildcard to read files from GCS but also needed to modify the collection based on the file name.The key is to use ReadFromTextWithFilename instead of readfromtext In java you already have a way out and you can use:
String filename =context.element().getMetadata().resourceId().getCurrentDirectory().toString()
inside your processElement method.
But for Python below technique will work:
-> Use beam.io.ReadFromTextWithFilename for reading the wildcard path from GCS
-> As per the document, ReadFromTextWithFilename returns the file's name and the file's content.
Below is the code snippet:
class GetFileNameFromWildcard(beam.DoFn):
def process(self, element, *args, **kwargs):
file_path, content = element
schema = ["id","name","mob","email","dept","store"]
store_name = file_path.split("/")[-2]
content_list = content.split(",")
content_list.append(store_name)
out_dict = dict(zip(schema,content_list))
print(out_dict)
yield out_dict
def run():
pipeline_options = PipelineOptions()
with beam.Pipeline(options=pipeline_options) as p:
# saving main session so that it can load global namespace on the Cloud Dataflow Worker
init = p | 'Begin Pipeline With Initiator' >> beam.Create(
["pcollection initializer"]) | 'Read From GCS' >> beam.io.ReadFromTextWithFilename(
"gs://<bkt-name>/20220826/*/dlp*", skip_header_lines=1) | beam.ParDo(
GetFileNameFromWildcard()) | beam.io.WriteToText(
'df_out.csv')

Hostname Validator with Annotation Builder

I am currently using the Form Annotation builder with Zend Framework 2 (Latest 2.3.2).
I have a single form validator which does not want to play nice and I cannot find any example documentation on how to make the Hostname validator work properly when allowing local hostnames.
Here is a code snippet of the validator in question:
/**
* #Form\Type("text")
* #Form\Required(false)
* #Form\Options({"label":"name"})
* #Form\Attributes({"id":"name"})
* #Form\Filter({"name":"stringtrim"})
* #Form\Filter({"name":"stringtolower"})
* #Form\Validator({"name":"stringlength", "options":{"min":"1", "max":"254"}, "break_chain_on_failure":"true"})
* #Form\Validator({"name":"hostname", "options":{"allow":"\Zend\Validator\Hostname::ALLOW_LOCAL"}, "break_chain_on_failure":"true"})
* #Form\Validator({"name":"CompanyDns\Validator\DnsName", "break_chain_on_failure":"true"})
*/
public $name;
When the form attempts to validate using a local name I am getting the validators response of:
The input appears to be a local network name but local network names are not allowed
I am following the manual http://framework.zend.com/manual/2.3/en/modules/zend.form.quick-start.html#using-annotations
Any ideas what I am missing or can do to resolve this?
It appears that when using the Annotation Builder additional Hostname::* features do not get passed as one would expect.
So this line:
#Form\Validator({"name":"hostname", "options":{"allow":"\Zend\Validator\Hostname::ALLOW_LOCAL"}, "break_chain_on_failure":"true"})
Should actually read:
#Form\Validator({"name":"hostname", "options":{"allow":"4"}, "break_chain_on_failure":"true"})
If you look at the Hostname Validator class the 4 represents the ALLOW_LOCAL of the validator.
This should resolve the problem for you.

wicket: how to stream a resource from a database

I'm trying to generate a sitemap dynamically for a large web site with thousands of pages.
Yes, I have considered generating the sitemap file offline and simply serving it statically, and I might end up doing exactly that. But I think this is a generally useful question:
How can I stream large data from a DB in Wicket?
I followed the instructions at the Wicket SEO page, and was able to get a dynamic sitemap implementation working using a DataProvider. But it doesn't scale- it runs out of memory when it calls my DataProvider's iterator() method with a count arg equal to the total number of objects I'm returning, rather than iterating over them in chunks.
I think the solution lies somewhere with WebResource/ResourceStreamingRequestTarget. But those classes expect an IResourceStream, which ultimately boils down to providing an InputStream implementation, which deals in bytes, rather than DB records. I wouldn't know how to implement the length() method in such a case, as that would require visiting every record ahead of time to compute the overall length.
From the doc of the IResourceStream.length() method:
/**
* Gets the size of this resource in bytes
*
* TODO 1.5: rename to lengthInBytes() or let it return some sort of size object
*
* #return The size of this resource in the number of bytes, or -1 if unknown
*/
long length();
So I think it would be ok if your IResourceStream implementation tells that the length is unknown and you stream the data directly as you get the records from the database.
You could return -1, indicating an unknown length, or you could write the result in a memory buffer or disk, before rendering it to the client.
You could also use this file as a cache, so that you don't need to regenerate it every time this resource is requested (remember you have to handle concurrent requests, though). Dedicated caching solutions (e.g. memcache, ehcache, etc.) can also be considered.
It may be cleaner than publishing a static file, although static files are probably better if performance is critical.
I ended up using an AbsractResourceStreamWriter subclass:
public class SitemapStreamWriter extends AbstractResourceStreamWriter
{
#Override
public void write(OutputStream output)
{
String HEAD = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\"\n" +
" xmlns:wicket=\"http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd\">\n";
try
{
output.write(HEAD.getBytes());
// write out a <loc> entry for each of my pages here
output.write("</urlset>\n".getBytes());
}
catch (IOException e)
{
throw new RuntimeException(e.getMessage(), e);
}
}
}

MbUnit's row attribute in NUnit?

While reading an Asp.Net MVC code sample that used MbUnit as it's testing framework, I saw that it was possible to run a single test against multiple input possibilities by using a Row attribute, like so:
[Test]
[Row("test#test_test.com")]
[Row("sdfdf dsfsdf")]
[Row("sdfdf#.com")]
public void Invalid_Emails_Should_Return_False(string invalidEmail)
{
...
}
Please I'd like to know if there is an NUnit equivalent of MbUnit's Row attribute , or otherwise an elegant way to achieve this in NUnit. Thanks.
I think you're after the TestCase attribute
[TestCase(12,3,4)]
[TestCase(12,2,6)]
[TestCase(12,4,3)]
public void DivideTest(int n, int d, int q)
{
Assert.AreEqual( q, n / d );
}
http://www.nunit.com/index.php?p=testCase&r=2.5.7
NUnits Sequential attribute does exactly that.
The SequentialAttribute is used on a
test to specify that NUnit should
generate test cases by selecting
individual data items provided for the
parameters of the test, without
generating additional combinations.
Note: If parameter data is provided by
multiple attributes, the order in
which NUnit uses the data items is not
guaranteed. However, it can be
expected to remain constant for a
given runtime and operating system.
Example The following test will be
executed three times, as follows:
MyTest(1, "A")
MyTest(2, "B")
MyTest(3, null)
[Test, Sequential]
public void MyTest(
[Values(1,2,3)] int x,
[Values("A","B")] string s)
{
...
}
Given your example, this would become
[Test, Sequential]
public void IsValidEmail_Invalid_Emails_Should_Return_False(
[Values("test#test_test.com"
, "sdfdf dsfsdf"
, "sdfdf#.com")] string invalidEmail)
{
...
}

Resources