Extract file content with ManifoldCF - apache-tika

I'm trying to use ManifoldCF with the File System Connector.
It works like a charm : with the Tika content extractor implemented, I got all the expected metadata from my documents.
But...
How to configure ManifoldCF in order to get the equivalent of this command :
java -jar tika-app-1.9.jar --text
I mean, I want to get the CONTENT of the file and pushed it in my Output Connections. How is it possible ?

You have to set up the transformer in pipeline. Before you configure your output connector add Tika transformer . With this setup you should be able get your meta data extracted based on your document type, eventually you should see that content and meta-data fed into your output connector (eg. solr)

Related

How do I convert .bin code property graph to json?

How can I convert a code property graph(cpg) obtained from joern (https://joern.io/) from .bin format to .json format for feeding it to a graph machine learning library for classification.
Note: CPG = AST + Control Flow Graph + Program Dependency Graph
Task: Machine Learning on Source Code.
You can use scala script 'graph-for-funcs.sc' which is included in the joern scripts directory. However you need to redirect the output in order to store it in file (since the output goes to stdout by default).
I made a custom script to do so.

JMeter v.5+ - how to change retrieved embedded resources format in JMeter v5.2.1 to get full urls?

In JMeter 5.2.1, by default I have response with embedded resources in abstract format like that:
domain/path
domain/path-0
domain/path-1
domain/path-2
domain/path-3
...
(etc.)
I need to see normal URLs to each embedded resource instead of these abstract suffixes with -0,-1,-2,-3 etc. (like it worked in JMeter v.3).
How is it possible to set up, to have embedded resources in format of full URLs for each embedded resource?
Could you please give me a tip or lifehack for that for JMeter v.5+?
If this is something you really need, you can add the next line to user.properties file (it lives under "bin" folder of your JMeter installation):
subresults.disable_renaming=true
For one-time usage the property can be overridden via -J command-line argument like:
jmeter -Jsubresults.disable_renaming=true -n -t test.jmx -l result.jtl
Check out Apache JMeter Properties Customization Guide article for more information on JMeter properties and ways of setting/overriding them. You might also be interested in Settings that affect SampleResults chapter of JMeter Properties Reference
However you should be doing this only if you plan to use JMeter for some form of functional testing because it will break the logic of HTTP Request sampler elapsed time calculation in the HTML Reporting Dashboard

Unable to see 快乐 characters in HTTP Response Data tab or in Debug Sampler

As you can see in the screenshots, I am setting some user defined variables using 2 byte characters. I'm submitting the HTTP request to create this customer using UTF-8 encoding. The customer is being created with the correct double byte character characters because I can see them in the web app and in the DB. The problem is that I cannot see them in jmeter. It either shows little boxes or ??? question marks instead of the characters in the response data and in the debug sampler. The User defined variables is showing the characters correctly. I've added this to my user.properties file but that did not help:
sampleresult.default.encoding=UTF-8
How can I see these special characters in the response so I can Assert the record was created correctly? Any advice is appreciated. I am using jmeter 3.1 and JSON endpoints.
User Defined variables
DebugSampler
Now you need to ensure that JMeter itself is using UTF-8 encoding.
Configure Debug Sampler to show System Properties
Look for file.endoding property
If you see something different - override the existing property value using one of the following ways:
Permanent: add the next line to system.properties file (in JMeter's "bin" folder)
file.encoding=UTF-8
JMeter restart will be required to pick the property up
Temporary: set the "file.encoding" property via -D command-line argument as
jmeter -Dfile.encoding=UTF-8 -n -t ....
References:
Java - Supported Encodings
Apache JMeter Properties Customization Guide

Jmeter doesn't save response data or headers

I'm building some simple load testing for my API, and to make sure everything is on the up and up I'd like to also review the response headers and data. But when I run my test using the command line and then re-open the GUI to add a View Results Tree listener and load the created file the response headers or response data is empty.
I entered the following values into user.properties (also tried uncommenting those values in jmeter.properties and changing them there, same result)
jmeter.save.saveservice.output_format=csv (tried xml, omitting it, jtl)
jmeter.save.saveservice.data_type=false
jmeter.save.saveservice.label=true
jmeter.save.saveservice.response_code=true
jmeter.save.saveservice.response_data.on_error=true
jmeter.save.saveservice.response_message=true
jmeter.save.saveservice.successful=true
jmeter.save.saveservice.thread_name=true
jmeter.save.saveservice.time=true
jmeter.save.saveservice.subresults=false
jmeter.save.saveservice.assertions=false
jmeter.save.saveservice.latency=true
jmeter.save.saveservice.bytes=true
jmeter.save.saveservice.hostname=true
jmeter.save.saveservice.thread_counts=true
jmeter.save.saveservice.sample_count=true
jmeter.save.saveservice.response_message=true
jmeter.save.saveservice.assertion_results_failure_message=true
jmeter.save.saveservice.timestamp_format=HH:mm:ss
jmeter.save.saveservice.default_delimiter=;
jmeter.save.saveservice.print_field_names=true
But still no luck when opening the result file. I tried declaring the file after the -l tag as results.csv, .jtl, even .xml but none of them show me the headers and data.
I'm running it locally on Mac OS X 10.10 using the following command, jmeter version is 2.12
java -jar ApacheJMeter.jar -n -t /Users/[username]/Documents/API_test.jmx -l results_15.jtl
I don't know if it's not even saving that data, or if the Listeners can't read it or if I've been cursed but any help is appreciated.
It works fine if I add a Listener and run it using the GUI, but if I try to run my larger tests that way, well, things don't end well for anyone.
So my question is:
How do I save the response header and data to a file when using the command line, and how do I then view said file in jmeter?
Add a Simple Data Writer (under Listeners) and output to a file (NB: different file than your log). Under the 'configure' button, there are all sorts of options of what to save. One of the check boxes is Save Response Header.
This file can get huge if you're saving a bunch of things for every request- one strategy is to check everything, but only save for errors. But you can do whatever works for you.
You can also turn on "Functional Test Mode" which will produce a large file but will contain pretty much anything you might need to debug your test.
Beware, this can create a very large JTL file, so don't forget to turn it off for your large test runs! See JMeter Maven mojo throws IllegalArgumentException with large JTL file
Alternatively use a Tree View Listener in the GUI for a small sample of the requests and check the request/response in the GUI (including headers) to debug or check your test.
Add Below lines in user.properties file
jmeter.save.saveservice.output_format=xml
jmeter.save.saveservice.response_data=true
jmeter.save.saveservice.samplerData=true
jmeter.save.saveservice.requestHeaders=true
jmeter.save.saveservice.url=true
Restart cmd prompt.

How to use flume for uploading zip files to hdfs sink

I am new to flume.My flume agent having source as http server,from where it getting zip files(compressed xml files) on regular interval.This zip files are very small (less than 10 mb) and i want to put the zip files extracted into the hdfs sink.Please share some idea how to do this.Do i have to go for a custom interceptor.
Flume will try to read your files line by line, except if you configure a specific deserializer. A deserializer lets you control how the file is parsed and split into events. You could of course follow the example of the blob deserizalizer, which is designed for PDFs and such, but I understand that you actually want to unpack them and then read them line by line. In that case you would need to write a custom deserializer which reads Zip and writes line by line events.
Here's the reference in the documentation:
https://flume.apache.org/FlumeUserGuide.html#event-deserializers

Resources