Detect an Image using apache Tika in any document? - apache-tika

I am using Apache Tika for extracting the content of uploaded files and I do not want to parse files which are having embedded image/s. As of now, I am using ToXMLContentHandler and try to find <img> tag.
val parser = new AutoDetectParser()
val handler = new ToXMLContentHandler()
val metaData = new Metadata
parser.parse(stream, handler, metaData, getParseContext)
val xmlFileContent = XML.loadString(handler.toString)
val isDocHasImg = (xmlFileContent \\ "body" \\ "img").toList.nonEmpty
Is there any better solution to achieve this? I am using Scala.

If anyone is looking for solution, you can make use of EmbeddedDocumentExtractor class.
class EmbeddedImageFinder() extends EmbeddedDocumentExtractor {
override def shouldParseEmbedded(metadata: Metadata): Boolean = {
if(metadata.get("Content-Type").contains("image/")) {
isImageExists = true
}
false
}
override def parseEmbedded(stream: InputStream, handler: ContentHandler,
metadata: Metadata, outputHtml: Boolean): Unit = {}
}
then add this to the ParserContext
context.set(classOf[EmbeddedDocumentExtractor], new EmbeddedImageFinder)

Related

Jenkins scripted Pipeline: How to apply #NonCPS annotation in this specific case

I am working on a scripted Jenkins-Pipeline that needs to write a String with a certain encoding to a file as in the following example:
class Logger implements Closeable {
private final PrintWriter writer
[...]
Logger() {
FileWriter fw = new FileWriter(file, true)
BufferedWriter bw = new BufferedWriter(fw)
this.writer = new PrintWriter(bw)
}
def log(String msg) {
try {
writer.println(msg)
[...]
} catch (e) {
[...]
}
}
}
The above code doesn't work since PrintWriter ist not serializable so I know I got to prevent some of the code from being CPS-transformed. I don't have an idea on how to do so, though, since as far as I know the #NonCPS annotation can only be applied to methods.
I know that one solution would be to move all output-related code to log(msg) and annotate the method but this way I would have to create a new writer every time the method gets called.
Does someone have an idea on how I could fix my code instead?
Thanks in advance!
Here is a way to make this work using a log function that is defined in a shared library in vars\log.groovy:
import java.io.FileWriter
import java.io.BufferedWriter
import java.io.PrintWriter
// The annotated variable will become a private field of the script class.
#groovy.transform.Field
PrintWriter writer = null
void call( String msg ) {
if( ! writer ) {
def fw = new FileWriter(file, true)
def bw = new BufferedWriter(fw)
writer = new PrintWriter(bw)
}
try {
writer.println(msg)
[...]
} catch (e) {
[...]
}
}
After all, scripts in the vars folder are instanciated as singleton classes, which is perfectly suited for a logger. This works even without #NonCPS annotation.
Usage in pipeline is as simple as:
log 'some message'

Using Vaadin components in a kotlin js project

This question is about a Kotlin JS project which uses the Kotlin Frontend Plugin.
I want to use some UI components from the Vaadin Components library.
I have two questions about this:
(1) What would be the best way to include web components in Kotlin JS
=> for my complete code, see the link to the source below. In summary the relevant details are:
build.gradle.kts
kotlinFrontend {
npm {
dependency("#vaadin/vaadin-grid")
}
}
vaadin.grid.Imports.kt
#file:JsModule("#vaadin/vaadin-grid")
#file:JsNonModule
package vaadin.grid
external class GridElement {
companion object
}
Why the companion object? I need it for the workaround (see below).
foo.kt
fun main() {
document.getElementById("container")!!.append {
vaadin_grid {
attributes["id"] = "grid"
}
}
initUI()
}
fun initUI() {
// Force the side-effects of the vaadin modules. Is there a better way?
console.log(GridElement)
val grid = document.querySelector("#grid") /* ?? as GridElement ?? */
}
The console.log is the ugly workaround trick I want to avoid. If I don't do anything with GridElement then it's just not included in my bundle.
The vaadin_grid DSL is defined as a custom kotlinx.html tag which is unrelated code.
(2) I want to keep my code as typed as possible to avoid asDynamic but when I cast the HTMLElement to a Vaadin Element I get ClassCastExceptions (because GridElement is undefined).
For example I want to write something like this:
val grid : GridElement = document.querySelector("#grid") as GridElement
grid.items = ... // vs grid.asDynamic().items which does work
Here is how I define the external GridElement
vaadin/button/Imports.kt
#file:JsModule("#vaadin/vaadin-grid")
#file:JsNonModule
package vaadin.grid
import org.w3c.dom.HTMLElement
abstract external class GridElement : HTMLElement {
var items: Array<*> = definedExternally
}
build/node_modules/#vaadin/vaadin-grid/src/vaadin-grid.js
...
customElements.define(GridElement.is, GridElement);
export { GridElement };
Source example
To run:
From the root of the git repo:
./gradlew 05-kt-frontend-vaadin:build && open 05-kt-frontend-vaadin/frontend.html
I found the answer(s)
For the first question
(1) What would be the best way to include web components in Kotlin JS
Instead of the console.log to trigger the side effects I use require(...)
external fun require(module: String): dynamic
fun main() {
require("#vaadin/vaadin-button")
require("#vaadin/vaadin-text-field")
require("#vaadin/vaadin-grid")
...
}
(credits to someone's answer on the kotlin-frontend-plugin list)
(2) I want to keep my code as typed as possible to avoid asDynamic
Instead of importing #vaadin/vaadin-grid I have to import the file which actually exposes the element. Then it seems to work and I can even add generics to my GridElement:
#file:JsModule("#vaadin/vaadin-grid/src/vaadin-grid")
#file:JsNonModule
package vaadin.grid
import org.w3c.dom.HTMLElement
abstract external class GridElement<T> : HTMLElement {
var items: Array<out T> = definedExternally
}
This way I was able to get rid of all the asDynamics
val firstNameField = document.querySelector("#firstName") as TextFieldElement?
val lastNameField = document.querySelector("#lastName") as TextFieldElement?
val addButton = document.querySelector("#addButton") as ButtonElement?
val grid = document.querySelector("#grid") as GridElement<Person>?
val initialPeople: Array<out Person> = emptyArray()
grid?.items = initialPeople
addButton?.addEventListener("click", {
// Read the new person's data
val person = Person(firstNameField?.value, lastNameField?.value)
// Add it to the items
if(grid != null){
val people = grid.items
grid.items = people.plus(person)
}
// Reset the form fields
firstNameField?.value = ""
lastNameField?.value = ""
})

Interception Using StructureMap 3.*

I've done interception using Castle.DynamicProxy and StructureMap 2.6 API but now can't do it using StructureMap 3.0. Could anyone help me find updated documentation or even demo? Everything that I've found seems to be about old versions. e.g. StructureMap.Interceptors.TypeInterceptor interface etc.
HAHAA! I f***in did it! Here's how:
public class ServiceSingletonConvention : DefaultConventionScanner
{
public override void Process(Type type, Registry registry)
{
base.Process(type, registry);
if (type.IsInterface || !type.Name.ToLower().EndsWith("service")) return;
var pluginType = FindPluginType(type);
var delegateType = typeof(Func<,>).MakeGenericType(pluginType, pluginType);
// Create FuncInterceptor class with generic argument +
var d1 = typeof(FuncInterceptor<>);
Type[] typeArgs = { pluginType };
var interceptorType = d1.MakeGenericType(typeArgs);
// -
// Create lambda expression for passing it to the FuncInterceptor constructor +
var arg = Expression.Parameter(pluginType, "x");
var method = GetType().GetMethod("GetProxy").MakeGenericMethod(pluginType);
// Crate method calling expression
var methodCall = Expression.Call(method, arg);
// Create the lambda expression
var lambda = Expression.Lambda(delegateType, methodCall, arg);
// -
// Create instance of the FuncInterceptor
var interceptor = Activator.CreateInstance(interceptorType, lambda, "");
registry.For(pluginType).Singleton().Use(type).InterceptWith(interceptor as IInterceptor);
}
public static T GetProxy<T>(object service)
{
var proxyGeneration = new ProxyGenerator();
var result = proxyGeneration.CreateInterfaceProxyWithTarget(
typeof(T),
service,
(Castle.DynamicProxy.IInterceptor)(new MyInterceptor())
);
return (T)result;
}
}
The problem here was that SM 3.* allows interception for known types, i.e. doing something like this:
expression.For<IService>().Use<Service>().InterceptWith(new FuncInterceptor<IService>(service => GetProxyFrom(service)));
But what if you'd like to include the interception logic inside your custom scanning convention where you want to intercept all instances of type with specific signature (types having name ending on 'service', in my case)?
That's what I've accomplished using Expression API and reflection.
Also, I'm using here Castle.DinamicProxy for creating proxy objects for my services.
Hope someone else will find this helpful :)
I find the best place to go for any new versions is directly to the source.
If it's written well, then it will include test cases. Thankfully structuremap does include test cases.
You can explore the tests here
In the meantime I've written an example of an Activator Interceptor, and how to configure it.
static void Main()
{
ObjectFactory.Configure(x =>
{
x.For<Form>().Use<Form1>()
.InterceptWith(new ActivatorInterceptor<Form1>(y => Form1Interceptor(y), "Test"));
});
Application.Run(ObjectFactory.GetInstance<Form>());
}
public static void Form1Interceptor(Form f)
{
//Sets the title of the form window to "Testing"
f.Text = "Testing";
}
EDIT:
How to use a "global" filter using PoliciesExpression
[STAThread]
static void Main()
{
ObjectFactory.Configure(x =>
{
x.Policies.Interceptors(new InterceptorPolicy<Form>(new FuncInterceptor<Form>(y => Intercept(y))));
});
Application.Run(ObjectFactory.GetInstance<Form>());
}
private static Form Intercept(Form form)
{
//Do the interception here
form.Text = "Testing";
return form;
}

symfony 1.4 - is there a way to get text only of renderError() in a template?

I am trying to learn symfony.. can someone please tell me if there is a similar method like renderErrorText()? because this method renderError() outputs html with an ul/li tag. I just want the text of the error and not its html.
You need to define your own decorator. The renderError use a preformatted format to display an html renderer.
Create a file like lib/form/sfWidgetFormSchemaFormatterMyFormatter.class.php:
<?php
class sfWidgetFormSchemaFormatterMyFormatter extends sfWidgetFormSchemaFormatter
{
protected
$rowFormat = '',
$helpFormat = '%help%',
$errorRowFormat = '%errors%',
$errorListFormatInARow = " <ul class=\"error_list\">\n%errors% </ul>\n",
$errorRowFormatInARow = " <li>%error%</li>\n",
$namedErrorRowFormatInARow = " <li>%name%: %error%</li>\n",
$decoratorFormat = '',
$widgetSchema = null,
$translationCatalogue = null;
And inside this class, you can define how to format and render an error.
Then, you will need to define this class a the default formatter for your form. If this is about your frontend environment, in apps/backend/config/frontendConfiguration.class.php:
class frontendConfiguration extends sfApplicationConfiguration
{
public function configure()
{
sfWidgetFormSchema::setDefaultFormFormatterName('MyFormatter');
}
}
Or if you want to it to be global to your project, in config/ProjectConfiguration.class.php:
class ProjectConfiguration extends sfProjectConfiguration
{
public function setup()
{
sfWidgetFormSchema::setDefaultFormFormatterName('MyFormatter');
}
}
You will find :
a documented example here
an example for twitter bootstrap

JAXB annotation need to be known to bind xml with my class

I have the following classes
#XmlRootElement(name = "ExecutionRequest")
#XmlAccessorType(XmlAccessType.FIELD)
public class ExecutionRequest {
#XmlElement(name="Command")
private String command;
#XmlElementWrapper(name="ExecutionParameters")
#XmlElement(name="ExecutionParameter")
private ArrayList<ExecutionParameter> ExecutionParameters;
}
#XmlRootElement
#XmlAccessorType(XmlAccessType.FIELD)
public class ExecutionParameter {
#XmlElement(name = "Key")
private String key;
#XmlElement(name = "Value")
private String value;
}
and when I marshall the ExecutionRequest object, I get the following XML -
<ExecutionRequest>
<Command>RetrieveHeader</Command>
<ExecutionParameters>
<ExecutionParameter>
<Key>tid</Key>
<Value>ASTLGA-ALTE010220101</Value>
</ExecutionParameter>
<ExecutionParameter>
<Key>ctag</Key>
<Value>dq</Value>
</ExecutionParameter>
</ExecutionParameters>
</ExecutionRequest>
It is working fine as per JAXB binding.
But I want the XML to have all key value collection within one ExecutionParameter like -
<ExecutionRequest>
<Command>RetrieveHeader</Command>
<ExecutionParameters>
<ExecutionParameter>
<Key>tid</Key>
<Value>ASTLGA-ALTE010220101</Value>
<Key>ctag</Key>
<Value>dq</Value>
</ExecutionParameter>
</ExecutionParameters>
</ExecutionRequest>
Is there any way to obtain xml like this by changing annotation.
Let me know in case of clarifications.
Thanks in advance.
There isn't metadata for that. You could get a compact XML representation (that is easily parseable) by mapping key and value with #XmlAttribute.
<ExecutionParameters>
<ExecutionParameter Key="a" Value="b"/>
<ExecutionParameter Key="c" Value="d"/>
</ExecutionParameters>
UPDATE
If you have to support this XML format, then you could use JAXB with XSLT to get the desired result:
// Create Transformer
TransformerFactory tf = TransformerFactory.newInstance();
StreamSource xslt = new StreamSource(
"src/example/stylesheet.xsl");
Transformer transformer = tf.newTransformer(xslt);
// Source
JAXBContext jc = JAXBContext.newInstance(ExecutionRequest.class);
JAXBSource source = new JAXBSource(jc, request);
// Result
StreamResult result = new StreamResult(System.out);
// Transform
transformer.transform(source, result);
For More Information
http://blog.bdoughan.com/2012/11/using-jaxb-with-xslt-to-produce-html.html

Resources