Reading and saving binary image from OpenLDAP server using Groovy - grails

I'm trying to save an image from an OpenLDAP server. It's in binary format and all my code appears to work, however, the image is corrupted.
I then attempted to do this in PHP and was successful, but I'd like to do it in a Grails project.
PHP Example (works)
<?php
$conn = ldap_connect('ldap.example.com') or die("Could not connect.\n");
ldap_set_option($conn, LDAP_OPT_PROTOCOL_VERSION, 3);
$dn = 'ou=People,o=Acme';
$ldap_rs = ldap_bind($conn) or die("Can't bind to LDAP");
$res = ldap_search($conn,$dn,"someID=123456789");
$info = ldap_get_entries($conn, $res);
$entry = ldap_first_entry($conn, $res);
$jpeg_data = ldap_get_values_len( $conn, $entry, "someimage-jpeg");
$jpeg_filename = '/tmp/' . basename( tempnam ('.', 'djp') );
$outjpeg = fopen($jpeg_filename, "wb");
fwrite($outjpeg, $jpeg_data[0]);
fclose ($outjpeg);
copy ($jpeg_filename, '/some/dir/test.jpg');
unlink($jpeg_filename);
?>
Groovy Example (does not work)
def ldap = org.apache.directory.groovyldap.LDAP.newInstance('ldap://ldap.example.com/ou=People,o=Acme')
ldap.eachEntry (filter: 'someID=123456789') { entry ->
new File('/Some/dir/123456789.jpg').withOutputStream {
it.write entry.get('someimage-jpeg').getBytes() // File is created, but image is corrupted (size also doesn't match the PHP version)
}
}
How would I tell the Apache LDAP library that "image-jpeg" is actually binary and not a String? Is there a better simple library available to read binary data from an LDAP server? From looking at the Apache mailing list, someone else had a similar issue, but I couldn't find a resolution in the thread.
Technology Stack
Grails 2.2.1
Apache LDAP API 1.0.0 M16

Have you checked whether the image attribute value is base-64 encoded?

I found the answer. The Apache Groovy LDAP library uses JNDI under the hood. When using JNDI certain entries are automatically read as binary, but if your LDAP server uses a custom name, the library will not know that it's binary.
For those people that come across this problem using Grails, here's the steps to set a specific entry to binary format.
Create a new properties file call "jndi.properties" and add it to your grails-app/conf directory (all property files in this folder are automatically included in the classpath)
Add a line in the properties file with the name of the image variable:
java.naming.ldap.attributes.binary=some_custom_image
Save the file and run the Grails application
Here is some sample code to save a binary entry to a file.
def ldap = LDAP.newInstance('ldap://some.server.com/ou=People,o=Acme')
ldap.eachEntry (filter: 'id=1234567') { entry ->
new File('/var/dir/something.jpg').withOutputStream {
it.write entry.image
}
}

Related

Jenkins...Modify XML Tag value in xml file using Groovy in Jenkins

I am using jenkins for automated deployment.
I needs to modify xml tag value in xml file using groovy script. I am using below groovy code. When I try to edit xml tag value I am receiving error unclassified field xml.uti.node error.
Node xml = xmlParser.parse(new File("c:/abc/test.xml"))
xml.DeployerServer.host[0] = '172.20.204.49:7100'
FileWriter fileWriter = new FileWriter("c:/abc/test.xml")
XmlNodePrinter nodePrinter = new XmlNodePrinter(new PrintWriter(fileWriter))
nodePrinter.setPreserveWhitespace(true)
nodePrinter.print(xml)
I need to modify host tag value and host is available inside DeployerServer tag.
Any help will be much appreciated.
Here is the script, comments in-line:
//Create file object
def file = new File('c:/abc/test.xml')
//Parse it with XmlSlurper
def xml = new XmlSlurper().parse(file)
//Update the node value using replaceBody
xml.DeployerServer.host[0].replaceBody '172.20.204.49:7100'
//Create the update xml string
def updatedXml = groovy.xml.XmlUtil.serialize(xml)
//Write the content back
file.write(updatedXml)
I was wanting to read / manipulate the CSProj file and NUSPEC files in a Pipeline script. I could not get passed the parseText() without the dreaded "SAXParseException: Content is not allowed in prolog".
There are quite a few threads about this error message. What wasn't clear is that both CSProj and NUSPEC files are UTF-8 with BOM - BUT this is invisible!
To make it worse I've been trying to automate the NUSPEC file creation, and there is no way I can tell the tools to change file encoding.
The answers above helped solve my issue, and once I added code to look for 65279 as the first character (and deleted it). I could then parse the XML and carry out the above.
There didn't seem to be good thread to put this summary on, so added it to a thread about Jenkins, Groovy & XML files which is where I found this "known Java" issue.
I used powershell to do this change in app.config file.
My problem was with passwords. So, I created a Credential, in jenkins, to store the password.
If you do not need to work with credential, just remove the withCredentials section
Here is part of my jenkinsfile:
def appConfigPath = "\\server\folder\app.config"
stage('Change App.Config'){
steps{
withCredentials([string(credentialsId: 'CREDENTIAL_NAME', variable: 'PWD')]) {
powershell(returnStdout: true, script: '''
Function swapAppSetting {
param([string]$key,[string]$value )
$obj = $doc.configuration.appSettings.add | where {$_.Key -eq $key }
$obj.value = $value
}
$webConfig = "'''+appConfigPath+'''"
$doc = [Xml](Get-Content $webConfig)
swapAppSetting 'TAG_TO_MODIFY' 'VALUE_TO_CHANGE'
$doc.Save($webConfig)
''')
}
}
}
Don`t forget to update your powershell. (minimum version 3)

Jenkins Continuous Integration with Amazon S3 - Everything is uploading to the root?

I'm running Jenkins and I have it successfully working with my GitHub account, but I can't get it working correctly with Amazon S3.
I installed the S3 plugin and when I run a build it successfully uploads to the S3 bucket I specify, but all of the files uploaded end up in the root of the bucket. I have a bunch of folders (such as /css /js and so on), but all of the files in those folders from hithub end up in the root of my S3 account.
Is it possible to get the S3 plugin to upload and retain the folder structure?
It doesn't look like this is possible. Instead, I'm using s3cmd to do this. You must first install it on your server, and then in one of the bash scripts within a Jenkins job you can use:
s3cmd sync -r -P $WORKSPACE/ s3://YOUR_BUCKET_NAME
That will copy all of the files to your S3 account maintaining the folder structure. The -P keeps read permissions for everyone (needed if you're using your bucket as a web server). This is a great solution using the sync feature, because it compares all your local files against the S3 bucket and only copies files that have changed (by comparing file sizes and checksums).
I have never worked with the S3 plugin for Jenkins (but now that I know it exists, I might give it a try), though, looking at the code, it seems you can only do what you want using a workaround.
Here's what the actual plugin code does (taken from github) --I removed the parts of the code that are not relevant for the sake of readability:
class hudson.plugins.s3.S3Profile, method upload:
final Destination dest = new Destination(bucketName,filePath.getName());
getClient().putObject(dest.bucketName, dest.objectName, filePath.read(), metadata);
Now if you take a look into hudson.FilePath.getName()'s JavaDoc:
Gets just the file name portion without directories.
Now, take a look into the hudson.plugins.s3.Destination's constructor:
public Destination(final String userBucketName, final String fileName) {
if (userBucketName == null || fileName == null)
throw new IllegalArgumentException("Not defined for null parameters: "+userBucketName+","+fileName);
final String[] bucketNameArray = userBucketName.split("/", 2);
bucketName = bucketNameArray[0];
if (bucketNameArray.length > 1) {
objectName = bucketNameArray[1] + "/" + fileName;
} else {
objectName = fileName;
}
}
The Destination class JavaDoc says:
The convention implemented here is that a / in a bucket name is used to construct a structure in the object name. That is, a put of file.txt to bucket name of "mybucket/v1" will cause the object "v1/file.txt" to be created in the mybucket.
Conclusion: the filePath.getName() call strips off any prefix (S3 does not have any directory, but rather prefixes, see this and this threads for more info) you add to the file. If you really need to put your files into a "folder" (i.e. having a specific prefix that contains a slash (/)), I suggest you to add this prefix to the end of your bucket name, as explicited in the Destination class JavaDoc.
Yes this is possible.
It looks like for each folder destination, you'll need a separate instance of the S3 plugin however.
"Source" is the file you're uploading.
"Destination bucket" is where you place your path.
Using Jenkins 1.532.2 and S3 Publisher Plug-In 0.5, the UI configure Job screen rejects additional S3 publish entries. There would also be a significant maintenance benefit to us if the plugin recreated the workspace directory structure as we'll have many directories to create.
Set up your git plugin.
Set up your Bash script
All in your folder marked as "*" will go to bucket

How do I save the origin html file with Apache Nutch

I'm new to search engines and web crawlers. Now I want to store all the original pages in a particular web site as html files, but with Apache Nutch I can only get the binary database files. How do I get the original html files with Nutch?
Does Nutch support it? If not, what other tools can I use to achieve my goal.(The tools that support distributed crawling are better.)
Well, nutch will write the crawled data in binary form so if if you want that to be saved in html format, you will have to modify the code. (this will be painful if you are new to nutch).
If you want quick and easy solution for getting html pages:
If the list of pages/urls that you intend to have is quite low, then better get it done with a script which invokes wget for each url.
OR use HTTrack tool.
EDIT:
Writing a your own nutch plugin will be great. Your problem will get solved plus you can contribute to nutch by submitting your work !!! If you are new to nutch (in terms of code & design), then you will have to invest lot of time building a new plugin ... else its easy to do.
Few pointers for helping your initiative:
Here is a page which talks about writing own nutch plugin.
Start with Fetcher.java. See lines 647-648. That is the place where you can get the fetched content on per url basis (for those pages which got fetched successfully).
pstatus = output(fit.url, fit.datum, content, status, CrawlDatum.STATUS_FETCH_SUCCESS);
updateStatus(content.getContent().length);
You should add code right after this to invoke your plugin. Pass content object to it. By now, you would have guessed that content.getContent() is the content for url you want. Inside the plugin code, write it to some file. Filename should be based on the url name else it will be difficult to work with that. Url can be obtained by fit.url.
You must do modifications in run Nutch in Eclipse.
When you are able to run, open Fetcher.java and add the lines between "content saver" command lines.
case ProtocolStatus.SUCCESS: // got a page
pstatus = output(fit.url, fit.datum, content, status, CrawlDatum.STATUS_FETCH_SUCCESS, fit.outlinkDepth);
updateStatus(content.getContent().length);'
//------------------------------------------- content saver ---------------------------------------------\\
String filename = "savedsites//" + content.getUrl().replace('/', '-');
File file = new File(filename);
file.getParentFile().mkdirs();
boolean exist = file.createNewFile();
if (!exist) {
System.out.println("File exists.");
} else {
FileWriter fstream = new FileWriter(file);
BufferedWriter out = new BufferedWriter(fstream);
out.write(content.toString().substring(content.toString().indexOf("<!DOCTYPE html")));
out.close();
System.out.println("File created successfully.");
}
//------------------------------------------- content saver ---------------------------------------------\\
To update this answer -
It is possible to post process the data from your crawldb segment folder, and read in the html (including other data nutch has stored) directly.
Configuration conf = NutchConfiguration.create();
FileSystem fs = FileSystem.get(conf);
Path file = new Path(segment, Content.DIR_NAME + "/part-00000/data");
SequenceFile.Reader reader = new SequenceFile.Reader(fs, file, conf);
try
{
Text key = new Text();
Content content = new Content();
while (reader.next(key, content))
{
System.out.println(new String(content.GetContent()));
}
}
catch (Exception e)
{
}
The answers here are obsolete. Now, it is simply possible to get the plain HTML-files with nutch dump. Please see this answer.
In apache Nutch 2.3.1
You can save the raw HTML by edit the Nutch code firstly run the nutch in eclipse by following https://wiki.apache.org/nutch/RunNutchInEclipse
After you finish ruunning nutch in eclipse edit file FetcherReducer.java , add this code to the output method, run ant eclipse again to rebuild the class
Finally the raw html will added to reportUrl column in your database
if (content != null) {
ByteBuffer raw = fit.page.getContent();
if (raw != null) {
ByteArrayInputStream arrayInputStream = new ByteArrayInputStream(raw.array(), raw.arrayOffset() + raw.position(), raw.remaining());
Scanner scanner = new Scanner(arrayInputStream);
scanner.useDelimiter("\\Z");//To read all scanner content in one String
String data = "";
if (scanner.hasNext()) {
data = scanner.next();
}
fit.page.setReprUrl(StringUtil.cleanField(data));
scanner.close();
}

Looking for image in HDD rather in context

I have multimodule project
Project
|--src
|-JavaFile.java
Web-Project
|-Web-Content
|-images
| |-logo.PNG
|-pages
|-WEB-INF
regular java module - contains src with all java files
dynamic web project module - contains all web related stuff
eventually regular java module goes as a jar file in dynamic web module in lib folder
Problem
java file after compilation looks for an image file in c:\ibm\sdp\server completepath\logo.png rather in context. File is defined in java file as below for iText:
Image logo = Image.getInstance("/images/logo.PNG");
Please suggest how can I change my java file to refer to image. I am not allowed to change my project structure.
You need to use ServletContext#getResource() or, better, getResourceAsStream() for that. It returns an URL respectively an InputStream of the resource in the web content.
InputStream input = getServletContext().getResourceAsStream("/images/logo.PNG");
// ...
This way you're not dependent on where (and how!) the webapp is been deployed. Relying on absolute disk file system paths would only end up in portability headache.
See also:
getResourceAsStream() vs FileInputStream
Update: as per the comments, you seem to be using iText (you should have clarified that a bit more in the question, I edited it). You can then use the Image#getInstance() method which takes an URL:
URL url = getServletContext().getResource("/images/logo.PNG");
Image image = Image.getInstance(url);
// ...
Update 2: as per the comments, you turn out to be sitting in the JSF context (you should have clarified that as well in the question). You should use ExternalContext#getResource() instead to get the URL:
URL url = FacesContext.getCurrentInstance().getExternalContext().getResource("/images/logo.PNG");
Image image = Image.getInstance(url);
// ...

How to call input file which is qlready in the package

In my Hadoop Map Reduce application I have one input file.I want that when I execute the jar of my application, then the input file will automatically be called.To do this I code one class to specify the input,output and file itself but from where I am calling the file, there I want to specify the file path. To do that I have used this code:
QueriesTest.class.getResourceAsStream("/src/main/resources/test")
but it is not working (cannot read the input file from the generated jar)
so I have used this one
URL url = this.getClass().getResource("/src/main/resources/test") here I am getting the problem of URL. So please help me out. I am using Hadoop 0.21.
I'm not sure what you want to tell us with your resource loading, but the usual way to add an input file is this:
Configuration conf = new Configuration();
Job job = new Job(conf);
Path in = new Path("YOUR_PATH_IN_HDFS");
FileInputFormat.addInputPath(job, in);
job.setInputFormatClass(TextInputFormat.class); // could be a sequencefile also
// set the other stuff
job.waitForCompletion(true);
Make sure your file resides in HDFS then.

Resources