how to extract main text from html using Tika - html-parsing

I just want to know that how i can extract main text and plain text from html using Tika?
maybe one possible solution is to use BoilerPipeContentHandler but do you have some sample/demo codes to show it?
thanks very much in advance

The BodyContentHandler class doesn't use the Boilerpipe code, so you'll have to explicitly use the BoilerPipeContentHandler. The following code worked for me:
public String[] tika_autoParser() {
String[] result = new String[3];
try {
InputStream input = new FileInputStream(new File("test.html"));
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
parser.parse(input, new BoilerpipeContentHandler(textHandler), metadata, context);
result[0] = "Title: " + metadata.get(metadata.TITLE);
result[1] = "Body: " + textHandler.toString();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (TikaException e) {
e.printStackTrace();
}
return result;
}

Here is a sample:
public String[] tika_autoParser() {
String[] result = new String[3];
try {
InputStream input = new FileInputStream(new File("/Users/nazanin/Books/Web crawler.pdf"));
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
parser.parse(input, textHandler, metadata, context);
result[0] = "Title: " + metadata.get(metadata.TITLE);
result[1] = "Body: " + textHandler.toString();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (TikaException e) {
e.printStackTrace();
}
return result;
}

Related

when use mina-sshd to execute command ,timeout occurred,how can i deal with it

Here follows is my code,how can i deal with it
public ConnectResponse runCommand(UserInfo userInfo, String command, long timeout) {
SshClient sshClient = SshClient.setUpDefaultClient();
sshClient.start();
ChannelExec execChannel = null;
try (ClientSession session = sshClient
.connect(userInfo.getUsername(), userInfo.getTargetIp(), 22)
.verify(CLIENT_VERIFY_TIMEOUT)
.getClientSession();) {
session.addPasswordIdentity(userInfo.getAuthentication());
session.auth().verify(SESSION_VERIFY_TIMEOUT);
execChannel = session.createExecChannel(command);
ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayOutputStream err = new ByteArrayOutputStream();
execChannel.setOut(out);
execChannel.setErr(err);
execChannel.open();
Set<ClientChannelEvent> events = execChannel.waitFor(EnumSet.of(ClientChannelEvent.CLOSED), TimeUnit.SECONDS.toMillis(timeout));
session.close(false);
if (events.contains(ClientChannelEvent.TIMEOUT)) {
throw new BusinessException(500, String.format("执行命令 {%s} 超时了!", command));
}
return new ConnectResponse(out.toString(), err.toString(), execChannel.getExitStatus());
} catch (Exception e) {
throw new BusinessException(e.getCause(), 500, String.format("执行命令 {%s} 出现了一行了!", command));
} finally {
if (!Objects.isNull(execChannel)) {
try {
execChannel.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

Clear packets sent/ received

Is it possible to clear the amount packets sent/ received and start from 0 again??
The Code to get sent or received packets:
long no_of_packet_Sent = RadioInfo.getNumberOfPacketsSent();
long no_of_packet_Received = RadioInfo.getNumberOfPacketsReceived();
I never found the answer to this but another option is to write the data to a text file then minus the data in the text file from the "get number of packets" .
private static String fileFormatString(String filename) {
return filename.replace(" ".charAt(0),"_".charAt(0));
}
public static String readTextFile(String fName) {
fName = fileFormatString(fName);
String result = null;
FileConnection fconn = null;
DataInputStream is = null;
try {
fconn = (FileConnection) Connector.open(fName, Connector.READ_WRITE);
is = fconn.openDataInputStream();
byte[] data = IOUtilities.streamToBytes(is);
result = new String(data);
} catch (IOException e) {
System.out.println("Error on read: "+fName+" - " + e.getMessage());
} finally {
try {
if (null != is) is.close();
if (null != fconn) fconn.close();
} catch (IOException e) {
System.out.println("Error on read IO: "+fName+" - " + e.getMessage());
}
}
return result;
}
public static void writeTextFile(String fName, String text) {
fName = fileFormatString(fName);
DataOutputStream os = null;
FileConnection fconn = null;
try {
fconn = (FileConnection) Connector.open(fName, Connector.READ_WRITE);
if (fconn.exists());
if (!fconn.exists()) fconn.create();
os = fconn.openDataOutputStream();
os.write(text.getBytes());
} catch (IOException e) {
System.out.println("Error on write: "+fName+" - " + e.getMessage());
} finally {
try {
if (null != os) os.close();
if (null != fconn) fconn.close();
} catch (IOException e) {
System.out.println("Error on write IO: "+fName+" - " + e.getMessage());
}
}
}
long no_of_packet = RadioInfo.getNumberOfPacketsSent()+RadioInfo.getNumberOfPacketsReceived();
DTHelper.writeTextFile(text_file_name,""+no_of_packet );
String readnumberofkbytes=readTextFile(text_file_name);
long Longreadnumberofbytes = Long.parseLong(readnumberofbytes);
long CurrentNumberofDataUsed= no_of_packet -Longreadnumberofbytes;

UnZip/Extract Zip file in blackberry

I have a zip file containing a folder and inside the folder I have some image file.I wish to extract these images.However I have not been able to find anything.I have been looking at zipMe but have not been able to find any relevant help.
Below is the code that I have developed so far.
ZipEntry dataZE;
InputStream isData;
StringBuffer sbData;
ZipInputStream dataZIS;
String src = "file:///store/home/user/images.zip";
String path = "file:///store/home/";
String fileName = "";
FileConnection f_Conn;
public UnZipper() {
debugger("Unzipper constructor");
try {
f_Conn = (FileConnection) Connector.open(src);
} catch (IOException e) {
debugger("f_conn error :" + e.getMessage());
}
try {
isData = f_Conn.openInputStream();
} catch (IOException e) {
debugger("f_conn error getting ip_stream:" + e.getMessage());
}
sbData = new StringBuffer();
dataZIS = new ZipInputStream(isData);
debugger("got all thing initialized");
}
public void run() {
debugger("unzipper run");
try {
startUnziping();
} catch (IOException e) {
debugger("Error unzipping " + e.getMessage());
}
debugger("finished...");
}
private void startUnziping() throws IOException {
debugger("startUnziping");
dataZE = dataZIS.getNextEntry();
fileName = dataZE.getName();
writeFile();
dataZIS.closeEntry();
debugger(">>>>>>>>>>> : " + fileName);
}
private void readFile() throws IOException {
debugger("readFile");
int ch;
int i = 0;
while ((ch = dataZIS.read()) != -1) {
debugger((i++) + " : " + sbData.toString()
+ " >>> writting this..");
sbData.append(ch);
}
}
private void writeFile() {
debugger("writting file...");
FileConnection f_Conn = null;
byte[] file_bytes = new byte[sbData.length()];
file_bytes = sbData.toString().getBytes();
try {
readFile();
} catch (IOException e) {
debugger("Error while reading " + e.getMessage());
}
try {
f_Conn = (FileConnection) Connector.open(path + fileName);
} catch (IOException e) {
debugger("getting f_conn" + e.getMessage());
}
if (!f_Conn.exists()) {
// create the file first
debugger("I know file does not exists");
try {
f_Conn.mkdir();
} catch (IOException e) {
debugger("Oops!!! error creating fle : " + e.getMessage());
}
}
try {
f_Conn.setWritable(true);
debugger("file is nt writeable");
} catch (IOException e) {
debugger("cannot make it writeable : " + e.getMessage());
}
OutputStream lo_OS = null;
try {
lo_OS = f_Conn.openOutputStream();
debugger("got out Stream hero!!!");
} catch (IOException e) {
debugger("cant get out Stream !!!");
}
try {
lo_OS.write(file_bytes);
debugger("yess...writtent everything");
} catch (IOException e) {
add(new LabelField("Error writing file ..." + e.getMessage()));
}
try {
lo_OS.close();
debugger("now closing connection...");
} catch (IOException e) {
debugger("error closing out stream : " + e.getMessage());
}
}
}
I have been able to get ZipEntry representing folder that contains images however I have not been able to figure out how i must extract those images.
Thanks for help.
Iterate over all ZipEntry in zip file in you startUnzipping (you're working only with first one in your code). The item corresponded to child file should have name like "foldername/filename".

How to send an audio file as attachment in BlackBerry?

How to send an audio file as attachment in BlackBerry SDK 6?
You can convert the audiofile into a bytearray and then use the following method
public synchronized boolean sendMail(final byte []data,
final boolean licensed)
{
Folder[] folders = store.list(4);
Folder sentfolder = folders[0];
// create a new message and store it in the sent folder
msg = new Message(sentfolder);
multipart = new Multipart();
textPart = new TextBodyPart(multipart,"Audio");
Address recipients[] = new Address[1];
try {
recipients[0] = new Address(address, "XYZ");
msg.addRecipients(Message.RecipientType.TO, recipients);
msg.setSubject("Audio");
try {
Thread thread = new Thread("Send mail") {
public void run() {
emailSenderIsBusy = true;
try {
attach = new SupportedAttachmentPart(
multipart, "application/octet-stream",
"title",data);
multipart.addBodyPart(textPart);
multipart.addBodyPart(attach);
msg.setContent(multipart);
Transport.send(msg);
}
catch(SendFailedException e)
{
}
catch (final MessagingException e) {
}
catch (final Exception e) {
}
}
};
thread.start();
return true;
}
catch (final Exception e)
{
}
}catch (final Exception e) {
}
return false;
}

J2ME/Blackberry - how to read/write text file?

please give me a sample code for read/write text file in blackberry application.
My code snippet for string read/write files:
private String readTextFile(String fName) {
String result = null;
FileConnection fconn = null;
DataInputStream is = null;
try {
fconn = (FileConnection) Connector.open(fName, Connector.READ_WRITE);
is = fconn.openDataInputStream();
byte[] data = IOUtilities.streamToBytes(is);
result = new String(data);
} catch (IOException e) {
System.out.println(e.getMessage());
} finally {
try {
if (null != is)
is.close();
if (null != fconn)
fconn.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
return result;
}
private void writeTextFile(String fName, String text) {
DataOutputStream os = null;
FileConnection fconn = null;
try {
fconn = (FileConnection) Connector.open(fName, Connector.READ_WRITE);
if (!fconn.exists())
fconn.create();
os = fconn.openDataOutputStream();
os.write(text.getBytes());
} catch (IOException e) {
System.out.println(e.getMessage());
} finally {
try {
if (null != os)
os.close();
if (null != fconn)
fconn.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
}
Using
FileConnection Interface

Resources