Lucene.Net FastVectorHighlighter with fine-grained Chinese segmentation tool doesn't work - highlight

enter image description here
the error is:
System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
Parameter name: length
at System.String.Substring(Int32 startIndex, Int32 length)
at Lucene.Net.Search.VectorHighlight.BaseFragmentsBuilder.MakeFragment(StringBuilder buffer, Int32[] index, Field[] values, WeightedFragInfo fragInfo, String[] preTags, String[] postTags, IEncoder encoder) in C:\BuildAgent\work\b1b63ca15b99dddb\src\Lucene.Net.Highlighter\VectorHighlight\BaseFragmentsBuilder.cs:line 195
at Lucene.Net.Search.VectorHighlight.BaseFragmentsBuilder.CreateFragments(IndexReader reader, Int32 docId, String fieldName, FieldFragList fieldFragList, Int32 maxNumFragments, String[] preTags, String[] postTags, IEncoder encoder) in C:\BuildAgent\work\b1b63ca15b99dddb\src\Lucene.Net.Highlighter\VectorHighlight\BaseFragmentsBuilder.cs:line 146
at Lucene.Net.Search.VectorHighlight.BaseFragmentsBuilder.CreateFragments(IndexReader reader, Int32 docId, String fieldName, FieldFragList fieldFragList, Int32 maxNumFragments) in C:\BuildAgent\work\b1b63ca15b99dddb\src\Lucene.Net.Highlighter\VectorHighlight\BaseFragmentsBuilder.cs:line 99
it is because the resource code:
protected virtual string MakeFragment(StringBuilder buffer, int[] index, Field[] values, WeightedFragInfo fragInfo,
string[] preTags, string[] postTags, IEncoder encoder)
{
StringBuilder fragment = new StringBuilder();
int s = fragInfo.StartOffset;
int[] modifiedStartOffset = { s };
string src = GetFragmentSourceMSO(buffer, index, values, s, fragInfo.EndOffset, modifiedStartOffset);
int srcIndex = 0;
foreach (SubInfo subInfo in fragInfo.SubInfos)
{
foreach (Toffs to in subInfo.TermsOffsets)
{
fragment
.Append(encoder.EncodeText(src.Substring(srcIndex, (to.StartOffset - modifiedStartOffset[0]) - srcIndex)))
.Append(GetPreTag(preTags, subInfo.Seqnum))
.Append(encoder.EncodeText(src.Substring(to.StartOffset - modifiedStartOffset[0], (to.EndOffset - modifiedStartOffset[0]) - (to.StartOffset - modifiedStartOffset[0]))))
.Append(GetPostTag(postTags, subInfo.Seqnum));
srcIndex = to.EndOffset - modifiedStartOffset[0];
}
}
fragment.Append(encoder.EncodeText(src.Substring(srcIndex)));
return fragment.ToString();
}
fine-grained participle highlight with this code will be wrong.Because this function may require word segmentation is continuous.While fine-grained participle is not continuous. I want to ask how can FastVectorHighlighter hightlight with fine-grained participle.

Related

How to walk the parse tree to check for syntax errors in ANTLR

I have written a fairly simple language in ANTLR. Before actually interpreting the code written by a user, I wish to parse the code and check for syntax errors. If found I wish to output the cause for the error and exit. How can I check the code for syntax errors and output the corresponding error. Please not that for my purposes the error statements similar to those generated by the ANTLR tool are more than sufficient. For example
line 3:0 missing ';'
There is ErrorListener that you can use to get more information.
For example:
...
FormulaParser parser = new FormulaParser(tokens);
parser.IsCompletion = options.IsForCompletion;
ErrorListener errListener = new ErrorListener();
parser.AddErrorListener(errListener);
IParseTree tree = parser.formula();
Only thing you need to do is to attach ErrorListener to the parser.
Here is the code of ErrorListener.
/// <summary>
/// Error listener recording all errors that Antlr parser raises during parsing.
/// </summary>
internal class ErrorListener : BaseErrorListener
{
private const string Eof = "the end of formula";
public ErrorListener()
{
ErrorMessages = new List<ErrorInfo>();
}
public bool ErrorOccured { get; private set; }
public List<ErrorInfo> ErrorMessages { get; private set; }
public override void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
ErrorOccured = true;
if (e == null || e.GetType() != typeof(NoViableAltException))
{
ErrorMessages.Add(new ErrorInfo()
{
Message = ConvertMessage(msg),
StartIndex = offendingSymbol.StartIndex,
Column = offendingSymbol.Column + 1,
Line = offendingSymbol.Line,
Length = offendingSymbol.Text.Length
});
return;
}
ErrorMessages.Add(new ErrorInfo()
{
Message = string.Format("{0}{1}", ConvertToken(offendingSymbol.Text), " unexpected"),
StartIndex = offendingSymbol.StartIndex,
Column = offendingSymbol.Column + 1,
Line = offendingSymbol.Line,
Length = offendingSymbol.Text.Length
});
}
public override void ReportAmbiguity(Antlr4.Runtime.Parser recognizer, DFA dfa, int startIndex, int stopIndex, bool exact, BitSet ambigAlts, ATNConfigSet configs)
{
ErrorOccured = true;
ErrorMessages.Add(new ErrorInfo()
{
Message = "Ambiguity", Column = startIndex, StartIndex = startIndex
});
base.ReportAmbiguity(recognizer, dfa, startIndex, stopIndex, exact, ambigAlts, configs);
}
private string ConvertToken(string token)
{
return string.Equals(token, "<EOF>", StringComparison.InvariantCultureIgnoreCase)
? Eof
: token;
}
private string ConvertMessage(string message)
{
StringBuilder builder = new StringBuilder(message);
builder.Replace("<EOF>", Eof);
return builder.ToString();
}
}
It is some dummy listener, but you can see what it does. And that you can tell if the error is syntax error, or some ambiguity error. After parsing, you can ask directly the errorListener, if some error occurred.

how to get JNA read back function's string result

public interface Kernel32 extends StdCallLibrary {
int GetComputerNameW(Memory lpBuffer, IntByReference lpnSize);
}
public class Kernel32Test {
private static final String THIS_PC_NAME = "tiangao-160";
private static Kernel32 kernel32;
#BeforeClass
public static void setUp() {
System.setProperty("jna.encoding", "GBK");
kernel32 = (Kernel32) Native.loadLibrary("kernel32", Kernel32.class);
}
#AfterClass
public static void tearDown() {
System.setProperty("jna.encoding", null);
}
#Test
public void testGetComputerNameW() {
final Memory lpBuffer = new Memory(1024);
final IntByReference lpnSize = new IntByReference();
final int result = kernel32.GetComputerNameW(lpBuffer, lpnSize);
if (result != 0) {
throw new IllegalStateException(
"calling 'GetComputerNameW(lpBuffer, lpnSize)'failed,errorcode:" + result);
}
final int bufferSize = lpnSize.getValue();
System.out.println("value of 'lpnSize':" + bufferSize);
Assert.assertEquals(THIS_PC_NAME.getBytes().length + 1, bufferSize);
final String name = lpBuffer.getString(0);
System.out.println("value of 'lpBuffer':" + name);
Assert.assertEquals(THIS_PC_NAME, name);
}
}
The offical instructions says use byte[]、char[]、Memory or NIO Buffer for mapping char pointer in c native function.But I tried all of above, and String、WString、StringArrays、class extends PointType etc, all of them are no use.
Out parameter 'lpnSize' can return the corret buffer size,but 'lpBuffer' return 'x>'(i think it's random memory) or nothing no matter I mapping any java type.If i wrote someting to the 'lpBuffer' memory first, it would read the same things after calling native function.
How can I solve the problem?
You need to use Pointer.getString(0, true) to extract the unicode string returned by GetComputerNameW.
EDIT
You'll also need to call GetComputerNameW again with the length parameter initialized before the function will fill in the result. Either pass back the same IntByReference to a second call, or initialize the IntByReference to the size of your Memory buffer to have the buffer written to in the first call.

Using scanner to read phrases

Hey StackOverflow Community,
So, I have this line of information from a txt file that I need to parse.
Here is an example lines:
-> date & time AC Power Insolation Temperature Wind Speed
-> mm/dd/yyyy hh:mm.ss kw W/m^2 deg F mph
Using a scanner.nextLine() gives me a String with a whole line in it, and then I pass this off into StringTokenizer, which then separates them into individual Strings using whitespace as a separator.
so for the first line it would break up into:
date
&
time
AC
Power
Insolation
etc...
I need things like "date & time" together, and "AC Power" together. Is there anyway I can specify this using a method already defined in StringTokenizer or Scanner? Or would I have to develop my own algorithm to do this?
Would you guys suggest I use some other form of parsing lines instead of Scanner? Or, is Scanner sufficient enough for my needs?
ejay
oh, this one was tricky, maybe you could build up some Trie structure with your tokens, i was bored and wrote a little class which solves your problem. Warning: it's a bit hacky, but was fun to implement.
The Trie class:
class Trie extends HashMap<String, Trie> {
private static final long serialVersionUID = 1L;
boolean end = false;
public void addToken(String strings) {
addToken(strings.split("\\s+"), 0);
}
private void addToken(String[] strings, int begin) {
if (begin == strings.length) {
end = true;
return;
}
String key = strings[begin];
Trie t = get(key);
if (t == null) {
t = new Trie();
put(key, t);
}
t.addToken(strings, begin + 1);
}
public List<String> tokenize(String data) {
String[] split = data.split("\\s+");
List<String> tokens = new ArrayList<String>();
int pos = 0;
while (pos < split.length) {
int tokenLength = getToken(split, pos, 0);
tokens.add(glue(split, pos, tokenLength));
pos += tokenLength;
}
return tokens;
}
public String glue(String[] parts, int pos, int length) {
StringBuilder sb = new StringBuilder();
sb.append(parts[pos]);
for (int i = pos + 1; i < pos + length; i++) {
sb.append(" ");
sb.append(parts[i]);
}
return sb.toString();
}
private int getToken(String[] tokens, int begin, int length) {
if (end) {
return length;
}
if (begin == tokens.length) {
return 1;
}
String key = tokens[begin];
Trie t = get(key);
if (t != null) {
return t.getToken(tokens, begin + 1, length + 1);
}
return 1;
}
}
and how to use it:
Trie t = new Trie();
t.addToken("AC Power");
t.addToken("date & time");
t.addToken("date & foo");
t.addToken("Speed & fun");
String data = "date & time AC Power Insolation Temperature Wind Speed";
List<String> tokens = t.tokenize(data);
for (String s : tokens) {
System.out.println(s);
}

RSACryptoServiceProvider error

I encountered this error when I deployed the application in the production server, but in my local machine it is much working well.The error generates from this line (rsa.FromXmlString(xmlKey); in SignAndSecureData function). Anyone who encountered the error below? I also include the code snippets below the error message.
Error in: /sample.html.
Error Message:The profile for the user is a temporary profile.
Source: mscorlib
Method: System.Security.Cryptography.SafeProvHandle CreateProvHandle(System.Security.Cryptography.CspParameters, Boolean)
Stack Trace: at System.Security.Cryptography.Utils.CreateProvHandle(CspParameters parameters, Boolean randomKeyContainer)
at System.Security.Cryptography.RSACryptoServiceProvider.ImportParameters(RSAParameters parameters)
at System.Security.Cryptography.RSA.FromXmlString(String xmlString)
at website1.CryptoHelper.SignAndSecureData(String xmlKey, String[] values) in C:\website1\CryptoHelper.cs:line 52
at website1.CryptoHelper.SignAndSecureData(String[] values) in C:\website1\CryptoHelper.cs:line 40
-----------------------------Code-------------------------------
public static string SignAndSecureData(string[] values)
{
string xmlKey = "<RSAKeyValue><Modulus>p9HPjw9PMOCbYlu7YiE5chOOLgLfPR4L9jmcAyjrRsAekw0Z/xhs9G3Nl2P5G+/kMangrwg0egh2ium+3j5NuB0UGFEs8jKk/deSwwbxsxp+0p1JoY6jkHaQ1ItmrDVU5TZGjh7jNjBn5TpsrcFdxkslJp1x9ki248E7z7q1uhs=</Modulus><Exponent>AQAB</Exponent><P>27HXXHera3Voek0qg5pJf8wsl0Tq4xGl+tl1/f0rt1g6hyx4egS4/finWlptUnTnXu81oboYq7mI/kjzFiOPbQ==</P><Q>w41mCFTmdmINIo85D/8umTdwDsC+FOVlyYTVlw/xHBc/HxQQVOQOCVOJA9kZsVSUBr6fXY3yfSe/jxQXyzOSpw==</Q><DP>QCo38TzOZys6YYYKJbe5QccbOu8Y/0rXRGWhDZaU3w64wWQep9ybPyoRjtUcWtnj/Zk1+89Dh1xAA6zAurWWHQ==</DP><DQ>dsWiDDtswshpC+2LjgDCz8KRKBS/Hrf567zncdn36sTfzMOF69mcAOQg2xp4dXFWewY6izsU5hlHSuK8VOodDw==</DQ><InverseQ>WAmgU5XPgZNVXDMqYePpVZzQoiOblX4UlM21xTt/ZmvC7+af0c00LqOW4nbkwDqKCuRcD8X5Yr3H7IraaANjyg==</InverseQ><D>QbMRGAe9T/xOuLYC6Qrqy28+dWLodKvjsPSi0FXfriYekiFJ8SVl2ld2anNYHgjPhGXmMX/7016m0gFqmOU5VV1zzHVH0c0wecnKhhnJC+irjNgNwy9xwM1mnVoce9auk2qiAMhr2cL1NtwUf8cuXBfzm39ZF9Sxsn4fE1+p+ck=</D></RSAKeyValue>";
return SignAndSecureData(xmlKey, values);
}
public static string SignAndSecureData(string xmlKey, string[] values)
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml("<x></x>");
for (int i = 0; i < values.Length; i++)
_AddNode(xmlDoc, "v" + i.ToString(), values[i]);
RSACryptoServiceProvider rsa = new RSACryptoServiceProvider();
rsa.FromXmlString(xmlKey);
byte[] signature = rsa.SignData(Encoding.ASCII.GetBytes(xmlDoc.InnerXml),
"SHA1");
_AddNode(xmlDoc, "s", Convert.ToBase64String(signature, 0, signature.Length));
return EncryptCookie(xmlDoc.InnerXml);
}

BlackBerry 6: ListFieldCallback.indexOfList() - how to filter while typing?

I'm trying to display a TextField and a ListField below it:
And I would like to filter (aka "live search") the number of displayed rows, while the user is typing a word into the TextField.
I've tried calling ListField.setSearchable(true) but it doesn't change anything, even if I type words while having the ListField focussed.
And by the way I wonder which TextField to take. I've used AutoCompleteField because it looks exactly as I want the field to be (white field with rounded corners), but it is probably not the best choice (because I don't need AutoCompleteField's drop down list while typing).
Here is my current code -
MyScreen.java:
private ListField presetListField = new ListField();
private MyList presetList = new MyList(presetListField);
private MyScreen() {
int size;
getMainManager().setBackground(_bgOff);
setTitle("Favorites");
BasicFilteredList filterList = new BasicFilteredList();
String[] days = {"Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday"};
int uniqueID = 0;
filterList.addDataSet(uniqueID, days, "days",
BasicFilteredList.COMPARISON_IGNORE_CASE);
// XXX probably a bad choice here?
AutoCompleteField autoCompleteField =
new AutoCompleteField(filterList);
add(autoCompleteField);
presetListField.setEmptyString("* No Favorites *", DrawStyle.HCENTER);
add(presetListField);
presetList.insert("Monday");
presetList.insert("Tuesday");
presetList.insert("Wednesday");
for (int i = 0; i < 16; i++) {
presetList.insert("Favorite #" + (1 + i));
}
}
MyList.java:
public class MyList implements ListFieldCallback {
private Vector _preset = new Vector();
private ListField _list;
public MyList(ListField list) {
_list = list;
_list.setCallback(this);
_list.setRowHeight(-2);
// XXX does not seem to have any effect
_list.setSearchable(true);
}
public void insert(String str) {
insert(str, _preset.size());
}
public void insert(String str, int index) {
_preset.insertElementAt(str, index);
_list.insert(index);
}
public void delete(int index) {
_preset.removeElementAt(index);
_list.delete(index);
}
public void drawListRow(ListField listField,
Graphics g, int index, int y, int width) {
Font f = g.getFont();
Font b = f.derive(Font.BOLD, f.getHeight() * 2);
Font i = f.derive(Font.ITALIC, f.getHeight());
g.setColor(Color.WHITE);
g.drawText((String)_preset.elementAt(index), Display.getWidth()/3, y);
g.setFont(i);
g.setColor(Color.GRAY);
g.drawText("Click to get frequency",
Display.getWidth()/3, y + g.getFont().getHeight());
g.setFont(b);
g.setColor(Color.YELLOW);
g.drawText(String.valueOf(100f + index/10f), 0, y);
}
public Object get(ListField list, int index) {
return _preset.elementAt(index);
}
public int indexOfList(ListField list, String prefix, int start) {
return _preset.indexOf(prefix, start);
}
public int getPreferredWidth(ListField list) {
return Display.getWidth();
}
}
Thank you!
Alex
Have you checked the net.rim.device.api.ui.component.KeywordFilterField ?

Resources