OpenNLP: Training a custom NER Model for multiple entities

OpenNLP: Training a custom NER Model for multiple entities - machine-learning

I am trying training a custom NER model for multiple entities. Here is the sample training data:
count all <START:item_type> operating tables <END> on the <START:location_id> third <END> <START:location_type> floor <END>
count all <START:item_type> items <END> on the <START:location_id> third <END> <START:location_type> floor <END>
how many <START:item_type> beds <END> are in <START:location_type> room <END> <START:location_id> 2 <END>
The NameFinderME.train(.) method takes a string parameter type. What is the use of this parameter? And, how can I train a model for multiple entities (e.g. item_type, location_type, location_id in my case)
public static void main(String[] args) {
String trainingDataFile = "/home/OpenNLPTest/lib/training_data.txt";
String outputModelFile = "/tmp/model.bin";
String sentence = "how many beds are in the hospital";
train(trainingDataFile, outputModelFile, "location_type");
predict(sentence, outputModelFile);
}
private static void train(String trainingDataFile, String outputModelFile, String tagToFind) {
File inFile = new File(trainingDataFile);
NameSampleDataStream nss = null;
try {
nss = new NameSampleDataStream(new PlainTextByLineStream(new java.io.FileReader(inFile)));
} catch (Exception e) {}
TokenNameFinderModel model = null;
int iterations = 100;
int cutoff = 5;
try {
// Does the 'type' parameter mean the entity type that I am trying to train the model for?
// What if I need to train for multiple entities?
model = NameFinderME.train("en", tagToFind, nss, (AdaptiveFeatureGenerator) null, Collections.<String,Object>emptyMap(), iterations, cutoff);
} catch(Exception e) {}
try {
File outFile = new File(outputModelFile);
FileOutputStream outFileStream = new FileOutputStream(outFile);
model.serialize(outFileStream);
}
catch (Exception ex) {}
}
private static void predict(String sentence, String modelFile) throws Exception {
FileInputStream modelInToken = new FileInputStream("/tmp/en-token.bin");
TokenizerModel modelToken = new TokenizerModel(modelInToken);
Tokenizer tokenizer = new TokenizerME(modelToken);
String tokens[] = tokenizer.tokenize(sentence);
FileInputStream modelIn = new FileInputStream(modelFile);
TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(model);
Span nameSpans[] = nameFinder.find(tokens);
double[] spanProbs = nameFinder.probs(nameSpans);
for( int i = 0; i<nameSpans.length; i++) {
System.out.println(nameSpans[i]);
}
}

The type argument to NameFinderME.train is used as the default type for training data that does not include a type parameter. This is only relevant if you have a sample that looks like this:
<START> operating tables <END>
Instead of like this:
<START:item_type> operating tables <END>
To train multiple types of entities, the developer documentation says
A training file can contain multiple types. If the training file
contains multiple types the created model will also be able to detect
these multiple types. For now its recommended to only train single
type models, since multi type support is still experimental.
So you could try training on the sample from your question, which includes multiple types, and see how well it works. In this mailing list message, someone asks for the status of training for multiple types and gets this answer:
The code path itself is stable, the reason we put it there is that it
didn't have a good performance on the English data.
Anyway, there performance might highly depend on your data set and the
language.
If you don't get good performance with a model that handles multiple types, the alternative would be to create multiple copies of your training data where each copy is modified to include only one type. You would then train a separate model on each set of training data. At that point you should have a (for example) item_type model, a location_type model, and a location_id model. You could then run your input through each model to detect the different types.

Related

BeanItemContainer unique property values

I am using BeanItemContainer for my Grid. I want to get a unique list of one of the properties. For instance, let's say my beans are as follows:
class Fun {
String game;
String rules;
String winner;
}
This would display as 3 columns in my Grid. I want to get a list of all the unique values for the game property. How would I do this? I have the same property id in multiple different bean classes, so it would be nice to get the values directly from the BeanItemContainer. I am trying to avoid building this unique list before loading the data into the Grid, since doing it that way would require me to handle it on a case by case basis.
My ultimate goal is to create a dropdown in a filter based on those unique values.

There isn't any helper for directly doing what you ask for. Instead, you'd have to do it "manually" by iterating through all items and collecting the property values to a Set which would then at the end contain all unique values.
Alternatively, if the data originates from a database, then you could maybe retrieve the unique values from there by using e.g. the DISTINCT keyword in SQL.

In case anyone is curious, this is how I applied Leif's suggestion. When they enter the dropdown, I cycle through all the item ids for the property id of the column I care about, and then fill values based on that property id. Since the same Grid can be loaded with new data, I also have to "clear" this list of item ids.
filterField.addFocusListener(focus->{
if(!(filterField.getItemIds() instanceof Collection) ||
filterField.getItemIds().isEmpty())
{
BeanItemContainer<T> container = getGridContainer();
if( container instanceof BeanItemContainer && getFilterPropertyId() instanceof Object )
{
List<T> itemIds = container.getItemIds();
Set<String> distinctValues = new HashSet<String>();
for(T itemId : itemIds)
{
Property<?> prop = container.getContainerProperty(itemId, getFilterPropertyId());
String value = null;
if( prop.getValue() instanceof String )
{
value = (String) prop.getValue();
}
if(value instanceof String && !value.trim().isEmpty())
distinctValues.add(value);
}
filterField.addItems(distinctValues);
}
}
});
Minor point: the filterField variable is using the ComboBoxMultiselect add-on for Vaadin 7. Hopefully, when I finally have time to convert to Vaadin 14+, I can do something similar there.

How to get random data from firestore? [duplicate]

It is crucial for my application to be able to select multiple documents at random from a collection in firebase.
Since there is no native function built in to Firebase (that I know of) to achieve a query that does just this, my first thought was to use query cursors to select a random start and end index provided that I have the number of documents in the collection.
This approach would work but only in a limited fashion since every document would be served up in sequence with its neighboring documents every time; however, if I was able to select a document by its index in its parent collection I could achieve a random document query but the problem is I can't find any documentation that describes how you can do this or even if you can do this.
Here's what I'd like to be able to do, consider the following firestore schema:
root/
posts/
docA
docB
docC
docD
Then in my client (I'm in a Swift environment) I'd like to write a query that can do this:
db.collection("posts")[0, 1, 3] // would return: docA, docB, docD
Is there anyway I can do something along the lines of this? Or, is there a different way I can select random documents in a similar fashion?
Please help.

Using randomly generated indexes and simple queries, you can randomly select documents from a collection or collection group in Cloud Firestore.
This answer is broken into 4 sections with different options in each section:
How to generate the random indexes
How to query the random indexes
Selecting multiple random documents
Reseeding for ongoing randomness
How to generate the random indexes
The basis of this answer is creating an indexed field that when ordered ascending or descending, results in all the document being randomly ordered. There are different ways to create this, so let's look at 2, starting with the most readily available.
Auto-Id version
If you are using the randomly generated automatic ids provided in our client libraries, you can use this same system to randomly select a document. In this case, the randomly ordered index is the document id.
Later in our query section, the random value you generate is a new auto-id (iOS, Android, Web) and the field you query is the __name__ field, and the 'low value' mentioned later is an empty string. This is by far the easiest method to generate the random index and works regardless of the language and platform.
By default, the document name (__name__) is only indexed ascending, and you also cannot rename an existing document short of deleting and recreating. If you need either of these, you can still use this method and just store an auto-id as an actual field called random rather than overloading the document name for this purpose.
Random Integer version
When you write a document, first generate a random integer in a bounded range and set it as a field called random. Depending on the number of documents you expect, you can use a different bounded range to save space or reduce the risk of collisions (which reduce the effectiveness of this technique).
You should consider which languages you need as there will be different considerations. While Swift is easy, JavaScript notably can have a gotcha:
32-bit integer: Great for small (~10K unlikely to have a collision) datasets
64-bit integer: Large datasets (note: JavaScript doesn't natively support, yet)
This will create an index with your documents randomly sorted. Later in our query section, the random value you generate will be another one of these values, and the 'low value' mentioned later will be -1.
How to query the random indexes
Now that you have a random index, you'll want to query it. Below we look at some simple variants to select a 1 random document, as well as options to select more than 1.
For all these options, you'll want to generate a new random value in the same form as the indexed values you created when writing the document, denoted by the variable random below. We'll use this value to find a random spot on the index.
Wrap-around
Now that you have a random value, you can query for a single document:
let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
.order(by: "random")
.limit(to: 1)
Check that this has returned a document. If it doesn't, query again but use the 'low value' for your random index. For example, if you did Random Integers then lowValue is 0:
let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: lowValue)
.order(by: "random")
.limit(to: 1)
As long as you have a single document, you'll be guaranteed to return at least 1 document.
Bi-directional
The wrap-around method is simple to implement and allows you to optimize storage with only an ascending index enabled. One downside is the possibility of values being unfairly shielded. E.g if the first 3 documents (A,B,C) out of 10K have random index values of A:409496, B:436496, C:818992, then A and C have just less than 1/10K chance of being selected, whereas B is effectively shielded by the proximity of A and only roughly a 1/160K chance.
Rather than querying in a single direction and wrapping around if a value is not found, you can instead randomly select between >= and <=, which reduces the probability of unfairly shielded values by half, at the cost of double the index storage.
If one direction returns no results, switch to the other direction:
queryRef = postsRef.whereField("random", isLessThanOrEqualTo: random)
.order(by: "random", descending: true)
.limit(to: 1)
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
.order(by: "random")
.limit(to: 1)
Selecting multiple random documents
Often, you'll want to select more than 1 random document at a time. There are 2 different ways to adjust the above techniques depending on what trade offs you want.
Rinse & Repeat
This method is straight forward. Simply repeat the process, including selecting a new random integer each time.
This method will give you random sequences of documents without worrying about seeing the same patterns repeatedly.
The trade-off is it will be slower than the next method since it requires a separate round trip to the service for each document.
Keep it coming
In this approach, simply increase the number in the limit to the desired documents. It's a little more complex as you might return 0..limit documents in the call. You'll then need to get the missing documents in the same manner, but with the limit reduced to only the difference. If you know there are more documents in total than the number you are asking for, you can optimize by ignoring the edge case of never getting back enough documents on the second call (but not the first).
The trade-off with this solution is in repeated sequences. While the documents are randomly ordered, if you ever end up overlapping ranges you'll see the same pattern you saw before. There are ways to mitigate this concern discussed in the next section on reseeding.
This approach is faster than 'Rinse & Repeat' as you'll be requesting all the documents in the best case a single call or worst case 2 calls.
Reseeding for ongoing randomness
While this method gives you documents randomly if the document set is static the probability of each document being returned will be static as well. This is a problem as some values might have unfairly low or high probabilities based on the initial random values they got. In many use cases, this is fine but in some, you may want to increase the long term randomness to have a more uniform chance of returning any 1 document.
Note that inserted documents will end up weaved in-between, gradually changing the probabilities, as will deleting documents. If the insert/delete rate is too small given the number of documents, there are a few strategies addressing this.
Multi-Random
Rather than worrying out reseeding, you can always create multiple random indexes per document, then randomly select one of those indexes each time. For example, have the field random be a map with subfields 1 to 3:
{'random': {'1': 32456, '2':3904515723, '3': 766958445}}
Now you'll be querying against random.1, random.2, random.3 randomly, creating a greater spread of randomness. This essentially trades increased storage to save increased compute (document writes) of having to reseed.
Reseed on writes
Any time you update a document, re-generate the random value(s) of the random field. This will move the document around in the random index.
Reseed on reads
If the random values generated are not uniformly distributed (they're random, so this is expected), then the same document might be picked a dispropriate amount of the time. This is easily counteracted by updating the randomly selected document with new random values after it is read.
Since writes are more expensive and can hotspot, you can elect to only update on read a subset of the time (e.g, if random(0,100) === 0) update;).

Posting this to help anyone that has this problem in the future.
If you are using Auto IDs you can generate a new Auto ID and query for the closest Auto ID as mentioned in Dan McGrath's Answer.
I recently created a random quote api and needed to get random quotes from a firestore collection.
This is how I solved that problem:
var db = admin.firestore();
var quotes = db.collection("quotes");
var key = quotes.doc().id;
quotes.where(admin.firestore.FieldPath.documentId(), '>=', key).limit(1).get()
.then(snapshot => {
if(snapshot.size > 0) {
snapshot.forEach(doc => {
console.log(doc.id, '=>', doc.data());
});
}
else {
var quote = quotes.where(admin.firestore.FieldPath.documentId(), '<', key).limit(1).get()
.then(snapshot => {
snapshot.forEach(doc => {
console.log(doc.id, '=>', doc.data());
});
})
.catch(err => {
console.log('Error getting documents', err);
});
}
})
.catch(err => {
console.log('Error getting documents', err);
});
The key to the query is this:
.where(admin.firestore.FieldPath.documentId(), '>', key)
And calling it again with the operation reversed if no documents are found.
I hope this helps!

Just made this work in Angular 7 + RxJS, so sharing here with people who want an example.
I used #Dan McGrath 's answer, and I chose these options: Random Integer version + Rinse & Repeat for multiple numbers. I also used the stuff explained in this article: RxJS, where is the If-Else Operator? to make if/else statements on stream level (just if any of you need a primer on that).
Also note I used angularfire2 for easy Firebase integration in Angular.
Here is the code:
import { Component, OnInit } from '#angular/core';
import { Observable, merge, pipe } from 'rxjs';
import { map, switchMap, filter, take } from 'rxjs/operators';
import { AngularFirestore, QuerySnapshot } from '#angular/fire/firestore';
#Component({
selector: 'pp-random',
templateUrl: './random.component.html',
styleUrls: ['./random.component.scss']
})
export class RandomComponent implements OnInit {
constructor(
public afs: AngularFirestore,
) { }
ngOnInit() {
}
public buttonClicked(): void {
this.getRandom().pipe(take(1)).subscribe();
}
public getRandom(): Observable<any[]> {
const randomNumber = this.getRandomNumber();
const request$ = this.afs.collection('your-collection', ref => ref.where('random', '>=', randomNumber).orderBy('random').limit(1)).get();
const retryRequest$ = this.afs.collection('your-collection', ref => ref.where('random', '<=', randomNumber).orderBy('random', 'desc').limit(1)).get();
const docMap = pipe(
map((docs: QuerySnapshot<any>) => {
return docs.docs.map(e => {
return {
id: e.id,
...e.data()
} as any;
});
})
);
const random$ = request$.pipe(docMap).pipe(filter(x => x !== undefined && x[0] !== undefined));
const retry$ = request$.pipe(docMap).pipe(
filter(x => x === undefined || x[0] === undefined),
switchMap(() => retryRequest$),
docMap
);
return merge(random$, retry$);
}
public getRandomNumber(): number {
const min = Math.ceil(Number.MIN_VALUE);
const max = Math.ceil(Number.MAX_VALUE);
return Math.floor(Math.random() * (max - min + 1)) + min;
}
}

The other solutions are better but seems hard for me to understand, so I came up with another method
Use incremental number as ID like 1,2,3,4,5,6,7,8,9, watch out for delete documents else we
have an I'd that is missing
Get total number of documents in the collection, something like this, I don't know of a better solution than this
let totalDoc = db.collection("stat").get().then(snap=>snap.size)
Now that we have these, create an empty array to store random list of number, let's say we want 20 random documents.
let randomID = [ ]
while(randomID.length < 20) {
const randNo = Math.floor(Math.random() * totalDoc) + 1;
if(randomID.indexOf(randNo) === -1) randomID.push(randNo);
}
now we have our 20 random documents id
finally we fetch our data from fire store, and save to randomDocs array by mapping through the randomID array
const randomDocs = randomID.map(id => {
db.collection("posts").doc(id).get()
.then(doc => {
if (doc.exists) return doc.data()
})
.catch(error => {
console.log("Error getting document:", error);
});
})
I'm new to firebase, but I think with this answers we can get something better or a built-in query from firebase soon

After intense argument with my friend, we finally found some solution
If you don't need to set document's id to be RandomID, just name documents as size of collection's size.
For example, first document of collection is named '0'.
second document name should be '1'.
Then, we just read the size of collection, for example N, and we can get random number A in range of [0~N).
And then, we can query the document named A.
This way can give same probability of randomness to every documents in collection.

undoubtedly Above accepted Answer is SuperUseful but There is one case like If we had a collection of some Documents(about 100-1000) and we want some 20-30 random Documents Provided that Document must not be repeated. (case In Random Problems App etc...).
Problem with the Above Solution:
For a small number of documents in the Collection(say 50) Probability of repetition is high. To avoid it If I store Fetched Docs Id and Add-in Query like this:
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: lowValue).where("__name__", isNotEqualTo:"PreviousId")
.order(by: "random")
.limit(to: 1)
here PreviousId is Id of all Elements that were fetched Already means A loop of n previous Ids.
But in this case, network Call would be high.
My Solution:
Maintain one Special Document and Keep a Record of Ids of this Collection only, and fetched this document First Time and Then Do all Randomness Stuff and check for previously not fetched on App site. So in this case network call would be only the same as the number of documents requires (n+1).
Disadvantage of My solution:
Have to maintain A document so Write on Addition and Deletion. But it is good If reads are very often then Writes which occurs in most cases.

You can use listDocuments() property for get only Query list of documents id. Then generate random id using the following way and get DocumentSnapshot with get() property.
var restaurantQueryReference = admin.firestore().collection("Restaurant"); //have +500 docs
var restaurantQueryList = await restaurantQueryReference.listDocuments(); //get all docs id;
for (var i = restaurantQueryList.length - 1; i > 0; i--) {
var j = Math.floor(Math.random() * (i + 1));
var temp = restaurantQueryList[i];
restaurantQueryList[i] = restaurantQueryList[j];
restaurantQueryList[j] = temp;
}
var restaurantId = restaurantQueryList[Math.floor(Math.random()*restaurantQueryList.length)].id; //this is random documentId

Unlike rtdb, firestore ids are not ordered chronologically. So using Auto-Id version described by Dan McGrath is easily implemented if you use the auto-generated id by the firestore client.
new Promise<Timeline | undefined>(async (resolve, reject) => {
try {
let randomTimeline: Timeline | undefined;
let maxCounter = 5;
do {
const randomId = this.afs.createId(); // AngularFirestore
const direction = getRandomIntInclusive(1, 10) <= 5;
// The firestore id is saved with your model as an "id" property.
let list = await this.list(ref => ref
.where('id', direction ? '>=' : '<=', randomId)
.orderBy('id', direction ? 'asc' : 'desc')
.limit(10)
).pipe(take(1)).toPromise();
// app specific filtering
list = list.filter(x => notThisId !== x.id && x.mediaCounter > 5);
if (list.length) {
randomTimeline = list[getRandomIntInclusive(0, list.length - 1)];
}
} while (!randomTimeline && maxCounter-- >= 0);
resolve(randomTimeline);
} catch (err) {
reject(err);
}
})

I have one way to get random a list document in Firebase Firestore, it really easy. When i upload data on Firestore i creat a field name "position" with random value from 1 to 1 milions. When i get data from Fire store i will set Order by field "Position" and update value for it, a lot of user load data and data always update and it's will be random value.

For those using Angular + Firestore, building on #Dan McGrath techniques, here is the code snippet.
Below code snippet returns 1 document.
getDocumentRandomlyParent(): Observable<any> {
return this.getDocumentRandomlyChild()
.pipe(
expand((document: any) => document === null ? this.getDocumentRandomlyChild() : EMPTY),
);
}
getDocumentRandomlyChild(): Observable<any> {
const random = this.afs.createId();
return this.afs
.collection('my_collection', ref =>
ref
.where('random_identifier', '>', random)
.limit(1))
.valueChanges()
.pipe(
map((documentArray: any[]) => {
if (documentArray && documentArray.length) {
return documentArray[0];
} else {
return null;
}
}),
);
}
1) .expand() is a rxjs operation for recursion to ensure we definitely get a document from the random selection.
2) For recursion to work as expected we need to have 2 separate functions.
3) We use EMPTY to terminate .expand() operator.
import { Observable, EMPTY } from 'rxjs';

Ok I will post answer to this question even thou I am doing this for Android. Whenever i create a new document i initiate random number and set it to random field, so my document looks like
"field1" : "value1"
"field2" : "value2"
...
"random" : 13442 //this is the random number i generated upon creating document
When I query for random document I generate random number in same range that I used when creating document.
private val firestore: FirebaseFirestore = FirebaseFirestore.getInstance()
private var usersReference = firestore.collection("users")
val rnds = (0..20001).random()
usersReference.whereGreaterThanOrEqualTo("random",rnds).limit(1).get().addOnSuccessListener {
if (it.size() > 0) {
for (doc in it) {
Log.d("found", doc.toString())
}
} else {
usersReference.whereLessThan("random", rnds).limit(1).get().addOnSuccessListener {
for (doc in it) {
Log.d("found", doc.toString())
}
}
}
}

Based on #ajzbc answer I wrote this for Unity3D and its working for me.
FirebaseFirestore db;
void Start()
{
db = FirebaseFirestore.DefaultInstance;
}
public void GetRandomDocument()
{
Query query1 = db.Collection("Sports").WhereGreaterThanOrEqualTo(FieldPath.DocumentId, db.Collection("Sports").Document().Id).Limit(1);
Query query2 = db.Collection("Sports").WhereLessThan(FieldPath.DocumentId, db.Collection("Sports").Document().Id).Limit(1);
query1.GetSnapshotAsync().ContinueWithOnMainThread((querySnapshotTask1) =>
{
if(querySnapshotTask1.Result.Count > 0)
{
foreach (DocumentSnapshot documentSnapshot in querySnapshotTask1.Result.Documents)
{
Debug.Log("Random ID: "+documentSnapshot.Id);
}
} else
{
query2.GetSnapshotAsync().ContinueWithOnMainThread((querySnapshotTask2) =>
{
foreach (DocumentSnapshot documentSnapshot in querySnapshotTask2.Result.Documents)
{
Debug.Log("Random ID: " + documentSnapshot.Id);
}
});
}
});
}

If you are using autoID this may also work for you...
let collectionRef = admin.firestore().collection('your-collection');
const documentSnapshotArray = await collectionRef.get();
const records = documentSnapshotArray.docs;
const index = documentSnapshotArray.size;
let result = '';
console.log(`TOTAL SIZE=====${index}`);
var randomDocId = Math.floor(Math.random() * index);
const docRef = records[randomDocId].ref;
result = records[randomDocId].data();
console.log('----------- Random Result --------------------');
console.log(result);
console.log('----------- Random Result --------------------');

Easy (2022). You need something like:
export const getAtRandom = async (me) => {
const collection = admin.firestore().collection('...').where(...);
const { count } = (await collection.count().get()).data();
const numberAtRandom = Math.floor(Math.random() * count);
const snap = await accountCollection.limit(1).offset(numberAtRandom).get()
if (accountSnap.empty) return null;
const doc = { id: snap.docs[0].id, ...snap.docs[0].data(), ref: snap.docs[0].ref };
return doc;
}

The next code (Flutter) will return one or up to ten random documents from a Firebase collection.
None of the documents will be repeated
Max 10 documents can be retrieved
If you pass a greater numberOfDocuments than existing documents in the collection, the loop will never end.
Future<Iterable<QueryDocumentSnapshot>> getRandomDocuments(int numberOfDocuments) async {
// Queried documents
final docs = <QueryDocumentSnapshot>[];
// Queried documents id's. We will use later to avoid querying same documents
final currentIds = <String>[];
do {
// Generate random id explained by #Dan McGrath's answer (autoId)
final randomId = FirebaseFirestore.instance.collection('random').doc().id;
var query = FirebaseFirestore.instance
.collection('myCollection') // Change this for you collection name
.where(FieldPath.documentId, isGreaterThanOrEqualTo: randomId)
.limit(1);
if (currentIds.isNotEmpty) {
// If previously we fetched a document we avoid fetching the same
query = query.where(FieldPath.documentId, whereNotIn: currentIds);
}
final querySnap = await query.get();
for (var element in querySnap.docs) {
currentIds.add(element.id);
docs.add(element);
}
} while (docs.length < numberOfDocuments); // <- Run until we have all documents we want
return docs;
}

Passing query parameters in Dapper using OleDb

This query produces an error No value given for one or more required parameters:
using (var conn = new OleDbConnection("Provider=..."))
{
conn.Open();
var result = conn.Query(
"select code, name from mytable where id = ? order by name",
new { id = 1 });
}
If I change the query string to: ... where id = #id ..., I will get an error: Must declare the scalar variable "#id".
How do I construct the query string and how do I pass the parameter?

The following should work:
var result = conn.Query(
"select code, name from mytable where id = ?id? order by name",
new { id = 1 });

Important: see newer answer
In the current build, the answer to that would be "no", for two reasons:
the code attempts to filter unused parameters - and is currently removing all of them because it can't find anything like #id, :id or ?id in the sql
the code for adding values from types uses an arbitrary (well, ok: alphabetical) order for the parameters (because reflection does not make any guarantees about the order of members), making positional anonymous arguments unstable
The good news is that both of these are fixable
we can make the filtering behaviour conditional
we can detect the category of types that has a constructor that matches all the property names, and use the constructor argument positions to determine the synthetic order of the properties - anonymous types fall into this category
Making those changes to my local clone, the following now passes:
// see https://stackoverflow.com/q/18847510/23354
public void TestOleDbParameters()
{
using (var conn = new System.Data.OleDb.OleDbConnection(
Program.OleDbConnectionString))
{
var row = conn.Query("select Id = ?, Age = ?", new DynamicParameters(
new { foo = 12, bar = 23 } // these names DO NOT MATTER!!!
) { RemoveUnused = false } ).Single();
int age = row.Age;
int id = row.Id;
age.IsEqualTo(23);
id.IsEqualTo(12);
}
}
Note that I'm currently using DynamicParameters here to avoid adding even more overloads to Query / Query<T> - because this would need to be added to a considerable number of methods. Adding it to DynamicParameters solves it in one place.
I'm open to feedback before I push this - does that look usable to you?
Edit: with the addition of a funky smellsLikeOleDb (no, not a joke), we can now do this even more directly:
// see https://stackoverflow.com/q/18847510/23354
public void TestOleDbParameters()
{
using (var conn = new System.Data.OleDb.OleDbConnection(
Program.OleDbConnectionString))
{
var row = conn.Query("select Id = ?, Age = ?",
new { foo = 12, bar = 23 } // these names DO NOT MATTER!!!
).Single();
int age = row.Age;
int id = row.Id;
age.IsEqualTo(23);
id.IsEqualTo(12);
}
}

I've trialing use of Dapper within my software product which is using odbc connections (at the moment). However one day I intend to move away from odbc and use a different pattern for supporting different RDBMS products. However, my problem with solution implementation is 2 fold:
I want to write SQL code with parameters that conform to different back-ends, and so I want to be writing named parameters in my SQL now so that I don't have go back and re-do it later.
I don't want to rely on getting the order of my properties in line with my ?. This is bad. So my suggestion is to please add support for Named Parameters for odbc.
In the mean time I have hacked together a solution that allows me to do this with Dapper. Essentially I have a routine that replaces the named parameters with ? and also rebuilds the parameter object making sure the parameters are in the correct order.
However looking at the Dapper code, I can see that I've repeated some of what dapper is doing anyway, effectively it each parameter value is now visited once more than what would be necessary. This becomes more of an issue for bulk updates/inserts.
But at least it seems to work for me o.k...
I borrowed a bit of code from here to form part of my solution...

The ? for parameters was part of the solution for me, but it only works with integers, like ID. It still fails for strings because the parameter length isn't specifed.
OdbcException: ERROR [HY104] [Microsoft][ODBC Microsoft Access Driver]Invalid precision value
System.Data.Odbc. OdbcParameter.Bind(OdbcStatementHandle hstmt,
OdbcCommand command, short ordinal, CNativeBuffer parameterBuffer, bool allowReentrance)
System.Data.Odbc.OdbcParameterCollection.Bind(OdbcCommand command, CMDWrapper cmdWrapper, CNativeBuffer parameterBuffer)
System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, string method, bool needReader, object[] methodArguments, SQL_API odbcApiMethod)
System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, string method, bool needReader)
System.Data.Common.DbCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
Dapper.SqlMapper.QueryAsync(IDbConnection cnn, Type effectiveType, CommandDefinition command) in SqlMapper.Async.cs
WebAPI.DataAccess.CustomerRepository.GetByState(string state) in Repository.cs
var result = await conn.QueryAsync(sQuery, new { State = state });
WebAPI.Controllers.CustomerController.GetByState(string state) in CustomerController .cs
return await _customerRepo.GetByState(state);
For Dapper to pass string parameters to ODBC I had to specify the length.
var result = await conn.QueryAsync<Customer>(sQuery, new { State = new DbString { Value = state, IsFixedLength = true, Length = 4} });

Simulating a Markov Chain with Neo4J

A Markov chain is composed of a set of states which can transition to other states with a certain probability.
A Markov chain can be easily represented in Neo4J by creating a node for each state, a relationship for each transition, and then annotating the transition relationships with the appropriate probability.
BUT, can you simulate the Markov chain using Neo4J? For instance, can Neo4J be coerced to start in a certain state and then make transitions to the next state and the next state based upon probabilities? Can Neo4J return with a printout of the path that it took through this state space?
Perhaps this is easier to understand with a simple example. Let's say I want to make a 2-gram model of English based upon the text of my company's tech blog. I spin up a script which does the following:
It pulls down the text of the blog.
It iterates over every pair of adjacent letters and creates a node in Neo4J.
It iterates again over every 3-tuple of adjacent letters and then creates a Neo4J directed relationship between the node represented by the first two letters and the node represented by the last two letters. It initializes a counter on this relationship to 1. If the relationship already exists, then the counter is incremented.
Finally, it iterates through each node, counts how many total outgoing transitions have occurred, and then creates a new annotation on each relationship of a particular node equal to count/totalcount. This is the transition probability.
Now that the Neo4J graph is complete, how do I make it create a "sentence" from my 2-gram model of English? Here is what the output might look like:
IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.

Neo4j doesn't provide the functionality you're asking for out of the box, but since you've already come as far as correctly populating your database, the traversal that you need is just a few lines of code.
I've recreated your experiment here, with a few modifications. First of all, I populate the database with a single pass through the text (steps 2 and 3), but that's a minor. More importantly, I only store the number of occurrences on each relationship and the total number on the node (step 4), as I don't think there is a need to pre-calculate probabilities.
The code that you're asking for then looks something like this:
/**
* A component that creates a random sentence by a random walk on a Markov Chain stored in Neo4j, produced by
* {#link NGramDatabasePopulator}.
*/
public class RandomSentenceCreator {
private final Random random = new Random(System.currentTimeMillis());
/**
* Create a random sentence from the underlying n-gram model. Starts at a random node an follows random outgoing
* relationships of type {#link Constants#REL} with a probability proportional to that transition occurrence in the
* text that was processed to form the model. This happens until the desired length is achieved. In case a node with
* no outgoing relationships it reached, the walk is re-started from a random node.
*
* #param database storing the n-gram model.
* #param length desired number of characters in the random sentence.
* #return random sentence.
*/
public String createRandomSentence(GraphDatabaseService database, int length) {
Node startNode = randomNode(database);
return walk(startNode, length, 0);
}
private String walk(Node startNode, int maxLength, int currentLength) {
if (currentLength >= maxLength) {
return (String) startNode.getProperty(NAME);
}
int totalRelationships = (int) startNode.getProperty(TOTAL, 0);
if (totalRelationships == 0) {
//terminal node, restart from random
return walk(randomNode(startNode.getGraphDatabase()), maxLength, currentLength);
}
int choice = random.nextInt(totalRelationships) + 1;
int total = 0;
Iterator<Relationship> relationshipIterator = startNode.getRelationships(OUTGOING, REL).iterator();
Relationship toFollow = null;
while (total < choice && relationshipIterator.hasNext()) {
toFollow = relationshipIterator.next();
total += (int) toFollow.getProperty(PROBABILITY);
}
Node nextNode;
if (toFollow == null) {
//no relationship to follow => stay on the same node and try again
nextNode = startNode;
} else {
nextNode = toFollow.getEndNode();
}
return ((String) nextNode.getProperty(NAME)).substring(0, 1) + walk(nextNode, maxLength, currentLength + 1);
}
private Node randomNode(GraphDatabaseService database) {
return random(GlobalGraphOperations.at(database).getAllNodes());
}
}

Search subset of objects using Compass/Lucene

I'm using the searchable plugin for Grails (which provides an API for Compass, which is itself an API over Lucene). I have an Order class that I would like to search but, I don't want to search all the instances of Order, just a subset of them. Something like this:
// This is a Hibernate/GORM call
List<Order> searchableOrders = Customer.findAllByName("Bob").orders
// Now search only these orders with the searchable plugin - something like
searchableOrders.search("name: foo")
In reality the relational query to get the searchableOrders is more complex than this, so I can't do the entire query (Hibernate + compass) in compass alone. Is there a way to search only a subject of instances of a particular class using Compass/Lucene.

One way to do this is with a custom Filter. For example, if you wanted to filter based on ids for your domain class, you would add the id to the searchable configuration for the domain class:
static searchable = {
id name: "id"
}
Then you would write your custom filter (which can go in [project]/src/java):
import org.apache.lucene.search.Filter;
import java.util.BitSet;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexReader;
import java.io.IOException;
import java.util.List;
public class IdFilter extends Filter {
private List<String> ids;
public IdFilter(List<String> ids) {
this.ids = ids;
}
public BitSet bits(IndexReader reader) throws IOException {
BitSet bits = new BitSet(reader.maxDoc());
int[] docs = new int[1];
int[] freqs = new int[1];
for( String id : ids ) {
if (id != null) {
TermDocs termDocs = reader.termDocs(new Term("id", id ) );
int count = termDocs.read(docs, freqs);
if (count == 1) {
bits.set(docs[0]);
}
}
}
return bits;
}
}
Then you would put the filter as an argument to your search (making sure to import the Filter class if its in a different package):
def theSearchResult = MyDomainClass.search(
{
must( queryString(params.q) )
},
params,
filter: new IdFilter( [ "1" ] ))
Here I'm just creating a hard-coded list with a single value of "1" in it, but you could retrieve a list of ids from the database, from a previous search, or wherever.
You could easily abstract the filter I have to take the term name in the constructor, then pass in "name" like you want.

Two ways of doing this:
The easiest from the implementation standpoing is do two searches (one findAll and search) on all objects and then find intersection between them. If you cache the result of findAll call, then you are really down to one query you have to make.
A more "clean" way to do this is to make sure to index the IDs of the domain objects with Searchable, and when you get the findAll result, pass in those IDs into the search query, thus limiting it.
I don't remember the Lucene syntax off the top of my head, but you'd have to do something like
searchableOrders.search("name: foo AND (ID:4 or ID:5 or ID:8 ...)" )
You may run into query size limits in Lucene, but I think there are settings that allows you to control query length.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart