How to detect changes in a time series data using Spark streaming

How to detect changes in a time series data using Spark streaming - time-series

I have a continuous stream of data in Kafka. I want to count the number of times a column value in the stream of data has changed.
Which algorithm I should be using for this?

Is this what you are looking for?
Compare previous value with the current and filter out scenarios where current equals previous. After this do a count.
case class TimeSeriesEntry(
key: String,
timestamp: Instant,
value: Long
)
val timeSeriesData: Dataset[TimeSeriesEntry] = null
timeSeriesData
.groupByKey(_.key)
.mapGroups { (k, timeSeriesEntries: Iterator[TimeSeriesEntry]) =>
val last = timeSeriesEntries.next()
if (!timeSeriesEntries.hasNext) {
(k, true)
} else {
val secondLast = timeSeriesEntries.next()
(k, last != secondLast)
}
}.filter {
_._2
}.groupByKey(_._1)
.count()

Related

Cosmos DB stored procedure: I can query the DB, but when I try to upsert I get a 'not same partition' error

I understand that stored procedures run in the scope of a single partition key.
It is also possible to do operations that change data, not just read it.
ID must be string, so I must roll my own autoincrementer for a separate property to use in documents.
I am trying to make a simple autoincrement number generator that runs in a single stored procedure.
I am partitioning data mimicking a file tree, using forward slashes to separate+concatenate significant bits that make my partition names. Like so:
/sometype/foo/bar/
/sometype/ids/
The first item is always the document type, and every document type will have a 'ids' sub-partition.
Instead of holding documents, the /sometype/ids/ partition will hold and reserve all numerical ids that have been created for this document type, for autoincrement purposes.
this satisfies uniqueness within a partition, stored procedure execution scope, and unique document count within a document type, which is good for my purposes.
I got stumped in a stored procedure where I want to get a specified id, or create it if it does not exist.
I can query my partition with the stored procedure, but the upsert throws an error, using the same partition key.
I designed my database with "pkey" as the name of the property that will holds my partition keys.
Here is the code:
//this stored procedure is always called from a partition of type /<sometype>/ids/ , where <sometype> os one of my document types.
//the /sometype/ids/ is a partition to reserve unique numerical ids, as Cosmos DB does not have a numerical increment out of the box, I am creating a facility for that.
//the actual documents of /sometype/ will be subpartitioned as well for performance.
function getId(opkey, n, id) {
// gets the requested number if available, or next one.
//opkey: string - a partition key of cosmos db of the object that is going to consume the generated ID, if known. must start with /<sometype>/ which is the same that is being used to call this SP
//n: integer - a numerical number for the autoincrement
//id = '' : string - a uuid of the document that is using this id, if known
if (opkey === undefined) throw new Error('opkey cannot be null. must be a string. must be a valid partition key on Cosmos DB.');
n = (n === undefined || n === null)?0:n;
id = (id === undefined || id === null)?'':id;
var collection = getContext().getCollection();
//make opkey parameter into an array
var split_pkey = opkey.split('/');
//recreate the pkey /<sometype>/ids/ because I can't find a reference to this string inside the context.
var idpkey = '/'+split_pkey[1]+'/ids/';
//first query as SQL
//get highest numerical value.
var q = 'SELECT TOP 1 * FROM c \
WHERE c.pkey = \''+idpkey+'\' ORDER BY c.n desc';
//helper function to create uuids. can I ditch it?
function CreateUUID() {
return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
return v.toString(16);
});
}
// Query documents and take 1st item.
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
q
,
function (firstError, feed, options) {
if (firstError) throw "firstError:"+firstError;
//console.log(collection.options.);
console.log(idpkey+', '+n+', '+id+"-");
var maxn = 0;
// take 1st element from feed
if (!feed || !feed.length) {
//var response = getContext().getResponse();
//response.setBody(null);
}
else {
maxn = feed[0].n;
//var response = getContext().getResponse();
//var body = { original: '', document: '', feed: feed[0] };
//response.setBody(JSON.stringify(body));
}
console.log(maxn);
//query for existing numerical value
q = 'SELECT TOP 1 * FROM c \
WHERE c.pkey = \''+idpkey+'\' \
AND \
c.number = '+n+' \
OR \
c.id = \''+id+'\'';
var isAccepted2 = collection.queryDocuments(
collection.getSelfLink(),
q
,
function (secondFetchError, feed2, options2) {
if (secondFetchError) throw "second error:"+secondFetchError;
//if no numerical value found, create a new (autoincrement)
if (!feed || !feed.length) {
console.log("|"+idpkey);
var uuid = CreateUUID();
var newid = {
id:uuid,
pkey:idpkey,
doc_pkey:opkey,
n:maxn+1
};
//here I used the javascript query api
//it throws an error claiming the primary key is different and I don't know why, I am using idpkey all the time
var isAccepted3 = collection.upsertDocument(
collection.getSelfLink(),
newid
,
function (upsertError,feed3,options3){
if (upsertError) throw "upsert error:"+upsertError;
//if (upsertError) console.log("upsert error:|"+idpkey+"|");
//var response = getContext().getResponse();
//response.setBody(feed[0]);
});
if (!isAccepted3) throw new Error('The third query was not accepted by the server.');
console.log(" - "+uuid);
}
else {
//if id found, return it
//maxn = feed[0].n;
var response = getContext().getResponse();
response.setBody(feed[0]);
//var body = { original: '', document: '', feed: feed[0] };
//response.setBody(JSON.stringify(body));
}
});
if (!isAccepted2) throw new Error('The second query was not accepted by the server.');
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
The error message is :
"Requests originating from scripts cannot reference partition keys other than the one for which client request was submitted."
I don't understand why it thinks it is in error, as I am using the variable idpkey in all queries to hold the correct pkey.

Talk about brain fart!
I was violating my own rules because I was misspelling the partition name in the request, making the first part of the partition key /sometype/ different from the parameter sent, causing a mismatch between the execution scope's partition key and the idpkey variable, resulting in the error.

How do I query all documents in a Firestore collection for all strings in an array? [duplicate]

From the docs:
You can also chain multiple where() methods to create more specific queries (logical AND).
How can I perform an OR query?
Example:
Give me all documents where the field status is open OR upcoming
Give me all documents where the field status == open OR createdAt <= <somedatetime>

OR isn't supported as it's hard for the server to scale it (requires keeping state to dedup). The work around is to issue 2 queries, one for each condition, and dedup on the client.
Edit (Nov 2019):
Cloud Firestore now supports IN queries which are a limited type of OR query.
For the example above you could do:
// Get all documents in 'foo' where status is open or upcmoming
db.collection('foo').where('status','in',['open','upcoming']).get()
However it's still not possible to do a general OR condition involving multiple fields.

With the recent addition of IN queries, Firestore supports "up to 10 equality clauses on the same field with a logical OR"
A possible solution to (1) would be:
documents.where('status', 'in', ['open', 'upcoming']);
See Firebase Guides: Query Operators | in and array-contains-any

suggest to give value for status as well.
ex.
{ name: "a", statusValue = 10, status = 'open' }
{ name: "b", statusValue = 20, status = 'upcoming'}
{ name: "c", statusValue = 30, status = 'close'}
you can query by ref.where('statusValue', '<=', 20) then both 'a' and 'b' will found.
this can save your query cost and performance.
btw, it is not fix all case.

I would have no "status" field, but status related fields, updating them to true or false based on request, like
{ name: "a", status_open: true, status_upcoming: false, status_closed: false}
However, check Firebase Cloud Functions. You could have a function listening status changes, updating status related properties like
{ name: "a", status: "open", status_open: true, status_upcoming: false, status_closed: false}
one or the other, your query could be just
...where('status_open','==',true)...
Hope it helps.

This doesn't solve all cases, but for "enum" fields, you can emulate an "OR" query by making a separate boolean field for each enum-value, then adding a where("enum_<value>", "==", false) for every value that isn't part of the "OR" clause you want.
For example, consider your first desired query:
Give me all documents where the field status is open OR upcoming
You can accomplish this by splitting the status: string field into multiple boolean fields, one for each enum-value:
status_open: bool
status_upcoming: bool
status_suspended: bool
status_closed: bool
To perform your "where status is open or upcoming" query, you then do this:
where("status_suspended", "==", false).where("status_closed", "==", false)
How does this work? Well, because it's an enum, you know one of the values must have true assigned. So if you can determine that all of the other values don't match for a given entry, then by deduction it must match one of the values you originally were looking for.
See also
in/not-in/array-contains-in: https://firebase.google.com/docs/firestore/query-data/queries#in_and_array-contains-any
!=: https://firebase.googleblog.com/2020/09/cloud-firestore-not-equal-queries.html

I don't like everyone saying it's not possible.
it is if you create another "hacky" field in the model to build a composite...
for instance, create an array for each document that has all logical or elements
then query for .where("field", arrayContains: [...]

you can bind two Observables using the rxjs merge operator.
Here you have an example.
import { Observable } from 'rxjs/Observable';
import 'rxjs/add/observable/merge';
...
getCombinatedStatus(): Observable<any> {
return Observable.merge(this.db.collection('foo', ref => ref.where('status','==','open')).valueChanges(),
this.db.collection('foo', ref => ref.where('status','==','upcoming')).valueChanges());
}
Then you can subscribe to the new Observable updates using the above method:
getCombinatedStatus.subscribe(results => console.log(results);
I hope this can help you, greetings from Chile!!

We have the same problem just now, luckily the only possible values for ours are A,B,C,D (4) so we have to query for things like A||B, A||C, A||B||C, D, etc
As of like a few months ago firebase supports a new query array-contains so what we do is make an array and we pre-process the OR values to the array
if (a) {
array addObject:#"a"
}
if (b) {
array addObject:#"b"
}
if (a||b) {
array addObject:#"a||b"
}
etc
And we do this for all 4! values or however many combos there are.
THEN we can simply check the query [document arrayContains:#"a||c"] or whatever type of condition we need.
So if something only qualified for conditional A of our 4 conditionals (A,B,C,D) then its array would contain the following literal strings: #["A", "A||B", "A||C", "A||D", "A||B||C", "A||B||D", "A||C||D", "A||B||C||D"]
Then for any of those OR combinations we can just search array-contains on whatever we may want (e.g. "A||C")
Note: This is only a reasonable approach if you have a few number of possible values to compare OR with.
More info on Array-contains here, since it's newish to firebase docs

If you have a limited number of fields, definitely create new fields with true and false like in the example above. However, if you don't know what the fields are until runtime, you have to just combine queries.
Here is a tags OR example...
// the ids of students in class
const students = [studentID1, studentID2,...];
// get all docs where student.studentID1 = true
const results = this.afs.collection('classes',
ref => ref.where(`students.${students[0]}`, '==', true)
).valueChanges({ idField: 'id' }).pipe(
switchMap((r: any) => {
// get all docs where student.studentID2...studentIDX = true
const docs = students.slice(1).map(
(student: any) => this.afs.collection('classes',
ref => ref.where(`students.${student}`, '==', true)
).valueChanges({ idField: 'id' })
);
return combineLatest(docs).pipe(
// combine results by reducing array
map((a: any[]) => {
const g: [] = a.reduce(
(acc: any[], cur: any) => acc.concat(cur)
).concat(r);
// filter out duplicates by 'id' field
return g.filter(
(b: any, n: number, a: any[]) => a.findIndex(
(v: any) => v.id === b.id) === n
);
}),
);
})
);
Unfortunately there is no other way to combine more than 10 items (use array-contains-any if < 10 items).
There is also no other way to avoid duplicate reads, as you don't know the ID fields that will be matched by the search. Luckily, Firebase has good caching.
For those of you that like promises...
const p = await results.pipe(take(1)).toPromise();
For more info on this, see this article I wrote.
J

OR isn't supported
But if you need that you can do It in your code
Ex : if i want query products where (Size Equal Xl OR XXL : AND Gender is Male)
productsCollectionRef
//1* first get query where can firestore handle it
.whereEqualTo("gender", "Male")
.addSnapshotListener((queryDocumentSnapshots, e) -> {
if (queryDocumentSnapshots == null)
return;
List<Product> productList = new ArrayList<>();
for (DocumentSnapshot snapshot : queryDocumentSnapshots.getDocuments()) {
Product product = snapshot.toObject(Product.class);
//2* then check your query OR Condition because firestore just support AND Condition
if (product.getSize().equals("XL") || product.getSize().equals("XXL"))
productList.add(product);
}
liveData.setValue(productList);
});

For Flutter dart language use this:
db.collection("projects").where("status", whereIn: ["public", "unlisted", "secret"]);

actually I found #Dan McGrath answer working here is a rewriting of his answer:
private void query() {
FirebaseFirestore db = FirebaseFirestore.getInstance();
db.collection("STATUS")
.whereIn("status", Arrays.asList("open", "upcoming")) // you can add up to 10 different values like : Arrays.asList("open", "upcoming", "Pending", "In Progress", ...)
.addSnapshotListener(new EventListener<QuerySnapshot>() {
#Override
public void onEvent(#Nullable QuerySnapshot queryDocumentSnapshots, #Nullable FirebaseFirestoreException e) {
for (DocumentSnapshot documentSnapshot : queryDocumentSnapshots) {
// I assume you have a model class called MyStatus
MyStatus status= documentSnapshot.toObject(MyStatus.class);
if (status!= null) {
//do somthing...!
}
}
}
});
}

search in maps dart2 , same as list.indexOf?

I Use this sample for search in Map but not work :|:
var xmenList = ['4','xmen','4xmen','test'];
var xmenObj = {
'first': '4',
'second': 'xmen',
'fifth': '4xmen',
'author': 'test'
};
print(xmenList.indexOf('4xmen')); // 2
print(xmenObj.indexOf('4xmen')); // ?
but I have error TypeError: xmenObj.indexOf$1 is not a function on last code line.
Pelease help me to search in map object simple way same as indexOf.

I found the answer:
print(xmenObj.values.toList().indexOf('4xmen')); // 2
or this:
var ind = xmenObj.values.toList().indexOf('4xmen') ;
print(xmenObj.keys.toList()[ind]); // fifth

Maps are not indexable by integers, so there is no operation corresponding to indexOf. If you see lists as specialized maps where the keys are always consecutive integers, then the corresponding operation should find the key for a given value.
Maps are not built for that, so iterating through all the keys and values is the only way to get that result.
I'd do that as:
K keyForValue<K, V>(Map<K, V> map, V value) {
for (var entry in map.entries) {
if (entry.value == value) return key;
}
return null;
}
The entries getter is introduced in Dart 2. If you don't have that, then using the map.values.toList().indexOf(value) to get the iteration position, and then map.keys.elementAt(thatIndex) to get the corresponding key.
If you really only want the numerical index, then you can skip that last step.
It's not amazingly efficient (you allocate a new list and copy all the values). Another approach is:
int indexOfValue<V>(Map<Object, V> map, V value) {
int i = 0;
for (var mapValue in map.values) {
if (mapValue == value) return i;
i++;
}
return -1;
}

You can search using .where(...) if you want to find all that match or firstWhere if you assume there can only be one or you only want the first
var found = xmenObj.keys.firstWhere(
(k) => xmenObj[k] == '4xmen', orElse: () => null);
print(xmenObj[found]);

How to get random data from firestore? [duplicate]

It is crucial for my application to be able to select multiple documents at random from a collection in firebase.
Since there is no native function built in to Firebase (that I know of) to achieve a query that does just this, my first thought was to use query cursors to select a random start and end index provided that I have the number of documents in the collection.
This approach would work but only in a limited fashion since every document would be served up in sequence with its neighboring documents every time; however, if I was able to select a document by its index in its parent collection I could achieve a random document query but the problem is I can't find any documentation that describes how you can do this or even if you can do this.
Here's what I'd like to be able to do, consider the following firestore schema:
root/
posts/
docA
docB
docC
docD
Then in my client (I'm in a Swift environment) I'd like to write a query that can do this:
db.collection("posts")[0, 1, 3] // would return: docA, docB, docD
Is there anyway I can do something along the lines of this? Or, is there a different way I can select random documents in a similar fashion?
Please help.

Using randomly generated indexes and simple queries, you can randomly select documents from a collection or collection group in Cloud Firestore.
This answer is broken into 4 sections with different options in each section:
How to generate the random indexes
How to query the random indexes
Selecting multiple random documents
Reseeding for ongoing randomness
How to generate the random indexes
The basis of this answer is creating an indexed field that when ordered ascending or descending, results in all the document being randomly ordered. There are different ways to create this, so let's look at 2, starting with the most readily available.
Auto-Id version
If you are using the randomly generated automatic ids provided in our client libraries, you can use this same system to randomly select a document. In this case, the randomly ordered index is the document id.
Later in our query section, the random value you generate is a new auto-id (iOS, Android, Web) and the field you query is the __name__ field, and the 'low value' mentioned later is an empty string. This is by far the easiest method to generate the random index and works regardless of the language and platform.
By default, the document name (__name__) is only indexed ascending, and you also cannot rename an existing document short of deleting and recreating. If you need either of these, you can still use this method and just store an auto-id as an actual field called random rather than overloading the document name for this purpose.
Random Integer version
When you write a document, first generate a random integer in a bounded range and set it as a field called random. Depending on the number of documents you expect, you can use a different bounded range to save space or reduce the risk of collisions (which reduce the effectiveness of this technique).
You should consider which languages you need as there will be different considerations. While Swift is easy, JavaScript notably can have a gotcha:
32-bit integer: Great for small (~10K unlikely to have a collision) datasets
64-bit integer: Large datasets (note: JavaScript doesn't natively support, yet)
This will create an index with your documents randomly sorted. Later in our query section, the random value you generate will be another one of these values, and the 'low value' mentioned later will be -1.
How to query the random indexes
Now that you have a random index, you'll want to query it. Below we look at some simple variants to select a 1 random document, as well as options to select more than 1.
For all these options, you'll want to generate a new random value in the same form as the indexed values you created when writing the document, denoted by the variable random below. We'll use this value to find a random spot on the index.
Wrap-around
Now that you have a random value, you can query for a single document:
let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
.order(by: "random")
.limit(to: 1)
Check that this has returned a document. If it doesn't, query again but use the 'low value' for your random index. For example, if you did Random Integers then lowValue is 0:
let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: lowValue)
.order(by: "random")
.limit(to: 1)
As long as you have a single document, you'll be guaranteed to return at least 1 document.
Bi-directional
The wrap-around method is simple to implement and allows you to optimize storage with only an ascending index enabled. One downside is the possibility of values being unfairly shielded. E.g if the first 3 documents (A,B,C) out of 10K have random index values of A:409496, B:436496, C:818992, then A and C have just less than 1/10K chance of being selected, whereas B is effectively shielded by the proximity of A and only roughly a 1/160K chance.
Rather than querying in a single direction and wrapping around if a value is not found, you can instead randomly select between >= and <=, which reduces the probability of unfairly shielded values by half, at the cost of double the index storage.
If one direction returns no results, switch to the other direction:
queryRef = postsRef.whereField("random", isLessThanOrEqualTo: random)
.order(by: "random", descending: true)
.limit(to: 1)
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
.order(by: "random")
.limit(to: 1)
Selecting multiple random documents
Often, you'll want to select more than 1 random document at a time. There are 2 different ways to adjust the above techniques depending on what trade offs you want.
Rinse & Repeat
This method is straight forward. Simply repeat the process, including selecting a new random integer each time.
This method will give you random sequences of documents without worrying about seeing the same patterns repeatedly.
The trade-off is it will be slower than the next method since it requires a separate round trip to the service for each document.
Keep it coming
In this approach, simply increase the number in the limit to the desired documents. It's a little more complex as you might return 0..limit documents in the call. You'll then need to get the missing documents in the same manner, but with the limit reduced to only the difference. If you know there are more documents in total than the number you are asking for, you can optimize by ignoring the edge case of never getting back enough documents on the second call (but not the first).
The trade-off with this solution is in repeated sequences. While the documents are randomly ordered, if you ever end up overlapping ranges you'll see the same pattern you saw before. There are ways to mitigate this concern discussed in the next section on reseeding.
This approach is faster than 'Rinse & Repeat' as you'll be requesting all the documents in the best case a single call or worst case 2 calls.
Reseeding for ongoing randomness
While this method gives you documents randomly if the document set is static the probability of each document being returned will be static as well. This is a problem as some values might have unfairly low or high probabilities based on the initial random values they got. In many use cases, this is fine but in some, you may want to increase the long term randomness to have a more uniform chance of returning any 1 document.
Note that inserted documents will end up weaved in-between, gradually changing the probabilities, as will deleting documents. If the insert/delete rate is too small given the number of documents, there are a few strategies addressing this.
Multi-Random
Rather than worrying out reseeding, you can always create multiple random indexes per document, then randomly select one of those indexes each time. For example, have the field random be a map with subfields 1 to 3:
{'random': {'1': 32456, '2':3904515723, '3': 766958445}}
Now you'll be querying against random.1, random.2, random.3 randomly, creating a greater spread of randomness. This essentially trades increased storage to save increased compute (document writes) of having to reseed.
Reseed on writes
Any time you update a document, re-generate the random value(s) of the random field. This will move the document around in the random index.
Reseed on reads
If the random values generated are not uniformly distributed (they're random, so this is expected), then the same document might be picked a dispropriate amount of the time. This is easily counteracted by updating the randomly selected document with new random values after it is read.
Since writes are more expensive and can hotspot, you can elect to only update on read a subset of the time (e.g, if random(0,100) === 0) update;).

Posting this to help anyone that has this problem in the future.
If you are using Auto IDs you can generate a new Auto ID and query for the closest Auto ID as mentioned in Dan McGrath's Answer.
I recently created a random quote api and needed to get random quotes from a firestore collection.
This is how I solved that problem:
var db = admin.firestore();
var quotes = db.collection("quotes");
var key = quotes.doc().id;
quotes.where(admin.firestore.FieldPath.documentId(), '>=', key).limit(1).get()
.then(snapshot => {
if(snapshot.size > 0) {
snapshot.forEach(doc => {
console.log(doc.id, '=>', doc.data());
});
}
else {
var quote = quotes.where(admin.firestore.FieldPath.documentId(), '<', key).limit(1).get()
.then(snapshot => {
snapshot.forEach(doc => {
console.log(doc.id, '=>', doc.data());
});
})
.catch(err => {
console.log('Error getting documents', err);
});
}
})
.catch(err => {
console.log('Error getting documents', err);
});
The key to the query is this:
.where(admin.firestore.FieldPath.documentId(), '>', key)
And calling it again with the operation reversed if no documents are found.
I hope this helps!

Just made this work in Angular 7 + RxJS, so sharing here with people who want an example.
I used #Dan McGrath 's answer, and I chose these options: Random Integer version + Rinse & Repeat for multiple numbers. I also used the stuff explained in this article: RxJS, where is the If-Else Operator? to make if/else statements on stream level (just if any of you need a primer on that).
Also note I used angularfire2 for easy Firebase integration in Angular.
Here is the code:
import { Component, OnInit } from '#angular/core';
import { Observable, merge, pipe } from 'rxjs';
import { map, switchMap, filter, take } from 'rxjs/operators';
import { AngularFirestore, QuerySnapshot } from '#angular/fire/firestore';
#Component({
selector: 'pp-random',
templateUrl: './random.component.html',
styleUrls: ['./random.component.scss']
})
export class RandomComponent implements OnInit {
constructor(
public afs: AngularFirestore,
) { }
ngOnInit() {
}
public buttonClicked(): void {
this.getRandom().pipe(take(1)).subscribe();
}
public getRandom(): Observable<any[]> {
const randomNumber = this.getRandomNumber();
const request$ = this.afs.collection('your-collection', ref => ref.where('random', '>=', randomNumber).orderBy('random').limit(1)).get();
const retryRequest$ = this.afs.collection('your-collection', ref => ref.where('random', '<=', randomNumber).orderBy('random', 'desc').limit(1)).get();
const docMap = pipe(
map((docs: QuerySnapshot<any>) => {
return docs.docs.map(e => {
return {
id: e.id,
...e.data()
} as any;
});
})
);
const random$ = request$.pipe(docMap).pipe(filter(x => x !== undefined && x[0] !== undefined));
const retry$ = request$.pipe(docMap).pipe(
filter(x => x === undefined || x[0] === undefined),
switchMap(() => retryRequest$),
docMap
);
return merge(random$, retry$);
}
public getRandomNumber(): number {
const min = Math.ceil(Number.MIN_VALUE);
const max = Math.ceil(Number.MAX_VALUE);
return Math.floor(Math.random() * (max - min + 1)) + min;
}
}

The other solutions are better but seems hard for me to understand, so I came up with another method
Use incremental number as ID like 1,2,3,4,5,6,7,8,9, watch out for delete documents else we
have an I'd that is missing
Get total number of documents in the collection, something like this, I don't know of a better solution than this
let totalDoc = db.collection("stat").get().then(snap=>snap.size)
Now that we have these, create an empty array to store random list of number, let's say we want 20 random documents.
let randomID = [ ]
while(randomID.length < 20) {
const randNo = Math.floor(Math.random() * totalDoc) + 1;
if(randomID.indexOf(randNo) === -1) randomID.push(randNo);
}
now we have our 20 random documents id
finally we fetch our data from fire store, and save to randomDocs array by mapping through the randomID array
const randomDocs = randomID.map(id => {
db.collection("posts").doc(id).get()
.then(doc => {
if (doc.exists) return doc.data()
})
.catch(error => {
console.log("Error getting document:", error);
});
})
I'm new to firebase, but I think with this answers we can get something better or a built-in query from firebase soon

After intense argument with my friend, we finally found some solution
If you don't need to set document's id to be RandomID, just name documents as size of collection's size.
For example, first document of collection is named '0'.
second document name should be '1'.
Then, we just read the size of collection, for example N, and we can get random number A in range of [0~N).
And then, we can query the document named A.
This way can give same probability of randomness to every documents in collection.

undoubtedly Above accepted Answer is SuperUseful but There is one case like If we had a collection of some Documents(about 100-1000) and we want some 20-30 random Documents Provided that Document must not be repeated. (case In Random Problems App etc...).
Problem with the Above Solution:
For a small number of documents in the Collection(say 50) Probability of repetition is high. To avoid it If I store Fetched Docs Id and Add-in Query like this:
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: lowValue).where("__name__", isNotEqualTo:"PreviousId")
.order(by: "random")
.limit(to: 1)
here PreviousId is Id of all Elements that were fetched Already means A loop of n previous Ids.
But in this case, network Call would be high.
My Solution:
Maintain one Special Document and Keep a Record of Ids of this Collection only, and fetched this document First Time and Then Do all Randomness Stuff and check for previously not fetched on App site. So in this case network call would be only the same as the number of documents requires (n+1).
Disadvantage of My solution:
Have to maintain A document so Write on Addition and Deletion. But it is good If reads are very often then Writes which occurs in most cases.

You can use listDocuments() property for get only Query list of documents id. Then generate random id using the following way and get DocumentSnapshot with get() property.
var restaurantQueryReference = admin.firestore().collection("Restaurant"); //have +500 docs
var restaurantQueryList = await restaurantQueryReference.listDocuments(); //get all docs id;
for (var i = restaurantQueryList.length - 1; i > 0; i--) {
var j = Math.floor(Math.random() * (i + 1));
var temp = restaurantQueryList[i];
restaurantQueryList[i] = restaurantQueryList[j];
restaurantQueryList[j] = temp;
}
var restaurantId = restaurantQueryList[Math.floor(Math.random()*restaurantQueryList.length)].id; //this is random documentId

Unlike rtdb, firestore ids are not ordered chronologically. So using Auto-Id version described by Dan McGrath is easily implemented if you use the auto-generated id by the firestore client.
new Promise<Timeline | undefined>(async (resolve, reject) => {
try {
let randomTimeline: Timeline | undefined;
let maxCounter = 5;
do {
const randomId = this.afs.createId(); // AngularFirestore
const direction = getRandomIntInclusive(1, 10) <= 5;
// The firestore id is saved with your model as an "id" property.
let list = await this.list(ref => ref
.where('id', direction ? '>=' : '<=', randomId)
.orderBy('id', direction ? 'asc' : 'desc')
.limit(10)
).pipe(take(1)).toPromise();
// app specific filtering
list = list.filter(x => notThisId !== x.id && x.mediaCounter > 5);
if (list.length) {
randomTimeline = list[getRandomIntInclusive(0, list.length - 1)];
}
} while (!randomTimeline && maxCounter-- >= 0);
resolve(randomTimeline);
} catch (err) {
reject(err);
}
})

I have one way to get random a list document in Firebase Firestore, it really easy. When i upload data on Firestore i creat a field name "position" with random value from 1 to 1 milions. When i get data from Fire store i will set Order by field "Position" and update value for it, a lot of user load data and data always update and it's will be random value.

For those using Angular + Firestore, building on #Dan McGrath techniques, here is the code snippet.
Below code snippet returns 1 document.
getDocumentRandomlyParent(): Observable<any> {
return this.getDocumentRandomlyChild()
.pipe(
expand((document: any) => document === null ? this.getDocumentRandomlyChild() : EMPTY),
);
}
getDocumentRandomlyChild(): Observable<any> {
const random = this.afs.createId();
return this.afs
.collection('my_collection', ref =>
ref
.where('random_identifier', '>', random)
.limit(1))
.valueChanges()
.pipe(
map((documentArray: any[]) => {
if (documentArray && documentArray.length) {
return documentArray[0];
} else {
return null;
}
}),
);
}
1) .expand() is a rxjs operation for recursion to ensure we definitely get a document from the random selection.
2) For recursion to work as expected we need to have 2 separate functions.
3) We use EMPTY to terminate .expand() operator.
import { Observable, EMPTY } from 'rxjs';

Ok I will post answer to this question even thou I am doing this for Android. Whenever i create a new document i initiate random number and set it to random field, so my document looks like
"field1" : "value1"
"field2" : "value2"
...
"random" : 13442 //this is the random number i generated upon creating document
When I query for random document I generate random number in same range that I used when creating document.
private val firestore: FirebaseFirestore = FirebaseFirestore.getInstance()
private var usersReference = firestore.collection("users")
val rnds = (0..20001).random()
usersReference.whereGreaterThanOrEqualTo("random",rnds).limit(1).get().addOnSuccessListener {
if (it.size() > 0) {
for (doc in it) {
Log.d("found", doc.toString())
}
} else {
usersReference.whereLessThan("random", rnds).limit(1).get().addOnSuccessListener {
for (doc in it) {
Log.d("found", doc.toString())
}
}
}
}

Based on #ajzbc answer I wrote this for Unity3D and its working for me.
FirebaseFirestore db;
void Start()
{
db = FirebaseFirestore.DefaultInstance;
}
public void GetRandomDocument()
{
Query query1 = db.Collection("Sports").WhereGreaterThanOrEqualTo(FieldPath.DocumentId, db.Collection("Sports").Document().Id).Limit(1);
Query query2 = db.Collection("Sports").WhereLessThan(FieldPath.DocumentId, db.Collection("Sports").Document().Id).Limit(1);
query1.GetSnapshotAsync().ContinueWithOnMainThread((querySnapshotTask1) =>
{
if(querySnapshotTask1.Result.Count > 0)
{
foreach (DocumentSnapshot documentSnapshot in querySnapshotTask1.Result.Documents)
{
Debug.Log("Random ID: "+documentSnapshot.Id);
}
} else
{
query2.GetSnapshotAsync().ContinueWithOnMainThread((querySnapshotTask2) =>
{
foreach (DocumentSnapshot documentSnapshot in querySnapshotTask2.Result.Documents)
{
Debug.Log("Random ID: " + documentSnapshot.Id);
}
});
}
});
}

If you are using autoID this may also work for you...
let collectionRef = admin.firestore().collection('your-collection');
const documentSnapshotArray = await collectionRef.get();
const records = documentSnapshotArray.docs;
const index = documentSnapshotArray.size;
let result = '';
console.log(`TOTAL SIZE=====${index}`);
var randomDocId = Math.floor(Math.random() * index);
const docRef = records[randomDocId].ref;
result = records[randomDocId].data();
console.log('----------- Random Result --------------------');
console.log(result);
console.log('----------- Random Result --------------------');

Easy (2022). You need something like:
export const getAtRandom = async (me) => {
const collection = admin.firestore().collection('...').where(...);
const { count } = (await collection.count().get()).data();
const numberAtRandom = Math.floor(Math.random() * count);
const snap = await accountCollection.limit(1).offset(numberAtRandom).get()
if (accountSnap.empty) return null;
const doc = { id: snap.docs[0].id, ...snap.docs[0].data(), ref: snap.docs[0].ref };
return doc;
}

The next code (Flutter) will return one or up to ten random documents from a Firebase collection.
None of the documents will be repeated
Max 10 documents can be retrieved
If you pass a greater numberOfDocuments than existing documents in the collection, the loop will never end.
Future<Iterable<QueryDocumentSnapshot>> getRandomDocuments(int numberOfDocuments) async {
// Queried documents
final docs = <QueryDocumentSnapshot>[];
// Queried documents id's. We will use later to avoid querying same documents
final currentIds = <String>[];
do {
// Generate random id explained by #Dan McGrath's answer (autoId)
final randomId = FirebaseFirestore.instance.collection('random').doc().id;
var query = FirebaseFirestore.instance
.collection('myCollection') // Change this for you collection name
.where(FieldPath.documentId, isGreaterThanOrEqualTo: randomId)
.limit(1);
if (currentIds.isNotEmpty) {
// If previously we fetched a document we avoid fetching the same
query = query.where(FieldPath.documentId, whereNotIn: currentIds);
}
final querySnap = await query.get();
for (var element in querySnap.docs) {
currentIds.add(element.id);
docs.add(element);
}
} while (docs.length < numberOfDocuments); // <- Run until we have all documents we want
return docs;
}

Is this the best way to sum decimal value types in queries?

I was getting an error when I was execting the following query that summed a non-nullable decimal. The magnitude was null when there was no magnitude value for a location/organization combination.
let q = query { from m in Measure do
where (locations.Contains(m.Location)
&& organizations.Contains(m.Organization))
sumBy m.Magnitude }
Error: The null value cannot be assigned to a member with type System.Decimal which is a non-nullable value type.
I solved this in the following way. Is this there a better way to do this?
let convert (n:Nullable<decimal>) =
let c:decimal = 0M
if n.HasValue then n.Value else c
let q = query { from m in Measure do
let nullable = Nullable<decimal>(m.Magnitude)
where (locations.Contains(m.Location)
&& organizations.Contains(m.Organization))
sumBy (convert(nullable)) }
Thanks. I changed my query to the following.
query { for m in db.Measurement do
let nullable = Nullable<decimal>(m.Magnitude)
where ( m.Measure = me
&& (if slice.Organization > 0L then organizationUnits.Contains( m.PrimalOrganizationUnit.Value ) else true)
&& (if slice.Location > 0L then locationUnits.Contains(m.PrimalLocationUnit.Value) else true) )
sumByNullable (nullable) }

sumByNullable?

This is a LINQ-to-SQL issue (but is by design, since in SQL summing an empty column results in NULL, see here), not specific to F#. Your workaround seems to match the suggestion in the linked MSDN Forums thread.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to detect changes in a time series data using Spark streaming - time-series

I have a continuous stream of data in Kafka. I want to count the number of times a column value in the stream of data has changed. Which algorithm I should be using for this?

Related

Cosmos DB stored procedure: I can query the DB, but when I try to upsert I get a 'not same partition' error

How do I query all documents in a Firestore collection for all strings in an array? [duplicate]

search in maps dart2 , same as list.indexOf?

How to get random data from firestore? [duplicate]

Is this the best way to sum decimal value types in queries?

Categories

Resources