Solr Join parser performance issue

Solr Join parser performance issue - parsing

Solr Version: 6.3.0
Cloud: Yes
Shards: Single(1)
Data Size: 50GB
Records: 12M
We have a Solr Join query which tries to find the related ids from the given collection(Yes self join). This is causing a performance hit.
On analysis found that, Solr is scanning all the terms from the from_field irrespective of the q filter mentioned and then tries to do intersect with the to_field terms. Is there a way by which we can ask solr to filter the terms before doing intersect to the to_field in Join parser?
We have around 9M terms for the given solr field, which we assume to be cause for the the performance hit.
"join": {
"{!join from=from_field to=to_field fromIndex=insight_pats_1_shard1_replica1}to_field: \u0001\u0000\u0000\u0000\u0000\u0000\u0003X\u0002H": {
"time": 16824,
"fromSetSize": 1,
"toSetSize": 0,
"fromTermCount": 8561723,
"fromTermTotalDf": 8561723,
"fromTermDirectCount": 8561505,
"fromTermHits": 0,
"fromTermHitsTotalDf": 0,
"toTermHits": 0,
"toTermHitsTotalDf": 0,
"toTermDirectCount": 0,
"smallSetsDeferred": 0,
"toSetDocsAdded": 0
}
},
"rawquerystring": "*:*",
"querystring": "*:*",
"parsedquery": "(+MatchAllDocsQuery(*:*))/no_coord",
"parsedquery_toString": "+*:*",
"explain": { },
"QParser": "ExtendedDismaxQParser",
"altquerystring": null,
"boost_queries": null,
"parsed_boost_queries": [ ],
"boostfuncs": null,
"filter_queries": [
"account_ids:1",
"{!join from=from_field to=to_field fromIndex=insight_pats_1}to_field:7733576"
],
"parsed_filter_queries": [
"account_ids:1",
"JoinQuery({!join from=from_field to=to_field fromIndex=insight_pats_1_shard1_replica1}to_field: \u0001\u0000\u0000\u0000\u0000\u0000\u0003X\u0002H)"
]

There are two types of join parsers available
JoinQueryParser
ScoreJoinQParser
By default !join uses JoinQueryParser but is not optimal for joining records where size of Millions.
We can ask the SOLR to use ScoreJoinQParser by adding a parameter score=none in !join parser command as show below.
http://localhost:8983/solr/mycollection/select?fq={!join from=from_field to=to_field fromIndex=from_collection score=none}&indent=on&q=*:*&wt=json&debugQuery=on
We are able to achieve 30 times improvement in performance where the from_field terms are in the range of 8 Million

Related

Binance API response shows wrong minimum order quantity

Trying to hit the fapi.binance.com/fapi/v1/exchangeInfo endpoint in order to get the current minimum order quantity for a pair (let's take ETHUSDT as an example).
Here is what Binance UI shows as the minQty: Minimum order quantity for ETH is 0.004ETH
When hitting the above mentioned endpoint and inspecting the ETHUSDT symbol specific response:
{
"symbol": "ETHUSDT",
"pair": "ETHUSDT",
"contractType": "PERPETUAL",
"deliveryDate": 4133404800000,
"onboardDate": 1569398400000,
"status": "TRADING",
"maintMarginPercent": "2.5000",
"requiredMarginPercent": "5.0000",
"baseAsset": "ETH",
"quoteAsset": "USDT",
"marginAsset": "USDT",
"pricePrecision": 2,
"quantityPrecision": 3,
"baseAssetPrecision": 8,
"quotePrecision": 8,
"underlyingType": "COIN",
"underlyingSubType": [
"Layer-1"
],
"settlePlan": 0,
"triggerProtect": "0.0500",
"liquidationFee": "0.015000",
"marketTakeBound": "0.05",
"filters": [
{
"minPrice": "39.86",
"maxPrice": "306177",
"filterType": "PRICE_FILTER",
"tickSize": "0.01"
},
{
"stepSize": "0.001",
"filterType": "LOT_SIZE",
"maxQty": "10000",
"minQty": "0.001"
},
{
"stepSize": "0.001",
"filterType": "MARKET_LOT_SIZE",
"maxQty": "2000",
"minQty": "0.001"
},
{
"limit": 200,
"filterType": "MAX_NUM_ORDERS"
},
{
"limit": 10,
"filterType": "MAX_NUM_ALGO_ORDERS"
},
{
"notional": "5",
"filterType": "MIN_NOTIONAL"
},
{
"multiplierDown": "0.9500",
"multiplierUp": "1.0500",
"multiplierDecimal": "4",
"filterType": "PERCENT_PRICE"
}
],
"orderTypes": [
"LIMIT",
"MARKET",
"STOP",
"STOP_MARKET",
"TAKE_PROFIT",
"TAKE_PROFIT_MARKET",
"TRAILING_STOP_MARKET"
],
"timeInForce": [
"GTC",
"IOC",
"FOK",
"GTX"
]
}
We can observe that no data here indicates a correct minQty despite the fact that some documents in the filters array (like LOT_SIZE and MARKET_LOT_SIZE) are trying.
Am I missing something or is this a Binance API bug?

Took a while to figure it out, but I believe I have come to a solution/explanation.
As I previously noted, we are interested in the filters array found in the response of the fapi.binance.com/fapi/v1/exchangeInfo endpoint, specifically the MARKET_LOT_SIZE (LOT_SIZE if you are not interested in market orders) and MIN_NOTIONAL filters. When I wrote the question I assumed one of those had to be the source of truth, but none of the two were consistently matching the minimum order quantities seen on Binance UI.
It turns out its not one of them, but the two combined...sort of.
For inexperienced traders - a quick definition. Base asset - the asset you are buying. Quote asset - the asset you are acquiring the base asset with. In my case (ETHUSDT) ETH was the base asset and USDT was the quote asset.
MIN_NOTIONAL filter tells you the minimum amount of the quote asset you are required to spend to acquire the base asset. This filter is equal to 5 USDT in the case of ETHUSDT at the time of posting.
MARKET_LOT_SIZE filter tells you the minimum amount of the base asset you can buy. This filter is equal to 0.001 in the case of ETHUSDT at the time of posting.
Depending on the price of ETH, 0.001 ETH may cost less than 5 USDT, which would not fill the MIN_NOTIONAL filter's requirement. Similarly, if, due to price, 5 USDT bought less than 0.001 ETH, an order with notional value of 5 USDT would not fill the requirement of the MARKET_LOT_SIZE filter.
Hence, to calculate the minimum order quantity, we need to take the maximum value of the two with one caveat. We first need to convert the MIN_NOTIONAL filter value of 5 USDT to ETH. We can do that by dividing that value by the current price of ETH. Beware, if the result of the division has more decimal points then the quantity precision of the asset on Binance, we need to round up to that number of decimals.
Ultimately, the calculation example in Python:
max(
MIN_MARKET_LOT_SIZE,
round_up((MIN_NOTIONAL / CURRENT_ETH_PRICE), ETH_QUANTITY_PRECISION)
)
An example of the decimal rounding up function can be found here

Below calculation worked for me.
Divide MIN_NOTIONAL value with the current price of symbol to get the minimum quantity to place the order
minQty = MIN_NOTIONAL / current price of symbol
In my case,
MIN_NOTIONAL value was 5.0,
I have used 5.5, just to place the order more than the minimum quantity
use um_futures_client.mark_price(symbol) to get the mark price
Thanks

k6: how to manage rps-limit on each stage of increase the number of VUs

I have a question about basic term for which I did not find a detailed explanation. Input data: framework k6 v0.25.1, http-requests.
Question #1: what is the implementation of VU (virtual user) from a perspective:
1) client-side;
2) server-side;
3) interactions of client-server?
What should you read about subtleties of the VU essence, in particular within k6?
For now I found out what each VU occupies one network port on the client- and server-sides.
Load profiles:
1) rps:1; vus:1; duration for N minutes — I see in Grafana that increase in number of requests is really minimal: +~1rps. Everything is fine;
2) rps:1; vus: 1..1000 with acceleration during for N minutes by option target in the stages — I see that load has increased by ~+100rps in peak, although option "rps" according to k6 documentation is "The maximum number of requests to make per second, in total across all VUs" option i.e. instead of ~+100rps I expected to see load in ~1rps, by analogy with experience #1
— i.e. either k6 bug that rps limit incorrectly does not take amount of rps in all VUs threads or hidden legal behavior for VUs required for each VU to exist.
Note: I set an arbitrary timeout at beginning and end of scenario to achieve even load distribution.
Question #2: What could be cause of incredible growth of rps with illegally exceeded of rps limit when vus is increased?
Example:
import http from "k6/http";
export let options = {
stages: [
{ duration: "1m", target: 1, rps: 1 },
{ duration: "1m", target: 200, rps: 1 },
{ duration: "1m", target: 500, rps: 1 },
{ duration: "1m", target: 1000, rps: 1 },
{ duration: "1m", target: 500, rps: 1 },
{ duration: "1m", target: 200, rps: 1 },
{ duration: "1m", target: 1, rps: 1 },
]
};
export default function() {
http.get("https://httpbin.test.loadimpact.com/get");
console.log("request made by VU " + __VU);
};

Virtual User or VU is k6 specific definition and implementation. VU is the entity that execute your script, make one or more HTTP requests to your server.
If you are testing a web server, you can think VU is the same as real user.
If you are testing API, VU can produce more requests per second (RPS) to server than your real VUs. Example you can define 5 VUs, but each one can produce 10 requests per second. That's why when your VUs increase, you can reach RPS limit very quickly.
You can read more details about VU definition at this link.

How do I sum the product of two values, across multiple objects in Rails?

Imagine I have a portfolio p that has 2 stocks port_stocks. What I want to do is run a calculation on each port_stock, and then sum up all the results.
[60] pry(main)> p.port_stocks
=> [#<PortStock:0x00007fd520e064e0
id: 17,
portfolio_id: 1,
stock_id: 385,
volume: 2000,
purchase_price: 5.9,
total_spend: 11800.0>,
#<PortStock:0x00007fd52045be68
id: 18,
portfolio_id: 1,
stock_id: 348,
volume: 1000,
purchase_price: 9.0,
total_spend: 9000.0>]
[61] pry(main)>
So, in essence, using the code above I would like to do this:
ps = p.port_stocks.first #(`id=17`)
first = ps.volume * ps.purchase_price # 2000 * 5.9 = 11,800
ps = p.port_stocks.second #(`id=18`)
second = ps.volume * ps.purchase_price # 1000 * 9.0 = 9,000
first + second = 19,800
I want to simply get 19,800. Ideally I would like to do this in a very Ruby way.
If I were simply summing up all the values in 1 total_spend, I know I could simply do: p.port_stocks.map(&:total_spend).sum and that would be that.
But not sure how to do something similar when I am first doing a math operation on each object, then adding up all the products from all the objects. This should obviously work for 2 objects or 500.

The best way of doing this using Rails is to pass a block to sum, such as the following:
p.port_stocks.sum do |port_stock|
port_stock.volume * port_stock.purchase_price
end
That uses the method dedicated to totalling figures, and tends to be very fast and efficient - particularly when compared to manipulating the data ahead of calling a straight sum without a block.
A quick benchmark here typically shows it performing ~20% faster than the obvious alternatives.
I've not been able to test, but give that a try and it should resolve this for you.
Let me know how you get on!
Just a quick update as you also mention the best Ruby way, sum was introduced in 2.4, though on older versions of Ruby you can use reduce (also aliased to inject):
p.port_stocks.reduce(0) do |sum, port_stock|
sum + (port_stock.volume * port_stock.purchase_price)
end
This isn't as efficient as sum, but thought I'd give you the options :)

You are right to use Array#map to iterate through all stocks, but instead to sum all total_spend values, you could calculate it for each stock. After, you sum all results and your done:
p.port_stocks.map{|ps| ps.volume * ps.purchase_price}.sum
Or you could use Enumerable#reduce like SRack did. This would return the result with one step/iteration.

Does dart automatically determine which map type to use?

I was working on some code very similar to the example below that always seems to resolve in order, despite never specifying that I want the map object to be ordered.
Map<String, Map<String, int>> map = {
"a" : {"value": 1, "price": 2},
"c" : {"value": 3, "price": 8},
"x" : {"value": 2, "price": 1},
"b" : {"value": 1, "price": 8},
};
map.forEach((name, map) => print(map));
This always results in
a
c
x
b
Note: This is what I want it to do, I am just looking for an explanation as to why it does things this way, as it would be interesting to get a better understanding of why.

Dart's core libraries have a couple implementations of Map. Lets look!
NOTE: I am referencing the dev channel SDK, the one to be affiliated with Dart 2 (and being used for Flutter), so if you're currently using the stable SDK, the docs might look a little different
If you navigate to our docs:
-> https://api.dartlang.org/dev
-> dart:core (https://api.dartlang.org/dev/dart-core/dart-core-library.html)
-> Map (https://api.dartlang.org/dev/dart-core/Map-class.html)
You'll see the following description:
Maps, and their keys and values, can be iterated. The order of iteration is defined by the individual type of map.
And for the default constructor (the one used for {} syntax, as well):
Creates a LinkedHashMap instance that contains all key/value pairs of other.
You could create a Map of one of the other two types, HashMap (unordered; https://api.dartlang.org/dev/2.0.0-dev.35.0/dart-collection/HashMap-class.html) or SplayTreeMap (sorted; https://api.dartlang.org/dev/2.0.0-dev.35.0/dart-collection/SplayTreeMap-class.html) if you want different behavior.
The HashMap class (explicitly) can be useful where you want a bit less memory usage, or where order is explicitly not useful or not desired. For most uses cases, as you've mentioned, LinkedHashMap is totally appropriate.
Cheers!

NETLOGO: Making network with exact number of links

I need to do the network when agents will be connected with links and i want it make so that there will be exact number (variable) of links going from each agent. Lets say for example that i want 3 links going from each agent to another. No more, no less. I was trying to use this code:
let num-links (links * number) / 2
while [count links < num-links ]
[
ask one-of turtles
[
let choice (min-one-of (other turtles with [not link-neighbor? myself])
[distance myself])
if choice != nobody [ create-link-with choice ]
]
]
Where "number" is the number of nodes and "links" is number of links i want to go from each agent- But this code unfortunately works so that "links" is really just an average degree of node. So if I want 3 links, i could get all agent (except for example two) with 3 links going from them, but one of them would have only 1 link and another 5 (average is 3 then). Is there some way How to do it.
And is there some way how to do it so that each "link" would be actually two directed links, one going from the node and one going to the node?
And one last question. I want to give this links a variable, but i need to do it so that sum of these variables from each agent is exactly 100 (as percents).
Any help? Thank you very much.

Here is how I create a fixed degree network for a small network (easy to understand)
to make-legal
create-turtles 100 [setxy random-xcor random-ycor]
let target-degree 5
while [ min [ count my-links ] of turtles < target-degree ]
[ ask links [die]
makeNW-Lattice target-degree
]
end
to makeNW-Lattice [DD]
ask turtles
[ let needed DD - count my-links
if needed > 0
[ let candidates other turtles with [ count my-links < DD ]
create-links-with n-of min (list needed count candidates) candidates
]
]
end
See NetLogo Efficient way to create fixed number of links for more efficient methods for larger networks.
Please ask separate questions for separate issues
UPDATE to ensure all nodes have required degree, in response to comment
Based on the following code, the basic generator does a legal network a little under 50% of the time. Therefore I simply threw the original code in a while loop and regenerated if not legal. This is not a good solution for larger networks, but is a reasonable hack.
to setup
let mins (list 0 0 0 0 0 0)
repeat 100
[ ask turtles [die]
ask links [die]
makeNW-lattice
let this-min min [ count my-links ] of turtles
set mins replace-item this-min mins (item this-min mins + 1)
]
print mins
end

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart