Apache superset caching customization - docker

I'm trying to integrate apache superset to my multi tenant application and I'm having the below issue regarding caching:
superset provide almost everything we need to be easily configured in superset_config.py, but in my case I'm trying to set the caching key to have the tenant ID so I can have data segregation between the tenants, so what I should do is to get the tenant ID from the session and append it to the key
Note that I'm running superset:2.0.0
Below are my caching configurations in superset_config.py:
FILTER_STATE_CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': 86400,
'CACHE_KEY_PREFIX': "FILTER_STATE_CACHE_CONFIG",
'CACHE_REDIS_URL': 'redis://xxx.xxx.xxx.xxx:6379/0'
}
DATA_CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_KEY_PREFIX": "DATA_CACHE_CONFIG_", # make sure this string is unique to avoid collisions
"CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours
'CACHE_REDIS_URL': 'redis://xxx.xxx.xxx.xxx:6379/0'
}
EXPLORE_FORM_DATA_CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_KEY_PREFIX": "EXPLORE_FORM_DATA_CACHE_CONFIG_", # make sure this string is unique to avoid collisions
"CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours
'CACHE_REDIS_URL': 'redis://xxx.xxx.xxx.xxx:6379/0'
}
What I did is I update the cache.py file in superset/utils/
and I amend the set_and_log_cache function to read the tenant id from the session:
cache_key = cache_key + "" + "Tenant:" + str(session["Tenant_Id"])
I can see the keys have their corresponding tenant ids in the redis CLI but the caching is not working in superset!
Is there any sort of configuration I should add or anything that I'm missing?
I would really be thankful for any kind of help
Note that I'm using docker exec command to enter the container then I made my code changes and committed to a new image, then I use the new image in the docker-compose-non-dev.yml

Related

Gatling: Can ramping up of individual scenarios be done just like the users?

Consider an example of testing API's with Gatling. For some weird requirement i had to get a scenario for each user
var scenarioList // This is of type mutable list
I have plenty of scenarios added to this list as my request body should differ for each user or the request won't be processed.This individual scenarios have following gatling simulation configured at this moment
Ex: scenarioList += scenario1. inject(rampUsers(1) over (1 minutes)
scenarioList += scenario2. inject(rampUsers(1) over (1 minutes)
scenarioList += scenario3. inject(rampUsers(1) over (1 minutes)
.
.
.
so on
Now in the global setup as below while calling all these scenarios
setUp(scenarioList: _*).assertions(
forAll.successfulRequests.percent.gte(90)
)
Suppose i have 1000 users (scenarioList size is 1000), The problem here would be all of the 1000 users would start at the same time but i want to ramp up these users. So the question comes of ramping up the scenarios instead of running them parallely.
Is this possible ? If not is there any other approach to follow ?
I can't have the luxury of running the same scenario with multiple users as the body of the requests change. Please let me know.
I was able to solve this problem by using feeders within the scenario so i don't need to create multiple scenarios.
With feeders Gatling provides option to parameterize your request body of any http request.
Code Example:
var randomSession = Iterator.continually(Map("randsession" -> ( req.replace("0000000000", randomStringGenerator.randomString(10)))))
val httpConf = http
.baseURL("http://localhost:5000")
.acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.userAgentHeader("Mozilla/4.0(compatible;IE;GACv10. 0. 0. 1)")
val scn = scenario("Activate")
.feed(randomSession)
.exec(http("activate request")
.post("/login/activate")
.body(StringBody("""${randsession}"""))
.check(status.is(200)))
.pause(5)
setUp(
scn.inject(atOnceUsers(5))
).protocols(httpConf)
}

Azure SDK Ruby Set Container ACL

I'm using https://github.com/Azure/azure-sdk-for-ruby in my Rails application. I need to set the container policy but I don't know how to create a Signed Identifier instance for the set_container_acl method.
The comments say to pass in an array of "Azure::Entity::SignedIdentifier instances" but when I try to create an instance I get "uninitialized constant Azure::Storage::Entity". Scoured the net/documentation can't find anything about it.
After digging around in the azure gem files I was able to find a signed identifier file in the service directory. It's not loaded with azure for some reason so you have to require it.
require 'azure'
require 'azure/service/signed_identifier'
def some_method
# Some code here. Create blobs instance.
# blobs = Azure::Blob::BlobService.new
sas = Azure::Service::SignedIdentifier.new
sas.id = identifier
policy = sas.access_policy
policy.start = (Time.now - 5 * 60).utc.iso8601
policy.expiry = (Time.now + 1.years).utc.iso8601
policy.permission = "r"
identifiers = [sas]
options = { timeout: 60, signed_identifiers: identifiers }
container, signed = blobs.set_container_acl(container_name, "", options)
end

Mongoid caching or multiple requests accessing the Database

I have an application running on Rails 4 using MongoDB and mongoid to connect the Rails app with Mongodb.
Let me please paste in a few lines of my trouble maker code.
def generate_transaction_uuid
current_time = Time.now
julian_date = current_time.strftime("%j")
year = current_time.strftime("%Y")
date = current_time.strftime("%H")
channel_identifier = '1'
last_transaction_random_sequence = TransactionUuid.last.transaction_uuid[-5..-1]
if last_transaction_random_sequence.to_i == 99999
running_sequence_numbers = '00000'
else
current_sequence_number = (last_transaction_random_sequence.to_i + 1).to_s
running_sequence_numbers = '0' * (5 - (current_sequence_number).length) + current_sequence_number.to_s
end
uuid = year[-1] + julian_date[-3..-1] + date + channel_identifier + running_sequence_numbers
TransactionUuid.create(transaction_uuid: uuid)
uuid
end
The code above generates a transaction uuid. The logic ( the core part ) is generating the last 5 digits of the transaction id, the running sequence number. Currently the logic is to generate an id and store it in a table 'TransactionUuid'. When the next request arrives, the last stored uuid is queried from the table, the last 5 digits extracted out of that and the next sequence number is created. Its again inserted into the DB and the process goes on.
last_transaction_random_sequence = TransactionUuid.last.transaction_uuid[-5..-1]
if last_transaction_random_sequence.to_i == 99999
running_sequence_numbers = '00000'
else
current_sequence_number = (last_transaction_random_sequence.to_i + 1).to_s
running_sequence_numbers = '0' * (5 - (current_sequence_number).length) + current_sequence_number.to_s
end
This works fine in most cases and gives unique transaction ids for each Rails request. But on certain cases, this generates duplicate transaction Ids. Why exactly does that happen is quite unclear to me because from the code its quite evident that the transaction Ids generated are to be unique. Also, when i run the code manually it doesnt seem to be having a problem.
I inspected into the Rails logs to check if the problem is having two simultaneous requests accessing the database at the same time, which is possible. But from the Rails logs, it seems that even requests that have a difference of about 20seconds in between them seem to generate the same transaction id on certain instances. This is whats really confusing me. We use nginx-passenger to serve our requests.
What exactly might be the issue here, some kind of database caching that happens internally or some totally unrelated weird problem or may be even a bug in my code? Any kind of help on this would be much appreciated.

How to setup service method caching in grails

My application has a couple of services that make external calls via httpClient (GET and POST) that are unlikely to change in months, but they are slow; making my application even slower.
Clarification: this is NOT about caching GORM/hibernate/queries to my db.
How can I cache these methods (persistence on disk gets bonus points...) in grails 2.1.0?
I have installed grails-cache-plugin but it doesn't seem to be working, or i configured it wrong (very hard to do since there are 2-5 lines to add only, but i've managed to do it in the past)
I also tried setting up an nginx proxy cache in front of my app, but when i submit one of my forms with slight changes, I get the first submission as result.
Any suggestions/ideas will be greatly appreciated.
EDIT: Current solution (based on Marcin's answer)
My config.groovy: (the caching part only)
//caching
grails.cache.enabled = true
grails.cache.clearAtStartup = false
grails.cache.config = {
defaults {
timeToIdleSeconds 3600
timeToLiveSeconds 2629740
maxElementsInMemory 1
eternal false
overflowToDisk true
memoryStoreEvictionPolicy 'LRU'
}
diskStore {
path 'cache'
}
cache {
name 'scoring'
}
cache {
name 'query'
}
}
The important parts are:
do not clear at startup (grails.cache.clearAtStartup = false)
overflowToDisk=true persists all results over maxElementsInMemory
maxElementsInMemory=1 reduced number of elements in memory
'diskStore' should be writable by the user running the app.
Grails Cache Plugin works quite well for me under Grails 2.3.11. Documentation is pretty neat, but just to show you a draft...
I use the following settings in Config.groovy:
grails.cache.enabled = true
grails.cache.clearAtStartup = true
grails.cache.config = {
defaults {
maxElementsInMemory 10000
overflowToDisk false
maxElementsOnDisk 0
eternal true
timeToLiveSeconds 0
}
cache {
name 'somecache'
}
}
Then, in the service I use something like:
#Cacheable(value = 'somecache', key = '#p0.id.toString().concat(#p1)')
def serviceMethod(Domain d, String s) {
// ...
}
Notice the somecache part is reused. Also, it was important to use String as key in my case. That's why I used toString() on id.
The plugin can be also set up to use disk storage, but I don't use it.
If it doesn't help, please provide more details on your issue.
This may not help, but if you upgrade the application to Grails 2.4.x you can use the #Memoize annotation. This will automagically cache the results of each method call based upon the arguments passed into it.
In order to store this "almost static" information you could use Memcached or Redis as a cache system. (There are many others)
This two cache systems allows you to store key-value data (in your case something like this "key_GET": JSON,XML,MAP,String ).
Here is a related post: Memcached vs. Redis?
Regards.

zipline backtesting using non-US (European) intraday data

I'm trying to get zipline working with non-US, intraday data, that I've loaded into a pandas DataFrame:
BARC HSBA LLOY STAN
Date
2014-07-01 08:30:00 321.250 894.55 112.105 1777.25
2014-07-01 08:32:00 321.150 894.70 112.095 1777.00
2014-07-01 08:34:00 321.075 894.80 112.140 1776.50
2014-07-01 08:36:00 321.725 894.80 112.255 1777.00
2014-07-01 08:38:00 321.675 894.70 112.290 1777.00
I've followed moving-averages tutorial here, replacing "AAPL" with my own symbol code, and the historical calls with "1m" data instead of "1d".
Then I do the final call using algo_obj.run(DataFrameSource(mydf)), where mydf is the dataframe above.
However there are all sorts of problems arising related to TradingEnvironment. According to the source code:
# This module maintains a global variable, environment, which is
# subsequently referenced directly by zipline financial
# components. To set the environment, you can set the property on
# the module directly:
# from zipline.finance import trading
# trading.environment = TradingEnvironment()
#
# or if you want to switch the environment for a limited context
# you can use a TradingEnvironment in a with clause:
# lse = TradingEnvironment(bm_index="^FTSE", exchange_tz="Europe/London")
# with lse:
# the code here will have lse as the global trading.environment
# algo.run(start, end)
However, using the context doesn't seem to fully work. I still get errors, for example stating that my timestamps are before the market open (and indeed, looking at trading.environment.open_and_close the times are for the US market.
My question is, has anybody managed to use zipline with non-US, intra-day data? Could you point me to a resource and ideally example code on how to do this?
n.b. I've seen there are some tests on github that seem related to the trading calendars (tradincalendar_lse.py, tradingcalendar_tse.py , etc) - but this appears to only handle data at the daily level. I would need to fix:
open/close times
reference data for the benchmark
and probably more ...
I've got this working after fiddling around with the tutorial notebook. Code sample below. It's using the DF mid, as described in the original question. A few points bear mentioning:
Trading Calendar I create one manually and assign to trading.environment, by using non_working_days in tradingcalendar_lse.py. Alternatively you could create one that fits your data exactly (however could be a problem for out-of-sample data). There are two fields that you need to define: trading_days and open_and_closes.
sim_params There is a problem with the default start/end values because they aren't timezone aware. So you must create a sim_params object and pass start/end parameters with a timezone.
Also, run() must be called with the argument overwrite_sim_params=False as calculate_first_open/close raise timestamp errors.
I should mention that it's also possible to pass pandas Panel data, with fields open,high,low,close,price and volume in the minor_axis. But in this case, the former fields are mandatory - otherwise errors are raised.
Note that this code only produces a daily summary of the performance. I'm sure there must be a way to get the result at a minute resolution (I thought this was set by emission_rate, but apparently it's not). If anybody knows please comment and I'll update the code.
Also, not sure what the api call is to call 'analyze' (i.e. when using %%zipline magic in IPython, as in the tutorial, the analyze() method gets automatically called. How do I do this manually?)
import pytz
from datetime import datetime
from zipline.algorithm import TradingAlgorithm
from zipline.utils import tradingcalendar
from zipline.utils import tradingcalendar_lse
from zipline.finance.trading import TradingEnvironment
from zipline.api import order_target, record, symbol, history, add_history
from zipline.finance import trading
def initialize(context):
# Register 2 histories that track daily prices,
# one with a 100 window and one with a 300 day window
add_history(10, '1m', 'price')
add_history(30, '1m', 'price')
context.i = 0
def handle_data(context, data):
# Skip first 30 mins to get full windows
context.i += 1
if context.i < 30:
return
# Compute averages
# history() has to be called with the same params
# from above and returns a pandas dataframe.
short_mavg = history(10, '1m', 'price').mean()
long_mavg = history(30, '1m', 'price').mean()
sym = symbol('BARC')
# Trading logic
if short_mavg[sym] > long_mavg[sym]:
# order_target orders as many shares as needed to
# achieve the desired number of shares.
order_target(sym, 100)
elif short_mavg[sym] < long_mavg[sym]:
order_target(sym, 0)
# Save values for later inspection
record(BARC=data[sym].price,
short_mavg=short_mavg[sym],
long_mavg=long_mavg[sym])
def analyze(context,perf) :
perf["pnl"].plot(title="Strategy P&L")
# Create algorithm object passing in initialize and
# handle_data functions
# This is needed to handle the correct calendar. Assume that market data has the right index for tradeable days.
# Passing in env_trading_calendar=tradingcalendar_lse doesn't appear to work, as it doesn't implement open_and_closes
from zipline.utils import tradingcalendar_lse
trading.environment = TradingEnvironment(bm_symbol='^FTSE', exchange_tz='Europe/London')
#trading.environment.trading_days = mid.index.normalize().unique()
trading.environment.trading_days = pd.date_range(start=mid.index.normalize()[0],
end=mid.index.normalize()[-1],
freq=pd.tseries.offsets.CDay(holidays=tradingcalendar_lse.non_trading_days))
trading.environment.open_and_closes = pd.DataFrame(index=trading.environment.trading_days,columns=["market_open","market_close"])
trading.environment.open_and_closes.market_open = (trading.environment.open_and_closes.index + pd.to_timedelta(60*7,unit="T")).to_pydatetime()
trading.environment.open_and_closes.market_close = (trading.environment.open_and_closes.index + pd.to_timedelta(60*15+30,unit="T")).to_pydatetime()
from zipline.utils.factory import create_simulation_parameters
sim_params = create_simulation_parameters(
start = pd.to_datetime("2014-07-01 08:30:00").tz_localize("Europe/London").tz_convert("UTC"), #Bug in code doesn't set tz if these are not specified (finance/trading.py:SimulationParameters.calculate_first_open[close])
end = pd.to_datetime("2014-07-24 16:30:00").tz_localize("Europe/London").tz_convert("UTC"),
data_frequency = "minute",
emission_rate = "minute",
sids = ["BARC"])
algo_obj = TradingAlgorithm(initialize=initialize,
handle_data=handle_data,
sim_params=sim_params)
# Run algorithm
perf_manual = algo_obj.run(mid,overwrite_sim_params=False) # overwrite == True calls calculate_first_open[close] (see above)
#Luciano
You can add analyze(None, perf_manual)at the end of your code for automatically running the analyze process.

Resources