How to assert count of the record in couchbase in Gatling Simulation - scala-gatling

I am trying to capture the record which has been created on running of gatling simulation test.
My scenario is that
read json data from csv and publish to kafka which is consumed by microservice and store data into couchbase,
since kafka publishing message in asyn mode so there is no way we can get to know that how many record got created in data base.
is there any way in gatling to get data from couchabse and assert so that if record in couchbase not equal to the request then simulation should fail ?
val scn = scenario("Order test sceanrio")
.feed(csv("TestOrder.csv").circular)
.exec(ProducerBuilder[Array[Byte], Array[Byte]]())
setUp(scn.inject(atOnceUsers(count))).protocols(kafkaProtocol)
//.assertion(getCouchbaseOrderCount == count) // not supported by
gatling

I Have resolved this issue by using tearDown in the simulation.
Below is the tearDown code for gatling,
after {
println("**************** Asserting scenario *****************")
assert(orderCount() == count)
}
def orderCount(): Int = {
val cluster = openConnection
val bucket: Bucket = cluster.openBucket("Order", "password");
println(bucket)
val query = "SELECT meta().id FROM `Order`"
Thread.sleep(1000)
val orderCount: Int = bucket.query(N1qlQuery.simple(query)).allRows().size();
println(" Order Count :: " + orderCount)
orderCount
}

Related

How to control whether a resource exists in Terraform?

I am building with Terraform a Cloud Scheduler job that will hit a service deployed in Cloud Run. Because the service and the scheduler are deployed in different pipelines, it is possible that the service does not exist yet when running the scheduler pipeline. This is why I am using data "google_cloud_run_service" to retrieve the service and then control whether it exists in the scheduler block. It is here that I don't find the right syntax to use in count.
data "google_cloud_run_service" "run-service" {
name = "serv-${var.project_env}"
location = var.region
}
resource "google_cloud_scheduler_job" "job" {
count = length(data.google_cloud_run_service.run-service.status)
name = "snap-job-${var.project_env}"
description = "Call write API on my service"
schedule = "* * * * *"
time_zone = "Etc/UTC"
http_target {
http_method = "GET"
uri = "${data.google_cloud_run_service.run-service.status[0].url}/write"
oidc_token {
service_account_email = google_service_account.sa_scheduler.email
}
}
depends_on = [google_app_engine_application.app]
}
The above length(data.google_cloud_run_service.run-service.status) control will not make any effect, and Terraform tries to create the scheduler even though there is no service defined.
I have also tried other variations with similar result such as length(data.google_cloud_run_service.run-service.status[0]) > 0 ? 1 : 0.
Other options that I tried will not work either with different errors:
data.google_cloud_run_service.run-service ? 1 : 0: data.google_cloud_run_service.run-service is object with 9 attributes;The condition expression must be of type bool
data.google_cloud_run_service.run-service.status[0].url ? 1 : 0: data.google_cloud_run_service.run-service is object with 9 attributes;The condition expression must be of type bool

Tensorflow federated : How to map the remote-worker with remote datasets in iterative_process.next?

I would like to point the federated_train_data to remote client data as shown in the code below.Is this possible? How ?
If not what further implementation is required for me to try this out. Kindly point me to the relevant code.
factory = tff.framework.create_executor_factory(make_remote_executor)
context = tff.framework.ExecutionContext(factory)
tff.framework.set_default_context(context)
state = iterative_process.initialize()
state, metrics = iterative_process.next(state, federated_train_data)
def make_remote_executor(inferred_cardinalities):
"""Make remote executor."""
def create_worker_stack(ex):
ex = tff.framework.ThreadDelegatingExecutor(ex)
return tff.framework.ReferenceResolvingExecutor(ex)
client_ex = []
num_clients = inferred_cardinalities.get(tff.CLIENTS, None)
if num_clients:
print('Inferred that there are {} clients'.format(num_clients))
else:
print('No CLIENTS placement provided')
for _ in range(num_clients or 0):
channel = grpc.insecure_channel('{}:{}'.format(FLAGS.host, FLAGS.port))
remote_ex = tff.framework.RemoteExecutor(channel, rpc_mode='STREAMING')
worker_stack = create_worker_stack(remote_ex)
client_ex.append(worker_stack)
federating_strategy_factory = tff.framework.FederatedResolvingStrategy.factory(
{
tff.SERVER: create_worker_stack(tff.framework.EagerTFExecutor()),
tff.CLIENTS: client_ex,
})
unplaced_ex = create_worker_stack(tff.framework.EagerTFExecutor())
federating_ex = tff.framework.FederatingExecutor(federating_strategy_factory,
unplaced_ex)
return tff.framework.ReferenceResolvingExecutor(federating_ex)
This is from https://github.com/tensorflow/federated/blob/master/tensorflow_federated/python/examples/remote_execution/remote_executor_example.py
In the linked example, you can see that the client data is coming from a tf.data.Dataset per-client generated by the make_federated data function.
Client data can be supplied in the form of a serializable tf.data.Dataset or, depending on how you're defining your iterative process, you can tff.federated_map some input data (such as client IDs) to datasets using TensorFlow.
Note that RemoteExecutors are not designed to run against data "on clients", that is, on the remote executor itself. They could perhaps be used this way using TensorFlow code to read data from the remote executor's filesystem into a dataset, but in general this is not a supported use-case. The recommended way to handle client data is to have a TensorFlow computation that can generate a tf.data.Dataset representing the client data based on a client ID or other input to the client's TensorFlow computation.

Increasing workers causes Dataflow job to hang on TextIO.Write - executes quickly using DirectRunner - Apache Beam

This program ingests records from a file, parses and saves the records to the database, and writes failure records to a Cloud Storage bucket. The test file I'm using only creates 3 failure records - when run locally the final step parseResults.get(failedRecords).apply("WriteFailedRecordsToGCS", TextIO.write().to(failureRecordsPath)); executes in milliseconds.
In Dataflow I am running the process with 5 workers. The process hangs indefinitely on the write step even after writing the 3 failure records successfully. I can see that it is hanging in the step WriteFailedRecordsToGCS/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Pair with random key.out0
Can anyone let me know why this behaves so differently between DirectRunner and Dataflow? The whole pipeline is below.
StageUtilizationDataSourceOptions options = PipelineOptionsFactory.fromArgs(args).as(StageUtilizationDataSourceOptions.class);
final TupleTag<Utilization> parsedRecords = new TupleTag<Utilization>("parsedRecords") {};
final TupleTag<String> failedRecords = new TupleTag<String>("failedRecords") {};
DrgAnalysisDbStage drgAnalysisDbStage = new DrgAnalysisDbStage(options);
HashMap<String, Client> clientKeyMap = drgAnalysisDbStage.getClientKeys();
Pipeline pipeline = Pipeline.create(options);
PCollectionTuple parseResults = PCollectionTuple.empty(pipeline);
PCollection<String> records = pipeline.apply("ReadFromGCS", TextIO.read().from(options.getGcsFilePath()));
if (FileTypes.utilization.equalsIgnoreCase(options.getFileType())) {
parseResults = records
.apply("ConvertToUtilizationRecord", ParDo.of(new ParseUtilizationFile(parsedRecords, failedRecords, clientKeyMap, options.getGcsFilePath()))
.withOutputTags(parsedRecords, TupleTagList.of(failedRecords)));
parseResults.get(parsedRecords).apply("WriteToUtilizationStagingTable", drgAnalysisDbStage.writeUtilizationRecordsToStagingTable());
} else {
logger.error("Unrecognized file type provided: " + options.getFileType());
}
String failureRecordsPath = Utilities.getFailureRecordsPath(options.getGcsFilePath(), options.getFileType());
parseResults.get(failedRecords).apply("WriteFailedRecordsToGCS", TextIO.write().to(failureRecordsPath));
pipeline.run().waitUntilFinish();

neo4j query running slowly in Python for Windows

I'm developing locally on a Windows 10 machine using Python 2.7. I'm using Neo4j 3.0.5 with the Bolt driver for Python.
Connection string is as follows:
db = GraphDatabase.driver("bolt://localhost:7687", auth=basic_auth("USERNAME", "PASSWORD"), encrypted=False)
When running queries I'm using the following syntax:
with db.session() as s:
with s.begin_transaction() as tx:
results = tx.run(
"MATCH (a:User{username:{username}}) RETURN a.username",
{
"username": username
})
For some reason there is about a 2 second latency.
I ran the following tests (with slightly different syntax, but exactly the same result) and the slow elements seem to be loading the database driver and then the FIRST run of the session.
from neo4j.v1 import GraphDatabase
import timeit
start_time = 0
elapsed = 0
task = ""
def startTimer(taskIn):
global task
global start_time
task = taskIn
start_time = timeit.default_timer()
def endTimer():
global task
global start_time
global elapsed
elapsed = round(timeit.default_timer() - start_time, 3)
print(task, elapsed)
startTimer("GraphDatabase.driver")
db = GraphDatabase.driver("bolt://localhost:7687", auth=("USERNAME", "PASSWORD"))
endTimer()
startTimer("db.session")
session = db.session()
endTimer()
startTimer("query1")
result = session.run(
"MATCH (a:User{username:{username}}) RETURN a.username, a.password_hash ",
{"username": "Pingu"})
endTimer()
startTimer("query2")
result = session.run(
"MATCH (a:User{username:{username}}) RETURN a.username, a.password_hash ",
{"username": "Pingu"})
endTimer()
startTimer("db.close")
session.close()
endTimer()
The results were as follow:
('GraphDatabase.driver', 1.308)
('db.session', 0.0)
('query1', 1.017)
('query2', 0.001)
('db.close', 0.009)
The string is the test step and the number is the number of seconds for execution.
I'm developing a Flask API and so I can get past the database driver load time by loading it once and then referencing the loaded instance.
However, I can't seem to get past the query1 issue.
Running the exact same code on an Ubuntu Server Virtual box runs like lightening, so this seems to be something to do with Windows implementation.
Any ideas how this can be resolved please?
Thank you very much!

How to fetch records set with a ttl of -1 in aerospike?

I have so many records in aerospike, i want to fetch the records whose ttl is -1 please provide solution
Just to clarify, setting a TTL of -1 in the client means never expire (equivalent to a default-ttl of 0 in the server's aerospike.conf file), while setting a TTL of 0 in the client means inherit the default-ttl for this namespace.
With Predicate Filtering:
If you're using the Java, C, C# and Go clients the easiest way to identify the records with a void time of 0 would be to use a predicate filter.
In the Java app:
Statement stmt = new Statement();
stmt.setNamespace(params.namespace);
stmt.setSetName(params.set);
stmt.setPredExp(
PredExp.recVoidTime(),
PredExp.integerValue(0),
PredExp.integerEqual()
);
RecordSet rs = client.query(null, stmt);
Without Predicate Filtering:
With other clients that don't yet have predicate filtering (Python, PHP, etc), you would do it all through a stream UDF. The filtering logic would have to live inside the UDF.
ttl.lua
local function filter_ttl_zero(rec)
local rec_ttl = record.ttl(rec)
if rec_ttl == 0 then
return true
end
return false
end
local function map_record(rec)
local ret = map()
for i, bin_name in ipairs(record.bin_names(rec)) do
ret[bin_name] = rec[bin_name]
end
return ret
end
function get_zero_ttl_recs(stream)
return stream : filter(filter_ttl_zero) : map(map_record)
end
In AQL:
$ aql
Aerospike Query Client
Version 3.12.0
C Client Version 4.1.4
Copyright 2012-2017 Aerospike. All rights reserved.
aql> register module './ttl.lua'
OK, 1 module added.
aql> AGGREGATE ttl.get_zero_ttl_recs() on test.foo
Alternatively, you could run the stream UDF from the client. The following example is for the Python client:
import aerospike
import pprint
config = {'hosts': [('127.0.0.1', 3000)],
'lua': {'system_path':'/usr/local/aerospike/lua/',
'user_path':'/usr/local/aerospike/usr-lua/'}}
client = aerospike.client(config).connect()
pp = pprint.PrettyPrinter(indent=2)
query = client.query('test', 'foo')
query.apply('ttl', 'get_zero_ttl_recs')
records = query.results()
# we expect a dict (map) whose keys are bin names
# each with the associated bin value
pp.pprint(records)
client.close()

Resources