How do you cache database queries with Google Cloud Run? - google-cloud-run

I have a server on GCR and it pings a db when called. I was thinking of just adding a simple mechanism for cacheing like
var lastDBUpdate int
var lastCache int
if lastDBUpdate > lastCache {
lastCache = now
return newResults
} else {
return cachedResults
}
// endpoints that modify the db update the lastDBUpdate global var
This would work if there was only one container (i.e. while my backend has little load), but as my app grows and multiple containers are created, the lastDBUpdate and lastCache variables will be out of sync amongst the different containers. So how can I cache db reads with GCR?

You can use Memorystore.
Here is a guide how to connect to a Redis instance from Cloud Run.

Related

Resolve mx records for a domain using dnsjava in kubernetes/ docker

I'm trying to resolve mx records in a kubernetes pod.
The dnsjava library works when tested on mac and ubuntu outside of a container but returns an empty array once deployed.
What needs to be available in k8s or the docker image for this to work?
See https://github.com/dnsjava/dnsjava
EDIT 1
Record[] records;
try {
records = new Lookup(mailDomain, Type.MX).run();
} catch (TextParseException e) {
throw new IllegalStateException(e);
}
if (records != null && records.length > 0) {
for (final Record record : records) {
MXRecord mx = (MXRecord) record;
//do something with mx...
}
} else {
log.warn("Failed to determine MX record for {}", mailDomain);
}
The log.warn is always executed in K8s. The docker image is openjdk:11-jdk-slim i.e. it's Debian. I just tested on Debian outside of Docker and it worked as well.
In the end I couldn't get dnsjava to work in docker/k8s.
I used JNDI directly, following https://stackoverflow.com/a/16448180/400048
this works without any issues exactly as given in that answer.

How to parallelize HTTP requests within an Apache Beam step?

I have an Apache Beam pipeline running on Google Dataflow whose job is rather simple:
It reads individual JSON objects from Pub/Sub
Parses them
And sends them via HTTP to some API
This API requires me to send the items in batches of 75. So I built a DoFn that accumulates events in a list and publish them via this API once they I get 75. This results to be too slow, so I thought instead of executing those HTTP requests in different threads using a thread pool.
The implementation of what I have right now looks like this:
private class WriteFn : DoFn<TheEvent, Void>() {
#Transient var api: TheApi
#Transient var currentBatch: MutableList<TheEvent>
#Transient var executor: ExecutorService
#Setup
fun setup() {
api = buildApi()
executor = Executors.newCachedThreadPool()
}
#StartBundle
fun startBundle() {
currentBatch = mutableListOf()
}
#ProcessElement
fun processElement(processContext: ProcessContext) {
val record = processContext.element()
currentBatch.add(record)
if (currentBatch.size >= 75) {
flush()
}
}
private fun flush() {
val payloadTrack = currentBatch.toList()
executor.submit {
api.sendToApi(payloadTrack)
}
currentBatch.clear()
}
#FinishBundle
fun finishBundle() {
if (currentBatch.isNotEmpty()) {
flush()
}
}
#Teardown
fun teardown() {
executor.shutdown()
executor.awaitTermination(30, TimeUnit.SECONDS)
}
}
This seems to work "fine" in the sense that data is making it to the API. But I don't know if this is the right approach and I have the sense that this is very slow.
The reason I think it's slow is that when load testing (by sending a few million events to Pub/Sub), it takes it up to 8 times more time for the pipeline to forward those messages to the API (which has response times of under 8ms) than for my laptop to feed them into Pub/Sub.
Is there any problem with my implementation? Is this the way I should be doing this?
Also... am I required to wait for all the requests to finish in my #FinishBundle method (i.e. by getting the futures returned by the executor and waiting on them)?
You have two interrelated questions here:
Are you doing this right / do you need to change anything?
Do you need to wait in #FinishBundle?
The second answer: yes. But actually you need to flush more thoroughly, as will become clear.
Once your #FinishBundle method succeeds, a Beam runner will assume the bundle has completed successfully. But your #FinishBundle only sends the requests - it does not ensure they have succeeded. So you could lose data that way if the requests subsequently fail. Your #FinishBundle method should actually be blocking and waiting for confirmation of success from the TheApi. Incidentally, all of the above should be idempotent, since after finishing the bundle, an earthquake could strike and cause a retry ;-)
So to answer the first question: should you change anything? Just the above. The practice of batching requests this way can work as long as you are sure the results are committed before the bundle is committed.
You may find that doing so will cause your pipeline to slow down, because #FinishBundle happens more frequently than #Setup. To batch up requests across bundles you need to use the lower-level features of state and timers. I wrote up a contrived version of your use case at https://beam.apache.org/blog/2017/08/28/timely-processing.html. I would be quite interested in how this works for you.
It may simply be that the extremely low latency you are expecting, in the low millisecond range, is not available when there is a durable shuffle in your pipeline.

ASP.NET Core 2.0 application running in Linux container suffers from locking issues when there are many calls to Trace.WriteLine()

Our application is an ASP.NET Core 2.0 WebAPI deployed in Linux Docker containers and running in Kubernetes.
During load testing, we discovered intermittent spikes in CPU usage that our application would never recover from.
We used perfcollect to collect traces from a container so that we could compare a successful test and a test with CPU spikes. We discovered that around 75% of the CPU time in the failing test was spent in JIT_MonRelaibleEnter_Protable, an interface of lock operations. The caller was System.Diagnostics.TraceSource.dll.
Our application was ported from .NET Framework and contained a lot of calls to System.Diagnostics.Trace.WriteLine(). When we removed all of these, our CPU/memory usage reduced by more than 50% and we don't see the CPU spikes anymore.
I want to understand the cause of this issue.
In the corefx repo, I can see that a default trace listener is setup in TraceInternal.cs:
public static TraceListenerCollection Listeners
{
get
{
InitializeSettings();
if (s_listeners == null)
{
lock (critSec)
{
if (s_listeners == null)
{
// In the absence of config support, the listeners by default add
// DefaultTraceListener to the listener collection.
s_listeners = new TraceListenerCollection();
TraceListener defaultListener = new DefaultTraceListener();
defaultListener.IndentLevel = t_indentLevel;
defaultListener.IndentSize = s_indentSize;
s_listeners.Add(defaultListener);
}
}
}
return s_listeners;
}
}
I can see that DefaultTraceListener.cs calls Debug.Write():
private void Write(string message, bool useLogFile)
{
if (NeedIndent)
WriteIndent();
// really huge messages mess up both VS and dbmon, so we chop it up into
// reasonable chunks if it's too big
if (message == null || message.Length <= InternalWriteSize)
{
Debug.Write(message);
}
else
{
int offset;
for (offset = 0; offset < message.Length - InternalWriteSize; offset += InternalWriteSize)
{
Debug.Write(message.Substring(offset, InternalWriteSize));
}
Debug.Write(message.Substring(offset));
}
if (useLogFile && !string.IsNullOrEmpty(LogFileName))
WriteToLogFile(message);
}
In Debug.Unix.cs, I can see that there is a call to SysLog:
private static void WriteToDebugger(string message)
{
if (Debugger.IsLogging())
{
Debugger.Log(0, null, message);
}
else
{
Interop.Sys.SysLog(Interop.Sys.SysLogPriority.LOG_USER | Interop.Sys.SysLogPriority.LOG_DEBUG, "%s", message);
}
}
I don't have a lot of experience working with Linux but I believe that I can simulate the call to SysLog by running the following command in the container:
logger --socket-errors=on 'SysLog test'
When I run that command, I get the following response:
socket /dev/log: No such file or directory
So it looks like I can't successfully make calls to SysLog from the container. If this is indeed what is going on when I call Trace.WriteLine(), why is it causing locking issues in my application?
As far as I can tell, EnvVar_DebugWriteToStdErr is not set in my container so it should not be attempting to write to StdErr.
The solution can be that rsyslog is not running. Is that installed in your container? Use a base image that has rsyslog built in.
This link can help too.

How to check neo4j is running or not from php?

I want to send an email when my db is down. I don't know how to check neo4j is running or not from php. I am using neoxygen neoclient library to connect to neo4j. Is there any way around to do this ? I am using neo4j 2.3.2
As neo4j is operated by HTTP REST interface, you just need to check if the appropriate host is reachable:
if (#fopen("http://localhost:7474/db/data/","r")) {
// database is up
}
(assuming it's running on localhost)
a) Upgrade to graphaware neo4j-php-client, neoxygen is deprecated since months and has been ported there since more than a year.
b) You can just do a try/catch on a query :
try {
$result = $client->run('RETURN 1 AS x');
if (1 === $result->firstRecord()->get('x') { // db is running // }
} catch(\Exception $e) {
// db is not running or connection cannot be made
}

Neo4j BatchInserter initializing Db on restart

I am using Neo4j BatchInserters to insert nodes in db.I am using LuceneBatchInserterIndexProvider for indexes. I have multiple files from where i am importing the data. I want if my process break then i should be able to restart the process from next file. But whenever i restart process it creates new db in graph folder and new indexes. My initialization code look like this.
Map<String, String> config = new HashMap<String, String>();
config.put("neostore.nodestore.db.mapped_memory", "2G");
config.put("batch_import.keep_db", "true");
BatchInserter db = BatchInserters.inserter("ttl.db", config);
BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider(
db);
index = indexProvider.nodeIndex("ttlIndex",
MapUtil.stringMap("type", "exact"));
index.setCacheCapacity(URI_PROPERTY, indexCache + 1);
Can somebody please help here?
To provide more details. I have multiple files ( around 400) which i want to import to Neo4j.
I want to divide my process into batches. After every batch i want to restart the process.
I used neo4j batch inserter config batch_import.keep_db = "true". This does not clear the graph but after restart indexer has lost information. I have this method to check for node existence. I am sure before restart i have created node.
private Long getNode(String nodeUrl)
{
IndexHits<Long> hits = index.get(URI_PROPERTY, nodeUrl);
if (hits.hasNext()) { // node exists
return hits.next();
}
return null;
}

Resources