Clone an elasticsearch docker - docker

I have the following setup:
one es-docker (live)
one es-docker (working)
So i wish that the working docker can run some data changes and save this inside the es application. (This changes will run over a few hours).
After this changes are done i wish to copy the working-docker (with all data) and override the live-docker.
So i can run the changes over some hours without having a downtime on live (or a minimalistic downtime).
But i don't know how to "copy" the original included all data.
Thank you for your hints.

The Elasticsearch Definitive Guide outlines a process to achieve zero downtime for use cases like yours, making use of Index Aliases.
The idea is to create an Index Alias that your applications will always use to access the live data.
Given an alias named "alias1" that is pointing to an index named "index1", perform the following steps:
Create a new index, named "index2"
Run your batch indexing process
Swap "alias1" to point to "index2"
Clean up "index1"
The alias swapping occurs in a single call, and Elasticsearch performs the action atomically, giving you the zero downtime you desire. The call looks something like this:
POST /_aliases
{
"actions" : [
{ "remove" : { "index" : "index1", "alias" : "alias1" } },
{ "add" : { "index" : "index2", "alias" : "alias1" } }
]
}

Related

CDK not updating

Running cdk deploy after updating my Stack:
export function createTaskXXXX (stackScope: Construct, workflowContext: WorkflowContext) {
const lambdaXXXX = new lambda.Function(stackScope, 'XXXXFunction', {
runtime: Globals.LAMBDA_RUNTIME,
memorySize: Globals.LAMBDA_MEMORY_MAX,
code: lambda.Code.fromAsset(CDK_MODULE_ASSETS_PATH),
handler: 'xxxx-handler.handler',
timeout: Duration.minutes(Globals.LAMBDA_DURATION_2MIN),
environment: {
YYYY_ENV: (workflowContext.production) ? 'prod' : 'test',
YYYY_A_LOCATION: `s3://${workflowContext.S3ImportDataBucket}/adata-workflow/split-input/`,
YYYY_B_LOCATION: `s3://${workflowContext.S3ImportDataBucket}/bdata-workflow/split-input/` <--- added
}
})
lambdaXXXX.addToRolePolicy(new iam.PolicyStatement({
effect: Effect.ALLOW,
actions: ['s3:PutObject'],
resources: [
`arn:aws:s3:::${workflowContext.S3ImportDataBucket}/adata-workflow/split-input/*`,
`arn:aws:s3:::${workflowContext.S3ImportDataBucket}/bdata-workflow/split-input/*` <---- added
]
}))
I realize that those changes are not updated at stack.template.json:
...
"Runtime": "nodejs12.x",
"Environment": {
"Variables": {
"YYYY_ENV": "test",
"YYYY_A_LOCATION": "s3://.../adata-workflow/split-input/"
}
},
"MemorySize": 3008,
"Timeout": 120
}
...
I have cleaned cdk.out and tried the deploy --force, but never see any updates.
Is it deleting the stack and redeploy the only final alternative, or am i missing something? I think at least at synth should generate different results.
(i also changed to cdk 1.65.0 in my local system to match the package.json)
Thanks.
EDITED: I git clone the project, and did npm install and cdk synth again and finally saw the changes, i would like not to do this everytime, any light of what could be blocking the correct synth generation?
EDITED 2: After a diff between the bad old project and the new from git where synth worked, i realized that some of my project files that had .ts (for example cdk.ts my App definition) also had replicas with .js and .d.ts., such as cdk.js and cdk.d.ts. Could i have runned some command by mistake that compiled Typescript, i will continue to investigate, thanks to all answers.
because CDK uses Cloudformation, it performs an action to determine a ChangeSet. This is to say, if it doesn't think anything has changed, it wont change that resource.
This can, of course, be very annoying as sometimes it thinks it is the same and doesn't update when there is actually a change - I find this most often with Layers and using some form of make file to generate the zips for the layers. Even tho it makes a 'new' zip whatever it uses to determine that the zip is updated recalls it as the same because of ... whatever compression/hash/ect changes are used.
You can get around this by updating the description with a datetime. Its assigned at synth (which is part of the cdk deploy) and so if you do a current now() of datetime
You can also use cdk diff to see what it thinks the changes are.
And finally... always remember to save your file before deployments as, depending on your IDE, it may not be available to the command line ;)
I think it will update where I see the code, but I don't know why it can't.
It is advisable to comment out the part about Lambda once and deploy it, then uncomment it and deploy it again, then recreate Lambda.
This is how I do it. Works nicely so far. Basically you can do the following:
Push your lambda code as a zip file to an s3 bucket. The bucket must have versioning enabled. .
The CDK code below will do the following:
Create a custom resource. It basically calls s3.listObjectVersions for my lambda zip file in S3. I grab the first returned value, which seems to be the most recent object version all the time (I cannot confirm this with the documentation though). I also create a role for the custom resource.
Create the lambda and specify the code as the zip file in s3 AND THE OBJECT VERSION RETURNED BY THE CUSTOM RESOURCE! That is the most important part.
Create a new lambda version.
Then the lambda's code updates when you deploy the CDK stack!
const versionIdKey = 'Versions.0.VersionId';
const isLatestKey = 'Versions.0.IsLatest'
const now = new Date().toISOString();
const role = new Role(this, 'custom-resource-role', {
assumedBy: new ServicePrincipal('lambda.amazonaws.com'),
});
role.addManagedPolicy(ManagedPolicy.fromAwsManagedPolicyName('AdministratorAccess')); // you can make this more specific
// I'm not 100% sure this gives you the most recent first, but it seems to be doing that every time for me. I can't find anything in the docs about it...
const awsSdkCall: AwsSdkCall = {
action: "listObjectVersions",
parameters: {
Bucket: buildOutputBucket.bucketName, // s3 bucket with zip file containing lambda code.
MaxKeys: 1,
Prefix: LAMBDA_S3_KEY, // S3 key of zip file containing lambda code
},
physicalResourceId: PhysicalResourceId.of(buildOutputBucket.bucketName),
region: 'us-east-1', // or whatever region
service: "S3",
outputPaths: [versionIdKey, isLatestKey]
};
const customResourceName = 'get-object-version'
const customResourceId = `${customResourceName}-${now}` // not sure if `now` is neccessary...
const response = new AwsCustomResource(this, customResourceId, {
functionName: customResourceName,
installLatestAwsSdk: true,
onCreate: awsSdkCall,
onUpdate: awsSdkCall,
policy: AwsCustomResourcePolicy.fromSdkCalls({resources: AwsCustomResourcePolicy.ANY_RESOURCE}), // you can make this more specific
resourceType: "Custom::ListObjectVersions",
role: role
})
const fn = new Function(this, 'my-lambda', {
functionName: 'my-lambda',
description: `${response.getResponseField(versionIdKey)}-${now}`,
runtime: Runtime.NODEJS_14_X,
memorySize: 1024,
timeout: Duration.seconds(5),
handler: 'index.handler',
code: Code.fromBucket(buildOutputBucket, LAMBDA_S3_KEY, response.getResponseField(versionIdKey)), // This is where the magic happens. You tell CDK to use a specific S3 object version when updating the lambda.
currentVersionOptions: {
removalPolicy: RemovalPolicy.DESTROY,
},
});
new Version(this, `version-${now}`, { // not sure if `now` is neccessary...
lambda: fn,
removalPolicy: RemovalPolicy.DESTROY
})
Do note:
For this to work, you have to upload your lambda zip code to S3 before each cdk deploy. This can be the same code as before, but the s3 bucket versioning will create a new version. I use code pipeline to do this as part of additional automation.

How to add training end callback to AllenNLP config file?

Currently training models using AllenNLP 1.2 and the commands api:
allennlp train -f --include-package custom-exp /usr/training_config/mock_model_config.jsonnet -s test-mock-out
I'm trying to execute a forward pass on a test dataset after training is completed. I know how to add an epoch_callback, but am not sure about the syntax for the end_callback.
In my config.json, I have the following:
{
...
"trainer": {
...
"epoch_callbacks": [{"type": 'log_metrics_to_wandb',},]
}
...
}
I've tried:
"end_callback": [{"type": 'my_custom_function',},]
but got an illegal argument error. Also, I am not sure how I would accurately specify the exact custom function and communicate it to the trainer.
I think you can create a new callback function/object that inherits from TrainerCallback and override the on_end method, and then it should work as expected if you register it the same way as you did log_metrics_to_wandb above.
Just an example a bit more complete for people who is as lost as I am using allennlp, this worked for me:
Define the callback, register it and overwrite whatever method you want to call:
from allennlp.training.callbacks.callback import TrainerCallback
#TrainerCallback.register("log_metrics_to_wandb")
class LogMetricCallback(TrainerCallback):
def on_end(self, trainer, metrics, epoch, is_primary=True, **kwargs):
...
And add it in the config file under trainer -> callbacks
{
...
"trainer": {
...
"callbacks": [{"type": 'log_metrics_to_wandb',},]
}
...
}
I tested it with version 2.4.0, but according to the documentation it should not have changed much.

Circle CI replace appsettings.json

I am starting to look at Circle CI to build my projects. At the moment we are using octopus deploy, but want to use something new.
Today we have a appsettings file eg. "Appsettings.json"
Here we have structure eg:
"ConnectionStrings": {
"DatabaseConnectionString": "MyLocalConnectionString",
"MessageBusConnectionString": "MyLocalConnectionString2"
},
"MessageBus": {
"Sqs": {
"DefaultQueu" : "LocalTestQueu",
"ErrorQueu": "LocalErrorQueu"
}
},
...
I want to replace all values with new ones.
Eg: DefaultQueu is the name of the key and I want LocalTestQueu value to be changed to "MyProductionQueu"
For example key in CircleCi would be something like:
MessageBus.Sqs.DefaultQueu = MyProductionQueu
and
ConnectionStrings.DatabaseConnectionString = MyProductionDatabaseConnectionString
How would I do that?
I know there is environment variables where I can do something like:
"DatabaseConnectionString": "$MyConnectionString"
where simply string replace $MyConnectionString with the real connection string. But that is not what I am looking for.
We have all our local connectionstrings stored in source control. So we need key / value replacement as described as before.
Octopus let's us do something like this:
There is no support for this at the moment.

Groovy - How to reflect code from another groovy app to find Classes/Properties/Types within

First, I came from a .NET background so please excuse my lack of groovy lingo. Back when I was in a .NET shop, we were using TypeScript with C# to build web apps. In our controllers, we would always receive/respond with DTOs (data xfer objects). This got to be quite the headache every time you create/modify a DTO you had to update the TypeScript interface (the d.ts file) that corresponded to it.
So we created a little app (a simple exe) that loaded the dll from the webapp into it, then reflected over it to find the DTOs (filtering by specific namespaces), and parse through them to find each class name within, their properties, and their properties' data types, generate that information into a string, and finally saved as into a d.ts file.
This app was then configured to run on every build of the website. That way, when you go to run/debug/build the website, it would update your d.ts files automatically - which made working with TypeScript that much easier.
Long story short, how could I achieve this with a Grails Website if I were to write a simple groovy app to generate the d.ts that I want?
-- OR --
How do I get the IDE (ex IntelliJ) to run a groovy file (that is part of the app) that does this generation post-build?
I did find this but still need a way to run on compile:
Groovy property iteration
class Foo {
def feck = "fe"
def arse = "ar"
def drink = "dr"
}
class Foo2 {
def feck = "fe2"
def arse = "ar2"
def drink = "dr2"
}
def f = new Foo()
def f2 = new Foo2()
f2.properties.each { prop, val ->
if(prop in ["metaClass","class"]) return
if(f.hasProperty(prop)) f[prop] = val
}
assert f.feck == "fe2"
assert f.arse == "ar2"
assert f.drink == "dr2"
I've been able to extract the Domain Objects and their persistent fields via the following Gant script:
In scripts/Props.groovy:
import static groovy.json.JsonOutput.*
includeTargets << grailsScript("_GrailsBootstrap")
target(props: "Lists persistent properties for each domain class") {
depends(loadApp)
def propMap = [:].withDefault { [] }
grailsApp.domainClasses.each {
it?.persistentProperties?.each { prop ->
if (prop.hasProperty('name') && prop.name) {
propMap[it.clazz.name] << ["${prop.name}": "${prop.getType()?.name}"]
}
}
}
// do any necessary file I/O here (just printing it now as an example)
println prettyPrint(toJson(propMap))
}
setDefaultTarget(props)
This can be run via the command line like so:
grails props
Which produces output like the following:
{
"com.mycompany.User": [
{ "type": "java.lang.String" },
{ "username": "java.lang.String" },
{ "password": "java.lang.String" }
],
"com.mycompany.Person": [
{ "name": "java.lang.String" },
{ "alive": "java.lang.Boolean" }
]
}
A couple of drawbacks to this approach is that we don't get any transient properties and I'm not exactly sure how to hook this into the _Events.groovy eventCompileEnd event.
Thanks Kevin! Just wanted to mention, in order to get this to run, here are a few steps I had to make sure to do in my case that I thought I would share:
-> Open up the grails BuildConfig.groovy
-> Change tomcat from build to compile like this:
plugins {
compile ":tomcat:[version]"
}
-> Drop your Props.groovy into the scripts folder on the root (noting the path to the grails-app folder for reference)
[application root]/scripts/Props.groovy
[application root]/grails-app
-> Open Terminal
gvm use grails [version]
grails compile
grails Props
Note: I was using Grails 2.3.11 for the project I was running this on.
That gets everything in your script to run successfully for me. Now to modify the println portion to generate Typescript interfaces.
Will post a github link when it is ready so be sure to check back.

From Jenkins, how do I get a list of the currently running jobs in JSON?

I can find out just about everything about my Jenkins server via the Remote API, but not the list of currently running jobs.
This,
http://my-jenkins/computer/api/json
or
http://my-jenkins/computer/(master)/api/json
Would seem like the most logical choices, but they say nothing (other than the count of jobs) about which jobs are actually running.
There is often confusion between jobs and builds in Jenkins, especially since jobs are often referred to as 'build jobs'.
Jobs (or 'build jobs' or 'projects') contain configuration that describes what to run and how to run it.
Builds are executions of a job. A build contains information about the start and end time, the status, logging, etc.
See https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project for more information.
If you want the jobs that are currently building (i.e. have one or more running builds), the fastest way is to use the REST API with XPath to filter on colors that end with _anime, like this:
http://jenkins.example.com/api/xml?tree=jobs[name,url,color]&xpath=/hudson/job[ends-with(color/text(),%22_anime%22)]&wrapper=jobs
will give you something like:
<jobs>
<job>
<name>PRE_DB</name>
<url>http://jenkins.example.com/job/my_first_job/</url>
<color>blue_anime</color>
</job>
<job>
<name>SDD_Seller_Dashboard</name>
<url>http://jenkins.example.com/job/my_second_job/</url>
<color>blue_anime</color>
</job>
</jobs>
Jenkins uses the color field to indicate the status of the job, where the _anime suffix indicates that the job is currently building.
Unfortunately, this won't give you any information on the actual running build. Multiple instances of the job maybe running at the same time, and the running build is not always the last one started.
If you want to list all the running builds, you can also use the REST API to get a fast answer, like this:
http://jenkins.example.com/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds
Will give you something like:
<builds>
<url>http://jenkins.example.com/job/my_first_job/1412/</url>
<url>http://jenkins.example.com/job/my_first_job/1414/</url>
<url>http://jenkins.example.com/job/my_second_job/13126/</url>
</builds>
Here you see a list off all the currently running builds. You will need to parse the URL to separate the job name from the build number. Notice how my_first_job has two builds that are currently running.
I have a view defined using View Job Filters Plugin that filters just currently running jobs, then you can use /api/json on the view page to see just the jobs that are running. I also have one for aborted, unstable, etc.
UPDATE
Select Edit View → Job Filters → Add Job Filter ▼ → Build Statuses Filter
Build Statuses: ☑ Currently Building
Match Type: Exclude Unmatched - ...
Bit of a hack but I think you can infer what jobs are currently running by looking at the color key in the job objects when you do a GET at /jenkins/api/json?pretty=true. If the 'ball' icon for a given job in Jenkins is animated, we know it's running.
Have a look at the array of job objects in the JSON response:
{
...
"jobs" : [
{
"name" : "Test Job 1",
"url" : "http://localhost:8000/jenkins/job/Test%20Job%201/",
"color" : "blue"
},
{
"name" : "Test Job 2",
"url" : "http://localhost:8000/jenkins/job/Test%20Job%202/",
"color" : "blue_anime"
}
...
}
In this case "color" : "blue_anime" indicates that the job is currently running, and "color" : "blue" indicates that the job is not running.
Hope this helps.
Marshal the output and filter for "building: true" from the following call to json api on a job with tree to filter out the extraneous stuff (hope this helps):
http://jenkins.<myCompany>.com/job/<myJob>/api/json?pretty=true&depth=2&tree=builds[builtOn,changeSet,duration,timestamp,id,building,actions[causes[userId]]]
will give you something like:
{
"builds" : [
{
"actions" : [
{
},
{
"causes" : [
{
"userId" : "cheeseinvert"
}
]
},
{
},
{
},
{
},
{
}
],
"building" : true,
"duration" : 0,
"id" : "2013-05-07_13-20-49",
"timestamp" : 1367958049745,
"builtOn" : "serverA",
"changeSet" : {
}
}, ...
You can do this with the jenkins tree api, using an endpoint like this:
http://<host>/api/json?tree=jobs[name,lastBuild[building,timestamp]]
You can see what attributes from lastBuild you can use if you access <job-endpoint>/lastBuild/api/json.
I had a similar problem where some pipeline builds get stuck in the building state after I restart jenkins (piepline jobs are supposed to be durable and resume but most of the time they get stuck indefinitely).
These builds do not use an executor so the only way to find them is to open every job.
All of the other answers seem to work when the project is considered building, i.e.: the last build is building. But they ignore past builds still building.
The following query works for me and gives me all the currently running builds, i.e.: they do not have a result.
http://localhost:8080/api/xml?tree=jobs[name,builds[fullDisplayName,id,number,timestamp,duration,result]]&xpath=/hudson/job/build[count(result)=0]&wrapper=builds
Nothing worked me properly. I copied and modified code form python-jenkins. Since Master node name changed , it was giving exception. Did'nt want to rely on plugin.
def get_running_builds():
builds = []
nodes = server.get_nodes()
for node in nodes:
# the name returned is not the name to lookup when
# dealing with master :/
if node['name'] == 'Built-In Node':
continue
if node['name'] == 'master':
node_name = '(master)'
else:
node_name = node['name']
try:
info = server.get_node_info(node_name, depth=2)
except server.JenkinsException as e:
# Jenkins may 500 on depth >0. If the node info comes back
# at depth 0 treat it as a node not running any jobs.
if ('[500]' in str(e) and
server.get_node_info(node_name, depth=0)):
continue
else:
raise
for executor in info['executors']:
executable = executor['currentExecutable']
if executable and 'number' in executable:
#print(f'{executable}')
executor_number = executor['number']
build_number = executable['number']
url = executable['url']
m = re.search(r'/job/([^/]+)/.*', urlparse(url).path)
job_name = m.group(1)
builds.append({'name': executable['fullDisplayName'],
'number': build_number,
'url': url,
'node': node_name,
'executor': executor_number,
'timestamp': executable['timestamp']})
return builds
timestamp gives time in millisecs.

Resources