I'm trying to convert a predicted RasterFrameLayer in RasterFrames into a GeoTiff file after training a machine learning model.
When using the demo data Elkton-VA from rasterframes,it works fine.
But when using one cropping sentinel 2a tif with ndvi indice (normalized from -1000 to 1000), it failed with NullPointedException in toRaster step.
Feel like it's due to nodata value outside the ROI.
The test data is here, geojson and log.
Geotrellis version:3.3.0
Rasterframes version:0.9.0
import geotrellis.proj4.LatLng
import geotrellis.raster._
import geotrellis.raster.io.geotiff.{MultibandGeoTiff, SinglebandGeoTiff}
import geotrellis.raster.io.geotiff.reader.GeoTiffReader
import geotrellis.raster.render.{ColorRamps, Png}
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.classification.DecisionTreeClassifier
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
import org.apache.spark.sql._
import org.locationtech.rasterframes._
import org.locationtech.rasterframes.ml.{NoDataFilter, TileExploder}
object ClassificiationRaster extends App {
def readTiff(name: String) = GeoTiffReader.readSingleband(getClass.getResource(s"/$name").getPath)
def readMtbTiff(name: String): MultibandGeoTiff = GeoTiffReader.readMultiband(getClass.getResource(s"/$name").getPath)
implicit val spark = SparkSession.builder()
.master("local[*]")
.appName(getClass.getName)
.withKryoSerialization
.getOrCreate()
.withRasterFrames
import spark.implicits._
val filenamePattern = "xiangfuqu_202003_mask_%s.tif"
val bandNumbers = "ndvi".split(",").toSeq
val bandColNames = bandNumbers.map(b ⇒ s"band_$b").toArray
val tileSize = 256
val joinedRF: RasterFrameLayer = bandNumbers
.map { b ⇒ (b, filenamePattern.format(b)) }
.map { case (b, f) ⇒ (b, readTiff(f)) }
.map { case (b, t) ⇒ t.projectedRaster.toLayer(tileSize, tileSize, s"band_$b") }
.reduce(_ spatialJoin _)
.withCRS()
.withExtent()
val tlm = joinedRF.tileLayerMetadata.left.get
// println(tlm.totalDimensions.cols)
// println(tlm.totalDimensions.rows)
joinedRF.printSchema()
val targetCol = "label"
val geojsonPath = "/Users/ethan/work/data/L2a10m4326/zds/test.geojson"
spark.sparkContext.addFile(geojsonPath)
import org.locationtech.rasterframes.datasource.geojson._
val jsonDF: DataFrame = spark.read.geojson.load(geojsonPath)
val label_df: DataFrame = jsonDF
.select($"CLASS_ID", st_reproject($"geometry",LatLng,LatLng).alias("geometry"))
.hint("broadcast")
val df_joined = joinedRF.join(label_df, st_intersects(st_geometry($"extent"), $"geometry"))
.withColumn("dims",rf_dimensions($"band_ndvi"))
val df_labeled: DataFrame = df_joined.withColumn(
"label",
rf_rasterize($"geometry", st_geometry($"extent"), $"CLASS_ID", $"dims.cols", $"dims.rows")
)
df_labeled.printSchema()
val tmp = df_labeled.filter(rf_tile_sum($"label") > 0).cache()
val exploder = new TileExploder()
val noDataFilter = new NoDataFilter().setInputCols(bandColNames :+ targetCol)
val assembler = new VectorAssembler()
.setInputCols(bandColNames)
.setOutputCol("features")
val classifier = new DecisionTreeClassifier()
.setLabelCol(targetCol)
.setFeaturesCol(assembler.getOutputCol)
val pipeline = new Pipeline()
.setStages(Array(exploder, noDataFilter, assembler, classifier))
val evaluator = new MulticlassClassificationEvaluator()
.setLabelCol(targetCol)
.setPredictionCol("prediction")
.setMetricName("f1")
val paramGrid = new ParamGridBuilder()
//.addGrid(classifier.maxDepth, Array(1, 2, 3, 4))
.build()
val trainer = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(4)
val model = trainer.fit(tmp)
val metrics = model.getEstimatorParamMaps
.map(_.toSeq.map(p ⇒ s"${p.param.name} = ${p.value}"))
.map(_.mkString(", "))
.zip(model.avgMetrics)
metrics.toSeq.toDF("params", "metric").show(false)
val scored = model.bestModel.transform(joinedRF)
scored.groupBy($"prediction" as "class").count().show
scored.show(20)
val retiled: DataFrame = scored.groupBy($"crs", $"extent").agg(
rf_assemble_tile(
$"column_index", $"row_index", $"prediction",
tlm.tileCols, tlm.tileRows, IntConstantNoDataCellType
)
)
val rf: RasterFrameLayer = retiled.toLayer(tlm)
val raster: ProjectedRaster[Tile] = rf.toRaster($"prediction", 5848, 4189)
SinglebandGeoTiff(raster.tile,tlm.extent, tlm.crs).write("/Users/ethan/project/IdeaProjects/learn/spark_ml_learn.git/src/main/resources/easy_b1.tif")
val clusterColors = ColorRamp(
ColorRamps.Viridis.toColorMap((0 until 1).toArray).colors
)
// val pngBytes = retiled.select(rf_render_png($"prediction", clusterColors)).first //It can output the png.
// retiled.tile.renderPng(clusterColors).write("/Users/ethan/project/IdeaProjects/learn/spark_ml_learn.git/src/main/resources/classified2.png")
// Png(pngBytes).write("/Users/ethan/project/IdeaProjects/learn/spark_ml_learn.git/src/main/resources/classified2.png")
spark.stop()
}
I suspect there is a bug in the way the toLayer extension method is working. I will follow up with a bug report to RasterFrames project. That will take a little more effort I suspect.
Here is a possible workaround that is a little bit lower level. In this case it results in 25 non-overlapping GeoTiffs written out.
import geotrellis.store.hadoop.{SerializableConfiguration, _}
import geotrellis.spark.Implicits._
import org.apache.hadoop.fs.Path
// Need this to write local files from spark
val hconf = SerializableConfiguration(spark.sparkContext.hadoopConfiguration)
ContextRDD(
rf.toTileLayerRDD($"prediction")
.left.get
.filter{
case (_: SpatialKey, null) ⇒ false // remove any null Tiles
case _ ⇒ true
},
tlm)
.regrid(1024) //Regrid the Tiles so that they are 1024 x 1024
.toGeoTiffs()
.foreach{ case (sk: SpatialKey, gt: SinglebandGeoTiff) ⇒
val path = new Path(new Path("file:///tmp/output"), s"${sk.col}_${sk.row}.tif")
gt.write(path, hconf.value)
}
I've been attempting to get cross-compilation to an iOS target working via nix. My tenative expression starts out like this:
{iphone ? false}:
let
crossSystem = if iphone then {
config = "aarch64-apple-ios";
sdkVer = "13.7";
xcodeVer = "11";
xcodePlatform = "iPhoneOS";
useiOSPrebuilt = true;
platform = {};
} else null;
pkgs_fn = (import ../repo.nix).nixos_2009;
cmake = (pkgs_fn {}).cmake;
pkgs = pkgs_fn { config.allowUnfree = true; crossSystem = crossSystem; };
in
pkgs.stdenv.mkDerivation rec {
nativeBuildInputs = [
cmake
];
buildInputs = [
pkgs.zlib
# ...
];
}
The problem I run into is that the current xcode package does not have hashes that seem to correspond to any version I can download on Apple's website. In addition, the versions it nominally supports are extremely old (only the 10.x series and 11).
What's the idiomatic way to get this working? Thanks.
What's the shortest shell.nix equivalent of the following cmdline arguments?
nix-shell -p "haskell.packages.ghc865.ghcWithPackages (p: [p.ghci-pretty])"
This works, but it's verbose:
# contents of shell.nix file
# run with the following cmd:
# nix-shell shell.nix
{ nixpkgs ? import <nixpkgs> {} }:
let
inherit nixpkgs;
inherit (nixpkgs) haskellPackages;
haskellDeps = a: with a; [
ipprint
base
hscolour
ghci-pretty
];
ghc = nixpkgs.haskellPackages.ghcWithPackages haskellDeps;
nixPackages = [
haskellPackages.cabal-install
ghc
];
in
nixpkgs.stdenv.mkDerivation {
name = "profile_name";
buildInputs = nixPackages;
}
You can just copy your command line verbatim like so:
{ pkgs ? import <nixpkgs> {} }:
let
ghc = pkgs.haskell.packages.ghc865.ghcWithPackages (p: [ p.ghci-pretty ]);
in
pkgs.mkShell {
buildInputs = [ ghc ];
}
The Python Part
I have a python application with multiple entrypoints, json_out and json_in. I can run them both with this default.nix
with import <nixpkgs> {};
(
let jsonio = python37.pkgs.buildPythonPackage rec {
pname = "jsonio";
version = "0.0.1";
src = ./.;
};
in python37.withPackages (ps: [ jsonio ])
).env
Like so:
$ nix-shell --run "json_out"
{ "a" : 1, "b", 2 }
$ nix-shell --run "echo { \"a\" : 1, \"b\", 2 } | json_in"
keys: a,b
values: 1,2
The System Part
I want to also invoke jq in the nix shell, like this:
$ nix-shell --run --pure "json_out | jq '.a' | json_in"
But I can't because it is not included. I know that I can include jq into the nix shell using this default.nix
with import <nixpkgs> {};
stdenv.mkDerivation rec {
name = "jsonio-environment";
buildInputs = [ pkgs.jq ];
}
And it works on its own:
$ nix-shell --run --pure "echo { \"a\" : 1, \"b\", 2 } | jq '.a'"
{ "a" : 1 }
But now I don't have my application:
$ nix-shell --run "json_out | jq '.a'"
/tmp/nix-shell-20108-0/rc: line 1: json_out: command not found
The Question
What default.nix file can I provide that will include both my application and the jq package?
My preferred way to achieve this is to use .overrideAttrs to add additional dependencies to the environment like so:
with import <nixpkgs> {};
(
let jsonio = python37.pkgs.buildPythonPackage rec {
pname = "jsonio";
version = "0.0.1";
src = ./.;
};
in python37.withPackages (ps: [jsonio ])
).env.overrideAttrs (drv: {
buildInputs = [ jq ];
})
I needed to:
provide the output of buildPythonPackage as part of the input of mkDerivation
omit the env. Based on a hint from an error message:
Python 'env' attributes are intended for interactive nix-shell
sessions, not for building!
Here's what I ended up with:
with import <nixpkgs> {};
let jsonio_installed = (
let jsonio_module = (
python37.pkgs.buildPythonPackage rec {
pname = "jsonio";
version = "0.0.1";
src = ./.;
}
);
in python37.withPackages (ps: [jsonio_module ])
);
in stdenv.mkDerivation rec {
name = "jsonio-environment";
buildInputs = [ pkgs.jq jsonio_installed ];
}
Background
When I've added the first Overlay for Nixpkgs, I found out that a bunch of system utils got built for a different version:
these derivations will be built:
/nix/store/028dqnwq36xja16gba3gckq5mcprpn06-postfix-main.cf.drv
/nix/store/b2sch2538ny2drdf9zzknf38grn8d8r3-pcre-8.42.drv
/nix/store/i1k9ksk32ca441zap40z3zddy7bhqx3n-zlib-1.2.11.drv
/nix/store/sypawsb3cwqnnhdl1barv2d8nyvbsxyv-coreutils-8.29.drv
/nix/store/xa4vnajxck2zgvjwp7l71lm11hqnz32r-findutils-4.6.0.drv
...
which is time and space consuming. I tried to figure out what's going on then end up with this question.
Summary
The idea of self and super of overlays is that self is the accumulated result after all overlays have applied, while super is the result of applying the previous overlay.
I thought that an attribute that have not been touched by will be the same between self and super, but some are not.
let
nixpkgs = import <nixpkgs> {
overlays = [
# This overlay obtains self.bash and super.bash, and save them under the
# attrs "bash-from-self" and "bash-from-super" for further examination
(self: super: {
bash-from-self = self.bash;
bash-from-super = super.bash;
})
];
};
# Retrieve bash-from-self (self.bash) and bash-from-super (super.bash)
# from the overlayed nixpkgs
inherit (nixpkgs) bash-from-self bash-from-super;
in {
# Check if bash-from-self (self.bash) and bash-from-super (super.bash)
# are same
isBashSame = (bash-from-self == bash-from-super);
inherit bash-from-self bash-from-super;
}
The above evaluates to:
{ isBashSame = false;
bash-from-self = «derivation /nix/store/zvy7mbpxqlplqpflqn5xk9szx25s4mhg-bash-4.4-p23.drv»;
bash-from-super = «derivation /nix/store/2i91sj16snsphvjrbsa62z8m4zhs261c-bash-4.4-p23.drv»; }
Showing that self.bash and super.bash is not the same, even that the bash attribute isn't touched in any overlays. Why is this happening, or did have some concepts missing in mind?
Details
More different attributes
Besides of bash, there're more attributes that are different:
let
isAttrSame =
attrName:
let
nixpkgs = import <nixpkgs> {
overlays = [
(_self: _super: { inherit _self _super; })
];
};
self = nixpkgs._self."${attrName}";
super = nixpkgs._super."${attrName}";
isSame = self == super;
in
isSame
;
in {
coreutils = isAttrSame "coreutils";
bash = isAttrSame "bash";
zsh = isAttrSame "zsh";
zlib = isAttrSame "zlib";
stdenv = isAttrSame "stdenv";
findutils = isAttrSame "findutils";
gnutar = isAttrSame "gnutar";
gcc = isAttrSame "gcc";
}
{
bash = false;
coreutils = false;
findutils = false;
gcc = true;
gnutar = true;
stdenv = true;
zlib = false;
zsh = true;
}
The builder of self.bash is super.bash?
let
nixpkgs = import <nixpkgs> {
overlays = [
(self: super: {
bash-from-self = self.bash;
bash-from-super = super.bash;
})
];
};
inherit (nixpkgs) bash-from-self bash-from-super;
in {
bash-from-self-builder = bash-from-self.drvAttrs.builder;
bash-from-super-builder = bash-from-super.drvAttrs.builder;
bash-from-self-outPath = bash-from-self.outPath;
bash-from-super-outPath = bash-from-super.outPath;
}
{ bash-from-self-builder = "/nix/store/2ws9cmamvr7xyvdg4d2nnd1bmr1zjrrq-bootstrap-tools/bin/bash";
bash-from-self-outPath = "/nix/store/06z61jbgs0vkw4i9cqqf9yl7zsfkkhw2-bash-4.4-p23";
bash-from-super-builder = "/nix/store/06z61jbgs0vkw4i9cqqf9yl7zsfkkhw2-bash-4.4-p23/bin/bash";
bash-from-super-outPath = "/nix/store/2avim7j13k75k26w18g6br8gai869nm9-bash-4.4-p23"; }
bash-from-self-outPath is the bash-from-super-builder (06z61...khw2-bash-4.4-p23).
So super.bash uses self.bash to build itself, producing another bash (2avim...9nm9-bash-4.4-p23)?
bootstrap-tools/bash --builds--> self.bash --builds--> super.bash
Why this is a problem
I want some of my packages defined in an overlay to depend on bash, coreutils and stuff. I want to use the original version of them provided directly from <nixpkgs>, not those may be overwritten by later overlays. So, in this case, seems that I should choose super.* instead of self.* to be the dependencies.
But some super.stuff is not the original nixpkgs.stuff, this causes rebuilding them as there's no binary cache and is a waste of disk space. And self.stuff can probably be overwritten by later overlays. What can I do?