I am receiving a NullPointerException which I believe is due to the way objects are initialised but cannot find any supporting documentation.
I have this example code which illustrates the problem in Scala 2.12.7, I have found repeatable results in Scala 3.1.3 also:
abstract class Item(val collectionName: String)
abstract class ItemCollection(val name: String)
object TechItems extends ItemCollection("tech") {
// referencing 'name' from 'ItemCollection' superclass
case object TV extends Item(collectionName = name)
val items: Map[String, Item] = Map("tv" -> TV)
}
object Test1 extends App {
// prints 'tech'
println(TechItems.items.get("tv").map(_.collectionName))
}
object Test2 extends App {
// prints 'tech'
println(TechItems.TV.collectionName)
// throws NullPointerException
println(TechItems.items.get("tv").map(_.collectionName))
}
When running Test1, the code behaves as you'd expect. When running Test2, we now receive a NullPointerException when accessing the map after accessing the TV object directly.
When I no longer reference a field from the superclass, the issue no longer occurs:
...
object TechItems extends ItemCollection("tech") {
// using String instead of reference to superclass field
case object TV extends Item(collectionName = "mycollection")
val items: Map[String, Item] = Map("tv" -> TV)
}
...
object Test2 extends App {
// prints 'mycollection'
println(TechItems.TV.collectionName)
// prints 'Some(mycollection)'
println(TechItems.items.get("tv").map(_.collectionName))
}
My current understanding of how TechItems is initialised:
- We access
TechItems.TV.collectionNamewhich begins initialisingTechItems - An
ItemCollection("tech")is created whose fields are then available inside ofTechItems(depending on access modifiers of said superclass fields) TVis initialised and references the superclass fieldnameitemsis initialised and referencesTVas a value for key"tv"
I am sure that understanding is wrong but that is what I am here to learn.
My current theory for the NullPointerException:
- We access
TechItems.TV.collectionNamewhich begins initialisingTechItems itemsis initialised alongsideTV, butitemscaptures an uninitialisedTVasnull- Our access to
TechItems.TV.collectionNamereturns the value of"tech" TechItems.items.get("tv")returnsSome(null)becauseTVat the point of initialisingitemswasnull, due to not being initialised.NullPointerExceptionis thrown
To me it feels like a somewhat farfetched theory. I am sure my lack of understanding is shown here and there is an explanation in some documentation that I have failed to find. Why do I get this NullPointerException? What is the initialisation order? And why does removing the reference to a superclass field affect this initialisation?
CodePudding user response:
Wow, this is a good one! Here is what I think is going on ...
Consider this "pseudo-java" code, that I believe more-or-less accurately reflects what is actually happening in the JVM:
class TechItems extends ItemCollection {
static MODULE = new TechItems("tech")
static class TV extends Item {
static MODULE = new TV(TechItems.MODULE.name)
}
val items = Map("tv" -> TV.MODULE)
}
So, now, when you do print(TechItems.TV.MODULE.collectionName),
TechItems.MODULE gets constructed, because we need to pull name out of it to create TV.
This constructor, runs to the Map("tv" -> TV.MODULE) line, and puts null into the map (TV.MODULE is still null - we are only figuring out what to pass to its constructor.
If you use "mycollection" instead of name, it becomes
static MODULE = new TV("mycollection"), which doesn't trigger TechItems constructor.
What happens when you don't access TV before looking at items? Well, in that case, TechItems.MODULE gets initialized first, so, by the time you get to the new TV thing, as part of constructing the items, TechItems.MODULE.name is already available, so TV.MODULE can be created and put into the map.
CodePudding user response:
Very instructive example indeed and Dima is absolutely right! In fact, without inspecting the decompiled code, it would be harder to figure out what is happening under the hood. For simplicity, let's assume you just do these 2 calls in order (it will reproduce the issue):
println(TechItems.TV) // prints 'TV'
println(TechItems.items) // prints 'Map(tv -> null)'
Now let's decompile the code and show only the relevant parts. (I removed unnecessary code to be easier to follow) First these calls:
Predef$.MODULE$.println((Object)Main.TechItems$.TV$.MODULE$);
Predef$.MODULE$.println((Object)Main.TechItems$.MODULE$.items());
This was our Main. Now TechItems and TV:
public static class TechItems$ extends ItemCollection {
public static final TechItems$ MODULE$;
private static final Map<String, Main.Item> items;
static {
MODULE$ = new TechItems$();
items = (Map)Predef$.MODULE$.Map().apply((Seq)ScalaRunTime$.MODULE$.wrapRefArray(
(Object[])new Tuple2[] {
Predef.ArrowAssoc$.MODULE$.$minus$greater$extension(
Predef$.MODULE$.ArrowAssoc((Object)"tv"), (Object)TV$.MODULE$)
}));
}
public Map<String, Main.Item> items() {
return TechItems$.items;
}
public TechItems$() {
super("tech");
}
public static class TV$ extends Main.Item implements Product, Serializable {
public static final TV$ MODULE$;
static {
Product.$init$((Product)(MODULE$ = new TV$()));
}
public TV$() {
super(TechItems$.MODULE$.name());
}
}
When calling our first println statement we trigger the evaluation of TechItems.TV which translates to TechItems$.TV$.MODULE$. The MODULE$ is just a static final reference of TV that gets initialized in the static block of TV. To get initialized, it starts executing the static block, which in turn calls TV's constructor, new TV$() which in turn triggers the call to TechItems via: super(TechItems$.MODULE$.name());
This is the part where it gets interesting: TechItems$.MODULE$ is just the static final reference of TechItems, that was not yet referenced, so it was not yet initialized. Again, in the same manner, to get initialized, the static block of TechItems gets called. But this time the static block is different: It has to initialize TechItems$.MODULE$ and items as well, because both reside in the same static block.
Since we are in the middle of initializing TV$.MODULE$, and we just called items which requires the same reference - that we have not yet finished initializing, this reference is null at this point in time, so items is executed having TV$.MODULE$ as null.
After this, the static block of TechItems$.MODULE$ finishes, the static block of TechItems.TV finishes and we get printed TV at the console. The second print becomes self-explanatory. The call to items() returns TechItems$.items that we just evaluated in the previous call to TV, so items return Map(tv -> null) which gets printed.
Observations:
Using
case object TV extends Item(collectionName = name)is precisely what triggers the issue. The logical idea is that, you do not want to evaluateitemsbeforeTVfinishes evaluation. So one can do 2 things: 1 - either not callTVbefore first callingitemsor justTechItems- which will trigger the evaluation ofTV, and thus the correct initialization ofitems- or 2 (better solution) - delay evaluation ofitemsas much as possible, until you really needed.Naturally - the solution to the second point is to make
itemsalazy val. If we do this, the issue goes away, becauseitemswill no longer be evaluated unless explicitly referenced by us, and it will no longer trigger evaluation when calling justTV. And if we callitemsfirst, it will triggerTV's evaluation first. I can't show you the difference in the decompiled code because only the ScalaSignature differs: keywords likelazyare implemented as "pickled" signature bytes since these are easily picked up by the JVM through reflection.Changing it to
case object TV extends Item(collectionName = "mycollection")is also a fix. Since you no longer callsuper(TechItems$.MODULE$.name());fromTVat all,items's evaluation is no longer triggered when justTVis called. The call toTV's constructor becomessuper("mycollection"), so the secondprintwould then correctly evaluateitemstoMap(tv -> TV). This is why thenullgoes away when you change it.This is an example of a circular dependency:
TV"kind of" needsitemsanditemsneedsTV- and the order of initialization really makes the difference between a working code and a code that throws nulls at unexpected times. SinceTVis presumably initializedlazy, makingitemslazyas well should theoretically remove the circular dependency. Anobjectdefinition in Scala behaves much like alazy valwith an annonymous class, that gets initialized on demand, the first time it is used.So the first instinct when you see an
objectinside anotherobject, is to assume the formerobjectwill be lazily initialized (unless explicitly referenced). Becauseitemsdoes referenceTVexplicitly, even if you don't callTVexplicitly,TVwill be evaluated either when referencing justTechItemsor directlyitems, whichever comes first, because both are in the same static context, as we saw.
