Project

Profile

Help

HostedRedmine.com has moved to the Planio platform. All logins and passwords remained the same. All users will be able to login and use Redmine just as before. Read more...

Task #951985

open

Using granularity for move_fragments causes short int overflow in pathfinding functions.

Added by Lexxie L almost 2 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
General
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Recently I'm challenged by issues in the movement system which, if my problems are tl;dr, go to section 2 to read about the actual bug.

Section 1. Granularity issues in Freeciv movement system.
If your ruleset has some move_fragments value setting like 6, 9, or 12, you get some strange maths phenomena artificially limiting what your ruleset can do. Let's take 9 frags for example, and a unit who has 2mp (18 frags)

Your road types can set their move_costs in such a way to travel the following distances:
[MOVE_COST OF ROAD IN FRAGS]: {TILE DISTANCE}
9 frags: 2 tiles (theoretic case of a road that only speeds movement on rough terrain)
6-8 frags: 3 tiles
5 frags: 4 tiles
4 frags: 5 tiles
3 frags: 6 tiles
2 frags: 9 tiles
1 frag: 18 tiles
0 frag: infinite tiles

For roads where you want distance of 2-9, you have a pretty nice spread of options. But then what happens?
10,11,12,13,14,15,16,17 are all impossible, and then the next jump from 18 is to infinity.

The jump from possible road of 9 distance to 18 distance is very similar to the granularity problems that come from, for example, Warrior A1, next best unit is A2 or DOUBLE the attack. If you're so used to this that you don't blink, imagine an Armor unit where you have to jump from A10 to A20 as the next best unit, and this is what I mean by granularity problem.

The way to solve granularity problems was solved long ago by the ancients, and it's just to increase the number-base you're working with. For example, Clubman A10, Axeman A12, Swordman A15, Horseman A20, etc.

And this is also how to solve the problem with roads. In move_frag 9 system, if my 2mp unit goes 9 tiles down road-type-A and I want to an upgraded road-type-B, I have to double the distance to 18. Then I hit a singularity problem where the next road after that is extreme imbalanced and OP: infinity. Of course, I can choose move_fragments to be a nice higher number, like a highly composite number, and I'm reinventing what the Babylonians and ancient navigators did with the base-60 or base-360, still used in time and directional measurement today!

Section 2. The overflow bug

This is where we start to regret someone thought he's saving 2 nanoseconds or 500 bytes, by using short int move_costs in the pathfinding.c and pftools.c. You see, we actually get less than a short int because there's a thing called

#define PF_TURN_FACTOR 65536

Now our move_fragments for move_cost, moves_left, total_mc, and so are getting multiplied by this 2-byte FACTOR which really means we're using HALF OF A SHORT INT. I will conclude quickly to say this causes int overflow and segfaults when using highly granular move_fragment systems, especially the one that my research revealed to be the most perfect one.

I unfortunately had to hack the entire move system to use long ints everywhere, to fix the segfault, but probably only needs to be done in a few places in pathfinding and pftools. However, when I tried that I must have missed a case and still got segfaults and after a day of hunting it and failing, I thought, F*** this, we have 64-bit CPU for a reason and we're gong to long int at FCW.

You can comment on it how you want but, I can make roads that go any distance from 1 to 120 without a single gap in between, and setting up a fraction reduced GCF and LCD system in client so you never see such crazy big fractions, it's child's play.

So, what will the official upstream response be to the overflow problem for rulesets wanting granularity?

Actions #1

Updated by Marko Lindqvist almost 2 years ago

  • Category set to General

Yeah, even granularity ruleset does not reveal issues on movement granularity yet. Opened a ticket about that: https://osdn.net/projects/freeciv/ticket/45859

Please open future tickets to osdn. It's almost two years since we moved there.

Actions #2

Updated by Marko Lindqvist almost 2 years ago

I had a ruleset (but just one of them) with much increased movement granularity in the latest autogame runner run of S3_1. At least it doesn't seem to cause AI to constantly segfault (or even give any warnings). That makes sense as AI is likely to find targets close enough, so it's not planning routes over very long distances.

So this needs to be reproduced manually (likely easy, but still an additional step)

Actions #3

Updated by Lexxie L almost 2 years ago

It's easy to reproduce. Just set move_fragments really high. Experiment on a FCW server. Why? Because with FCW, goto-pathing is server-side, so you will see in the client with your own eyes what kind of int-overflow data ALL freeciv servers are suffering from, in the internal path_finding. Try a number like 10080 for move_frags. This won't be a long difficult problem to replicate the bug, trust me! Immediate problems instantly.

The issue isn't whether this happens. You can guess it does if we only get to play with half of a short int that because of multiplier called PF_TURN_FACTOR==65536.

The question for me was how to solve it without globally replacing every single variable for move_frags with a long int. PF_TURN_FACTOR is used in ONLY 2 LINES OF CODE. Even so, I globally changed all move_frags to long int after hours of frustration.

end of response

~*~

beginning of fun facts
TL:DR; You can't convince me 10080 is a bad number for move_fragments because it's unreasonably high. Because I've turned into a kind of expert who spent more time than anyone, grappling with multiple problem-sets. Eventually against my first wishes and impressions, I am convinced 10080 is the perfect number of move_frags even though it seems wildly large, counter-intuitive, and a really bad idea (at first impression.) I wanted to hate 10080. But it solves EVERYTHING PERFECTLY.

Long version:
Repeatedly I deal with issues around ruleset balance and exploit prevention in key areas:
movement rates, road-travel-distances, unloading, deboarding, disembarking, load-move-unload-load-move-repeat exploits, and so on. For over two years I do nothing but adding complexities and rules and exceptions to rules and tunings, to meet the ideas and complaints of the player community. Finally I realise in years of becoming expert on these problems, the solution TO ALL OF THEM comes through embracing a Grand-Three-Part Solution. And one of the three pillars is this: a huge-value move_fragments number. And IT'S NOT AS BAD AS IT SOUNDS.

After much careful research and analysing multiple problem areas and scoring candidates on how well they solve all of them, I decided on a weird hybrid. HUGE move_frags but letting the client do fraction reduction with numerator/GCF and divisor/GCF, to not expose the player to crazy fractions, and using a numerological special number that supports it. At the end of the day, 10080 is 99.2% same as the "basis point system" used in financial markets ( ‱). So it's easier for decimal people than 720 or 5540. As far as number theory goes, it's both "highly composite" and "superabundant". Everything is divisible. It literally divides evenly by 1,2,3,4,5,6,7,8,9,10, etc. So any fraction always reduces cleanly into a nice small human-readable fraction, with zero rounding error. Even if you have one road that's 1/3 move, one that's 1/4 move, and one that's 1/5 move, all in the same ruleset. Clean, accurate, always reduces to nice readable fraction WITHOUT rounding error. Injured units also, no rounding error because the hyper-superabundant-divisibility.

(The other two parts of the Grand Three-Part Solution are a new RoadFlag and a new special Effect. With all 3, NIRVANA is finally achieved.)

Here are just a fraction of the many problem sets:
1. A previous problem I never reported, that multiple truncation rounding was creating cases of units losing ridiculous amount of move_frags below what they fairly should have. Slightly injured unit with high move points, doing 2 attacks and suffering small injury each time, can end up erroneously losing more than a full move point from triple-truncation rounding error.
2. The "half-granular-then-infinity problem", where defining the range-distance for a road type can be granular up to a number that is move_frags/2, then the next step up is full move_frags, and the next choice after that is infinity. For example, in a 12-frag ruleset a 1mp unit can travel road-types that give range of 1,2,3,4,6,12, and infinity. Up to 6 is nice and granular, 6-12 very nongranular, then after you don't even get nongranular, you only get infinity as a choice for a faster road.
3. NOT getting strange fractions and rounding errors that sometimes steal or award a unit an extra frag when moving on roads, if the unit is injured, has small move bonus or penalty, or a weird move_rate like 7mp.

Actions #4

Updated by Marko Lindqvist almost 2 years ago

It's the same goto module used by the (desktop) clients and the server, so these problems should show up in the client, and it's a lot easier to run regular client in the debugger than freeciv-web server connected to web-client.

So far I've not encountered any problems (with master HEAD) by simply setting move_frags to 10000. When I (also) set a high move rate for a unit (to increase 'move_frags * move_rate'), I encountered network protocol warning ( https://osdn.net/projects/freeciv/ticket/45950 ) but still no other issues.

Hmm.. the PF_TURN_FACTOR is actually multiplier to the cost, so lets see what increase to that does...

Actions #5

Updated by Marko Lindqvist almost 2 years ago

With all of the movement related values set high, I managed to torture it to fail an assert:

1: in pf_map_path() [../../../../src/common/aicore/path_finding.c::3250]: assertion 'pos->moves_left == param->moves_left_initially' failed.

(The line number won't match version in the repository - I have local logging/debug changes)

Actions #6

Updated by Marko Lindqvist almost 2 years ago

Marko Lindqvist wrote in #note-5:

With all of the movement related values set high, I managed to torture it to fail an assert:

1: in pf_map_path() [../../../../src/common/aicore/path_finding.c::3250]: assertion 'pos->moves_left == param->moves_left_initially' failed.

That one is down to node cost being signed short (i.e. 15 bits). After an overflow wrong cost gets subtracted from moves left. While it's important to have the node structure small (there can be a lot of maps with a lot of nodes), it's not packed --> alignment cause padding after that short. So, sizeof(struct pf_normal_node) with cost as 'signed short' : --> 16, cost as 'signed int' --> 16. Not much saving there ;-)

--

I've now also managed to get some funny routes (round-trip of five turns to the adjacent tile, no ZoC involved) reproduced, so there's more to investigate.

Actions #7

Updated by Marko Lindqvist almost 2 years ago

Marko Lindqvist wrote in #note-6:

Marko Lindqvist wrote in #note-5:

1: in pf_map_path() [../../../../src/common/aicore/path_finding.c::3250]: assertion 'pos->moves_left == param->moves_left_initially' failed.

That one is down to node cost being signed short (i.e. 15 bits).

Split that to https://osdn.net/projects/freeciv/ticket/46039

Just so that it can be fixed already, not waiting when we have time to debug the overall situation more, and that one is known to fix many cases alone.

Actions #8

Updated by Marko Lindqvist over 1 year ago

Marko Lindqvist wrote in #note-6:

I've now also managed to get some funny routes (round-trip of five turns to the adjacent tile, no ZoC involved) reproduced, so there's more to investigate.

That was likely a confusion between high movement value and high movement cost. With the ruleset I used, "single fragment" cost of the river movement was virtually free -> it was not five turns, but five very fast steps.

Overall, I've now played with various values a bit, and there has been no obvious trouble.

Also available in: Atom PDF