Travelling Salesman Problem -- Introduction

JVMTools provides several TSP functions that are extremely powerful and beat the TSP functions that are built in Mathematica by magnitudes, in terms of speed and solution quality. They are so much faster that they allow for the solution of problem sizes that “stock Mathematica” cannot even remotely hope to solve. They also submit several competing algorithms in parallel and return, depending on the user’s choice, either all solutions or only the best solutions, making optimal use of concurrency.

JTravellingSalesman[] to compute a solution of the Travelling Salesman Problem (four methods).
JTravellingSalesmanDelaunay[] to compute a solution of the Travelling Salesman Problem using the Delaunay Triangulation to reduce the number of connectable points.
JTravellingSalesmanMAOS[] to compute a solution of the Travelling Salesman Problem using the MAOS multi-agent optimization system on the JVMTools server.
JPostOpt[] to compute an improved solution of the Travelling Salesman Problem using an existing solution as input (post-optimization).

For example, in the case of the Travelling Salesman Problem, thousands of nodes can be used, whereas a practical limit of the built-in function FindShortestTour[] is a few hundred nodes. They also feature some of the best algorithms known today to solve these problems. For example, MAOS is one of the world-best TSP solvers and has produced the best solutions known today of several benchmarking problems of the TSPLib, some of them to optimality.  Besides, JTravellingSalesmanMAOS[] runs on the JVMTools server, a high-performance AMD Opteron system powered by Fedora 18. The solutions of the server-based computations are also encrypted with 2048 bit cipher strength asymmetric RSA encryption, protecting the confidential data of the user against intercepting traffic sniffers on the internet. (Neither input data nor output results are stored or logged on the JVMTools server.)

JVMTools provides two types of functions for TSP: those that create new solutions from scratch, and those that improve an existing solution.

Initial Methods and Use of Concurrency over Multiple Cores

JVMTools provides seven initial algorithms, FarthestInsert, NearestInsert, NearestNeighbor, DoubleSidedNearestNeighbor, CheapestInsert,  MinimumSpanningTree, and Cristofides, as well as a random method in the multi-threaded TSP functions that use post-optimization and have less than 300 cities. JVMTools also includes the extremely powerful MAOS multi-agent optimization system (http://www.adaptivebox.net/doi/MAOS-TSP, used with permission by the license owners), which, although operating only as a heuristic system as well, provides some of the best solutions to TSP problems that can be obtained. In fact, for almost all TSP problems with only a few thousand nodes the solutions ARE optimal. Two of the NETLib TSP benchmarking problems were first solved to optimality with MAOS.

The function JTravellingSalesman[] accepts an n x 2 array for the edge list together with an n x 1 array for the cost vector and outputs a list with the edges (as pairs of nodes) of the solution. Only one edge weight has to be specified in the input, as the weight of the opposite direction is taken to be the same value. This is sometimes called the "Symmetric Travelilng Salesman Problem". It is, however, not necessary to specify data for all edges that could be placed between the nodes ("complete graph"). In other words, the input graph does not have to be complete, but it is assumed to be symmetric.

Method Options

The initial methods can be called individually:

tspintro_1.gif

or

tspintro_2.gif

as FarthestInsert is the default method, as well as

tspintro_3.gif

tspintro_4.gif

tspintro_5.gif

tspintro_6.gif

tspintro_7.gif

tspintro_8.gif

Usually the FarthestInsert method returns the best initial method solution for just a few cities (up to about 300 cities), which is why it was made the default method. For larger problems FarthestInsert is generally not the best method anymore. However, to obtain really good solutions, post-opt methods have to be applied, even for solutions computed by FarthestInsert, see next subsection. None of the initial methods could ever be good enough as to make post-opt unnecessary. Even the two strong FarthestInsert and Cristofides methods benefit GREATLY from post-opt. It should also be noted, that oftentimes initial methods that produce poor solutions supply the best starting solutions for various post-opt methods. The combination NearestNeighbor / 6NodeSwap oftentimes provides the best solution, although the NearstNeighbor method is oftentimes a very poor initial solution (generally inferior to FarthestInsert and Cristofides). The DoubleSidedNearestNeighbor method is a “two-sided” variant of the Nearest Neighbor method. It generally does NOT provide better solutions than the NearestNeighbor method, but it is significantly faster, and oftentimes provides solutions that are better suited for post-opt.

The following table gives an overview of the run-time complexity of these seven initial methods:

FarthestInsert tspintro_9.gif
NearestInsert tspintro_10.gif
NearestNeighbor tspintro_11.gif
DoubleSidedNearestNeighbor tspintro_12.gif
CheapestInsert tspintro_13.gif
MinimumSpanningTree tspintro_14.gif
Cristofides tspintro_15.gif

In general, if no post-opt is done, FarthestInsert and Cristofides provide the best solutions. The other five methods are only suitable when post-opt is applied, but as post-opt should ALWAYS be done, they were included in JVMTools as these two initial methods usually are NOT the best starting solutions for post-opt.

Although MinimumSpanningTree has only quadratic runtime complexity (like most others), note that its solutions generally are QUITE inferior. The MinimumSpanningTree method ONLY useful for post-opt afterwards.

Output All Solutions or only the Best

With these settings as above, JVMTools uses the default settings for the additional optios Output and Threads, resulting in the submission of as many solutions as there are cores on the user's machine (n), returning only the best solution. The list of nodes is split up into n new node lists that use a multiple of n as the starting node. This guarantees optimal use of all available cores and thereby wasting no computational resource, while still not taking any additional time (compared to submitting just one). The user may want to look at different solutions or get a better solution than with the list of nodes provided by him as the default starting list. In the second case the "solution quality per unit time" in increased as n solutions are taken as candidate solutions for the best solutions that take no longer to compute than only 1. More on the Output and Threads options below.

The Output option specifies if the user wants all solutions returned, or just the best. Best is default, and to get all solutions, the user has to specify All:

tspintro_16.gif

to submit the FarthestInsert method and requesting all solutions (the FarthestInsert method is default).

tspintro_17.gif

to submit the FarthestInsert method and requesting only the best solution (the Output option Best is default).

tspintro_18.gif

to submit the NearestNeighbor method and requesting all solutions

tspintro_19.gif

to submit the NearestNeighbor method and requesting only the best solution.

Specify Number of Concurrent Threads

In the four examples above the Threads option was not set, which means the number of threads used is taken as the number of available cores on the user’s computer. We can request more threads or fewer cores than the user has cores. Using less is never a sensible option, as free threads (equivalent perspective: unused core capacity) remains unused, but by specifying a larger number of threads the quality of the best solution is usually still increased a little bit. However, the run time will at least double as there is at least one thread that has to wait until a core becomes available. In general, any number of threads that is not a multiple of the number of cores means that core capacity remains unused that COULD be used at no additional “cost” (=time).

tspintro_20.gif

to submit the FarthestInsert method and requesting all solutions using 16 threads.

tspintro_21.gif

to submit the FarthestInsert method and requesting only the best solution using 24 threads.

More on the Output and Threads options in the examples below.

Submit Multiple Methods Concurrently

JVMTools allows you to submit multiple methods concurrently. Just specify them as a list to the Method option:

tspintro_22.gif

to submit the FarthestInsert, CheapestInsert, NearestInsert, and MinimumSpanningTree methods concurrently. On a machine with at least 4 cores this will run concurrently.

If the number of cores is a multiple of the number of methods selected, the same methods will be submitted with multiple threads using different starting nodes, as described above. For example, on an 8 core machine

tspintro_23.gif

will submit these four methods with two threads and two different starting nodes each, and

tspintro_24.gif

will submit these two methods with 4 threads and 4 different starting nodes each.

This makes maximum use of parallel hardware while still not increasing the execution time, to generate the maximum number of candidate solutions to pick from.

Other Concurrency Options, Combination with Post-Optimization

JTravellingSalesman[] has three other options that make use of multi-threading:

tspintro_25.gif

tspintro_26.gif

tspintro_27.gif

When the option Concurrent is used, all seven initial method solutions are used in 7 concurrent threads (on an 8-core machine they will therefore run simultaneously). If the Output option is set to All, then all 7 solutions are returned in a list. If it set to Best, which is the default, only the best solution is returned. With seven concurrent methods, it is advisable to use a computer that has at least 8 cores, as computers with 4 and 6 cores would have to necessarily queue up the submissions and process them sequentially. If the user wants to submit only 4 or 6 initial methods concurrently (for example because the computer has less than 8 cores), this can be done as well with the method option that allows for the specification of a list of methods, as described above in the previous section. Concurrent uses all seven.

With the option ConcurrentPostOpt only the best solution of the following four initial/post-opt method combinations is returned (four combinations, three threads):

1. FarthestInsert with 6NodeSwap, CheapestInsert with 6NodeSwap (in one thread, as 6NodeSwap is very fast)
2. NearestNeighbor with 6NodeSwapSingle
3. NearestNeighbor with 6NodeSwapDouble

The solutions returned with the ConcurrentPostOpt method are regularly on the order of 1 - 3 % above optimal, as verified with CPLEX. On a modern 8 GB machine with at least four cores a practical limitation is about 3000 cities. If more cities are used, more than 8 GB memory would be needed, and runtime would exceed one hour.

However, even better solutions can be obtained by calling other post-opt methods in sequence, which may require a bit of experimentation, see next subsection. If the user wants very good solutions

tspintro_28.gif

should be used, which can easily be done with one function call.

This will use six independent threads. Thus, if the user's hardware has at least six cores, they all run in parallel. If less than six cores are available, the threads will be queued accordingly. However, two of the post-opt sequences are only applied if the problem is less than 300 cities. Therefore, if the problem size is at least 300 cities large and only four cores are available, all four post-opt sequences will still run concurrently.


The function JTravellingSalesmanMAOS[] accepts the following parameters:
1. the matrix of node coordinates
2. the number of agents used
3. the size of the maximum learning cycle as termination condition
4. the number of cycles for the best cycle that no longer changes
5. the number of trials
6. the distance function used. The valid distance functions according to the TSPLib specification are: EUC_2D for 2-dimensional distances, rounded to the next integer, GEO for great-circle geo distances, ATT for the AT&T distance function, and CEIL_2D for the 2-dimensional Euclidean metric rounded up.

The computations for JTravellingSalesmanMAOS[] are performed on the JVMTools server. You have to be connected to the internet when running this function. Neither inputs nor output solutions are stored on the JVMTools server.

Simple Example

As a simple introductory example we show a comparison against the solutions of the built-in function FindShortestTour[] from the help browser, but for some power examples see the power examples page. The help browser only uses 10 points (63 coordinate pairs) for the grid construction. To make it more interesting, and to show JVMTools's superiority, we will use 50 here (1547 coordinate points). Already with 50 points Mathematica's function FindShortestTour[] is quite unusable on a problem that was picked for the help browser. As is shown in the table below, one method (GreedyCycle) could not even produce an answer for the relatively small problem with 50 points. For that reason, the Mathematica method FindShortestTour[] was not even considered for the really big examples further down on this page (thousands of nodes, and millions of edges).

This finds all points on an n x n grid with coordinates that are coprime:

tspintro_29.gif

We index the points and store the coordinates:

tspintro_30.gif

tspintro_31.gif

tspintro_32.gif

We compute and store the Euclidean distances for all pairs of points:

tspintro_33.gif

tspintro_34.gif

The method option Concurrent uses your parallel cores to submit solution methods concurrently, returning either all solutions or only the best one, depending on the user’s choice. In addition, all available cores are used to submit not only the seven methods in parallel, but also to submit them with their respective start and end points of the original lists taken at the beginning and in the middle, depending on how many cores are available. For example, on an 8 core machine, all 8 cores are used to compute the seven strategies in parallel.

If you use Method->Concurrent, together with Output->All, all seven initial methods are submitted, returning all 7 solutions:

tspintro_35.gif

tspintro_36.gif

tspintro_37.gif

Although we can see clearly that there can be wide differences between the solutions of the different methods, the quality can usually be improved much by using multithreading over the algorithms when they are submitted with different starting nodes. Here we get the best solution for FarthestInsert, using 8 cores and therefore 8 different starting nodes:

tspintro_38.gif

tspintro_39.gif

Graphics:FormBox[TemplateBox[{Length=, 1116 + 465 Sqrt[2] + 50 Sqrt[5], =, 1885.41}, RowDefault], TraditionalForm]

If you want to see all solutions, you can specify “Output”->”All”, and you will get all 8 solutions:

tspintro_41.gif

tspintro_42.gif

tspintro_43.gif

tspintro_44.gif

tspintro_45.gif

You are guaranteed that all cores are used, if the number of cores is a multiple of 4 (as we have 4 solutions strategies). So if your computer has 4, 8, 12, 16, ... cores, you are guaranteed maximum utilization of your “core capacity”. We show this for 8 cores with the CPU meter and the thread display from VisualVM (using FarthestInsert):

tspintro_46.gif

tspintro_47.gif

Note the 8 threads from the thread pool running concurrently in less than 20 seconds:

tspintro_48.gif

You can set the number of threads to submit explicitly by setting the “Threads”-> n option. You can specify more submissions than you have cores, it just means that not all can run at the same time, some will have to wait until the cores are ready again. In the experience of the author of JVMTools, the solution quality of the best solution is improved a little bit further when the number of threads is twice as the number of available cores, and beyond that the solution quality is generally not increased any more. Here we submit with 16 threads on an 8-core machine:

tspintro_49.gif

tspintro_50.gif

tspintro_51.gif

As you can see the solution quality is slightly better than in the 8 cores example above:

tspintro_52.gif

Graphics:FormBox[TemplateBox[{Length=, 1098 + 478 Sqrt[2] + 48 Sqrt[5], =, 1881.33}, RowDefault], TraditionalForm]

MAOS finds the optimal (!!!) solution in less than 2 seconds (!!!):

tspintro_54.gif

tspintro_55.gif

tspintro_56.gif

tspintro_57.gif

Graphics:FormBox[TemplateBox[{Length=, 1020 + 523 Sqrt[2] + 6 Sqrt[5], =, 1773.05}, RowDefault], TraditionalForm]

The following table compares the solutions and the running times for this problem:

tspintro_59.gif

Note that the optimal solutions returned by MAOS are absolutely not unique. Given that so many segments are of the same length on a rectangular grid, the number of optimal solutions is huge. Here we show just nine solutions that all attain the optimal solution value. If you look closely, you can see that no two solutions are alike. This is due to the random seed and random minimization directions used in the MAOS method:

tspintro_60.gif

tspintro_61.gif

tspintro_62.gif

tspintro_63.gif

tspintro_64.gif

tspintro_65.gif

tspintro_66.gif

tspintro_67.gif

tspintro_68.gif

Post-Optimization Methods

JVMTools provides several very efficient proprietary post-optimization algorithms for the Travelling Salesman Problem that meet industrial-strength needs. Just like the initial methods above, the are written for concurrency and make maximum use of all available cores.

Method Options

tspintro_69.gif

tspintro_70.gif

tspintro_71.gif

tspintro_72.gif

Output All Solutions or Only the Best

JPostOpt[] has the same Output and Threads options as JTravallingSalesman[], explained above. The list of nodes of the tour that is supplied as input in validTSPsolution is split up into n new lists (n is the number of threads supplied by the user, it’s the number of available cores on the user’s machine if not specified, as default) and submitted in parallel. The user can request to receive all solutions by specifying “Output”->”All”, or only the best, by specifying Best for the Output option, or by leaving it out, as Best is the default.

tspintro_73.gif

to submit the 6NodeSwap method and requesting all solutions (the 6NodeSwap method is default).

tspintro_74.gif

to submit the 6NodeSwap method and requesting only the best solution (default).

tspintro_75.gif

to submit the 6NodeSwapSingle method and requesting all solutions.

tspintro_76.gif

to submit the 6NodeSwapSingle method and requesting only the best solution.

Specify Number of Concurrent Threads

In the four examples above the Threads option was not set, which means the number of threads used is taken as the number of available cores on the user's computer. We can request more threads or fewer cores than the user has cores. Using less is never a sensible option, as free threads (equivalent perspective: unused core capacity) remains unused, but by specifying a larger number of threads the quality of the best solution is usually still increased a little bit. However, the run time will at least double as there is at least one thread that has to wait until a core becomes available. In general, any number of threads that is not a multiple of the number of cores means that core capacity remains unused that COULD be used at no additinal “cost” (=time).

tspintro_77.gif

to submit the 6NodeSwap method and requesting all solutions using 16 threads.

tspintro_78.gif

to submit the 6NodeSwap method and requesting only the best solution using 24 threads.

Submit Multiple Methods Concurrently

For the post-opt methods as well, just like for the intial methods, JVMTools allows you to submit multiple methods concurrently. Just specify them as a list to the Method option:

tspintro_79.gif

to submit the 6NodeSwap, 6NodeSwapDouble, 6NodeSwapSingle, and ChainInsertion methods concurrently. On a machine with at least 4 cores this will run concurrently.

If the number of cores is a multiple of the number of methods selected, the same methods will be submitted with multiple threads using different starting nodes, as described above. For example, on an 8 core machine

tspintro_80.gif

will submit these four methods with two threads and two different starting nodes each, and

tspintro_81.gif

will submit these two methods with 4 threads and 4 different starting nodes each.

This makes maximum use of parallel hardware while still not increasing the execution time, to generate the maximum number of candidate solutions to pick from.

Combinations / Repeated Application of Post-Opt Methods

As a rough guideline, CheapestInsert or NearestInsert or FarthestInsert together with 6NodeSwapSingle, ChainInsertion, and 6NodeSwapDouble produced the best solutions of all post-opt method combinations for smaller problems (no more than 500 cities),

tspintro_82.gif

tspintro_83.gif

tspintro_84.gif

and NearestNeighbor together with 6NodeSwap, 6NodeSwapSingle, ChainInsertion, 6NodeSwapDouble or 6NodeSwapDouble, ChainInsertion, 6NodeSwap or ChainInsertion, 6NodeSwapDouble

tspintro_85.gif

tspintro_86.gif

tspintro_87.gif

produced the best solutions of all post-opt method combinations for larger problems (500 - 1500 cities). The best solutions obtained with these post-opt methods (in sequence) regularly produced solutions of the order of 1% above optimal, or better. However, some of these running times can be very long. In particular, the methods ChainInsertion and 6NodeSwapDouble can take 20 minutes and consume 4 GB of data for 1000 cities.

Therefore, JVMTools provides two functions with parallelized post-opt methods for two different purposes. The post-opt method ConcurrentPostOpt in each of its three threads (four method combinations) uses only one post-opt method after an initial method. This provides for very good solutions in very reasonable time (“80/20 rule”). These solutions oftentimes cannot be improved any further by the post-opt methods in JVMTools, including the application of several post-opt methods in sequence. In extensive tests the worst cases were found to be around 5% above optimal, usually 1 - 3% above optimal. But in many cases, in particular in large cases (3000 and more cities) ConcurrentPostOptSequential usually *does* produce solutions that are better than the ones obtained with ConcurrentPostOpt. The post-opt method ConcurrentPostOptSequential implements sequential post-opt methods in parallel and returns the tour with the shortest length (ignoring the other solutions). However, given that they contain ChainInsertion the running times can be *very* long, reaching two hours on sufficiently large problems.

In the following we will frequently compare the JVMTools results with the results obtained with Concorde/CPLEX. As Concorde/CPLEX uses an exact method, we can rely on the fact that its output is guaranteed to be optimal (there is no guarantee when using a heuristic), and we will use the value of the Concorde/CPLEX solution as the optimal solution value. With that optimal solution value we can compute how much above the optimal value our heuristic solutions’ values are. This difference is called “gap” and is usually expressed as “1.1% above optimal” or similar.

Simple Example

Using the coprime numbers example from above, here we call the Travelling Salesman function of JVMTools, using the Cristofides method as initial method (auto-threaded over 8 cores) and then the 6NodeSwap method as post-opt (also auto-threaded over 8 cores).

tspintro_88.gif

tspintro_89.gif

JVMTools needed 13 seconds to compute a solution that is 1.18% above optimal (1773.8). The solution is substantially better than any produced by the built-in methods in FindShortestTour[] and was computed in a fraction of the FindShortestTour[] running time.

Here is the solution:

tspintro_90.gif

Graphics:FormBox[TemplateBox[{Length=, 1031 + 511 Sqrt[2] + 17 Sqrt[5] + Sqrt[10], =, 1794.84}, RowDefault], TraditionalForm]

Note that Cristofides itself can render relatively poor solutions (2014.41 for this problem, as shown above in the Concurrent example). FarthestInsert beats it by 5.4% (1905.67, the first output of the Concurrent example above). But none of the other solutions, including FarthestInsert, produced a starting solution that allowed the post-opt method(s) to compute such a great (final) solution. This underscores that to get truly good solutions, one ALWAYS has to use a combination of initial methdo(s) and post-opt method(s). And that’s why initial methods that generally don’t produce good initial solutions (CheapestInsert, DoubleSidedNearestNeighbor, MinimumSpanningTree, in this example also Cristofides) are still very valuable as they initialize the post-opt stage better.

Here as well we want to see that all cores are used:

tspintro_92.gif

tspintro_93.gif

The first “bump” is for JTravellingSalesman[] (as above), and the second is for JPostOpt[].

tspintro_94.gif

Larger Example (1992 Points)

In the following example we find a Travelling Salesman Tour through all points in the [-300,300] x [-300,300] interval that have integer distance from the origin (excluding all points on the two axes).

tspintro_95.gif

tspintro_96.gif

tspintro_97.gif

tspintro_98.gif

tspintro_99.gif

tspintro_100.gif

tspintro_101.gif

tspintro_102.gif

tspintro_103.gif

tspintro_104.gif

tspintro_105.gif

tspintro_106.gif

The optimal solution has a length of 19080, so JVMTools’ ConcurrentPostOpt is only 3.5% above optimal in a fraction of the running time CPLEX needed to find the optimal solution, and it beats any of the methods built into Mathematica’s FindShortestTour[].

JVMTools provides a single-threaded post-opt method that reduces running time substantially by using the Delaunay Triangulation to include only “nearby” edges of every node in the optimization process, and ignoring the “more distant” edges. Only the edges up to the fourth neighbor of every node are included in the optimization process. This is particularly effective when the set of nodes contains “clusters” of nodes, such as the cities in California, New York, Texas, Illinois, etc. and is less effective when there are no discernible “clusters”, such as in Wisconsin, North Dakota, South Dakota, etc. (see more below in the “Large Examples” section) The solution will likely be not as good as the corresponding solutions returned by the “full” methods (occasionally they are even better!), but the decrease in running time is oftentimes so significant that a slight decrease in the quality of the solution can be acceptable.

tspintro_107.gif

tspintro_108.gif

tspintro_109.gif

tspintro_110.gif

tspintro_111.gif

tspintro_112.gif

tspintro_113.gif

tspintro_114.gif

tspintro_115.gif

This solution is 4% above optimal, but required only 129 seconds running time.

MAOS finds the optimal solution in 8 s (!!!):

tspintro_116.gif

tspintro_117.gif

tspintro_118.gif

tspintro_119.gif

tspintro_120.gif

tspintro_121.gif

The following table compares the solutions and the running times for this problem:

tspintro_122.gif

For much larger examples, please visit the TSP Power Examples page, www.lauschkeconsulting.net/tsppower.html.

Spikey Created with Wolfram Mathematica 9.0