Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cartographer grpc crashes when requesting submaps #1867

Open
uahic opened this issue Nov 18, 2021 · 10 comments
Open

Cartographer grpc crashes when requesting submaps #1867

uahic opened this issue Nov 18, 2021 · 10 comments

Comments

@uahic
Copy link

uahic commented Nov 18, 2021

I've tried to evaluate multi-trajectory SLAM via gRPC whereas one robot served as 'cloud server' as an uplink for another one.
Opening RVIZ and the map display runs cartographer-rviz submaps plugin, which lets the grpc_client send a proto to request a Submap from the server. This works on the robot which does NOT act as a cloud server but crashes (without further stack trace) on the robot with cloud server.

Currently, I'm scanning the code base and trying to understand more details but up to the point where it touches the pose_graph it looks alright for me.

Potential differences when running a grpc_server as uplink "master" (aka cloud server) to the 'slave' servers are:

  • Slaves do run a LocalTrajectoryUploader
  • The master grpc_server does not have range scan data in its submaps as this gets filtered out by the slaves before streaming submaps to the master

Other issues I could imagine are resource conflicts when accessing internal datastructures or missing lockings because of threading?

Versions
Cartographer is on the current master branch's head
grpc is on v1.10.0
async grpc is on commit 74cbcb37a6713814a1fc928eacbd2e7e3ffb1289
cartographer-project/async_grpc@74cbcb3

@MichaelGrupp is grpc + using rviz working for you fine? another question is if global slam optimizations are fed back from the upstream 'master' to its slaves. I cant find that in the code but I might have simply overlooked it. Thank you very much

@tristan-schwoerer
Copy link

tristan-schwoerer commented Nov 23, 2021

Hey, i can't answer you question but i am struggling with the same currently.
I am hosting a server on one machine and register two robots/trajectories on it. I believe, that there is simply no feedback from the global optimization on the server back to the robots.
I found this video from roscon 2018 https://vimeo.com/293260413 where @MichaelGrupp introduced the cloud computing and if I understood them correctly they feedback the remote result by exporting the pbstream on the server using the write_state service, streaming it to the robots manually and then localize in it. It sounds like this is done in intervals and not really automated by cartographer.

@uahic
Copy link
Author

uahic commented Nov 23, 2021

@Tristan9497 I dived into the grpc code and there is indeed no feedback of global optimization. The transport isn't hard to do, in fact it seems all necessary handler and protobuf messages are in place (for submaps, posegraph) and you have the corresponding methods to get all of these using the posegraph class. However, there are no methods to alternate the states of the posegraph nodes - as far as I did see last week. Now, as the pose graph is running in a seperate thread and I dont know how all the datastructures are linked I stepped back for the moment to just insert new methods to directly manipulate the internals. If you are interested we could collaborate on this issue

There is another fork https://github.com/shreyasgokhale/cartographer_ros (you need his fork with just cartographer as well) which allows you to send another grpc server instance the pbstream file. However, it contains as far as I can tell ALL internal data and everytime you do that you would have to relocate yourself in existing trajectories. The robots would not be really in the same coordinate system and transmission time grows with the size of the recorded map

@tristan-schwoerer
Copy link

Hey @uahic, sure working on this together sounds nice.

Your Idea sounds indeed very demanding for the network, although i am convinced it would work really well only thing i am concerned is the localizing step.

I was thinking it might be possible to approximate/calculate the current position of the robots by comparing their submaps, meaning the ones of the local slam and then the global optimized ones. We have very easy access to those, since they are on the ROS network anyway. In the end its just a little less data then your recommendation and of course will grow over time too.

Using the first optimized submap location of both trajectories we could fairly easily determine the starting positions of the trajectories to each other.

Then we could jump to the latest optimized submap compare it to the first one and get the position of it in the "real world". Which will get us already very close depending on submap size.

The remaining part of the trajectory is a not yet finished submap which would be the trasformation between the latest not optimized submap and the robot. This should be fairly easy to get since this is just the pose of local slam compared to the latest non optimized submap position.

Adding all of that up should allow to publish map->odom transforms that are as good as possible without constantly localizing.

It sounds a little hacky though
Let me know if that makes any sense to you

@uahic
Copy link
Author

uahic commented Nov 24, 2021

@Tristan9497 I was more referring to implementing the missing functions of the posegraph class to (simply?) change the pose of an already existing node.

A client sends only finished submaps of its trajectories to the 'master' (uplink server) and thus the uplink server. After optimization steps the poses of multiple nodes of each trajectory has changed and should communicated back (that part is easy). On client side 'only' the modification methods are missing.

Method 1)
The user has to set global optimization to disabled for the clients (that is possible already) and gets the posegraph constantly modified (reflected) by that downlink stream. So basically the work here would be to understand the posegraph and optimizer classes and allow for modifications via new API methods

Method 2)
In presence of an uplink server we swap the posegraph class instantiation on the client all together with the posegraph_stub class. This might cause a larger network load as you get send data from all trajectories back and right now I dont know how much work is left in that stub class

@tristan-schwoerer
Copy link

Oh i see now, what you are trying and think this is definetely the right way to go. To be honest i think i need to spend some time with the cartographer pose_graph stuff to actually know what is going on in there. I think this would be an amazing feature to have.

If i have time on the weekend i will investigate a little.

@adiego73
Copy link

@Tristan9497 @uahic I am currently facing the same problem (not the crash, but the sync between master and slave), have you started to work on this issue? I am willing to contribute and work with you on this, however, I am not using ROS just cartographer in a standalone manner.

@bufeng-12
Copy link

bufeng-12 commented Jan 28, 2022 via email

@uahic
Copy link
Author

uahic commented Jan 28, 2022

@adiego73 Hi Diego, not yet. The reason is that currently there is no demand in my institution (it may come back soon though!) which doesnt allow me to work on this issue at least during official work times; Still, I'd consider working on it at a very slow pace from my side. ROS doesnt matter so much for this issue (its basically just an additional wrapper package) and as far as I can see the methods to exchange data with GRPC do already exist for virtually all interesting datastructures.

I studied the RFCs of the former Cartographer developer group and they mentioned the desire to implement all of this but postponed it with some comments about that they have to decide how to implement it which sounds likes: it may not as trivial as it looks like (potentially!). I cant judge (yet) the internals of the classes which asynchronously run the optimization loops but it all seems really to boil down to "how to insert data on the fly without messing up the optimization and constraints". I cant remember all the details from before christmas when the knowledge was 'fresh' but I think studying this classes (posegraph optimization or something like that or just posegraph?) is really necessary;

The first step may be to draw some rough diagrams of the overall architecture and interaction between classes; I'm right now very busy but I will come back at this issue

@adiego73
Copy link

Great, I will start looking at those classes first to understand well how all this works Thanks!

@tristan-schwoerer
Copy link

@adiego73 Hey sorry, i was not able to work on this and did not get any further since i was too busy with the project i was working on at that time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants