Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

portable (save-image) #60

Open
dragoncoder047 opened this issue Oct 6, 2023 · 8 comments
Open

portable (save-image) #60

dragoncoder047 opened this issue Oct 6, 2023 · 8 comments

Comments

@dragoncoder047
Copy link

Currently the way save-image works is it just compacts the used cells into the smallest contiguous block of memory possible and then dumps the workspace as a binary blob. If you're saving this blob to an SD card and you want to use it between multiple platforms running different revisions of uLisp, it won't work (memory/workspace locations, sizeof(void*) differences, builtin symbol indexes, etc)

Could a portable format for storing "compressed" images be developed? Then if the save-image source is the SD card, it can be saved to the portable format. That way if I'm developing a program on one microcontroller, get it to work, and then transfer the image to another microcontroller with a different feature set, it will still work.

My idea for a portable format would be to save the bit size of the microcontroller, so pointers can be resized appropriately, and then only save the offset of the object from the beginning of the array, so it can be loaded regardless of what the value of &Workspace[0] is (i.e. instead of writing (uintptr_t)obj, write (((uintptr_t)obj - &Workspace[0]) / sizeof(struct sobject))).

The other problem is it will still clobber over existing code when it is loaded. If only there were a way to compile a module file on a SD card, and then only load the compiled binary image of the module without clobbering the existing code!

@technoblogy
Copy link
Owner

Hi, good to hear from you again.

This is a great idea, but I can think of a number of things that would make it very hard to implement, some of which you've identified. One is that the builtin symbols are different between different versions and revisions of uLisp. I wonder if it's worth the effort?

An alternative is be to do (pprintall) to an SD card, and then eval it all back in when you want to load it. That also has the advantage that it doesn't clobber over existing code (assuming no name clashes). This could be made simpler with a couple of extra functions.

@dragoncoder047
Copy link
Author

dragoncoder047 commented Oct 6, 2023

What I was going for is to be able to only store the (compiled, compressed) images on the SD card, and not have to store the source code. Kind of the same way Python reads .py files and writes the compiled bytecode to .pyc files which it uses if the original source hasn't changed. A quirk of this is you can distribute the .pyc file only and it will run fine!

The problem of the builtin symbols being different can be handled by storing all symbols as a non-builtin symbol, i.e. base40 or long symbol. The loader than can check each symbol and update it if it's a builtin. That way, it can properly report Error: undefined: some-system-specific-thing when the code is moved to a platform that doesn't have some-system-specific-thing built in.

To be able to handle loading a file and not clobbering the existing workspace, it could also be serialized only in reference to itself. You could probably use something like cl-conspack as a starting point...

@technoblogy
Copy link
Owner

If you feel like tackling it I think it would be a great addition to uLisp! I'm currently preoccupied with designing a good Lisp screen editor for the T-Deck: LilyGO T-Deck uLisp Machine.

@dragoncoder047
Copy link
Author

Okay, here's a draft:

object *resurrect(gfun_t gfun) {
  unsigned char op = gfun();
  switch (op) {
    case PAIR: return cons(resurrect(gfun), resurrect(gfun));
    case STRING: return readstring('\0', gfun);
    case NUMBER: return number(/* read number of bytes of a float */);
    case 0: return nil;
    // add more here etc.
    default: error2(PSTR("corrupted object dump"));
  }
}

The dump function would be the reverse, basically, unfortunately it would recurse infinitely on a circular object. Going to have to think about how to detect it and either bail or properly set the reference.

@technoblogy
Copy link
Owner

Create circular objects at your own peril!

@dragoncoder047
Copy link
Author

I think I figured out how to automatically bail on circular objects:

bool dump(object *obj, pfun_t pfun) {
  if (obj == nil) {
    pfun(0);
    return true;
  }
  if (marked(obj)) return false;
  int type = obj->type;
  object *aaa = car(obj);
  mark(obj);
  switch (type) {
    case STRING:
      pfun(STRING);
      unmark(obj);
      prin1object(obj, pfun);
      break;
    // etc.
    default:
      pfun(PAIR);
      if (!dump(aaa, pfun)) goto error;
      if (!dump(cdr(obj), pfun)) goto error;
      break;
  }
  unmark(obj);
  return true;
  error:
  unmark(obj);
  return false;
}

It would function like the existing markobject() and stop recursing and return false if it finds a circular reference. It can't throw an error because that would longjmp out and the objects wouldn't get unmarked.

@technoblogy
Copy link
Owner

Great!

@dragoncoder047
Copy link
Author

Looking at other object-serialization libraries that support circular objects, I think I came up with a solution that CAN correctly serialize circular objects.

First, function size_t pointers2 (object *obj) that scans the Workspace and counts the number of pointers that point to obj.

The serializer would do a couple of extra checks when it serializes a cons pair:

  • If the object is in the already-seen assoc list (i.e. assoc(obj, seen) is not nil) it would emit a BACKREF opcode with the key value in the list and stop. (the key can just be the memory address of the object itself to guarantee it's unique when the deserializer sees it.)
  • If it is not in the seen list, it would check to see if the object has multiple pointers to it (i.e. pointers2(obj) > 1) and if so it would put it in the list and emit a CIRCULAR_PAIR opcode instead of the usual PAIR.

The good part about this is that it will also preserve object identity -- even if an object is not circular, it will still allow an object to be referenced multiple times. That is, the list '(#1=(a) #1#), while it doesn't directly point to itself and will normally be printed as ((a) (a)), you'll still get ((b) (b)) if you do (setf (caar x) 'b), and this will be preserved when the object is deserialized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants