Q: Can I make a vstack from another vstack, including only a subset of the images ? What will this do ?
A: Describe EMAN2/FAQ/BdbVstack here.
The way the vstacks work is quite simple. It's exactly like a regular BDB file, but instead of the binary data being stored in an associated file with a deterministic name, each header has a pointer to the file and location where the binary data resides. That is, for each image within the VSTACK, the data can come from an arbitrary different file. If the other file's data is modified, the VSTACK's data will also appear modified. However, if you ever WRITE data (not metadata, but the image itself) to a VSTACK, then the reference to the other file is removed for that single image, and the data is written to a separate local file referenced in the normal way. You can always write header info to the VSTACK without impacting the original (referenced) file.
So, yes, you can use one VSTACK as a source for a second VSTACK and the binary data reference will get copied into the new VSTACK.
Say you have A,B,C and:
e2bdb.py --makevstack=bdb:.#abc BDB:.#A BDB:.#B BDB:.#C
Then you further copy that:
e2bdb.py --makevstack=bdb:.#xxx bdb:.#abc
Then reading from xxx will read binary data from A, B or C. xxx with have no association with abc at all any more. Writing metadata to abc or xxx will ONLY impact abc or xxx, not A, B or C, and doing so will leave the data referencing A, B or C. Writing image data to abc will write the data associated with ONLY abc. xxx will still reference the original image data in A, B or C. Note that reading an image from xxx, modifying its header, then writing the image back to xxx counts as writing the image data, even if it hasn't changed. To write metadata only, use a function designed to write only the header, not the whole image.
Yes, you can, of course, make xxx a subset of abc, not a full copy. This can be done with the normal subset mechanism for BDB files. From the Wiki:
--- The 'select.selectname' mechanism allows you to have a local database named 'select', and each key within that database contains a list of integers to be treated as image numbers in the file. ie bdb:db?select.abc would refer to a database called 'select' with key 'abc' referring to a list of image numbers which would then be dereferenced from 'db'.
The final access method is not very commonly used, but can be quite powerful for specialized purposes. In a typical image stack file, such as SPIDER or IMAGIC format, the individual images are numbered from 0 to n. Say you have a database with 50 images in it, and you want to extract image numbers 0,3,6,10 and 12 from the database. You could do this several ways, including running 5 separate proc2d commands or putting the numbers in a text file and having proc2d use the text file. An alternative would be:
e2proc2d.py bdb:averages?0,3,6,10,12 selected.hed
EMAN2 programs will treat averages?0,3,6,10,12 as if it were actually a database with only 5 images in it, numbered from 0-4: 0=0, 1=3,2=6,3=10,4=12. ---
That is, say abc had 100,000 images in it. You would then make a python list containing the images you want to include in the new database:
sel=db_open_dict("bdb:.#select") slist=range(100000) slist.remove(5000) # just removing a couple of images from the list slist.remove(7000) sel["subset"]=slist
Then: e2bdb.py --makevstack=bdb:.#xxx bdb:.#abc?select.subset
and you'd have xxx with 99998 images in it.