How to deal with Millions of Small files

Discussion in 'Samplers, Synthesizers' started by twoheart, Sep 9, 2023.

  1. twoheart

    twoheart Audiosexual

    Joined:
    Nov 21, 2015
    Messages:
    2,229
    Likes Received:
    1,410
    Location:
    Share many
    When we work with sampled files or MiDi files, we quickly end up with several million mostly quite small files (Nexus, EZ-/Superior Drummer ..., Samplecollections)

    Moving/copying these vast amounts of files is a challenge for computers. The need to move/copy these files arises, for example, when we set up a new computer or make backups of this valuable content.

    For example, it may be necessary keeping files between multiple computers (desktops <--> laptops) in Snyc.

    How do you deal with this to achieve a good balance between the space needed on the disks and the time needed to copy several million files? Strategies/Tools?
     
  2.  
  3. Olymoon

    Olymoon Moderator

    Joined:
    Jan 31, 2012
    Messages:
    5,777
    Likes Received:
    4,449
    I use an external drive with all the data for for many different plugins.
    And use symlink in the computers to point to the folders.
    It's also useful to backup easily your own presets and so.
     
    • Agree Agree x 3
    • Like Like x 1
    • List
  4. twoheart

    twoheart Audiosexual

    Joined:
    Nov 21, 2015
    Messages:
    2,229
    Likes Received:
    1,410
    Location:
    Share many
    That's part of my approach as well - :like:

    But when I need to have a copy on > 1 computers at the same time (otherwise I would switch ext Drive to other PC ofc) and I try to Backup a given Directory with ~12 Million files of 70 Gig it takes ages (some hours) to copy them.
    In windows I solved it by copying all these files once to a virtual drive (.vhxd, R/W, formatted NTFS /w 4K Blocksize). Now it's just one big file. Copying is only a few minutes per PC.
    I did not find a utility that can handle that more elegantly.
     
  5. xorome

    xorome Audiosexual

    Joined:
    Sep 28, 2021
    Messages:
    1,328
    Likes Received:
    979
    If you regularly turn on those other machines, then I'd look into continuous file sync. That'll automatically keep files in sync across a bunch of computers/phones without manual intervention (syncthing for example). First sync is going to take ages of course. Or maybe you'd be better served by a central NAS?
     
    • Interesting Interesting x 1
    • List
  6. Cclcng

    Cclcng Ultrasonic

    Joined:
    Jun 19, 2021
    Messages:
    81
    Likes Received:
    24
    thought this said "how to deal with millions of small flies" , which I've had to deal with before... insane gnat infestation.
    I'm glad it said files... ahhah :)
     
  7. clone

    clone Audiosexual

    Joined:
    Feb 5, 2021
    Messages:
    8,028
    Likes Received:
    3,510
    The worst example of this is midi files. You get some 300 meg pack, and think a normal archive will extract in only a couple of minutes. But it will take 10 times longer than whatever you expected. The best answer for me anyway, has been to delete the entire thing. There is no pack of 600,000 midi files that contains something you cannot make yourself in a minute. After they are extracted, the sheer size of a pack like this will stop you from ever using it anyway. It is bad enough trying to audition vocal samples. Sample-based content like Nexus or Kontakt, Falcon libs are like this too, but at least the content is something worthwhile. Something like 97,000 Zebra presets really are necessary? No thanks. You can make a nice patch faster than trolling through that much mostly random content.
     
    • Agree Agree x 5
    • Like Like x 1
    • List
  8. Trurl

    Trurl Audiosexual

    Joined:
    Nov 17, 2019
    Messages:
    2,480
    Likes Received:
    1,470
    I saw the thread title and wondered if it referred to doing an Arturia install :rofl:

    Thank gawd I don't use midi loops. Every time I install a ToonTrack expansion I have to delete thousands of those little fukkers.
     
    • Like Like x 2
    • Funny Funny x 1
    • List
  9. Olymoon

    Olymoon Moderator

    Joined:
    Jan 31, 2012
    Messages:
    5,777
    Likes Received:
    4,449
    For backup, I use FreeFileSync, you can program it as you wish, so I have it programmed to backup only what's new and the operation is much smaller and faster. I backup once a week, onless I'm doing very intensive work on something. The program can be left alone while doing it.
    Also, all these folders are programed as exceptions in my Antivirus.
    https://freefilesync.org/
     
  10. wacha

    wacha Member

    Joined:
    Oct 18, 2015
    Messages:
    21
    Likes Received:
    15
    The user zapetto wrote on "Arturia V Collection 9 v9.5.2-R2R" on sister site the following:

    "Actually that is what I did.
    I created a .vhd file, mounted it as drive, symlinked the c:\ProgramData\Arturia\ into it , installed all, and then I can use this VHD on any computer, even via network, just as one big file, convenient to copy. Also better for SSDs, because the directory entries don't get rewritten 800000 times during installation. During installation I use an old HDD where the VHD resides, not for speed, but for not wearing out my SSDs or USB-Sticks."

    This seems quite an elegant solution, though I'm yet to try it, not very familiar with VHD.
    Maybe you can give it a shot and report back? Hope it helps.
     
    • Interesting Interesting x 2
    • List
  11. twoheart

    twoheart Audiosexual

    Joined:
    Nov 21, 2015
    Messages:
    2,229
    Likes Received:
    1,410
    Location:
    Share many
    Yes, that is, what I do a.t.m. ...
    It works, but I'm not really happy because it has some flaws:

    1. mounting VHD/VHXD on every boot
    2. Massive overhead of VHD files
    3. More wear on SSD than syncing the small files (here is zapetto wrong)

    1. I need to mount the .vhd/.vhxd every time you boot the target machine the .vxd file is living on. I managed to solve it with a starup script in admin context. But that is definitely not an easy solution for everybody.
    2. The virtual file formats have a lot of overhead. In my case it's doubling the file size. So 70 Gig of small files make me 140 Gig in disk costs.
    3. zapetto will have to read up on this again. Flash memory does just not work like that. Interesting are only the TBW (Total Bytes Written or TeraBytes Written)*. You can read SSD/Flash almost endless.
    That means, writing a file of 150 Gig of data makes far more wear on the SSD then re-writing e.g. 1 Million of 4K files (4GB).

    *https://www.dell.com/support/kbdoc/...drive-why-do-solid-state-devices-ssd-wear-out

    So, what zapetto does and I do may be interesting and time saving but is far from being elegant or optimal:rofl:

    What I'm looking for is more a packer that is able to pack and unpack on the fly (a R/W ZIP or ISO format if you will). And the best way to do that is to just install a driver. That would be great
     
    Last edited: Sep 9, 2023
  12. wacha

    wacha Member

    Joined:
    Oct 18, 2015
    Messages:
    21
    Likes Received:
    15
    Ah, well... I thought it might help.

    When I read what zapetto wrote a couple weeks back I thought that was quite clever. Had to look up VHD though.
    Thanks for letting us know the details.
    You seem quite ahead of me on this so... well, it's the intention that counts :)
     
  13. fiction

    fiction Audiosexual

    Joined:
    Jun 21, 2011
    Messages:
    1,940
    Likes Received:
    706
    Archive them and only extract what you need.
    Maintain a good index (whatever feels good in your eyes) so you can find stuff.
     
    • Interesting Interesting x 2
    • Agree Agree x 1
    • List
  14. twoheart

    twoheart Audiosexual

    Joined:
    Nov 21, 2015
    Messages:
    2,229
    Likes Received:
    1,410
    Location:
    Share many
    absolutely, thank you :like:
    I've just been thinking about it for a few months now. :cheers:
     
  15. Backtired

    Backtired Audiosexual

    Joined:
    Jan 15, 2016
    Messages:
    1,035
    Likes Received:
    725
    about Arturia: i ended up deleting everything from arturia. unacceptable the way the files are handled. took me half a day to transfer a folder to another hard drive.
    interesting thread, thanks to all who shared solutions
     
  16. saccamano

    saccamano Audiosexual

    Joined:
    Mar 26, 2023
    Messages:
    1,486
    Likes Received:
    607
    Location:
    CBGB omfug
    I'll do regular acronis images of the entire systems on max compression and store on large offline external HDD's. There's recall of any individual file if something gets mashed so you're covered that way.
     
  17. Sinus Well

    Sinus Well Audiosexual

    Joined:
    Jul 24, 2019
    Messages:
    2,142
    Likes Received:
    1,650
    Location:
    Sanatorium
    NAS? I don't copy anything back and forth at all. I store data on the NAS and access it when needed. Either via tb, ethernet, wifi or port sharing. And if I really have to be mobil and need high throughput, well, that's what the mirror is for. Then I just take the image with me.
     
  18. Legotron

    Legotron Audiosexual

    Joined:
    Apr 24, 2017
    Messages:
    2,274
    Likes Received:
    2,198
    Location:
    Hyperborea
    For faster moving, I always pack midi files. Zip or rar
     
  19. Havana

    Havana Platinum Record

    Joined:
    May 6, 2022
    Messages:
    360
    Likes Received:
    197
    You just can't beat a Linux system when it comes to copying or deleting large number of files. Sad that Linux doesn't support major daws and plugins.
     
  20. fiction

    fiction Audiosexual

    Joined:
    Jun 21, 2011
    Messages:
    1,940
    Likes Received:
    706
    Re-writing 1 million files will change their directory entries too.
     
Loading...
Loading...