How to deal with Millions of Small files

Discussion in 'Samplers, Synthesizers' started by twoheart, Sep 9, 2023.

  1. twoheart

    twoheart Audiosexual

    Joined:
    Nov 21, 2015
    Messages:
    2,025
    Likes Received:
    1,242
    Location:
    Share many
    When we work with sampled files or MiDi files, we quickly end up with several million mostly quite small files (Nexus, EZ-/Superior Drummer ..., Samplecollections)

    Moving/copying these vast amounts of files is a challenge for computers. The need to move/copy these files arises, for example, when we set up a new computer or make backups of this valuable content.

    For example, it may be necessary keeping files between multiple computers (desktops <--> laptops) in Snyc.

    How do you deal with this to achieve a good balance between the space needed on the disks and the time needed to copy several million files? Strategies/Tools?
     
  2.  
  3. Olymoon

    Olymoon Moderator

    Joined:
    Jan 31, 2012
    Messages:
    5,782
    Likes Received:
    4,445
    I use an external drive with all the data for for many different plugins.
    And use symlink in the computers to point to the folders.
    It's also useful to backup easily your own presets and so.
     
    • Agree Agree x 3
    • Like Like x 1
    • List
  4. twoheart

    twoheart Audiosexual

    Joined:
    Nov 21, 2015
    Messages:
    2,025
    Likes Received:
    1,242
    Location:
    Share many
    That's part of my approach as well - :like:

    But when I need to have a copy on > 1 computers at the same time (otherwise I would switch ext Drive to other PC ofc) and I try to Backup a given Directory with ~12 Million files of 70 Gig it takes ages (some hours) to copy them.
    In windows I solved it by copying all these files once to a virtual drive (.vhxd, R/W, formatted NTFS /w 4K Blocksize). Now it's just one big file. Copying is only a few minutes per PC.
    I did not find a utility that can handle that more elegantly.
     
  5. xorome

    xorome Audiosexual

    Joined:
    Sep 28, 2021
    Messages:
    911
    Likes Received:
    690
    If you regularly turn on those other machines, then I'd look into continuous file sync. That'll automatically keep files in sync across a bunch of computers/phones without manual intervention (syncthing for example). First sync is going to take ages of course. Or maybe you'd be better served by a central NAS?
     
    • Interesting Interesting x 1
    • List
  6. Cclcng

    Cclcng Ultrasonic

    Joined:
    Jun 19, 2021
    Messages:
    82
    Likes Received:
    24
    thought this said "how to deal with millions of small flies" , which I've had to deal with before... insane gnat infestation.
    I'm glad it said files... ahhah :)
     
  7. clone

    clone Audiosexual

    Joined:
    Feb 5, 2021
    Messages:
    6,789
    Likes Received:
    2,966
    The worst example of this is midi files. You get some 300 meg pack, and think a normal archive will extract in only a couple of minutes. But it will take 10 times longer than whatever you expected. The best answer for me anyway, has been to delete the entire thing. There is no pack of 600,000 midi files that contains something you cannot make yourself in a minute. After they are extracted, the sheer size of a pack like this will stop you from ever using it anyway. It is bad enough trying to audition vocal samples. Sample-based content like Nexus or Kontakt, Falcon libs are like this too, but at least the content is something worthwhile. Something like 97,000 Zebra presets really are necessary? No thanks. You can make a nice patch faster than trolling through that much mostly random content.
     
    • Agree Agree x 5
    • Like Like x 1
    • List
  8. Trurl

    Trurl Audiosexual

    Joined:
    Nov 17, 2019
    Messages:
    2,480
    Likes Received:
    1,459
    I saw the thread title and wondered if it referred to doing an Arturia install :rofl:

    Thank gawd I don't use midi loops. Every time I install a ToonTrack expansion I have to delete thousands of those little fukkers.
     
    • Like Like x 2
    • Funny Funny x 1
    • List
  9. Olymoon

    Olymoon Moderator

    Joined:
    Jan 31, 2012
    Messages:
    5,782
    Likes Received:
    4,445
    For backup, I use FreeFileSync, you can program it as you wish, so I have it programmed to backup only what's new and the operation is much smaller and faster. I backup once a week, onless I'm doing very intensive work on something. The program can be left alone while doing it.
    Also, all these folders are programed as exceptions in my Antivirus.
    https://freefilesync.org/
     
  10. wacha

    wacha Member

    Joined:
    Oct 18, 2015
    Messages:
    21
    Likes Received:
    15
    The user zapetto wrote on "Arturia V Collection 9 v9.5.2-R2R" on sister site the following:

    "Actually that is what I did.
    I created a .vhd file, mounted it as drive, symlinked the c:\ProgramData\Arturia\ into it , installed all, and then I can use this VHD on any computer, even via network, just as one big file, convenient to copy. Also better for SSDs, because the directory entries don't get rewritten 800000 times during installation. During installation I use an old HDD where the VHD resides, not for speed, but for not wearing out my SSDs or USB-Sticks."

    This seems quite an elegant solution, though I'm yet to try it, not very familiar with VHD.
    Maybe you can give it a shot and report back? Hope it helps.
     
    • Interesting Interesting x 2
    • List
  11. twoheart

    twoheart Audiosexual

    Joined:
    Nov 21, 2015
    Messages:
    2,025
    Likes Received:
    1,242
    Location:
    Share many
    Yes, that is, what I do a.t.m. ...
    It works, but I'm not really happy because it has some flaws:

    1. mounting VHD/VHXD on every boot
    2. Massive overhead of VHD files
    3. More wear on SSD than syncing the small files (here is zapetto wrong)

    1. I need to mount the .vhd/.vhxd every time you boot the target machine the .vxd file is living on. I managed to solve it with a starup script in admin context. But that is definitely not an easy solution for everybody.
    2. The virtual file formats have a lot of overhead. In my case it's doubling the file size. So 70 Gig of small files make me 140 Gig in disk costs.
    3. zapetto will have to read up on this again. Flash memory does just not work like that. Interesting are only the TBW (Total Bytes Written or TeraBytes Written)*. You can read SSD/Flash almost endless.
    That means, writing a file of 150 Gig of data makes far more wear on the SSD then re-writing e.g. 1 Million of 4K files (4GB).

    *https://www.dell.com/support/kbdoc/...drive-why-do-solid-state-devices-ssd-wear-out

    So, what zapetto does and I do may be interesting and time saving but is far from being elegant or optimal:rofl:

    What I'm looking for is more a packer that is able to pack and unpack on the fly (a R/W ZIP or ISO format if you will). And the best way to do that is to just install a driver. That would be great
     
    Last edited: Sep 9, 2023
  12. wacha

    wacha Member

    Joined:
    Oct 18, 2015
    Messages:
    21
    Likes Received:
    15
    Ah, well... I thought it might help.

    When I read what zapetto wrote a couple weeks back I thought that was quite clever. Had to look up VHD though.
    Thanks for letting us know the details.
    You seem quite ahead of me on this so... well, it's the intention that counts :)
     
  13. fiction

    fiction Audiosexual

    Joined:
    Jun 21, 2011
    Messages:
    1,903
    Likes Received:
    692
    Archive them and only extract what you need.
    Maintain a good index (whatever feels good in your eyes) so you can find stuff.
     
    • Interesting Interesting x 2
    • Agree Agree x 1
    • List
  14. twoheart

    twoheart Audiosexual

    Joined:
    Nov 21, 2015
    Messages:
    2,025
    Likes Received:
    1,242
    Location:
    Share many
    absolutely, thank you :like:
    I've just been thinking about it for a few months now. :cheers:
     
  15. Backtired

    Backtired Audiosexual

    Joined:
    Jan 15, 2016
    Messages:
    1,000
    Likes Received:
    685
    about Arturia: i ended up deleting everything from arturia. unacceptable the way the files are handled. took me half a day to transfer a folder to another hard drive.
    interesting thread, thanks to all who shared solutions
     
  16. saccamano

    saccamano Rock Star

    Joined:
    Mar 26, 2023
    Messages:
    1,061
    Likes Received:
    417
    Location:
    uranus
    I'll do regular acronis images of the entire systems on max compression and store on large offline external HDD's. There's recall of any individual file if something gets mashed so you're covered that way.
     
  17. Sinus Well

    Sinus Well Audiosexual

    Joined:
    Jul 24, 2019
    Messages:
    2,071
    Likes Received:
    1,584
    Location:
    Sanatorium
    NAS? I don't copy anything back and forth at all. I store data on the NAS and access it when needed. Either via tb, ethernet, wifi or port sharing. And if I really have to be mobil and need high throughput, well, that's what the mirror is for. Then I just take the image with me.
     
  18. Legotron

    Legotron Audiosexual

    Joined:
    Apr 24, 2017
    Messages:
    1,975
    Likes Received:
    1,876
    Location:
    Hyperborea
    For faster moving, I always pack midi files. Zip or rar
     
  19. Havana

    Havana Platinum Record

    Joined:
    May 6, 2022
    Messages:
    351
    Likes Received:
    191
    You just can't beat a Linux system when it comes to copying or deleting large number of files. Sad that Linux doesn't support major daws and plugins.
     
  20. fiction

    fiction Audiosexual

    Joined:
    Jun 21, 2011
    Messages:
    1,903
    Likes Received:
    692
    Re-writing 1 million files will change their directory entries too.
     
Loading...
Similar Threads - deal Millions Small Forum Date
Deal Compressor July 12, 2024 | Music Software Sales & New Releases Software News Jul 12, 2024
Deal Compressor July 5, 2024 | Music Software Sales & New Releases Software News Jul 5, 2024
What's The Deal With Loop Packs? Genre Specific Production Jun 10, 2024
10 Years Of Imaginando - 10 Days Of Deals Up To -90% Off Software News Jun 3, 2024
Dealing with mediocre tracks? Mixing and Mastering May 20, 2024
Loading...