• Welcome to KawaseFan.net Forum.
 

umires.arc

Started by Naulahauta, November 30, 2015, 06:45:42 PM

Previous topic - Next topic

Naulahauta

umires.arc

The following post is typed in a monospace font for easier reading.


All of the Umihara Kawase games were recently released in Steam, and after looking into the game files I noticed that all of the games are basically dependent on three things:
• The executable itself
• A separate folder for music
• the fabled umires.arc

The EXE is obviously required to play the games and the music folder contains all the music for the games, but all of the assets, and I do mean all of them are hidden inside the umires.arc. This includes sprites, textures, models, levels and sound effects. Probably some things I'm missing as well. The thing is, the file is a proprietary format and has no specification - it's impossible to decompress by normal means. No decompressor opens it, but we can fix that.

So, for the sake of ROM hacking (all three games) and data preservation, I propose an umires.arc unpacker. I can't program but I've directed numerous similar projects in the past, so if a volunteer programmer steps up (I'm looking at you, Commando125...) we might be onto some really interesting stuff!


So, how does the format work?
Actually, it's pretty simple. The ARC is a collection of uncompressed files and folders, but before their data, the folder stucture is defined in the beginning of the file (Master set).

Anyway, the first four bytes of the umires.arc are always 53 41 43 48, or SACH in ASCII. After that, there's some odd stuff and finally the headers. Observe the hex view:


Looks fishy. I added in some color emphasis so you get the idea:


Pretty simple! After the magic number there are 16 unknown bytes but after that the fun begins. First you read four bytes as Little Endian, in this case 00 00 00 10, so you know there's $10 (sixteen) headers in this file set. Headers are 16 bytes each so you read 256 bytes (16 * 16 = 256).
They're as follows (I added the line breaks and notes for easier reading):


UNUSEDBYTES.NAMEADDRESS-DATAADDRESS-SIZEOFDATAX
00 00 00 80 2C 1F 00 00 18 01 00 00 00 00 00 00
00 00 00 80 31 1F 00 00 BC 01 00 00 00 00 00 00
00 00 00 80 37 1F 00 00 00 02 00 00 00 00 00 00
00 00 00 80 3B 1F 00 00 24 03 00 00 00 00 00 00
00 00 00 80 41 1F 00 00 98 03 00 00 00 00 00 00
00 00 00 80 45 1F 00 00 AC 03 00 00 00 00 00 00
00 00 00 80 4C 1F 00 00 50 04 00 00 00 00 00 00
00 00 00 80 54 1F 00 00 A4 04 00 00 00 00 00 00
00 00 00 80 5B 1F 00 00 08 05 00 00 00 00 00 00
00 00 00 80 63 1F 00 00 3C 05 00 00 00 00 00 00
00 00 00 80 68 1F 00 00 70 05 00 00 00 00 00 00
00 00 00 80 6C 1F 00 00 04 09 00 00 00 00 00 00
00 00 00 80 70 1F 00 00 0C 19 00 00 00 00 00 00
00 00 00 80 76 1F 00 00 E0 1C 00 00 00 00 00 00
00 00 00 80 7D 1F 00 00 34 1D 00 00 00 00 00 00
00 00 00 80 83 1F 00 00 78 1D 00 00 00 00 00 00

Legend:
UNUSEDBT = Unused bytes. Ignorable.
NAMEADDRESS = Where the name is stored, read in ASCII until you encounter a 00-byte.
DATAADDRESS = Where the data is stored.
SIZEOFDATAX = How big is the file in bytes. If it's 00 00 00 00 then it's a folder.

So.. what now? Well, connect the dots. There really isn't much to but to slice the file into files and folders as instructed by the bytes. See this flowchart fo you get the idea (this umires.arc is from Umihara Kawase Shun, but all games use the same .arc format):



Coincidentally, that outcoming BIN is also an undocumented archive type. Thankfully its format is pretty much exactly the same as described above but the pointers are two bytes (data, size) and they have no names:
19 00 00 00
CC 00 00 00 CC 29 00 00
98 2A 00 00 1C 40 00 00
etc ...



guh, it's late. I don't know if I'm making any sense but hopefully this is an interesting endeavour. In closing I have to emphasize that this works on all three games.

Commando125

#1
I'm currently juggling three projects right now: one for here, one for business, and one for friends. I can code the archive unpacker if it's quick and easy to do. I will need time for other things though.

What are the header sets for? Why are they grouped up like that?

Naulahauta

It's okay. I didn't want to sound inconsiderate but you really don't have to do anything if you're already managing several other projects.
Besides, I recently heard about the power of QuickBMS and I might be able to do it myself with the tools. :)
The master header sets tell where other headers are, and those, in turn, tell where the real files are and what they're called.

Commando125

#3
You weren't inconsiderate. I was just going through a lot today. Take a look at that QuickBMS thing, looks cool. I'll play around with making an archive unpacker.

Commando125

#4
Hello again. I cooked up a python script. Lots of interesting stuff. I'm not sure about the legality of reverse engineering a data file from a game in the Steam store, so I do not want to hold responsibility if the outputted files from this script are released on the internet, or if your game data fucks up (which can be redownloaded). The rest is up to you guys.

P.S.: The level data from the first game is the same compression as the SNES game. I want to focus on the SNES version first though, but it can be possible to modify it to read the PC port's data files. The levels for the other two games I think are just models.

Naulahauta

#5
Yes, the SNES level format is exactly the same in the NDS version and the PC port. This has great potential because the data is somewhat interchangeable between the ports.

Thank you so much for the script... unfortunately when I run it I get this:


..File "/Users/vernerikontto/Umires/arcdecompress.py", line 84
....print('.', end='', flush=True)
..................^
SyntaxError: invalid syntax


:c


EDIT:
Well, I modified all the print('.', end='', flush=True) instances to simply print('.') and I got it to run. However, it gives the same error for all three ARCs:
TypeError: The file type is not a valid .arc file for the game

I'm analysing the source the best I can, still a bit weird though.

Commando125

#6
That's strange. Worked for me when I put the archive in the same place as the script. Make sure the name of the file is umires.arc, and that it is in the same place and run "python arcdecompress.py" in the command line/terminal.

If I remember right from your screenshots, you are on a Mac computer. Are you on a Mac? If so, try it on Windows (via Bootcamp) or Linux. I don't think the filepath package I'm using in Python supports Mac OS X, or even if it supports some features of Mac OS X well at all (I think the Mac specific features removed in 3.0).

If you aren't on a Mac and it didn't work, make sure it is the latest version of Python 3 (not 2). You can remove the print commands without issue. I added the "flush=True" in some of the print functions because Windows command prompt only flushes on a new line otherwise. If the script says it is not a valid .arc file, don't try to force it to go any further because then it is probably reading the wrong file.

Naulahauta

#7
python -V gives me 2.7.6. I just installed Python 3.5.1rc1 and now it works on all three games.
FANTASTIC!! Good job and thank you. This will prove to be a very useful tool.

I'll spend some time verifying/reverse engineering the files now. Let's hope the level formats of Shun and Sayonara are simpler than the SFC version's, heh.

EDIT: The level format of Steam-Shun has been re-encoded and is not the same as in the PSX ISO. In fact, the model/level files seem to have the same header as the model files found in Sayonara Umihara Kawase's port. I compared the PSX, NDS and PC level files and the PC one looks readable while the PSX is a real mish-mash of bytes.
Can someone tell what's the "simplest" level in Umihara Kawase Shun? Like, the most plain, uninteresting level with least platforms/objects?

Here's a screenshot of a snippet of F54's bytes, formatted and with "00" turned to "..". There are clearly ordered patterns, possibly defining quads/polys/verts. Very interesting.



texh

Simplest field is probably 31 although I'm not sure how the two tadpoles add to its simplicity.

Really wish you good luck on this since if you managed to open the Umihara games to hacking it would be nothing short of amazing!

Naulahauta

#9
Quote from: texh on December 04, 2015, 02:25:28 PM
Simplest field is probably 31 although I'm not sure how the two tadpoles add to its simplicity.

Well, I checked it out and ..... man, we might be into something huge.
All levels come in mdl/atr pairs. There's f31.mdl and f31.atr.

I checked out the level from Youtube and made a shitty screenshot map in photoshop:


Obviously while the game is technically 3D, you can only interact with a 2D plane. So concentrate on the walkable tiles:


Now, let's look at our f31.atr file. There's a bunch of bytes, most are "00" which I replaced with ".." to help legibility, but amongst them there's something very suspicious:


Duuuuude.

I overlaid them on top of the level and lo and behold – this is the interaction map.


It appears that 08 is solid, 00 is air, 10 11 is a light slope right and 12 13 is a light slope left. That's literally it. One byte per block, no compression or anything. The few header bytes serve a purpose obviously, maybe width and height and stuff, but this is definitely the physical attributes of the corresponding level!


EDIT EDIT:
I get it now. Technically you could make a "level viewer" already, because the format is so simple. It's only a matter of figuring out the bytes, but at times they're super obvious. I took a look at f00.atr, the first 14 bytes are as follows (legend added):

unknw-width-height------------leveldatasz
06 00 38 00 24 00 00 00 00 00 80 1F 00 00

width is level width / 2 (the level is 112 blocks wide)
height is level height / 2 (the level is 72 blocks high)
leveldatasz is level data size.
unknw is possibly music or time limit. Not sure.

So, after the 14 bytes of header, you simply read 0x1F80 bytes and display them as rows of 112 bytes.
Once you've read the bytes, you read another 0x1F80 bytes and overlay them on top of the bytes.
That's all.

It's an eyesore but I don't know a better way to present this: image link. I laid out the blue bytes first, then the red ones. If you can recognise field 00 here, you probably get what the bytes mean.
The blue bytes are hard interaction, like slopes and floors.
The red bytes are soft interaction, like bird landing points, spawn-safe areas (the 03 03 03s), enemies, ladders and doors. And of course, the player starting point.

What's interesting is that besides the 61 normal field mdl/atr pairs (all named f00.atr, f00.mdl etc.), there are seven pairs named d00.atr/d00.mdl. These are the tutorial cut-scene levels. My best bet at cracking the mdl format is by diving into these, as they're all super super simple.

Stay tuned!!

Alc

Wow! Nice work. I guess it makes sense that there's no compression scheme for the level geometry given it's a CD game with lots of space going spare.

Naulahauta

Quote from: Alc on December 06, 2015, 06:05:23 PM
... given it's a CD game with lots of space going spare.
Quite the opposite, in fact. I'm talking about the PC port. The PSX version is way more complicated. The PSX and PC versions both have a lot of room to spare when compared to SNES, but I'm still baffled how complex the PSX level data looks.

Alc

Old habits dying hard in the PSX era, I guess. Still, whether it's editing levels on the PC port or the PSX original, this is a really cool development, kudos.

Naulahauta

#13
I've started coding an ATR viewer for the game.. with Javascript, of all things...
The bad news is that the code is a vomit to look at, but the good news is that the output is SVG!

It's still a work in progress because I have to figure out what all the bytes mean, and so far I've only implemented solids ($00) although I know for certain what $08, $10, $11, $12, $13, $14, $15, $16 and $17 mean.





EDIT:

The output is a mish-mash of blocks but by weeding out the "air" blocks and combining the same-colored ones in Illustrator, it looks really nice!


EDIT EDIT:
Haha, looks like the developers screwed up at Field 15! Look at the lone slope tile amidst the solids:


EDIT EDIT EDIT:
The output is a lot more clean now and I think I've nailed down every hard interaction.

I've got 8 different slopes, solid levels from 1 pixel to 8 pixels high and air. The soft interaction is rendered on top of the hard ones, but since I haven't defined any bytes, you can't see anything. The soft interaction is weird in that way that the definitions are very likely 2-byte words but can start from an even or an odd index. I'm having some trouble making the viewer understand that 00010000 is the same thing as 00000100. But I'll get there.

Can anyone who's played Umihara Kawase Shun a lot tell me the most complex levels? Not necessarily the biggest, but the ones with most crazy shit going on in terms of conveyors, enemies, special/unique objects or something?

Commando125

You have done well on the attribute editor, but you should probably investigate the .mdl format they use, because you won't be able to change the "visual" of the level without it.

Naulahauta

Quote from: Commando125 on December 09, 2015, 08:58:49 AM
You have done well on the attribute editor, but you should probably investigate the .mdl format they use, because you won't be able to change the "visual" of the level without it.
It's not an editor per se, but a viewer. I'm looking into the MDL files too but it's become apparent that I can't do it without actually buying the game, I simply can't figure it out just by looking at the bytes. But this is a fun exercise, thank you for your kind words.

texh

Quote from: Naulahauta on December 07, 2015, 09:11:29 PM
Can anyone who's played Umihara Kawase Shun a lot tell me the most complex levels? Not necessarily the biggest, but the ones with most crazy shit going on in terms of conveyors, enemies, special/unique objects or something?

F47 has both conveyor belts and moving platforms which is probably as crazy as it gets. Other than that the rotating barrels in F22 may be worth checking and there are moving teeth (just how do you call that?) in F38.

Also just out of curiosity, could you post pic of F17? There's a weird bug where you can stand on air in the top right corner and I gotta wonder if it would show up something in your viewer.

Naulahauta

Thanks for the info, I'll look into them! Here's a picture of f17, as you asked.

Naulahauta

#18
sorry for the double post but wow okay this is seriously really cool..! I made myself a virtual machine and played the game on it. It works with NINJA ripper very well and I was able to get very sane output. Here's a screen capture of me fooling around.
https://youtu.be/6_OlYoHaoqo

Now I want to make a paper diorama... haha.

Anyway, since the ripper snatches the 3D data in its purest form, this is a very good reference point to how the MDL format works. I'm sure the polygon hierarchy is the same between this rip and the actual MDL.

I'm doing the first direct comparisons tomorrow so stay tuned.


Really hyped!

KawaseFan

#19
That's pretty cool, looking forward to seeing how it progresses!

That F17 picture definitely explains why you can stand on air in that corner - makes me wonder why it wasn't fixed in the PC version.

Naulahauta

My best guess is that in order to make the most faithful port they simply wrote a some kind of a wrapper to compile all the original assets from way back to something Windows understands. After all, it's more of a port than a remake.

Commando125

I have been wondering. How are we going to compress these files back to an .arc archive? We would have to find out those "mystery/unknown bytes" in the format.

Naulahauta

#22
Quote from: Commando125 on December 11, 2015, 01:50:31 PM
I have been wondering. How are we going to compress these files back to an .arc archive? We would have to find out those "mystery/unknown bytes" in the format.
You're right – to be absolutely sure things get re-packed back as they were, we'll have to figure the exact file format specifications. For now, I'll just resort to editing the unpacked individual level files in my editor of choice, and when I want to see my changes I'll simply paste the bytes back to the umires.arc. Convoluted, maybe, but at least it's progress. The only downside is that I can't add or remove a single byte. The size has to be exactly the same.


EDIT EDIT:
Oh man, oh man.. Oh man. This is great.
There's a suspicious array of 2880 bytes in the d01.mdl file, which is the very first demo level that auto-plays before Field 00. I edited these bytes and the borders around the block structures disappeared. Obviously the data is related, but how? Well, I counted the polygons:


Amount of triangles: 72.
72 * 2 = 144.
The array of 2880 bytes in the file repeats the strange structure every 20 bytes.
20 * 144 = 2880.

This is it! One 20-byte blob is a definition for one point! I'm still hacking but I just had to tell someone, this is super exciting

EDIT EDIT EDIT:
YEAH this is it. One 20-byte row is definitely one point. To stitch together a quick and dirty point cloud, I just simply converted the bytes to 4-bytes-x, 4-bytes-y, 4-bytes-z, 8-bytes-ignore. Compare:


ˆ the real level


ˆ my output


The output looks.. wrong, but you can definitely see that we're on the right track. Maybe the right order is zxy or something, or maybe the 8 other bytes tell something we don't know yet.

Now I need some sleep.

Naulahauta

#23
Shameful double-post for exposure: I extracted a point cloud directly from the MDL file! VIDEO LINK
I read the bytes all wrong. Here's the first point definition bytes:
00 00 80 3F 00 00 00 00 00 00 A8 40 80 80 80 FF 00 24 00 2C
Instead of reading them as integers, I had to read them as Little-endian Floats. I used this page and got myself the following readout:
3F 80 00 00: 1
00 00 00 00: 0
40 A8 00 00: 5.25
FF 80 80 80: NaN
2C 00 24 00: 1.8209878e-12


Now, obviously the last two doubles are incorrect, but the first three look totally reasonable. By interpreting them as x, y and z I got the first point of the cloud shown above.

EDIT EDIT:

I've got... some progress.
• All MDLs begin with the same word, 18 00, which is a pointer to where all the data begins.
• After that, there's 44 52 00 01 00 00 43 00, these are always the same too.
• After that, there's a word for object amount (OBJAM), then a word for texture amount (TEXAM), then a word that's always 02 00.
• Then comes the first pointer. Four bytes - the pointer to the object headers (OBHDR).
• Then comes the second pointer. Four bytes - the pointer to the texture definitions (TXPTR).
• Then comes a strange pattern. It's always 0x1608 bytes and appears to be 0x54 bytes wide. This is the same in every single MDL. I don't know what this means.

• After those bytes, we've arrived to the first object. Every object begins with a 0x14 byte subheader. The subheader has:
•• Four bytes for a pointer to where the point list begins
•• Four bytes for a pointer to an unknown array of 0x0C bytes
•• Two bytes for the amount of points in point list
•• Four unknown bytes
•• Two bytes for which texture to use for this object
•• Two unknown bytes
•• FF FF

After the subheader there's a varying amount of words. I don't know where the amount comes from. Sometimes it's 0x7E, sometimes it's 0x86... no idea

• But then we're at the point list as told by the subheader's first pointer. Its format is as follows:
•• Four bytes for x (float)
•• Four bytes for y (float)
•• Four bytes for z (float)
•• Four bytes for vertex shading (possibly unused!)
•• Four bytes for unknown

After the point list, there's the weird array of 0x0C bytes as told by the subheader's second pointer.

... Then, repeat the same thing again, x times. x being OBJAM.

• After all objects have been listed, we've arrived at the offset that OBHDR told us about. It's format is as follows:
•• Two bytes for the size of the object header
•• Two bytes for unknown
•• Two bytes for texture name pointer
•• 32 bytes for unknown
•• Two bytes for unique texture ID
•• Four bytes for unknown
Repeat this OBJAM times.

After those, we've arrived at the offset that TXPTR told us about. It's ASCII and says fb0-l05 fb0-l0d fb0-blkface. Those are the names of the texture files this level loads to its memory, and there's three of them because TEXAM is 03 00.

End of file. Note: what was between the texture definitions and the object headers were the ASCII names of the individual textures.

Commando125

I'm not sure if it's a good idea to continue hacking the PC game if they are no longer sold in the steam store.

badlose

Maybe now is the BEST TIME to hack actually! Also:

http://www.greenmangaming.com/search/?q=umihara

https://www.humblebundle.com/store/search/search/umihara

Hopefully you can still download from Steam if you buy from an outside place!

Commando125

I don't get it. Why would it be the best time?

badlose

People are still going to find a way to get the game if they want it.