Distributed & Decentralized Systems Curriculum
Decentralized Systems Β· IPFS Decentralized Storage

Key Question

How does Filecoin incentivize persistent storage in a peer-to-peer network?

Deep Dive

IPFS solves distribution β€” anyone can serve a file if they have it. But IPFS does not solve retention β€” no one is obligated to keep your data. If all peers delete your file, it disappears. This is the best-effort model: IPFS peers keep data they care about and discard the rest.

Filecoin adds economic incentives on top of IPFS. The core idea: clients pay miners to store data, and miners prove they’re storing it correctly over time. If they fail, they lose money.

The Storage Deal lifecycle:

Deal Flow:

1. Client                  2. Client                  3. Miner
   β”‚                          β”‚                          β”‚
   β–Ό                          β–Ό                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Client   β”‚ ──deal────►│ Publish  β”‚ ──deal────►│   Miner      β”‚
β”‚ has CID  β”‚   params   β”‚ deal to  β”‚   accept   β”‚   seals      β”‚
β”‚ QmX...   β”‚            β”‚ blockchainβ”‚            β”‚   the data   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                                                β”‚
       β”‚                                                β–Ό
       β”‚                                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚                                       β”‚  Proof-of-   β”‚
       β”‚                                       β”‚  Replication β”‚
       β”‚                                       β”‚  (PoRep)     β”‚
       β”‚                                       β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                                                β”‚
       β”‚                                                β–Ό
       β”‚                                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚                                  β”‚  Periodic Proof-of-     β”‚
       └──────────── payments ───────────►│  Spacetime (PoSt)       β”‚
                                           β”‚  Every 24 hours         β”‚
                                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Proof-of-Replication (PoRep): Before storing, the miner must prove they’re storing a physically unique copy. This prevents a β€œgeneration attack” where a miner claims to store 1,000 identical files but actually stores one copy and generates the hashes on demand.

PoRep (simplified):

Original data (D) ──► Sealing ──► Encoded data (E) stored on disk
                           β”‚
                           β–Ό
                 Miner generates proof Ο€:
                 "I have E committed at this point in time"
                           β”‚
                           β–Ό
                 Verifier checks Ο€ against hash of D
                 
If miner only has (D) without sealing: PoRep fails.
If miner shares (E) with another miner: PoRep fails (different sealing key).

Sealing is a sequential, time-consuming process (tens of minutes). Each sealed copy is unique to the miner, so one copy can’t serve as proof for multiple deals.

Proof-of-Spacetime (PoSt): Storage is about persistence, not just one-time proof. PoSt proves that a miner continues to hold the sealed data over time.

PoSt timeline:

Deal start                    Deal end
    β”‚                            β”‚
    β–Ό                            β–Ό
    β”œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€
    β”‚PoSt β”‚PoSt β”‚PoSt β”‚PoSt β”‚PoSt β”‚
    β”‚  0  β”‚  1  β”‚  2  β”‚  3  β”‚  4  β”‚
    β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
      β–²     β–²     β–²     β–²     β–²
      β”‚     β”‚     β”‚     β”‚     β”‚
    Every 24 hours (or more frequent for higher-value deals)

Each PoSt: a cryptographic proof using random challenges on the sealed data.
If miner misses a PoSt deadline β†’ slashed (collateral forfeited).

PoSt uses a technique called β€œproof of space” combined with a time-delay VDF (Verifiable Delay Function): the miner must have read the entire sealed sector within a window to generate the proof. This prevents outsourcing β€” if you could generate PoSt from another machine, you wouldn’t need to actually store the data.

Faults and slashing:

FailurePenalty
Missing a PoSt deadlineCollateral slashed (up to entire deal value)
Sector terminationCollateral slashed + deal payment returned
Double-dealing (same data to two clients)Slashing + reputation damage

IPFS vs Filecoin:

              IPFS                          Filecoin
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ Best effort  β”‚              β”‚ Guaranteed       β”‚
        β”‚ "I share whatβ”‚              β”‚ "I am paid to    β”‚
        β”‚  I like"     β”‚              β”‚  store this"     β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚                              β”‚
               β”‚                              β”‚
        No incentives                Collateral-backed
        No penalties                 Slashing penalties
        Anyone can serve             Miners must prove

How they work together: Filecoin uses IPFS/libp2p under the hood. CIDs are the same. A Filecoin miner runs an IPFS node that also participates in Filecoin’s deal market and proof system. If you have a Filecoin deal storing a CID, any IPFS peer can retrieve that data β€” Filecoin just guarantees it stays available.

Check Your Understanding

  1. Why can’t Filecoin use Proof-of-Spacetime alone without Proof-of-Replication?
  2. What happens to a Filecoin miner’s collateral if they lose power for 48 hours?
  3. Can an IPFS client retrieve a file stored via Filecoin without using Filecoin software themselves?

The β€œSo What?”

Filecoin transforms decentralized storage from a volunteer effort into a marketplace. The combination of PoRep (unique copies) and PoSt (persistent storage) creates the first trustless storage commodity β€” you pay for guaranteed persistence, with cryptographic proofs that the deal is being honored. This is the economic layer that IPFS was always designed to support.


✏️ Exercises

IPFS & Decentralized Storage: Exercises

Exercise 1: CID Mutability

Alice creates a file hello.txt with content β€œHello, world!” and adds it to IPFS. She gets CID QmPZ9gcCe5rsLKbrJfFQW9dLLgKNLoJN7da8uDNmhCWZqJ. She then changes the file to β€œHello, world?” (changing ! to ?) and adds it again.

  1. Does the CID change? Why or why not?
  2. Bob downloads both files. How can he verify which one is the original?
  3. If Alice wants to share a link that always points to her latest version, what IPFS mechanism does she need?

Exercise 2: IPFS vs BitTorrent

Consider downloading a 2 GB open-source operating system ISO. Compare IPFS and BitTorrent:

  1. Discovery: How does each system find peers who have the file?
  2. Verification: How does each system verify that downloaded data is correct?
  3. Incentives: How does each system encourage peers to upload after downloading?
  4. Merkle structure: Both BitTorrent and IPFS use a Merkle tree / DAG. Is there a conceptual difference in how they structure and address data?

Exercise 3: PoRep Necessity

Filecoin miners earn money by storing clients’ data. A dishonest miner considers the following attack:

  • Client wants to store 100 copies of a 1 GB dataset D.
  • Miner stores 1 copy of D and claims to have 100.
  • When challenged with PoRep, miner quickly generates the sealed data from the single copy.

Explain why this attack fails due to the design of Proof-of-Replication. Be specific about the sealing process and what makes each sealed copy unique to a specific miner and a specific deal.

Bonus: Could this attack work if Filecoin used only Proof-of-Spacetime without Proof-of-Replication?

πŸ‘οΈ View Solutions

IPFS & Decentralized Storage: Solutions

Exercise 1 Solution

1. Does the CID change?

Yes. The CID is the cryptographic hash of the file’s content. Changing even one byte completely changes the hash output (avalanche effect). Assuming SHA-256 is the hash function:

  • Original: SHA-256(β€œHello, world!”) β†’ QmPZ9gcCe5rsLKbrJfFQW9dLLgKNLoJN7da8uDNmhCWZqJ
  • Modified: SHA-256(β€œHello, world?”) β†’ completely different CID

The two CIDs share no relationship. You cannot derive one from the other.

2. How to verify which is original?

Download each file and compute the hash. If the hash matches the CID claimed by Alice, you have the file she intended. Since the CIDs are different, you can tell they’re different files. Without Alice telling you which CID is the β€œoriginal,” you can’t know the authorial intention β€” but you can be certain about the content.

3. Always pointing to the latest version?

Alice needs IPNS (InterPlanetary Name System). IPNS creates a pointer from Alice’s PeerID (public key hash) to a CID. Alice can update the pointer: ipns://QmAlicePeerID always resolves to her latest CID. Users who trust Alice’s public key will always get her latest file.

Exercise 2 Solution

1. Discovery:

BitTorrentIPFS
Centralized tracker or DHT with infohashKademlia DHT keyed by CID
Tracker returns list of peersDHT returns provider records
PEX (Peer Exchange) for gossipBitSwap handles peer discovery during exchange

BitTorrent traditionally relied on trackers (centralized). Modern BitTorrent uses DHT (Mainline DHT), which inspired IPFS’s DHT. IPFS’s approach is fully decentralized from the start.

2. Verification:

BitTorrentIPFS
Merkle tree: root hash (infohash), 256 KB piece hashesMerkle DAG: every node is content-addressed
Verify each piece against its hash in the torrent metadataVerify each block against its CID on download
Root hash in .torrent file or magnet linkCID is the root hash

Both use Merkle verification. The key difference: BitTorrent’s Merkle tree is flat (one level of pieces), while IPFS’s Merkle DAG can be nested (trees within trees).

3. Incentives:

BitTorrentIPFS
Tit-for-tat: β€œI’ll only upload to you if you upload to me”BitSwap barter: β€œI’ll trade blocks I have for blocks I want”
Strict: peer is choked if they don’t reciprocateSoft: based on credit/debit ratios
Leeching is directly punishedLeeching is indirectly punished (credit score drops)

BitTorrent’s tit-for-tat is more aggressive about enforcing sharing. IPFS’s BitSwap is more flexible β€” a peer with low credit can still fetch data, just at lower priority.

4. Merkle structure difference:

BitTorrent’s Merkle tree is a static structure: the piece list is fixed when the torrent is created. You cannot add files or reorganize without creating a new torrent.

IPFS’s Merkle DAG is a dynamic structure: you can add files, create directories, and link objects arbitrarily. IPFS directories are DAG nodes; BitTorrent has no concept of directory hierarchy in its data structure.

Exercise 3 Solution

Why the attack fails:

Proof-of-Replication involves sealing β€” a sequential, resource-intensive encoding process that ties a specific copy of the data to a specific miner:

Sealing process (simplified):

Original data D
      β”‚
      β”œβ”€β”€β–Ί Miner ID (M) ──────┐
      β”œβ”€β”€β–Ί Deal ID (Deal) ──────
      β”œβ”€β”€β–Ί Random nonce ────────
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
         Layer-by-layer encoding (AES + PoSW)
               β”‚
               β–Ό
         Sealed sector S = Seal(D, M, Deal, nonce)
         Time: ~30 minutes per sector

Each sealed copy S is different because:

  • The miner ID is different (each miner has a unique public key)
  • The deal ID is different (each deal is a separate contract)
  • The nonce is different (random value per deal)

For the miner claiming 100 copies stored:

  • They would need 100 different sealed sectors: S₁, Sβ‚‚, …, S₁₀₀
  • Each requires ~30 minutes of sequential computation
  • They cannot compute 100 proofs from 1 copy because the sealing input (miner ID, deal ID) differs per deal
  • PoRep challenges ask about the sealed data, which is unique per copy

If caught cheating (unable to produce the correct PoRep), the miner’s entire collateral for all 100 deals is slashed.

Bonus: Without PoRep (PoSt only):

The attack would likely succeed. PoSt proves that you currently hold some data, but it doesn’t prove that the data is a unique copy. With PoSt alone:

  • Store 1 copy of D
  • Generate PoSt for that one copy
  • Have the same PoSt serve as proof for all 100 deals (since PoSt just proves β€œI have this hash”)
  • Collect 100Γ— payment for 1Γ— storage

This is why PoRep is necessary: it binds each deal to a physically distinct, computation-bound encoding.