| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
On Windows, UDPConn.ReadFrom returns an error for packets larger
than the receive buffer. The error is not marked temporary, causing
our loop to exit when the first oversized packet arrived. The fix
is to treat this particular error as temporary.
Fixes: #1579, #2087
Updates: #2082
|
|
|
|
|
| |
The test expected the timeout to fire after a matcher for the response
was added, but the timeout is random and fired sooner sometimes.
|
|
|
|
|
|
| |
This change makes it possible to add peers without providing their IP
address. The endpoint of the target node is resolved using the discovery
protocol.
|
|
|
|
|
|
| |
This change simplifies the dial scheduling logic because it
no longer needs to track whether the discovery table has been
bootstrapped.
|
| |
|
| |
|
|
|
|
| |
thanks to Felix Lange (fjl) for help with design & impl
|
| |
|
|\
| |
| | |
eth, p2p, rpc/api: polish protocol info gathering
|
| | |
|
| | |
|
|/ |
|
|
|
|
| |
The strict matching can get in the way of protocol upgrades.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nodeDB.querySeeds was not safe for concurrent use but could be called
concurrenty on multiple goroutines in the following case:
- the table was empty
- a timed refresh started
- a lookup was started and initiated refresh
These conditions are unlikely to coincide during normal use, but are
much more likely to occur all at once when the user's machine just woke
from sleep. The root cause of the issue is that querySeeds reused the
same leveldb iterator until it was exhausted.
This commit moves the refresh scheduling logic into its own goroutine
(so only one refresh is ever active) and changes querySeeds to not use
a persistent iterator. The seed node selection is now more random and
ignores nodes that have not been contacted in the last 5 days.
|
| |
|
|\
| |
| | |
fdtrack: hide message
|
| |
| |
| |
| | |
This reverts commit 5c949d3b3ba81ea0563575b19a7b148aeac4bf61.
|
| |
| |
| |
| |
| |
| |
| |
| | |
PR #1621 changed Table locking so the mutex is not held while a
contested node is being pinged. If multiple nodes ping the local node
during this time window, multiple ping packets will be sent to the
contested node. The changes in this commit prevent multiple packets by
tracking whether the node is being replaced.
|
| | |
|
|/
|
|
| |
Might solve #1579
|
|\
| |
| | |
p2p: validate recovered ephemeral pubkey
|
| | |
|
| |
| |
| |
| | |
We had the wrong value (12) since forever.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the timeout fired (even just nanoseconds) before the deadline of the
next pending reply, the timer was not rescheduled. The timer would've
been rescheduled anyway once the next packet was sent, but there were
cases where no next packet could ever be sent due to the locking issue
fixed in the previous commit.
As timing-related bugs go, this issue had been present for a long time
and I could never reproduce it. The test added in this commit did
reproduce the issue on about one out of 15 runs.
|
| |
| |
| |
| |
| |
| | |
Table.mutex was being held while waiting for a reply packet, which
effectively made many parts of the whole stack block on that packet,
including the net_peerCount RPC call.
|
| | |
|
| |
| |
| |
| | |
Not closing the table used to be fine, but now the table has a database.
|
| |
| |
| |
| |
| | |
Package fdtrack logs statistics about open file descriptors.
This should help identify the source of #1549.
|
| |
| |
| |
| | |
I forgot to update one instance of "go-ethereum" in commit 3f047be5a.
|
|/
|
|
|
| |
All code outside of cmd/ is licensed as LGPL. The headers
now reflect this by calling the whole work "the go-ethereum library".
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lookup calls would spin out of control when network connectivity was
lost. The throttling that was in place only took effect when the table
returned zero results, which doesn't happen very often.
The new throttling should not have a negative impact when the host is
online. Lookups against the network take some time and dials for all
results must complete or hit the cache before a new one is started. This
usually takes longer than four seconds, leaving online lookups
unaffected.
Fixes #1296
|
| |
|
|
|
|
|
|
|
| |
As of this commit, we no longer rely on the protocol handler to report
write errors in a timely fashion. When a write fails, shutdown is
initiated immediately and no new writes can start. This will also
prevent new writes from starting after Server.Stop has been called.
|
|
|
|
| |
rand.Source isn't safe for concurrent use.
|
| |
|
| |
|
|
|
|
|
| |
The previous value of 5 seconds causes timeouts for legitimate messages
if large messages are sent.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This detects hanging connections sooner. We send a ping every 15s and
other implementation have similar limits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The most visible change is event-based dialing, which should be an
improvement over the timer-based system that we have at the moment.
The dialer gets a chance to compute new tasks whenever peers change or
dials complete. This is better than checking peers on a timer because
dials happen faster. The dialer can now make more precise decisions
about whom to dial based on the peer set and we can test those
decisions without actually opening any sockets.
Peer management is easier to test because the tests can inject
connections at checkpoints (after enc handshake, after protocol
handshake).
Most of the handshake stuff is now part of the RLPx code. It could be
exported or move to its own package because it is no longer entangled
with Server logic.
|
| |
|
|
|
|
|
|
| |
The previous limit was 10MB which is unacceptable for all kinds
of reasons, the most important one being that we don't want to
allow the remote side to make us allocate 10MB at handshake time.
|
| |
|
| |
|
| |
|
|\
| |
| | |
p2p: tweak connection limits
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
This should increase the speed a bit because all findnode
results (up to 16) can be verified at the same time.
|
| |
| |
| |
| |
| |
| | |
The returned reason is currently not used except for the log
message. This change makes the log messages a bit more useful.
The handshake code also returns the remote reason.
|
| |
| |
| |
| |
| | |
On the test network, we've seen that it becomes harder to connect
if the queues are so short.
|
| |
| |
| |
| |
| |
| | |
People stil get confused about the messages. This commit changes
the levels so that the only thing printed at the default level (info)
is a successful mapping.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The test listens for multicast UDP packets on the default interface
because I couldn't get it to work reliably on loopback without massive
changes to goupnp. This means that the test might fail when there is a
UPnP-enabled router attached on that interface. I checked that locally
by looping the test and it passes reliably because the local SSDP server
always responds faster.
|
|/
|
|
|
|
|
|
|
|
|
|
| |
Concurrent calls to Interface methods on autodisc could return a "not
discovered" error if the discovery did not finish before the call.
autodisc.wait expected the done channel to carry the found Interface
but it was closed instead.
The fix is to use sync.Once for now, which is easier to get right.
And there is a test. Finally.
This will have to change again when we introduce re-discovery.
|
|
|
|
|
| |
The code assumed that Table.closest always returns at least 13 nodes.
This is not true for small tables (e.g. during bootstrap).
|
| |
|
|
|
|
| |
neighbours packets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't have a UDP which specifies any messages that will be 4KB. Aside from being implemented for months and a necessity for encryption and piggy-backing packets, 1280bytes is ideal, and, means this TODO can be completed!
Why 1280 bytes?
* It's less than the default MTU for most WAN/LAN networks. That means fewer fragmented datagrams (esp on well-connected networks).
* Fragmented datagrams and dropped packets suck and add latency while OS waits for a dropped fragment to never arrive (blocking readLoop())
* Most of our packets are < 1280 bytes.
* 1280 bytes is minimum datagram size and MTU for IPv6 -- on IPv6, a datagram < 1280bytes will *never* be fragmented.
UDP datagrams are dropped. A lot! And fragmented datagrams are worse. If a datagram has a 30% chance of being dropped, then a fragmented datagram has a 60% chance of being dropped. More importantly, we have signed packets and can't do anything with a packet unless we receive the entire datagram because the signature can't be verified. The same is true when we have encrypted packets.
So the solution here to picking an ideal buffer size for receiving datagrams is a number under 1400bytes. And the lower-bound value for IPv6 of 1280 bytes make's it a non-decision. On IPv4 most ISPs and 3g/4g/let networks have an MTU just over 1400 -- and *never* over 1500. Never -- that means packets over 1500 (in reality: ~1450) bytes are fragmented. And probably dropped a lot.
Just to prove the point, here are pings sending non-fragmented packets over wifi/ISP, and a second set of pings via cell-phone tethering. It's important to note that, if *any* router between my system and the EC2 node has a lower MTU, the message would not go through:
On wifi w/normal ISP:
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=104.831 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=119.004 ms
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 104.831/111.918/119.004/7.087 ms
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1
Tethering to O2:
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=107.844 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=105.127 ms
1458 bytes from 52.6.250.242: icmp_seq=2 ttl=42 time=120.483 ms
1458 bytes from 52.6.250.242: icmp_seq=3 ttl=42 time=102.136 ms
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
With the introduction of static/trusted nodes, the peer count
can go above MaxPeers. Update the capacity check to handle this.
While here, decouple the trusted nodes check from the handshake
by passing a function instead.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| | |
Last minute p2p fixes
|
| | |
|
| | |
|
| | |
|
| | |
|
|\ \
| |/
|/| |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
Conflicts:
p2p/server_test.go
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
The previous metric was pubkey1^pubkey2, as specified in the Kademlia
paper. We missed that EC public keys are not uniformly distributed.
Using the hash of the public keys addresses that. It also makes it
a bit harder to generate node IDs that are close to a particular node.
|
| | |
|
| | |
|
|/
|
|
|
|
| |
This commit changes the discovery protocol to use the new "v4" endpoint
format, which allows for separate UDP and TCP ports and makes it
possible to discover the UDP address after NAT.
|
|
|
|
|
| |
p2p.Msg.ReceivedAt can be used for determining block propagation from
begining to end.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
This helps with fixing the tests for cmd/geth to run without networking.
|
|
|
|
|
| |
We decode into [1]DiscReason in a few places. That doesn't work anymore
because package rlp no longer accepts RLP lists for byte arrays.
|
| |
|
| |
|
|
|
|
|
| |
The dial timer was not reset properly when the peer count reached
MaxPeers.
|
| |
|
| |
|
| |
|
|
|
|
| |
removePeer can be called even after listenLoop and dialLoop have returned.
|
|
|
|
|
|
|
|
|
| |
Peer.readLoop will only terminate if the connection is closed. Fix the
hang by closing the connection before waiting for readLoop to terminate.
This also removes the british disconnect procedure where we're waiting
for the remote end to close the connection. I have confirmed with
@subtly that cpp-ethereum doesn't adhere to it either.
|
|
|
|
| |
This regression was introduced in b3c058a9e4e9.
|
|
|
|
|
|
|
| |
This is supposed to apply some back pressure so Server is not accepting
more connections than it can actually handle. The current limit is 50.
This doesn't really need to be configurable, but we'll see how it
behaves in our test nodes and adjust accordingly.
|
| |
|
|
|
|
|
|
|
| |
As of this commit, p2p will disconnect nodes directly after the
encryption handshake if too many peer connections are active.
Errors in the protocol handshake packet are now handled more politely
by sending a disconnect packet before closing the connection.
|
|
|
|
| |
netWrapper already sets a read deadline in ReadMsg.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
There were multiple synchronization issues in the disconnect handling,
all caused by the odd special-casing of Peer.readLoop errors. Remove the
special handling of read errors and make readLoop part of the Peer
WaitGroup.
Thanks to @Gustav-Simonsson for pointing at arrows in a diagram
and playing rubber-duck.
|
|
|
|
|
|
|
| |
This commit introduces a new (temporary) peer selection
strategy based on random lookups.
While we're here, also implement the TODOs in dialLoop.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This a fix for an attack vector where the discovery protocol could be
used to amplify traffic in a DDOS attack. A malicious actor would send a
findnode request with the IP address and UDP port of the target as the
source address. The recipient of the findnode packet would then send a
neighbors packet (which is 16x the size of findnode) to the victim.
Our solution is to require a 'bond' with the sender of findnode. If no
bond exists, the findnode packet is not processed. A bond between nodes
α and β is created when α replies to a ping from β.
This (initial) version of the bonding implementation might still be
vulnerable against replay attacks during the expiration time window.
We will add stricter source address validation later.
|
|
|
|
|
|
| |
The primary motivation for doing this right now is that old PoC 8
nodes and newer PoC 9 nodes keep discovering each other, causing
handshake failures.
|
| |
|
|\ |
|
| | |
|
| |
| |
| |
| |
| |
| | |
This is better because protocols might not actually read the payload for
some errors (msg too big, etc.) which can be a pain to test with the old
behaviour.
|
| |
| |
| |
| | |
This helps a lot with debugging.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Message encoding functions have been renamed to catch any uses.
The switch to the new encoder can cause subtle incompatibilities.
If there are any users outside of our tree, they will at least be
alerted that there was a change.
NewMsg no longer exists. The replacements for EncodeMsg are called
Send and SendItems.
|
|/ |
|
|\ |
|
| | |
|
|/ |
|
| |
|
|
|
|
|
| |
It is unused and untested right now. We can
bring it back later if required.
|
|
|
|
|
| |
Until chunked frames are implemented we cannot send messages
with a size overflowing uint24.
|
|
|
|
| |
They got lost in the transition to rlpxFrameRW.
|
|
|
|
|
|
|
|
|
| |
With RLPx frames, the message code is contained in the
frame and is no longer part of the encoded data.
EncodeMsg, Msg.Decode have been updated to match.
Code that decodes RLP directly from Msg.Payload will need
to change.
|
| |
|
|
|
|
|
|
|
|
|
| |
This mostly changes how information is passed around.
Instead of using many function parameters and return values,
put the entire state in a struct and pass that.
This also adds back derivation of ecdhe-shared-secret. I deleted
it by accident in a previous refactoring.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
This should prevent connection drops.
|
|
|
|
|
|
| |
The diff is a bit bigger than expected because the protocol handshake
logic has moved out of Peer. This is necessary because the protocol
handshake will have custom framing in the final protocol.
|
|\
| |
| | |
Cleanup imports
|
| |
| |
| |
| | |
My temporary fix was merged upstream.
|
| |
| |
| |
| |
| | |
We forgot to update this reference when moving ecies into the
go-ethereum repo.
|
|/
|
|
|
|
| |
Range expressions capture the length of the slice once before the first
iteration. A range expression cannot be used here since the loop
modifies the slice variable (including length changes).
|
| |
|
|\ |
|
| | |
|
|/
|
|
|
| |
* ECIES moved from obscuren to ethereum
* Added html META[name=badge] to reflect menuItem.secondaryTitle
|
|
|
|
| |
For compatibility with cpp-ethereum
|
| |
|
|
|
|
|
| |
udp.Table was assigned after the readLoop started, so
packets could arrive and be processed before the Table was there.
|
|
|
|
|
| |
addPeer doesn't allow self connects, but we can avoid opening
connections in the first place.
|
| |
|
|
|
|
|
| |
The deflect logic called Disconnect on the peer, but the peer never ran
and wouldn't process the disconnect request.
|
|
|
|
|
|
|
|
| |
There are now two deadlines, frameReadTimeout and payloadReadTimeout.
The frame timeout is longer and allows for connections that are idle.
The message timeout is still short and ensures that we don't get stuck
in the middle of a message.
|
| |
|
| |
|
|
|
|
| |
This deletes the old NAT implementation.
|
|
|
|
|
|
| |
I have verified that UPnP and NAT-PMP work against an older version of
the MiniUPnP daemon running on pfSense. This code is kind of hard to
test automatically.
|
| |
|
| |
|
| |
|
|
|
|
| |
The unit test hooks were turned on 'in production'.
|
|
|
|
|
| |
The discovery RPC protocol does not yet distinguish TCP and UDP ports.
But it can't hurt to do so in our internal model.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Overview of changes:
- ClientIdentity has been removed, use discover.NodeID
- Server now requires a private key to be set (instead of public key)
- Server performs the encryption handshake before launching Peer
- Dial logic takes peers from discover table
- Encryption handshake code has been cleaned up a bit
- baseProtocol is gone because we don't exchange peers anymore
- Some parts of baseProtocol have moved into Peer instead
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
- add const length params for handshake messages
- add length check to fail early
- add debug logs to help interop testing (!ABSOLUTELY SHOULD BE DELETED LATER)
- wrap connection read/writes in error check
- add cryptoReady channel in peer to signal when secure session setup is finished
- wait for cryptoReady or timeout in TestPeersHandshake
|
|
|
|
| |
this is directly copied in the auth message
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
- set proper public key serialisation length in pubLen = 64
- reset all sizes and offsets
- rename from DER to S (we are not using DER encoding)
- add remoteInitRandomPubKey as return value to respondToHandshake
- add ImportPublicKey with error return to read both EC golang.elliptic style 65 byte encoding and 64 byte one
- add ExportPublicKey falling back to go-ethereum/crypto.FromECDSAPub() chopping off the first byte
- add Import - Export tests
- all tests pass
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
- abstract the entire handshake logic in cryptoId.Run() taking session-relevant parameters
- changes in peer to accomodate how the encryption layer would be switched on
- modify arguments of handshake components
- fixed test getting the wrong pubkey but it till crashes on DH in newSession()
|
| |
|
|
|
|
| |
secp256k1-go panics
|
| |
|
| |
|
|
|
|
|
|
| |
- add session token check and fallback to shared secret in responder call too
- use explicit length for the types of new messages
- fix typo resp[resLen-1] = tokenFlag
|
|
|
|
|
|
|
|
| |
- correct sizes for the blocks : sec signature 65, ecies sklen 16, keylength 32
- added allocation to Xor (should be optimized later)
- no pubkey reader needed, just do with copy
- restructuring now into INITIATE, RESPOND, COMPLETE -> newSession initialises the encryption/authentication layer
- crypto identity can be part of client identity, some initialisation when server created
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
The test now checks that the number of of addresses is correct
and terminates cleanly.
|
|
|
|
| |
It had been moved to Peer, probably for debugging.
|
|
|
|
|
|
|
|
|
|
|
|
| |
...and make it a top-level function instead.
The original idea behind having EncodeMsg in the interface was that
implementations might be able to encode RLP data to their underlying
writer directly instead of buffering the encoded data. The encoder
will buffer anyway, so that doesn't matter anymore.
Given the recent problems with EncodeMsg (copy-pasted implementation
bug) I'd rather implement once, correctly.
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| | |
p2p: fix decoding of disconnect reason
|
| | |
|
| |
| |
| |
| | |
Test-tastic.
|
| | |
|
| | |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Whoa, one more big commit. I didn't manage to untangle the
changes while working towards compatibility.
|
| |
|
| |
|