Realization in addition to Utilization of high-BW TCP on real application Kei Hiraki Data R

Realization in addition to Utilization of high-BW TCP on real application Kei Hiraki Data R www.phwiki.com

Realization in addition to Utilization of high-BW TCP on real application Kei Hiraki Data R

Newberry, Jan, Food & Wine Editor has reference to this Academic Journal, PHwiki organized this Journal Realization in addition to Utilization of high-BW TCP on real application Kei Hiraki Data Reservoir / GRAPE-DR project The University of Tokyo Computing System as long as real Scientists Fast CPU, huge memory in addition to disks, good graphics Cluster technology, DSM technology, Graphics processors Grid technology Very fast remote file accesses Global file system, data parallel file systems, Replication facilities Transparency to local computation No complex middleware, or no small modification to existing software Real Scientists are not computer scientists Computer scientists are not work as long as ces as long as real scientists Objectives of Data Reservoir / GRAPE-DR(1) Sharing Scientific Data between distant research institutes Physics, astronomy, earth science, simulation data Very High-speed single file transfer on Long Fat pipe Network > 10 Gbps, > 20,000 Km, > 400ms RTT High utilization of available b in addition to width Transferred file data rate > 90% of available b in addition to width Including header overheads, initial negotiation overheads

Strayer University-Nashville Campus TN www.phwiki.com

This Particular University is Related to this Particular Journal

Objectives of Data Reservoir / GRAPE-DR(2) GRAPE-DR:Very high-speed attached processor to a server 2004 – 2008 Successor of Grape-6 astronomical simulator 2PFLOPS on 128 node cluster system 1G FLOPS / processor 1024 processor / chip 8 chips / PCI card 2 PCI card / serer 2 M processor / system Grape6 VeryHigh-speed Network Data Reservoir Data analysis at University of Tokyo Belle Experiments X-ray astronomy Satellite ASUKA SUBARU Telescope Nobeyama Radio Observatory VLBI) Nuclear experiments Data Reservoir Data Reservoir Local Accesses Distributed Shared files Data intensive scientific computation through global networks Digital Sky Survey Data Reservoir Data Reservoir High latency Very high b in addition to width Network DistributeSharedData (DSM like architecture) Cache Disks Cache Disks Local file accesses Disk-block level Parallel in addition to Multi-stream transfer Local file accesses Basic Architecture

Disk Server Scientific Detectors User Programs IP Switch File Server File Server Disk Server IP Switch File Server File Server Disk Server Disk Server 1st level striping 2nd level striping Disk access by iSCSI File accesses on Data Reservoir IBM x345 (2.6GHz x 2) Global Data Transfer Problems found in 1st generation Data Reservoir Low TCP b in addition to width due to packet losses TCP congestion window size control Very slow recovery from fast recovery phase (>20min) Unbalance among parallel iSCSI streams Packet scheduling by switches in addition to routers User in addition to other network users have interests only to total behavior of parallel TCP streams

Fast Ethernet vs. GbE Iperf in 30 seconds Min/Avg: Fast Ethernet > GbE FE GbE Packet Transmission Rate Bursty behavior Transmission in 20ms against RTT 200ms Idle in rest 180ms Packet loss occurred Packet Spacing Ideal Story Transmitting packet every RTT/cwnd 24s interval as long as 500Mbps (MTU 1500B) High load as long as software only Low overhead because of limited use at slow start phase RTT RTT/cwnd

Example Case of 8 IPG Success on Fast Retransmit Smooth Transition to Congestion Avoidance CA takes 28 minutes to recover to 550Mbps Best Case of 1023B IPG Like Fast Ethernet case Proper transmission rate Spurious Retransmit due to Reordering Unbalance within parallel TCP streams Unbalance among parallel iSCSI streams Packet scheduling by switches in addition to routers Meaningless unfairness among parallel streams User in addition to other network users have interests only to total behavior of parallel TCP streams Our approach Constant cwnd i as long as fair TCP network usage to other users Balance each cwnd i communicating between parallel TCP streams time BW time BW

3rd Generation Data Reservoir Hardware in addition to software basis as long as 100Gbps Distributed Data-sharing systems 10Gbps disk data transfer by a single Data Reservoir server Transparent support as long as multiple filesystems (detection of modified disk blocks) Hardware(FPGA) implementation of Inter-layer coordination mechanisms 10 Gbps Long Fat pipe Network emulator in addition to 10 Gbps data logger Utilization of 10Gbps network A single box 10 Gbps Data Reservoir server Quad Opteron server with multiple PCI-X buses (prototype, SUN V40z server) Two Chelsio T110 TCP off-loading NIC Disk arrays as long as necessary disk b in addition to width Data Reservoir software (iSCSI deamon, disk driver, data transfer maneger) Chelsio T110 TCP NIC Quad Opteron Server (SUN V40z) Linux 2.6.6 PCI-X bus PCI-X bus PCI-X bus Chelsio T110 TCP NIC SCSI adaptor PCI-X bus SCSI adaptor 10G Ethernet Switch 10GBASE-SR Data Reservoir Software Ultra320SCSI Tokyo-CERN experiment (Oct.2004) CERN-Amsterdam-Chicago-Seattle-Tokyo SURFnet – CAnet 4 – IEEAF/Tyco – WIDE 18,500 km WAN PHY connection Per as long as mance result 7.21 Gbps (TCP payload) st in addition to ard Ethernet frame size, iperf 7.53 Gbps (TCP payload) 8K Jumbo frame, iperf 8.8 Gbps disk to disk per as long as mance 9 servers, 36 disks 36 parallel TCP streams

Tokyo Chicago Amsterdam Seattle Vancouver Calgary Minneapolis IEEAF CANARIE SURFnet Network used in the experiment Tokyo-CERN Network connection CAnet 4 End Systems A L1 or L2 switch Geneva Network topology of CERN-Tokyo experiment Tokyo Seattle Vancouver Minneapolis Chicago Amsterdam CERN (Geneva) IBM x345 server Dual Intel Xeon 2.4GHz 2GB memory Linux 2.6.6 (No.2-7) Linux 2.4.X (No. 1) IBM x345 IBM x345 Opteron server Dual Opteron248,2.2GHz 1GB memory Linux 2.6.6 (No.2-6) Chelsio T110 NIC Fujitsu XG800 12 port switch Foundry BI MG8 Data Reservoir at Univ. of Tokyo GbE IBM x345 server Dual Intel Xeon 2.4GHz 2GB memory Linux 2.6.6 (No.2-7) Linux 2.4.X (No. 1) IBM x345 IBM x345 Opteron server Dual Opteron248,2.2GHz 1GB memory Linux 2.6.6 (No.2-6) Chelsio T110 NIC Data Reservoir at CERN(Geneva) GbE T-LEX Pacific Northwest Gigapop StarLight NetherLight WIDE / IEEAF CAnet 4 SURFnet 10GBASE-LW Fujitsu XG800 Foundry FEX Foundry NetIron40G Extreme Summit 400 LSR experiments Target > 30,000 km LSR distance L3 switching at Chicago in addition to Amsterdam Period of the experiment 12/20 – 1/3 Holiday season as long as vacant public research networks System configuration A pair of opteron servers with Chelsio T110 (at N-otemachi) Another pair of opteron servers with Chelsion T110 as long as competing traffinc generation ClearSight 10Gbps packet analyzer as long as packet capturing

Tokyo Chicago Amsterdam Seattle Vancouver Calgary Minneapolis IEEAF/Tyco/WIDE CANARIE SURFnet Network used in the experiment Figure 2. Network connection CAnet 4 APAN/JGN2 Abilene NYC A router or an L3 switch A L1 or L2 switch Single stream TCP – Tokyo – Chicago – Amsterdam – NY – Chicago – Tokyo Tokyo T-LEX Amsterdam NetherLight SURFnet IEEAF/Tyco/WIDE CANARIE Router or L3 switch University of Amsterdam Chicago StarLight L1 or L2 switch Force10 E1200 ONS 15454 Vancouver Foundry NetIron 40G OME 6550 Minneapolis Atlantic Ocean Pacific Ocean Opteron1 Opteron server Chelsio T110 NIC IEEAF/Tyco Opteron server Chelsio T110 NIC ClearSight 10Gbps capture Fujitsu XG800 OC-192 WAN PHY WAN PHY CANARIE SURFnet OME 6550 Procket 8801 Procket 8812 ONS 15454 ONS 15454 ONS 15454 ONS 15454 ONS 15454 HDXc T640 T640 HDXc CISCO 12416 CISCO 12416 CISCO 6509 Force10 E600 Opteron3 Seattle Pacific Northwest Gigapop New York MANLAN Chicago SURFnet SURFnet OC-192 OC-192 Abilene APAN/JGN TransPAC SURFnet SURFnet SURFnet WIDE WIDE Calgary Univ of Tokyo WIDE APAN/JGN2 Abilene Network Traffic on routers in addition to switches StarLight Force10 E1200 University of Amsterdam Force10 E600 Abilene T640 NYCM to CHIN TransPAC Procket 8801 Submitted run

Newberry, Jan San Francisco Magazine Food & Wine Editor www.phwiki.com

Summary Single Stream TCP We removed TCP related difficulties Now I/O bus b in addition to width is the bottleneck Cheap in addition to simple servers can enjoy 10Gbps network Lack of methodology in high-per as long as mance network debugging 3 day debugging (overnight working) 1 day stable period (usable as long as measurements) Network may feel fatigue, some trouble must happen We need something effective. Detailed issues Flow control ( in addition to QoS) Buffer size in addition to policy Optical level setting Systems used in Long-distance TCP experiments CERNPittsburghTokyo

Efficient in addition to effective utilization of High-speed internet Efficient in addition to effective utilization of 10Gbps network is still very difficult PHY, MAC, Data-link , in addition to Switches 10Gbps is ready to use Network interface adaptor 8Gbps is ready to use, 10Gbps in several months Proper offloading, RDMA implementation I/O bus of a server 20 Gbps is necessary to drive 10Gbps network Drivers, operating system Too many interruption, buffer memory management File system Slow NFS service Consistency problem Difficulty in10Gbps Data Reservoir Disk to disk Single Stream TCP data transfer High CPU utilization (per as long as mance limit by CPU) Too many context switches Too many interruption from Network adaptor (> 30,000/s) Data copy from buffers to buffers I/O bus bottleneck PCI-X/133 — maximum 7.6Gbps data transfer Waiting as long as PCI-X/266 or PCI-express x8 or x16 NIC Disk per as long as mance Per as long as mance limit of RAID adaptor Number of disks as long as data transfer (>40 disks are required) File system High BW in file service is more difficult than data sharing High-speed IP network in supercomputing (GRAPE-DR project) World fastest computing system 2PFLOPS in 2008 (per as long as mance on actual application programs) Construction of general-purpose massively parallel architecture Low power consumption in PFLOPS range per as long as mance MPP architecture more general-purpose than vector architecture Use of comodity network as long as interconnection 10Gbps optical network (2008) + MEMs switches 100Gbps optical network (2010)

Hierarchical construction of GRAPE-DR 512PE/Chip 512 GFlops /Chip 2KPE/PCI board 2TFLOPS/PCI board 8 KPE/Server 8 TFLOPS/Server MPE/Node 1PFLOPS/Node 2M PE/System 2PFLOPS/System Network architecture inside a GRAPE-DR system Memory KOE AMD based server Memory bus Adaptive compier Outside IP network 100Gbps iSCSI MEMs based optical switch IP storage system Total system conductor For dynamic optimization Highly functional router Fujitsu Computer Technologies, LTD

Newberry, Jan Food & Wine Editor

Newberry, Jan is from United States and they belong to San Francisco Magazine and they are from  San Francisco, United States got related to this Particular Journal. and Newberry, Jan deal with the subjects like Food; Wine

Journal Ratings by Strayer University-Nashville Campus

This Particular Journal got reviewed and rated by Strayer University-Nashville Campus and short form of this particular Institution is TN and gave this Journal an Excellent Rating.